Skip to content

Refactor AI form fill task and add sentry to catch errors#23

Open
sainak wants to merge 6 commits intomasterfrom
sainak/code-optimizations
Open

Refactor AI form fill task and add sentry to catch errors#23
sainak wants to merge 6 commits intomasterfrom
sainak/code-optimizations

Conversation

@sainak
Copy link
Copy Markdown
Member

@sainak sainak commented Dec 12, 2025

Summary by CodeRabbit

  • Improvements

    • Stronger error handling, standardized failure reporting, and clearer user-facing messages during AI form filling.
    • Centralized prompt handling, provider-agnostic AI processing, improved transcription/OCR flows, smarter caching, and more reliable token/accounting metadata.
    • Enhanced logging, timing, trace capture, and mapping of AI outputs back to original form fields; quota-aware processing with explicit status updates.
  • New

    • Exposes a last-modified timestamp for quota records.
    • Stores a hashed Terms & Conditions value at startup.
    • Adds a public base prompt constant and a dedicated error type for AI processing.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings December 12, 2025 16:20
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 12, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Changes touch three modules: care_scribe/models/scribe_quota.py, care_scribe/settings.py, and care_scribe/tasks/scribe.py. ScribeQuota gained a public last_modified_date property that returns modified_date, or created_date, or datetime.min. Settings now computes and stores self.tnc_hash by hashing SCRIBE_TNC during validation. The task module was substantially reworked: added BASE_PROMPT and a public ScribeError exception; restructured the AI form-fill flow with quota checks, provider parsing, caching, transcription branching, expanded error handling and logging (including Sentry traces), more explicit token accounting, and updated output-schema and post-processing to map generated qX fields back to original form field IDs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Primary focus: care_scribe/tasks/scribe.py — dense, heterogeneous changes across control flow, error handling, quota integration, provider parsing, caching, schema/transcription generation, and token accounting.
  • care_scribe/models/scribe_quota.py — verify last_modified_date correctness, timezone-aware vs naive datetime behavior, and fallback semantics.
  • care_scribe/settings.py — confirm self.tnc_hash computation timing, deterministic hashing, and side-effect safety during validate().
  • Check ScribeError propagation, Sentry trace usage, and that processing metadata records tracebacks/tokens consistently.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: refactoring the AI form fill task (care_scribe/tasks/scribe.py) and integrating Sentry error handling, which are the primary modifications in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sainak/code-optimizations

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the AI form fill task to improve error handling and monitoring by introducing a custom ScribeError exception class and adding Sentry integration for production error tracking. The refactoring consolidates error handling into a single try-except block, eliminates early returns in favor of raising exceptions, and improves logging consistency throughout the task.

Key changes:

  • Introduced ScribeError exception class with processing context support
  • Added Sentry integration for error capture in production environments
  • Refactored error handling to use exceptions instead of early returns with status updates
  • Added tnc_hash property to plugin settings and last_modified_date property to ScribeQuota model

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
care_scribe/tasks/scribe.py Major refactoring of the AI form fill task with new error handling, custom exception class, improved logging format, and consolidated error processing
care_scribe/settings.py Added tnc_hash attribute to plugin_settings for caching the hash of terms and conditions
care_scribe/models/scribe_quota.py Added last_modified_date property and reorganized imports

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread care_scribe/tasks/scribe.py Outdated
Comment thread care_scribe/tasks/scribe.py
Comment thread care_scribe/tasks/scribe.py Outdated
Comment thread care_scribe/tasks/scribe.py Outdated
Comment thread care_scribe/models/scribe_quota.py
Comment thread care_scribe/tasks/scribe.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
care_scribe/models/scribe_quota.py (1)

14-15: Module-level date computation causes stale month boundaries.

start_of_month and end_of_month are computed once at import time. In long-running processes (e.g., Celery workers), these values become stale when the month changes, causing calculate_used() to query incorrect date ranges by default.

Move the computation inside calculate_used() or make them functions:

-start_of_month = timezone.now().replace(day=1, hour=0, minute=0, second=0, microsecond=0)
-end_of_month = (start_of_month + timezone.timedelta(days=31)).replace(day=1) - timezone.timedelta(seconds=1)
+
+def get_month_boundaries():
+    start = timezone.now().replace(day=1, hour=0, minute=0, second=0, microsecond=0)
+    end = (start + timezone.timedelta(days=31)).replace(day=1) - timezone.timedelta(seconds=1)
+    return start, end

Then update calculate_used:

-    def calculate_used(self, from_date=start_of_month, to_date=end_of_month):
+    def calculate_used(self, from_date=None, to_date=None):
+        if from_date is None or to_date is None:
+            from_date, to_date = get_month_boundaries()
care_scribe/tasks/scribe.py (1)

467-481: Missing processing_context in additional error paths.

These ScribeError raises should include processing_context=processing for consistent error tracking:

                     if retry > 0:
-                        raise ScribeError(
-                            f"AI response was malformed, please retry: {str(ai_resp.candidates[0].finish_message)}"
-                        )
+                        raise ScribeError(
+                            f"AI response was malformed, please retry: {ai_resp.candidates[0].finish_message!s}",
+                            processing_context=processing,
+                        )
                     else:
                         processing["retries"] = retry + 1
                         return generate_response(retry + 1)
                 return ai_resp

             ai_response = generate_response()

             if ai_response.candidates[0].finish_reason != types.FinishReason.STOP:
-                raise ScribeError(
-                    f"AI response did not finish successfully: {str(ai_response.candidates[0].finish_reason)}: "
-                    f"{str(ai_response.candidates[0].finish_message)}"
-                )
+                raise ScribeError(
+                    f"AI response did not finish successfully: {ai_response.candidates[0].finish_reason!s}: "
+                    f"{ai_response.candidates[0].finish_message!s}",
+                    processing_context=processing,
+                )

Note: Also using !s conversion flag as suggested by RUF010 instead of explicit str() calls.

🧹 Nitpick comments (2)
care_scribe/tasks/scribe.py (2)

399-408: Refactor cache fetching to avoid raising exception for control flow.

The current pattern raises Exception("Cache not found") inside a try block just to trigger the except handler. This is an anti-pattern flagged by linter rules TRY002/TRY301.

         cached_content = None
-        try:
-            logger.info(
-                f"Scribe[{form.external_id}] status: fetching cache for scribe_{chat_model}_{output_schema_hash}"
-            )
-            cached_content = client.caches.get(name=f"scribe_{chat_model}_{output_schema_hash}")
-            if not cached_content:
-                raise Exception("Cache not found")
-        except Exception as e:
-            logger.error(f"Scribe[{form.external_id}] error fetching cache: {e}")
+        logger.info(
+            f"Scribe[{form.external_id}] status: fetching cache for scribe_{chat_model}_{output_schema_hash}"
+        )
+        try:
+            cached_content = client.caches.get(name=f"scribe_{chat_model}_{output_schema_hash}")
+        except Exception:
+            logger.exception(f"Scribe[{form.external_id}] error fetching cache")
+            cached_content = None
+
+        if not cached_content:
+            logger.info(f"Scribe[{form.external_id}] cache not found, will create new cache")

621-629: Use logger.exception for automatic stack trace inclusion.

logger.error doesn't include the stack trace automatically. Since this is the main error handler, use logger.exception for better debugging context (especially useful when Sentry is not configured).

     except Exception as e:
-        logger.error(f"Scribe[{form.external_id}] status: error occurred while processing form: {e}")
+        logger.exception(f"Scribe[{form.external_id}] status: error occurred while processing form")

         if settings.SENTRY_DSN:
             import sentry_sdk

             sentry_sdk.capture_exception(e)
         else:
             processing["error_trace"] = "".join(traceback.format_exception(None, e, e.__traceback__))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 141629b and 77348e0.

📒 Files selected for processing (3)
  • care_scribe/models/scribe_quota.py (2 hunks)
  • care_scribe/settings.py (2 hunks)
  • care_scribe/tasks/scribe.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
care_scribe/settings.py (1)
care_scribe/utils.py (1)
  • hash_string (3-5)
🪛 Ruff (0.14.8)
care_scribe/tasks/scribe.py

150-153: Abstract raise to an inner function

(TRY301)


150-153: Avoid specifying long messages outside the exception class

(TRY003)


168-171: Abstract raise to an inner function

(TRY301)


168-171: Avoid specifying long messages outside the exception class

(TRY003)


174-174: Abstract raise to an inner function

(TRY301)


174-174: Avoid specifying long messages outside the exception class

(TRY003)


177-180: Abstract raise to an inner function

(TRY301)


177-180: Avoid specifying long messages outside the exception class

(TRY003)


183-186: Abstract raise to an inner function

(TRY301)


183-186: Avoid specifying long messages outside the exception class

(TRY003)


203-206: Abstract raise to an inner function

(TRY301)


203-206: Avoid specifying long messages outside the exception class

(TRY003)


209-212: Abstract raise to an inner function

(TRY301)


209-212: Avoid specifying long messages outside the exception class

(TRY003)


225-225: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


225-225: Avoid specifying long messages outside the exception class

(TRY003)


341-341: Avoid specifying long messages outside the exception class

(TRY003)


406-406: Abstract raise to an inner function

(TRY301)


406-406: Create your own exception

(TRY002)


406-406: Avoid specifying long messages outside the exception class

(TRY003)


407-407: Do not catch blind exception: Exception

(BLE001)


408-408: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


428-430: Avoid specifying long messages outside the exception class

(TRY003)


461-461: Avoid specifying long messages outside the exception class

(TRY003)


467-469: Abstract raise to an inner function

(TRY301)


467-469: Avoid specifying long messages outside the exception class

(TRY003)


468-468: Use explicit conversion flag

Replace with conversion flag

(RUF010)


478-481: Abstract raise to an inner function

(TRY301)


478-481: Avoid specifying long messages outside the exception class

(TRY003)


479-479: Use explicit conversion flag

Replace with conversion flag

(RUF010)


480-480: Use explicit conversion flag

Replace with conversion flag

(RUF010)


581-581: Avoid specifying long messages outside the exception class

(TRY003)


586-586: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


587-587: Avoid specifying long messages outside the exception class

(TRY003)


621-621: Do not catch blind exception: Exception

(BLE001)


622-622: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (3)
care_scribe/tasks/scribe.py (3)

28-53: Well-structured prompt template and error handling.

The centralized BASE_PROMPT improves maintainability, and ScribeError with processing_context provides good structured error handling for tracking processing state during failures.


427-430: Missing processing_context in cache creation error.

                 if match and "constraint-is-too-big" in match.group(1):
-                    raise ScribeError(
-                        "The form is too large for Scribe. Please try again with a smaller form."
-                    ) from e
+                    raise ScribeError(
+                        "The form is too large for Scribe. Please try again with a smaller form.",
+                        processing_context=processing,
+                    ) from e

Likely an incorrect or invalid review comment.


202-212: No changes needed — quota comparison logic is correct as written.

The code intentionally compares user_quota.used against facility_quota.tokens_per_user (the facility's current per-user token limit) rather than user_quota.tokens. This is the correct behavior. While user_quota.tokens is initialized to facility_quota.tokens_per_user at quota creation time, the facility's tokens_per_user setting can be updated dynamically by administrators via the API. By comparing against the facility's current limit, the code ensures that quota enforcement reflects the facility's current policy, allowing admins to adjust per-user limits and have those changes apply to all users going forward.

Comment thread care_scribe/models/scribe_quota.py
Comment thread care_scribe/settings.py Outdated
Comment thread care_scribe/tasks/scribe.py Outdated
Comment thread care_scribe/tasks/scribe.py
Comment thread care_scribe/tasks/scribe.py
Comment thread care_scribe/tasks/scribe.py Outdated
Comment thread care_scribe/tasks/scribe.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
care_scribe/tasks/scribe.py (5)

225-251: Verify schema key mapping is stable and matches UI field order.
You build processed_fields_no_keys via enumeration of processed_fields.values() and later convert back via enumeration of processed_fields.keys(). This is order-dependent; it’s OK only if insertion order is guaranteed to match the form’s field order (and never changes across recursion).

# Confirm field ordering is deterministic and matches the frontend form order.
# Expect: `process_fields()` traverses in a stable order and `form.form_data` preserves order.
rg -n "def process_fields|processed_fields_no_keys|converted_response" -C 3

Also applies to: 582-587


199-206: Provider override uses != "" and can mis-route when key is None (and lacks exception chaining).
This repeats a prior review point: use truthiness for the key, and use raise ... from None inside the except ValueError: to satisfy B904.

             try:
                 api_provider, chat_model = form.chat_model.split("/")
-                if api_provider == "openai" and plugin_settings.SCRIBE_AZURE_API_KEY != "":
+                if api_provider == "openai" and plugin_settings.SCRIBE_AZURE_API_KEY:
                     api_provider = "azure"
             except ValueError:
-                raise ScribeError("Invalid chat model format. Use 'provider/model_name'.")
+                raise ScribeError("Invalid chat model format. Use 'provider/model_name'.") from None

379-420: Cache name/display_name should be sanitized (chat_model may contain /), and cache-fetch flow has redundant raise/logging.
This repeats prior feedback: scribe_{chat_model}_... can break naming if chat_model includes slashes; also raise Exception("Cache not found") then swallowing it is unnecessary, and logger.error should be logger.exception if you care about the traceback.

-            logger.info(
-                f"Scribe[{form.external_id}] status: fetching cache for scribe_{chat_model}_{output_schema_hash}"
-            )
-            cached_content = client.caches.get(name=f"scribe_{chat_model}_{output_schema_hash}")
-            if not cached_content:
-                raise Exception("Cache not found")
-        except Exception as e:
-            logger.error(f"Scribe[{form.external_id}] error fetching cache: {e}")
+            safe_model = re.sub(r"[^a-zA-Z0-9_.-]", "_", chat_model)
+            cache_key = f"scribe_{safe_model}_{output_schema_hash}"
+            logger.info(f"Scribe[{form.external_id}] status: fetching cache for {cache_key}")
+            cached_content = client.caches.get(name=cache_key)
+        except Exception:
+            logger.exception(f"Scribe[{form.external_id}] error fetching cache")
...
-                    cached_content = client.caches.create(
+                    cached_content = client.caches.create(
                         model=chat_model,
                         config=types.CreateCachedContentConfig(
-                            display_name=f"scribe_{chat_model}_{output_schema_hash}",
+                            display_name=cache_key,

463-472: next(...) can raise StopIteration if Gemini returns no function_call part.
This repeats prior feedback: add a default and raise a clear ScribeError.

-            ai_response_json = next(
-                part.function_call.args for part in ai_response.candidates[0].content.parts if part.function_call
-            )
+            function_call_part = next(
+                (part for part in ai_response.candidates[0].content.parts if part.function_call),
+                None,
+            )
+            if not function_call_part:
+                raise ScribeError("AI response did not contain a function call")
+            ai_response_json = function_call_part.function_call.args

563-568: Use logger.exception for parse failures (and avoid dumping full response).
This repeats prior feedback: logger.error(... {ai_response}) loses stack trace and may print PHI; prefer logger.exception with identifiers only.

             except Exception as e:
-                logger.error(f"Scribe[{form.external_id}] error parsing response: {ai_response}")
+                logger.exception(f"Scribe[{form.external_id}] error parsing non-google AI response JSON")
                 raise ScribeError("Error parsing AI response") from e
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77348e0 and 0a3e2d4.

📒 Files selected for processing (1)
  • care_scribe/tasks/scribe.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
care_scribe/tasks/scribe.py (3)
care_scribe/models/scribe.py (4)
  • Scribe (103-154)
  • Status (104-111)
  • audio_file_ids (134-141)
  • document_file_ids (144-151)
care_scribe/models/scribe_quota.py (4)
  • ScribeQuota (17-72)
  • last_modified_date (35-36)
  • calculate_used (50-61)
  • save (66-72)
care_scribe/utils.py (1)
  • hash_string (3-5)
🪛 Ruff (0.14.8)
care_scribe/tasks/scribe.py

148-148: Abstract raise to an inner function

(TRY301)


148-148: Avoid specifying long messages outside the exception class

(TRY003)


163-163: Abstract raise to an inner function

(TRY301)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


166-166: Abstract raise to an inner function

(TRY301)


166-166: Avoid specifying long messages outside the exception class

(TRY003)


169-169: Abstract raise to an inner function

(TRY301)


169-169: Avoid specifying long messages outside the exception class

(TRY003)


172-172: Abstract raise to an inner function

(TRY301)


172-172: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Abstract raise to an inner function

(TRY301)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


192-192: Abstract raise to an inner function

(TRY301)


192-192: Avoid specifying long messages outside the exception class

(TRY003)


205-205: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


321-321: Avoid specifying long messages outside the exception class

(TRY003)


386-386: Abstract raise to an inner function

(TRY301)


386-386: Create your own exception

(TRY002)


386-386: Avoid specifying long messages outside the exception class

(TRY003)


387-387: Do not catch blind exception: Exception

(BLE001)


388-388: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


408-410: Avoid specifying long messages outside the exception class

(TRY003)


441-441: Avoid specifying long messages outside the exception class

(TRY003)


447-449: Abstract raise to an inner function

(TRY301)


447-449: Avoid specifying long messages outside the exception class

(TRY003)


448-448: Use explicit conversion flag

Replace with conversion flag

(RUF010)


458-461: Abstract raise to an inner function

(TRY301)


458-461: Avoid specifying long messages outside the exception class

(TRY003)


459-459: Use explicit conversion flag

Replace with conversion flag

(RUF010)


460-460: Use explicit conversion flag

Replace with conversion flag

(RUF010)


561-561: Avoid specifying long messages outside the exception class

(TRY003)


566-566: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


567-567: Avoid specifying long messages outside the exception class

(TRY003)


601-601: Do not catch blind exception: Exception

(BLE001)


602-602: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (1)
care_scribe/tasks/scribe.py (1)

14-17: Verify OpenAI SDK 2.2.0 support for response_format with json_schema and strict: true, and Azure OpenAI parity.

The code at lines 426–431 correctly handles Google GenAI's constraint: tool_config and tools are set to None when cached_content is used, and vice versa, which matches the API requirement. The GenerateContentConfig call at lines 426–431 properly conditions these parameters.

However, confirm that openai==2.2.0 supports response_format={"type":"json_schema", ..., "strict": true} (feature added August 2024) and that Azure OpenAI parity exists for this SDK version. Azure structured outputs support was added in SDK releases around Jan–Feb 2025; verify this version covers your Azure deployment.

Comment thread care_scribe/tasks/scribe.py
Comment thread care_scribe/tasks/scribe.py
Comment thread care_scribe/tasks/scribe.py
Comment thread care_scribe/tasks/scribe.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
care_scribe/settings.py (1)

120-131: tnc_hash becomes stale after plugin_settings.reload(); recompute it (or clear it) in reload().

reload() clears cached attrs, but tnc_hash is neither cleared nor recomputed, so runtime updates to SCRIBE_TNC will not be reflected.

 def reload(self) -> None:
@@
     if hasattr(self, "_user_settings"):
         delattr(self, "_user_settings")
+
+    # Keep `tnc_hash` in sync with the latest SCRIBE_TNC after reload
+    self.tnc_hash = hash_string(self.SCRIBE_TNC)
care_scribe/tasks/scribe.py (3)

294-325: High compliance risk: logs/Sentry can capture PHI (transcript, full AI response, raw dumps). Redact/gate content logging and avoid storing raw responses.

At minimum, remove direct content logs and avoid persisting ai_response_raw (or gate behind an explicit allowlist flag defaulting off).

-                    logger.info(f"Scribe[{form.external_id}] transcript: {transcript}")
+                    logger.info(
+                        f"Scribe[{form.external_id}] transcript generated (chars={len(transcript)})"
+                    )
@@
-                processing["ai_response_raw"] = ai_response.model_dump()
+                # Beware PHI: avoid storing full raw model responses in meta by default
+                # processing["ai_response_raw"] = ai_response.model_dump()
@@
-                processing["ai_response_raw"] = ai_response.model_dump()
+                # Beware PHI: avoid storing full raw model responses in meta by default
+                # processing["ai_response_raw"] = ai_response.model_dump()
@@
-        logger.info(f"AI response: {ai_response_json}")
+        logger.info(
+            f"Scribe[{form.external_id}] AI response received (filled_fields={len(converted_response)})"
+        )

Also consider a Sentry scrubber / event processor before capture_exception(e) if exceptions can include request/meta payloads.

Also applies to: 566-570, 582-593, 607-613


133-139: Exception handler can crash if Scribe.objects.get(...) fails; move the get() inside try (or guard form).

 @shared_task
 def process_ai_form_fill(external_id):
     current_time = timezone.now()
-
-    form = Scribe.objects.get(external_id=external_id, status=Scribe.Status.READY)
+    form = None
 
     processing = {
         "created_date": current_time.isoformat(),
     }
 
     try:
+        form = Scribe.objects.get(external_id=external_id, status=Scribe.Status.READY)
@@
-    except Exception as e:
-        logger.error(f"Scribe[{form.external_id}] status: error occurred while processing form: {e}")
+    except Exception as e:
+        logger.exception(
+            f"Scribe[{getattr(form, 'external_id', external_id)}] status: error occurred while processing form"
+        )
@@
-        form.meta["processings"] = [*form.meta.get("processings", []), processing]
-        form.status = Scribe.Status.FAILED
-        form.save()
+        if form:
+            form.meta["processings"] = [*form.meta.get("processings", []), processing]
+            form.status = Scribe.Status.FAILED
+            form.save()

Also applies to: 604-617


147-173: audio_file_ids / document_file_ids are methods; using them as attributes breaks checks/filters. Call once and reuse.

(Your care_scribe/models/scribe.py snippets show these are methods returning QuerySets.)

     try:
@@
-        if not form.audio_file_ids and not form.document_file_ids:
+        audio_ids = list(form.audio_file_ids())
+        document_ids = list(form.document_file_ids())
+
+        if not audio_ids and not document_ids:
             raise ScribeError("No audio or documents associated with the Scribe. Your upload might have failed.")
@@
-            if not facility_quota.allow_ocr and not user_quota.allow_ocr and len(form.document_file_ids) > 0:
+            if not facility_quota.allow_ocr and not user_quota.allow_ocr and len(document_ids) > 0:
                 raise ScribeError("OCR is not enabled for this user or facility.")
@@
-        audio_files = ScribeFile.objects.filter(external_id__in=form.audio_file_ids)
+        audio_files = ScribeFile.objects.filter(external_id__in=audio_ids)
@@
-        if (has_images := len(form.document_file_ids) > 0) or total_audio_duration > (3 * 60 * 1000):
+        if (has_images := len(document_ids) > 0) or total_audio_duration > (3 * 60 * 1000):
@@
-        if api_provider != "google" and len(form.document_file_ids) == 0:
+        if api_provider != "google" and len(document_ids) == 0:
             del output_schema["properties"]["__scribe__transcription"]
             output_schema["required"].remove("__scribe__transcription")
@@
-        document_file_objects = ScribeFile.objects.filter(external_id__in=form.document_file_ids)
+        document_file_objects = ScribeFile.objects.filter(external_id__in=document_ids)

Also applies to: 222-224, 255-266, 333-334

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a3e2d4 and 16a8177.

📒 Files selected for processing (2)
  • care_scribe/settings.py (2 hunks)
  • care_scribe/tasks/scribe.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
care_scribe/tasks/scribe.py (6)
care_scribe/models/scribe.py (4)
  • Scribe (103-154)
  • Status (104-111)
  • audio_file_ids (134-141)
  • document_file_ids (144-151)
care_scribe/models/scribe_file.py (2)
  • ScribeFile (14-56)
  • save (42-53)
care_scribe/models/scribe_quota.py (4)
  • ScribeQuota (17-72)
  • last_modified_date (35-36)
  • calculate_used (50-61)
  • save (66-72)
care_scribe/utils.py (1)
  • hash_string (3-5)
care_scribe/serializers/scribe.py (1)
  • save (94-144)
care_scribe/serializers/scribe_file.py (1)
  • create (50-67)
care_scribe/settings.py (1)
care_scribe/utils.py (1)
  • hash_string (3-5)
🪛 Ruff (0.14.8)
care_scribe/tasks/scribe.py

148-148: Abstract raise to an inner function

(TRY301)


148-148: Avoid specifying long messages outside the exception class

(TRY003)


163-163: Abstract raise to an inner function

(TRY301)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


166-166: Abstract raise to an inner function

(TRY301)


166-166: Avoid specifying long messages outside the exception class

(TRY003)


169-169: Abstract raise to an inner function

(TRY301)


169-169: Avoid specifying long messages outside the exception class

(TRY003)


172-172: Abstract raise to an inner function

(TRY301)


172-172: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Abstract raise to an inner function

(TRY301)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


192-192: Abstract raise to an inner function

(TRY301)


192-192: Avoid specifying long messages outside the exception class

(TRY003)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


321-321: Avoid specifying long messages outside the exception class

(TRY003)


386-386: Do not catch blind exception: Exception

(BLE001)


387-387: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


407-409: Avoid specifying long messages outside the exception class

(TRY003)


440-440: Avoid specifying long messages outside the exception class

(TRY003)


446-448: Abstract raise to an inner function

(TRY301)


446-448: Avoid specifying long messages outside the exception class

(TRY003)


447-447: Use explicit conversion flag

Replace with conversion flag

(RUF010)


457-460: Abstract raise to an inner function

(TRY301)


457-460: Avoid specifying long messages outside the exception class

(TRY003)


458-458: Use explicit conversion flag

Replace with conversion flag

(RUF010)


459-459: Use explicit conversion flag

Replace with conversion flag

(RUF010)


473-473: Abstract raise to an inner function

(TRY301)


473-473: Avoid specifying long messages outside the exception class

(TRY003)


564-564: Avoid specifying long messages outside the exception class

(TRY003)


569-569: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


570-570: Avoid specifying long messages outside the exception class

(TRY003)


604-604: Do not catch blind exception: Exception

(BLE001)


605-605: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

care_scribe/settings.py

120-120: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

Comment thread care_scribe/settings.py
Comment thread care_scribe/tasks/scribe.py
Comment thread care_scribe/tasks/scribe.py Outdated
Comment thread care_scribe/tasks/scribe.py Outdated
Comment thread care_scribe/tasks/scribe.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
care_scribe/tasks/scribe.py (3)

137-137: CRITICAL: Exception handler will crash if Scribe.objects.get() fails.

If line 137 raises Scribe.DoesNotExist or any other exception, the except block at lines 604-618 will attempt to access form.external_id (line 605), causing a secondary NameError that masks the original error.

This was flagged in past reviews. Initialize form = None before the query and guard all form references in the exception handler:

 @shared_task
 def process_ai_form_fill(external_id):
     current_time = timezone.now()
-
+    form = None
     form = Scribe.objects.get(external_id=external_id, status=Scribe.Status.READY)

And in the exception handler:

     except Exception as e:
-        logger.error(f"Scribe[{form.external_id}] status: error occurred while processing form at line {e.__traceback__.tb_lineno}: {e}")
+        form_id = form.external_id if form else external_id
+        logger.exception(f"Scribe[{form_id}] status: error occurred while processing form at line {e.__traceback__.tb_lineno}")
...
-        form.meta["processings"] = [*form.meta.get("processings", []), processing]
-        form.status = Scribe.Status.FAILED
-        form.save()
+        if form:
+            form.meta["processings"] = [*form.meta.get("processings", []), processing]
+            form.status = Scribe.Status.FAILED
+            form.save()

386-387: Use logger.exception to preserve stack traces.

Line 387 (and also 569, 605) uses logger.error which discards the stack trace. Use logger.exception in exception handlers to preserve debugging context.

             except Exception as e:
-                logger.error(f"Scribe[{form.external_id}] error fetching cache: {e}")
+                logger.exception(f"Scribe[{form.external_id}] error fetching cache")

Apply the same change at lines 569 and 605.

As per coding guidelines from past reviews, logger.exception should always be used in exception handlers.


324-324: MAJOR: Transcript logging exposes PHI.

Logging the full transcript (line 324) and potentially logging it again in error handlers exposes Protected Health Information. Similarly, line 582 logs the full AI response JSON which likely contains patient data.

Remove content logging or gate it behind an explicit configuration flag (default off):

-                    logger.info(f"Scribe[{form.external_id}] transcript: {transcript}")
+                    logger.info(f"Scribe[{form.external_id}] transcript length: {len(transcript)} chars")

And at line 582:

-        logger.info(f"AI response: {ai_response_json}")
+        logger.info(f"Scribe[{form.external_id}] AI response: {len(ai_response_json)} fields populated")

Based on learnings, past reviews flagged PHI exposure in logs and Sentry traces. Ensure Sentry integration is configured with proper scrubbing for PHI fields.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16a8177 and 1fc8fe5.

📒 Files selected for processing (1)
  • care_scribe/tasks/scribe.py (9 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
care_scribe/tasks/scribe.py

148-148: Abstract raise to an inner function

(TRY301)


148-148: Avoid specifying long messages outside the exception class

(TRY003)


163-163: Abstract raise to an inner function

(TRY301)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


166-166: Abstract raise to an inner function

(TRY301)


166-166: Avoid specifying long messages outside the exception class

(TRY003)


169-169: Abstract raise to an inner function

(TRY301)


169-169: Avoid specifying long messages outside the exception class

(TRY003)


172-172: Abstract raise to an inner function

(TRY301)


172-172: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Abstract raise to an inner function

(TRY301)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


192-192: Abstract raise to an inner function

(TRY301)


192-192: Avoid specifying long messages outside the exception class

(TRY003)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


321-321: Avoid specifying long messages outside the exception class

(TRY003)


386-386: Do not catch blind exception: Exception

(BLE001)


387-387: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


407-409: Avoid specifying long messages outside the exception class

(TRY003)


440-440: Avoid specifying long messages outside the exception class

(TRY003)


446-448: Abstract raise to an inner function

(TRY301)


446-448: Avoid specifying long messages outside the exception class

(TRY003)


447-447: Use explicit conversion flag

Replace with conversion flag

(RUF010)


457-460: Abstract raise to an inner function

(TRY301)


457-460: Avoid specifying long messages outside the exception class

(TRY003)


458-458: Use explicit conversion flag

Replace with conversion flag

(RUF010)


459-459: Use explicit conversion flag

Replace with conversion flag

(RUF010)


473-473: Abstract raise to an inner function

(TRY301)


473-473: Avoid specifying long messages outside the exception class

(TRY003)


564-564: Avoid specifying long messages outside the exception class

(TRY003)


569-569: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


570-570: Avoid specifying long messages outside the exception class

(TRY003)


604-604: Do not catch blind exception: Exception

(BLE001)


605-605: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (2)
care_scribe/tasks/scribe.py (2)

139-141: CRITICAL: processing dict contains non-JSON-serializable datetime.

Line 140 stores current_time.isoformat(), but if any error occurs before the processing dict is finalized, the datetime objects in processing will cause "Object of type datetime is not JSON serializable" errors when saving to form.meta at lines 593 or 615.

Based on learnings, user shivankacker reported this exact error in past reviews. Ensure all values in processing are JSON-serializable:

     processing = {
-        "created_date": current_time.isoformat(),
+        "created_date": str(current_time.isoformat()),
     }

Also verify that no other datetime objects are added to processing without serialization.

Likely an incorrect or invalid review comment.


147-148: Remove this comment - the code is correct as written.

audio_file_ids and document_file_ids are Python @property decorated methods that return Django QuerySets. They are properly accessed as attributes (without parentheses) throughout the code, and QuerySets support all operations being performed: truthiness checks, len(), and iteration in ORM filters. Calling them as methods (form.audio_file_ids()) would fail.

Likely an incorrect or invalid review comment.

Comment thread care_scribe/tasks/scribe.py
Comment thread care_scribe/tasks/scribe.py
Copy link
Copy Markdown
Member

@shivankacker shivankacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, clankers have something to say

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
care_scribe/tasks/scribe.py (5)

386-387: Use logger.exception to preserve stack trace.

Per static analysis hint TRY400, replace logger.error(f"... {e}") with logger.exception(...) to automatically capture the full stack trace.

             except Exception as e:
-                logger.error(f"Scribe[{form.external_id}] error fetching cache: {e}")
+                logger.exception(f"Scribe[{form.external_id}] error fetching cache")

571-573: Use logger.exception to preserve stack trace.

Per static analysis hint TRY400 and past review, replace logger.error with logger.exception to capture the full traceback.

             except Exception as e:
-                logger.error(f"Scribe[{form.external_id}] error parsing response: {ai_response}")
+                logger.exception(f"Scribe[{form.external_id}] error parsing response")
                 raise ScribeError("Error parsing AI response") from e

134-148: CRITICAL: Exception handler will crash if Scribe.objects.get() fails.

The form retrieval at line 137 is outside the try block. If the query raises DoesNotExist or any other exception, the except block at line 607-620 will fail with NameError when accessing form.external_id and form.meta.

🔎 Recommended fix
 @shared_task
 def process_ai_form_fill(external_id):
     current_time = timezone.now()
+    form = None
 
-    form = Scribe.objects.get(external_id=external_id, status=Scribe.Status.READY)
+    try:
+        form = Scribe.objects.get(external_id=external_id, status=Scribe.Status.READY)
+    except Scribe.DoesNotExist:
+        logger.warning(f"Scribe[{external_id}] not found or not READY; skipping")
+        return
 
     processing = {
         "created_date": current_time.isoformat(),
     }
 
     try:

And update the exception handler:

     except Exception as e:
-        logger.error(f"Scribe[{form.external_id}] status: error occurred ...")
+        logger.exception(f"Scribe[{form.external_id if form else external_id}] status: error occurred ...")
         ...
-        form.meta["processings"] = [*form.meta.get("processings", []), processing]
-        form.status = Scribe.Status.FAILED
-        form.save()
+        if form:
+            form.meta["processings"] = [*form.meta.get("processings", []), processing]
+            form.status = Scribe.Status.FAILED
+            form.save()

324-324: PHI risk: transcript content logged.

Logging the full transcript may expose patient health information (PHI) in log files. Consider logging only metadata (e.g., transcript length) or gating content logs behind a debug flag.


585-585: PHI risk: AI response logged.

Logging the full ai_response_json may expose patient health information. Consider logging only metadata or removing this log in production.

🧹 Nitpick comments (2)
care_scribe/tasks/scribe.py (2)

199-205: Consider logging the provider override.

When api_provider is openai but SCRIBE_AZURE_API_KEY is configured, the provider silently switches to azure. This could cause confusion during debugging.

🔎 Suggested improvement
             if api_provider == "openai" and plugin_settings.SCRIBE_AZURE_API_KEY:
+                logger.info(f"Scribe[{form.external_id}] overriding openai provider to azure")
                 api_provider = "azure"

607-620: Use logger.exception instead of manual formatting.

Line 608 manually formats the exception with line number, which loses the full stack trace. Use logger.exception for automatic traceback capture.

     except Exception as e:
-        logger.error(f"Scribe[{form.external_id}] status: error occurred while processing form at line {e.__traceback__.tb_lineno}: {e}")
+        logger.exception(f"Scribe[{form.external_id if form else external_id}] status: error occurred while processing form")

Note: Also requires the earlier fix to initialize form = None and guard the meta access, as flagged in the critical issue above.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fc8fe5 and 87d94c1.

📒 Files selected for processing (1)
  • care_scribe/tasks/scribe.py
🧰 Additional context used
🪛 Ruff (0.14.8)
care_scribe/tasks/scribe.py

148-148: Abstract raise to an inner function

(TRY301)


148-148: Avoid specifying long messages outside the exception class

(TRY003)


163-163: Abstract raise to an inner function

(TRY301)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


166-166: Abstract raise to an inner function

(TRY301)


166-166: Avoid specifying long messages outside the exception class

(TRY003)


169-169: Abstract raise to an inner function

(TRY301)


169-169: Avoid specifying long messages outside the exception class

(TRY003)


172-172: Abstract raise to an inner function

(TRY301)


172-172: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Abstract raise to an inner function

(TRY301)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


192-192: Abstract raise to an inner function

(TRY301)


192-192: Avoid specifying long messages outside the exception class

(TRY003)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


321-321: Avoid specifying long messages outside the exception class

(TRY003)


386-386: Do not catch blind exception: Exception

(BLE001)


387-387: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


407-409: Avoid specifying long messages outside the exception class

(TRY003)


440-440: Avoid specifying long messages outside the exception class

(TRY003)


446-448: Abstract raise to an inner function

(TRY301)


446-448: Avoid specifying long messages outside the exception class

(TRY003)


447-447: Use explicit conversion flag

Replace with conversion flag

(RUF010)


457-460: Abstract raise to an inner function

(TRY301)


457-460: Avoid specifying long messages outside the exception class

(TRY003)


458-458: Use explicit conversion flag

Replace with conversion flag

(RUF010)


459-459: Use explicit conversion flag

Replace with conversion flag

(RUF010)


473-473: Abstract raise to an inner function

(TRY301)


473-473: Avoid specifying long messages outside the exception class

(TRY003)


567-567: Avoid specifying long messages outside the exception class

(TRY003)


572-572: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


573-573: Avoid specifying long messages outside the exception class

(TRY003)


607-607: Do not catch blind exception: Exception

(BLE001)


608-608: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (12)
care_scribe/tasks/scribe.py (12)

28-47: Well-structured base prompt constant.

Extracting the prompt into a module-level constant improves maintainability and makes it easier to test or override. The placeholder {current_date_time} is appropriately replaced at runtime.


54-85: LGTM on AI client factory.

The provider-based client instantiation is clean. The Google credentials handling with base64 decoding is appropriate for containerized deployments.


88-130: Clean multi-provider message construction.

The function properly abstracts provider-specific message formats. The file handling with base64 encoding for OpenAI/Azure and native bytes for Google is correct.


174-186: Good edge-case handling for quota recalculation across months.

The logic correctly recalculates used quota when the first request of a new month arrives, preventing edge cases where stale quota data from the previous month would incorrectly block requests.


225-251: Good approach for normalizing field IDs in the schema.

The qN key mapping avoids potential issues with special characters in original field IDs when used in the JSON schema. The reverse mapping at lines 589-593 correctly restores the original keys.


420-452: Retry logic is sound with single retry attempt.

The generate_response function correctly handles the known Gemini issue with malformed function calls by retrying once. The exception chaining is properly implemented.


468-474: Good fix for StopIteration issue.

The next() call now properly uses a default value of None and explicitly checks for the missing function call case with a descriptive error message. This addresses the previous review concern.


478-541: Comprehensive token accounting.

The detailed breakdown of token usage by modality (audio, image, text) and cache status enables accurate quota tracking and cost analysis. The safe fallbacks to None when details aren't available are appropriate.


588-598: Correct response key mapping restoration.

The dictionary comprehension properly maps the qN keys back to original field IDs using enumeration order, maintaining consistency with the schema construction at line 239.


610-615: Good defensive check for Sentry configuration.

Using getattr(settings, "SENTRY_DSN", None) prevents AttributeError when Sentry isn't configured. The fallback to store the traceback in processing metadata ensures error details are preserved in all environments.


544-567: No action needed. The model names "gpt-5", "gpt-5-mini", and "gpt-5-nano" are all valid GPT-5 API model sizes, and temperature is indeed unsupported with these reasoning models. The code's conditional handling of the temperature parameter is correct and necessary.


147-147: No action required. audio_file_ids and document_file_ids are Python properties and are correctly accessed as attributes throughout the code. The @Property decorator allows them to execute method logic while being accessed without parentheses.

Likely an incorrect or invalid review comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants