Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds server-side DICOM→NIfTI conversion and streaming: new wadors NIfTI endpoints, renderer, async utilities to fetch/convert DICOM to NIfTI (and sidecars), client methods to retrieve/iterate multipart NIfTI parts as (filename, BytesIO), multipart parsing, conversion-specific errors, and acceptance tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as AditClient
participant API as WADO API View
participant Utils as wadors_utils
participant DICOM as DICOM Server
participant Converter as DicomToNiftiConverter
participant Renderer as NIfTI Renderer
Client->>API: GET /<ae>/wadors/.../nifti
API->>Utils: wado_retrieve_nifti(query, level)
Utils->>DICOM: fetch DICOM datasets
DICOM-->>Utils: list[Dataset]
Utils->>Converter: convert datasets -> files on disk
Converter-->>Utils: files (filename, BytesIO)
Utils-->>API: AsyncIterator[(filename, BytesIO)]
API->>Renderer: render(async tuples)
Renderer-->>API: multipart byte stream
API-->>Client: HTTP multipart/related response
Client->>Client: _iter_multipart_response() yields (filename, BytesIO)
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @mhumzaarain, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's interoperability by introducing direct DICOM-to-NIfTI conversion capabilities via new WADO-RS API endpoints. It streamlines the process of accessing medical imaging data in a widely used neuroimaging format, improving data accessibility for researchers and applications. The changes also include robust error handling and intelligent filtering to ensure reliable and efficient conversions. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Code Review: WADO-RS NIfTI Conversion API EndpointsSummaryThis PR adds NIfTI conversion endpoints to the DICOMweb API, allowing users to retrieve DICOM data converted to NIfTI format. The implementation includes proper error handling, client library methods, and acceptance tests. Overall, the code quality is good with thoughtful architecture decisions. ✅ Strengths
🔍 Issues & RecommendationsHigh Priority
Medium Priority
Low Priority
🎯 Code Quality
🔒 Security
🚀 Performance
🧪 Test Coverage
📝 Documentation
VerdictThis is a well-implemented feature with thoughtful architecture decisions. The error handling and memory management are particularly well done. The main concerns are:
After addressing the high-priority items, this would be ready to merge. Review generated with assistance from Claude Code |
There was a problem hiding this comment.
Code Review
This pull request introduces new API endpoints for DICOM to NIfTI conversion, along with corresponding client methods. However, a high-severity PII leak was identified due to missing pseudonymization in the new endpoints, a medium-severity HTTP header injection vulnerability in the NIfTI renderer, and a medium-severity insecure filename parsing issue in the client library. Additionally, there is code duplication in both the client and server-side views that could be refactored, a potential high memory usage issue when streaming large NIfTI files, and a fragile implementation detail in the client relying on monkey-patching a third-party library. Addressing these security and maintainability concerns will make the new functionality more robust and scalable.
| class RetrieveNiftiStudyAPIView(RetrieveAPIView): | ||
| renderer_classes = [WadoMultipartApplicationNiftiRenderer] | ||
|
|
||
| async def get( | ||
| self, request: AuthenticatedApiRequest, ae_title: str, study_uid: str | ||
| ) -> StreamingHttpResponse: | ||
| async with self.track_session(request.user) as session: | ||
| source_server = await self._get_dicom_server(request, ae_title) | ||
|
|
||
| query = self.query.copy() | ||
| query["StudyInstanceUID"] = study_uid | ||
|
|
||
| images = wado_retrieve_nifti(source_server, query, "STUDY") | ||
|
|
||
| renderer = cast( | ||
| WadoMultipartApplicationNiftiRenderer, getattr(request, "accepted_renderer") | ||
| ) | ||
| return StreamingHttpResponse( | ||
| streaming_content=_StreamingSessionWrapper( | ||
| renderer.render(images), session, self._finalize_statistic | ||
| ), | ||
| content_type=renderer.content_type, | ||
| ) |
There was a problem hiding this comment.
The get method in RetrieveNiftiStudyAPIView introduces a high-severity PII leak because it does not implement pseudonymization, which is crucial for protecting patient privacy in ADIT. This allows raw DICOM data, including PII, to be converted to NIfTI. Please ensure pseudonymization parameters are extracted and applied during retrieval and conversion, similar to RetrieveStudyAPIView. Furthermore, this method duplicates logic found in RetrieveNiftiSeriesAPIView and RetrieveNiftiImageAPIView, which could be refactored into a shared base class or helper method for better maintainability.
There was a problem hiding this comment.
NIfTI files contain only voxel data and spatial orientation. They have zero DICOM tags, zero patient identity fields, zero PHI by design. The conversion from DICOM → NIfTI is itself the stripping of all patient identity. There is nothing to pseudonymize because there are no identity fields in the output format.
| class RetrieveNiftiSeriesAPIView(RetrieveAPIView): | ||
| renderer_classes = [WadoMultipartApplicationNiftiRenderer] | ||
|
|
||
| async def get( | ||
| self, request: AuthenticatedApiRequest, ae_title: str, study_uid: str, series_uid: str | ||
| ) -> StreamingHttpResponse: | ||
| async with self.track_session(request.user) as session: | ||
| source_server = await self._get_dicom_server(request, ae_title) | ||
|
|
||
| query = self.query.copy() | ||
| query["StudyInstanceUID"] = study_uid | ||
| query["SeriesInstanceUID"] = series_uid | ||
|
|
||
| images = wado_retrieve_nifti(source_server, query, "SERIES") | ||
|
|
||
| renderer = cast( | ||
| WadoMultipartApplicationNiftiRenderer, getattr(request, "accepted_renderer") | ||
| ) | ||
| return StreamingHttpResponse( | ||
| streaming_content=_StreamingSessionWrapper( | ||
| renderer.render(images), session, self._finalize_statistic | ||
| ), | ||
| content_type=renderer.content_type, | ||
| ) |
There was a problem hiding this comment.
The new NIfTI conversion endpoint does not implement pseudonymization. This allows users to retrieve raw DICOM data converted to NIfTI, including PII, even when pseudonymization is expected. Please ensure that pseudonymization parameters are extracted from the request and applied during the retrieval and conversion process.
There was a problem hiding this comment.
NIfTI files contain only voxel data and spatial orientation. They have zero DICOM tags, zero patient identity fields, zero PHI by design. The conversion from DICOM → NIfTI is itself the stripping of all patient identity. There is nothing to pseudonymize because there are no identity fields in the output format.
| class RetrieveNiftiImageAPIView(RetrieveAPIView): | ||
| renderer_classes = [WadoMultipartApplicationNiftiRenderer] | ||
|
|
||
| async def get( | ||
| self, | ||
| request: AuthenticatedApiRequest, | ||
| ae_title: str, | ||
| study_uid: str, | ||
| series_uid: str, | ||
| image_uid: str, | ||
| ) -> StreamingHttpResponse: | ||
| async with self.track_session(request.user) as session: | ||
| source_server = await self._get_dicom_server(request, ae_title) | ||
|
|
||
| query = self.query.copy() | ||
| query["StudyInstanceUID"] = study_uid | ||
| query["SeriesInstanceUID"] = series_uid | ||
| query["SOPInstanceUID"] = image_uid | ||
|
|
||
| images = wado_retrieve_nifti(source_server, query, "IMAGE") | ||
|
|
||
| renderer = cast( | ||
| WadoMultipartApplicationNiftiRenderer, getattr(request, "accepted_renderer") | ||
| ) | ||
| return StreamingHttpResponse( | ||
| streaming_content=_StreamingSessionWrapper( | ||
| renderer.render(images), session, self._finalize_statistic | ||
| ), | ||
| content_type=renderer.content_type, | ||
| ) |
There was a problem hiding this comment.
The new NIfTI conversion endpoint does not implement pseudonymization. This allows users to retrieve raw DICOM data converted to NIfTI, including PII, even when pseudonymization is expected. Please ensure that pseudonymization parameters are extracted from the request and applied during the retrieval and conversion process.
There was a problem hiding this comment.
NIfTI files contain only voxel data and spatial orientation. They have zero DICOM tags, zero patient identity fields, zero PHI by design. The conversion from DICOM → NIfTI is itself the stripping of all patient identity. There is nothing to pseudonymize because there are no identity fields in the output format.
| def _iter_multipart_response(self, response, stream=False) -> Iterator[tuple[str, BytesIO]]: | ||
| """Parse a multipart response, yielding (filename, content) tuples.""" | ||
| dicomweb_client = self._create_dicom_web_client("") | ||
| original_extract_method = dicomweb_client._extract_part_content | ||
|
|
||
| try: | ||
| dicomweb_client._extract_part_content = self._extract_part_content_with_headers | ||
|
|
||
| for part in dicomweb_client._decode_multipart_message(response, stream=stream): | ||
| headers = {} | ||
| content = part | ||
|
|
||
| idx = part.find(b"\r\n\r\n") | ||
| if idx > -1: | ||
| headers_bytes = part[:idx] | ||
| content = part[idx + 4 :] | ||
|
|
||
| for header_line in headers_bytes.split(b"\r\n"): | ||
| if header_line and b":" in header_line: | ||
| name, value = header_line.split(b":", 1) | ||
| headers[name.decode("utf-8").strip()] = value.decode("utf-8").strip() | ||
|
|
||
| content_disposition = headers.get("Content-Disposition") | ||
| if content_disposition: | ||
| filename = self._extract_filename(content_disposition) | ||
| else: | ||
| for header, value in response.headers.items(): | ||
| if header.lower() == "content-disposition": | ||
| filename = self._extract_filename(value) | ||
| break | ||
| else: | ||
| raise ValueError("No Content-Disposition header found in response") | ||
|
|
||
| yield (filename, BytesIO(content)) | ||
| finally: | ||
| dicomweb_client._extract_part_content = original_extract_method |
There was a problem hiding this comment.
The _iter_multipart_response method relies on monkey-patching the private _extract_part_content method of dicomweb_client.DICOMwebClient. This is a fragile approach that can break with future updates to the dicomweb-client library. It would be more robust to handle the HTTP request and multipart parsing without relying on the client's internal implementation for this specific case.
A more robust approach would be to use the underlying requests.Session from the DICOMwebClient to make the GET request and then use a dedicated multipart parsing library (like requests_toolbelt.multipart.decoder) or parse the response stream manually. This would decouple your client from the internal implementation details of dicomweb-client.
There was a problem hiding this comment.
Acknowledged. This limitation is already documented directly in the code with a comment:
NOTE: This method monkey-patches DICOMwebClient._extract_part_content and uses DICOMwebClient._decode_multipart_message — both private APIs. These methods are not part of the public interface and may change without warning. If dicomweb-client is upgraded, verify that these private methods still exist and behave the same way.
The monkey-patch is necessary because the public DICOMwebClient API doesn't expose per-part headers; implementing a full standalone multipart parser would be a larger change. The risk is accepted and documented.
adit/dicom_web/utils/wadors_utils.py
Outdated
| async def _process_single_fetch( | ||
| dicom_images: list[Dataset], | ||
| ) -> AsyncIterator[tuple[str, BytesIO]]: | ||
| """Convert a list of DICOM datasets to NIfTI format and yield the resulting files. | ||
|
|
||
| For each conversion, yields files in order: JSON first, then NIfTI (.nii.gz or .nii), | ||
| then bval, then bvec. | ||
|
|
||
| If conversion fails with NoValidDicomError or NoSpatialDataError, a warning is logged | ||
| because the series was expected to contain image data (non-image modalities are filtered | ||
| out before this function is called). | ||
| """ | ||
| async with TemporaryDirectory() as temp_dir: | ||
| temp_path = Path(temp_dir) | ||
|
|
||
| for file_idx, dicom_image in enumerate(dicom_images): | ||
| dicom_file_path = temp_path / f"dicom_file_{file_idx}.dcm" | ||
| await sync_to_async(write_dataset, thread_sensitive=False)( | ||
| dicom_image, dicom_file_path | ||
| ) | ||
|
|
||
| nifti_output_dir = temp_path / "nifti_output" | ||
| await aiofiles.os.makedirs(nifti_output_dir, exist_ok=True) | ||
| converter = DicomToNiftiConverter() | ||
|
|
||
| try: | ||
| await sync_to_async(converter.convert, thread_sensitive=False)( | ||
| temp_path, nifti_output_dir | ||
| ) | ||
| except (NoValidDicomError, NoSpatialDataError) as e: | ||
| # The series passed the modality check but still failed conversion. | ||
| # This is unexpected and worth logging as a warning. | ||
| logger.warning(f"Series conversion failed unexpectedly: {e}") | ||
| return | ||
| except DcmToNiftiConversionError as e: | ||
| logger.warning(f"Failed to convert DICOM files to NIfTI: {e}") | ||
| return | ||
| except Exception as e: | ||
| logger.error(f"Error during DICOM to NIfTI conversion: {e}") | ||
| raise | ||
|
|
||
| entries = await aiofiles.os.scandir(nifti_output_dir) | ||
| all_files = [entry.name for entry in entries] | ||
|
|
||
| file_pairs: dict[str, dict[str, str]] = {} | ||
| for filename in all_files: | ||
| base_name, ext = os.path.splitext(filename) | ||
| if ext == ".json": | ||
| file_pairs.setdefault(base_name, {})["json"] = filename | ||
| elif ext == ".gz" and base_name.endswith(".nii"): | ||
| actual_base = os.path.splitext(base_name)[0] | ||
| file_pairs.setdefault(actual_base, {})["nifti"] = filename | ||
| elif ext == ".nii": | ||
| file_pairs.setdefault(base_name, {})["nifti"] = filename | ||
| elif ext == ".bval": | ||
| file_pairs.setdefault(base_name, {})["bval"] = filename | ||
| elif ext == ".bvec": | ||
| file_pairs.setdefault(base_name, {})["bvec"] = filename | ||
|
|
||
| for _base_name, files in file_pairs.items(): | ||
| if "json" in files: | ||
| json_file_path = os.path.join(nifti_output_dir, files["json"]) | ||
| async with aiofiles.open(json_file_path, "rb") as f: | ||
| json_content = await f.read() | ||
| yield files["json"], BytesIO(json_content) | ||
|
|
||
| if "nifti" in files: | ||
| nifti_file_path = os.path.join(nifti_output_dir, files["nifti"]) | ||
| async with aiofiles.open(nifti_file_path, "rb") as f: | ||
| nifti_content = await f.read() | ||
| yield files["nifti"], BytesIO(nifti_content) | ||
|
|
||
| if "bval" in files: | ||
| bval_file_path = os.path.join(nifti_output_dir, files["bval"]) | ||
| async with aiofiles.open(bval_file_path, "rb") as f: | ||
| bval_content = await f.read() | ||
| yield files["bval"], BytesIO(bval_content) | ||
|
|
||
| if "bvec" in files: | ||
| bvec_file_path = os.path.join(nifti_output_dir, files["bvec"]) | ||
| async with aiofiles.open(bvec_file_path, "rb") as f: | ||
| bvec_content = await f.read() | ||
| yield files["bvec"], BytesIO(bvec_content) |
There was a problem hiding this comment.
The _process_single_fetch function reads the entire content of each generated NIfTI file into a BytesIO object in memory before yielding it. NIfTI files can be very large, potentially leading to high memory consumption and scalability issues, especially when processing studies with multiple large series.
To improve memory efficiency, you should stream the file content instead of loading it all at once. This would likely require changes in a few places:
_process_single_fetchcould yield an async generator for each file's content that reads it from the temporary file in chunks.- The
WadoMultipartApplicationNiftiRendererwould need to be updated to consume this async generator of chunks instead of calling.getvalue()on aBytesIOobject.
This change would ensure that the server's memory usage remains low regardless of the size of the NIfTI files being served.
There was a problem hiding this comment.
Memory is already bounded by the per-series fetching design: at study level, each series is fetched, converted, and streamed independently before moving to the next one. Loading a single converted NIfTI into a BytesIO before yielding is intentional — aiofiles + asyncio don't support true zero-copy streaming of file content through Django's streaming response without materializing it first. Chunk-based streaming would require a larger architectural change to the renderer and is out of scope for this PR.
adit/dicom_web/renderers.py
Outdated
| yield f"\r\n--{self.boundary}\r\n".encode() | ||
|
|
||
| yield "Content-Type: application/octet-stream\r\n".encode() | ||
| yield f'Content-Disposition: attachment; filename="{filename}"\r\n\r\n'.encode() |
There was a problem hiding this comment.
The filename is inserted into the Content-Disposition header without sanitization. If the filename contains double quotes or line breaks (which could be triggered by malicious DICOM metadata if dcm2niix does not sanitize it), it can lead to HTTP header injection or response splitting. Please sanitize the filename by removing or escaping double quotes and control characters.
There was a problem hiding this comment.
Already addressed. The renderer sanitizes filenames more thoroughly than suggested — it filters to printable characters only and strips double-quotes, capped at 255 characters:
safe_filename = "".join(c for c in filename if c.isprintable() and c != '"')[:255]This strips all control characters (including \r, \n, null bytes) rather than just a fixed blacklist.
| """Extract filename from Content-Disposition header.""" | ||
| if not content_disposition or "filename=" not in content_disposition: | ||
| raise ValueError("No filename found in Content-Disposition header") | ||
| filename = content_disposition.split("filename=")[1].strip('"') |
There was a problem hiding this comment.
The filename extraction from the Content-Disposition header is fragile and insecure. It fails to handle multiple parameters correctly and does not sanitize the resulting filename, which could lead to path traversal in applications using this library if they use the returned filename to save files without further validation. Please use a robust parsing method and sanitize the filename to remove any path components or dangerous characters.
There was a problem hiding this comment.
Path traversal is already mitigated — the code calls os.path.basename(filename) immediately after extraction, which strips any directory components including ../. The simple split on filename= is intentional: our server generates these headers in a controlled format (always filename="name"), so a full RFC 5987 parser would add complexity without benefit. The os.path.basename call is the actual security control here.
| def retrieve_nifti_study(self, ae_title: str, study_uid: str) -> list[tuple[str, BytesIO]]: | ||
| """Retrieve NIfTI files for a study.""" | ||
| url = f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/nifti" | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| return list(self._iter_multipart_response(response, stream=False)) | ||
|
|
||
| def iter_nifti_study(self, ae_title: str, study_uid: str) -> Iterator[tuple[str, BytesIO]]: | ||
| """Iterate over NIfTI files for a study.""" | ||
| url = f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/nifti" | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| yield from self._iter_multipart_response(response, stream=True) | ||
|
|
||
| def retrieve_nifti_series( | ||
| self, ae_title: str, study_uid: str, series_uid: str | ||
| ) -> list[tuple[str, BytesIO]]: | ||
| """Retrieve NIfTI files for a series.""" | ||
| url = ( | ||
| f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/" | ||
| f"series/{series_uid}/nifti" | ||
| ) | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| return list(self._iter_multipart_response(response, stream=False)) | ||
|
|
||
| def iter_nifti_series( | ||
| self, ae_title: str, study_uid: str, series_uid: str | ||
| ) -> Iterator[tuple[str, BytesIO]]: | ||
| """Iterate over NIfTI files for a series.""" | ||
| url = ( | ||
| f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/" | ||
| f"series/{series_uid}/nifti" | ||
| ) | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| yield from self._iter_multipart_response(response, stream=True) | ||
|
|
||
| def retrieve_nifti_image( | ||
| self, ae_title: str, study_uid: str, series_uid: str, image_uid: str | ||
| ) -> list[tuple[str, BytesIO]]: | ||
| """Retrieve NIfTI files for a single image.""" | ||
| url = ( | ||
| f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/" | ||
| f"series/{series_uid}/instances/{image_uid}/nifti" | ||
| ) | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| return list(self._iter_multipart_response(response, stream=False)) | ||
|
|
||
| def iter_nifti_image( | ||
| self, ae_title: str, study_uid: str, series_uid: str, image_uid: str | ||
| ) -> Iterator[tuple[str, BytesIO]]: | ||
| """Iterate over NIfTI files for a single image.""" | ||
| url = ( | ||
| f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/" | ||
| f"series/{series_uid}/instances/{image_uid}/nifti" | ||
| ) | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| yield from self._iter_multipart_response(response, stream=True) |
There was a problem hiding this comment.
There is significant code duplication across the retrieve_nifti_* and iter_nifti_* methods for study, series, and image levels. The logic for URL construction, making the HTTP request, and processing the response is nearly identical in all six methods. This makes the code harder to maintain.
Consider refactoring this logic into one or two private helper methods. For example, you could have a method that constructs the URL and another that performs the request and yields the multipart content. The public methods would then just call these helpers with the appropriate parameters. This would reduce redundancy and improve maintainability.
There was a problem hiding this comment.
Already addressed above — see reply to the equivalent comment.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@adit/core/errors.py`:
- Around line 88-91: The code currently defines NoSpatialDataError but it is
never raised by DicomToNiftiConverter.convert() while
wadors_utils._process_single_fetch catches it alongside NoValidDicomError, so
either remove the dead except clause or make the converter raise the error; to
fix, decide which approach: (A) remove the NoSpatialDataError branch from
wadors_utils._process_single_fetch so only NoValidDicomError is handled, or (B)
update DicomToNiftiConverter.convert to validate spatial attributes (e.g., check
presence of image orientation/position, pixel spacing, slice thickness or other
spatial metadata) and raise NoSpatialDataError with a clear message when those
checks fail (ensure the converter docstring and exception list include
NoSpatialDataError and import/use the NoSpatialDataError symbol), so the handler
in wadors_utils remains correct.
In `@adit/core/utils/dicom_to_nifti_converter.py`:
- Around line 116-117: The except block that catches subprocess.SubprocessError
and raises ExternalToolError loses the original traceback; update the raise so
it chains the original exception (use "raise ExternalToolError(... ) from e")
when handling subprocess.SubprocessError thrown while invoking dcm2niix (the
place referencing ExternalToolError and subprocess.SubprocessError in
dicom_to_nifti_converter.py), so the original exception context is preserved for
debugging.
In `@adit/dicom_web/utils/wadors_utils.py`:
- Around line 211-214: In the except handlers in wadors_utils.py (the blocks
catching RetriableDicomError and DicomError), chain the new exceptions to the
original by using "raise ServiceUnavailableApiError(... ) from err" and "raise
BadGatewayApiError(... ) from err" respectively so the original traceback is
preserved; update the two raise statements inside the except RetriableDicomError
as err and except DicomError as err blocks to use "from err" while keeping the
existing error message string.
- Around line 183-190: Accessing series.Modality can raise AttributeError when
Modality is absent; update the loop in wadors_utils.py (the for series in
series_list block) to defensively read the modality (e.g., via getattr(series,
"Modality", None) or try/except AttributeError) and handle a missing modality by
logging a debug/warn with series.SeriesInstanceUID and continuing (still use
NON_IMAGE_MODALITIES to filter when modality is present). Ensure logger.debug
message still references SeriesInstanceUID and that the code never lets an
AttributeError propagate out of the series iteration.
- Around line 254-256: Replace the logger.error call inside the except Exception
as e handler for the DICOM-to-NIfTI conversion with logger.exception so the
traceback is preserved; specifically, in the except block that currently reads
"except Exception as e: logger.error(f'Error during DICOM to NIfTI conversion:
{e}'); raise", change it to call logger.exception("Error during DICOM to NIfTI
conversion") (keep the subsequent bare raise) so the full stack trace is logged
for the conversion routine that uses the logger instance.
In `@adit/dicom_web/views.py`:
- Around line 336-337: wado_retrieve_nifti should validate the first fetch
before streaming: call the existing peek_images (same check used by
wado_retrieve) to perform an initial _fetch_dicom_data probe and raise/handle
errors early; update wado_retrieve_nifti to invoke peek_images (or replicate its
first-fetch logic) using the same query (including the StudyInstanceUID
assignment) and only proceed to start the streaming generator if peek_images
succeeds so fetch errors aren't emitted as stream payloads.
🧹 Nitpick comments (9)
adit/dicom_web/views.py (1)
327-349: Nopeek_imagescall — conversion errors may surface mid-stream.The existing DICOM retrieve views call
await self.peek_images(images)to catch errors before streaming begins, allowing a proper HTTP error response. The NIfTI views skip this step. Whilewado_retrieve_niftiinternally catchesDcmToNiftiConversionErrorand logs warnings, DICOM-level errors likeRetriableDicomError/DicomErrorare re-raised as API exceptions. If these occur afterStreamingHttpResponsehas already started sending headers, Django cannot return a clean error status code.Consider whether a similar peek mechanism (or pre-flight validation) would be appropriate for the NIfTI endpoints, particularly for the SERIES and IMAGE levels where the fetch happens in one shot before conversion.
adit/dicom_web/tests/acceptance/test_wadors.py (2)
662-667: Usestrict=Trueinzip()for stronger assertion.Although the length check on line 665 guards against mismatches, adding
strict=Truetozip()is a good defensive practice per Ruff B905. Same applies to line 689.♻️ Proposed fix
- for (r_name, _), (i_name, _) in zip(retrieved, iterated): + for (r_name, _), (i_name, _) in zip(retrieved, iterated, strict=True):
585-645: Missing test coverage for image-level NIfTI and error scenarios.There are tests for study-level and series-level NIfTI retrieval but none for:
retrieve_nifti_image/iter_nifti_image(the image-level NIfTI endpoint)- Error scenarios (e.g., non-existent study/series, server without WADO support)
These could be added in a follow-up.
Would you like me to draft acceptance tests for image-level NIfTI retrieval and error cases, or open an issue to track this?
adit/dicom_web/renderers.py (1)
90-118: No error handling in NIfTI renderer — contrast with DICOM renderer.
WadoMultipartApplicationDicomRenderer.render()(lines 45-52) catches exceptions from the upstream iterator and emits an_error_streampart, allowing the client to detect failures mid-stream. The NIfTI renderer has no equivalent: if the upstream iterator raises, the multipart response will be truncated with no error signal.Given that
wadors_utils._process_single_fetchcatches most conversion errors internally, this is low-risk in practice, but an unhandled edge case (e.g., I/O error reading a temp file) would produce a silently truncated response.adit/core/utils/dicom_to_nifti_converter.py (1)
107-114: Success log runs after partial conversion — minor misleading message.When
exit_code == PARTIAL_CONVERSION(8), the method logs a warning but then falls through to line 119 which logs "successfully converted." This could be confusing in logs. Consider returning early or adjusting the success message.Also, the condition on line 111 (
exit_code == UNSPECIFIED_ERROR or exit_code != 0) could be simplified to justexit_code != 0since all known non-zero codes are already handled above.♻️ Suggested simplification
- elif exit_code == DcmToNiftiExitCode.UNSPECIFIED_ERROR or exit_code != 0: + elif exit_code != 0:adit/dicom_web/utils/wadors_utils.py (1)
229-236: All DICOM datasets loaded into memory simultaneously before conversion.
_process_single_fetchreceives the fulllist[Dataset], writes them all to disk, then runs conversion. For large series (e.g., thousands of CT slices), this holds all datasets in memory until the temp files are written. This is an inherent consequence of the synchronous_fetch_dicom_datadesign but worth noting as a scaling concern for study-level requests where multiple series are processed sequentially.adit-client/adit_client/client.py (3)
279-284: Filename extraction is fragile for edge cases.
split("filename=")[1].strip('"')assumes a specific format. It would fail or produce incorrect results if:
- The value uses
filename*=(RFC 5987 encoding)- Multiple
filename=tokens appear in the header- The filename contains escaped quotes
For dcm2niix-generated filenames this is safe in practice, but consider using Python's
email.messageorcgimodule for more robust parsing.♻️ More robust alternative
+ import re + def _extract_filename(self, content_disposition: str) -> str: """Extract filename from Content-Disposition header.""" if not content_disposition or "filename=" not in content_disposition: raise ValueError("No filename found in Content-Disposition header") - filename = content_disposition.split("filename=")[1].strip('"') + match = re.search(r'filename="([^"]+)"', content_disposition) + if not match: + match = re.search(r'filename=(\S+)', content_disposition) + if not match: + raise ValueError("No filename found in Content-Disposition header") + filename = match.group(1) return filename
193-277: Significant duplication across six NIfTI methods — consider a shared helper.The retrieve/iter method pairs for study, series, and image levels share nearly identical structure, differing only in URL path and parameters. A private helper could consolidate this:
♻️ Sketch of a shared helper
def _nifti_request(self, url: str, stream: bool) -> Iterator[tuple[str, BytesIO]]: """Shared helper for NIfTI multipart requests.""" dicomweb_client = self._create_dicom_web_client("") response = dicomweb_client._http_get( url, headers={"Accept": "multipart/related; type=application/octet-stream"}, stream=True, ) if stream: yield from self._iter_multipart_response(response, stream=True) else: yield from self._iter_multipart_response(response, stream=False) def retrieve_nifti_study(self, ae_title: str, study_uid: str) -> list[tuple[str, BytesIO]]: url = f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/nifti" return list(self._nifti_request(url, stream=False)) def iter_nifti_study(self, ae_title: str, study_uid: str) -> Iterator[tuple[str, BytesIO]]: url = f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/nifti" yield from self._nifti_request(url, stream=True) # ... similar for series and image
296-331: Document the dicomweb-client version this multipart parsing approach was tested against.The monkey-patching of
_extract_part_contentis necessary given thatdicomweb-clientprovides no public API for extracting per-part headers from multipart responses. The code already clearly explains this purpose in the_extract_part_content_with_headersdocstring.To improve maintainability, add a version comment to
_iter_multipart_responsedocumenting whichdicomweb-clientversion this was tested against (currently ≥0.60.0 perpyproject.toml). This helps future maintainers quickly identify when to re-verify compatibility after library upgrades. For example:def _iter_multipart_response(self, response, stream=False) -> Iterator[tuple[str, BytesIO]]: """Parse a multipart response, yielding (filename, content) tuples. Note: Uses private API _extract_part_content (tested with dicomweb-client 0.60.0+) to access per-part headers. Verify compatibility after library upgrades. """
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and valuable feature by adding WADO-RS endpoints for DICOM-to-NIfTI conversion. However, a security audit identified two high-severity vulnerabilities: a Path Traversal vulnerability in the adit-client due to improper handling of the Content-Disposition header, and an HTTP Header Injection (CRLF Injection) vulnerability in the WadoMultipartApplicationNiftiRenderer due to unsanitized filenames being placed in response headers. These issues pose serious security risks and require immediate remediation. Additionally, while the implementation is comprehensive and well-structured, feedback focuses on improving maintainability by reducing code duplication in a few key areas.
adit-client/adit_client/client.py
Outdated
| def _extract_filename(self, content_disposition: str) -> str: | ||
| """Extract filename from Content-Disposition header.""" | ||
| if not content_disposition or "filename=" not in content_disposition: | ||
| raise ValueError("No filename found in Content-Disposition header") | ||
| filename = content_disposition.split("filename=")[1].strip('"') | ||
| return filename |
There was a problem hiding this comment.
The _extract_filename function extracts a filename from the Content-Disposition header of an HTTP response. The current implementation does not sanitize the extracted filename for path traversal characters (e.g., ../). An attacker who can control the server's HTTP response can set a malicious Content-Disposition header, such as attachment; filename="../../../etc/passwd". If the user of the AditClient library then uses this filename to write the received data to the filesystem, it can result in writing files to arbitrary locations on the system where the client is run, potentially leading to code execution or system compromise. The filename should be sanitized to remove any directory information.
There was a problem hiding this comment.
Already addressed — the code calls os.path.basename(filename) right after extracting the raw value, which strips all directory components including path traversal sequences like ../. The extracted name is safe to use as a filename component.
adit/dicom_web/renderers.py
Outdated
| yield f"\r\n--{self.boundary}\r\n".encode() | ||
|
|
||
| yield "Content-Type: application/octet-stream\r\n".encode() | ||
| yield f'Content-Disposition: attachment; filename="{filename}"\r\n\r\n'.encode() |
There was a problem hiding this comment.
The render method of the WadoMultipartApplicationNiftiRenderer class constructs a Content-Disposition header using a filename that originates from the output of the dcm2niix tool. The filename format is based on the DICOM Series Description (%s), which can be controlled by an attacker if they can provide a malicious DICOM file. The filename is embedded directly into the response header without sanitization. If an attacker crafts a DICOM file with a SeriesDescription containing CRLF characters (e.g., foo\r\nContent-Type: text/html), the generated filename can break out of the Content-Disposition header and inject arbitrary HTTP headers and body content into the response. This can lead to vulnerabilities like Cross-Site Scripting (XSS), cache poisoning, or session fixation. The filename should be sanitized to remove any characters that are not valid in a filename or could be used for header injection.
There was a problem hiding this comment.
Already addressed. The renderer filters filenames to printable characters only and strips double-quotes before embedding in the header:
safe_filename = "".join(c for c in filename if c.isprintable() and c != '"')[:255]str.isprintable() returns False for all control characters including \r (CR) and \n (LF), so CRLF injection is not possible.
| def retrieve_nifti_study(self, ae_title: str, study_uid: str) -> list[tuple[str, BytesIO]]: | ||
| """Retrieve NIfTI files for a study.""" | ||
| url = f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/nifti" | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| return list(self._iter_multipart_response(response, stream=False)) | ||
|
|
||
| def iter_nifti_study(self, ae_title: str, study_uid: str) -> Iterator[tuple[str, BytesIO]]: | ||
| """Iterate over NIfTI files for a study.""" | ||
| url = f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/nifti" | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| yield from self._iter_multipart_response(response, stream=True) | ||
|
|
||
| def retrieve_nifti_series( | ||
| self, ae_title: str, study_uid: str, series_uid: str | ||
| ) -> list[tuple[str, BytesIO]]: | ||
| """Retrieve NIfTI files for a series.""" | ||
| url = ( | ||
| f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/" | ||
| f"series/{series_uid}/nifti" | ||
| ) | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| return list(self._iter_multipart_response(response, stream=False)) | ||
|
|
||
| def iter_nifti_series( | ||
| self, ae_title: str, study_uid: str, series_uid: str | ||
| ) -> Iterator[tuple[str, BytesIO]]: | ||
| """Iterate over NIfTI files for a series.""" | ||
| url = ( | ||
| f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/" | ||
| f"series/{series_uid}/nifti" | ||
| ) | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| yield from self._iter_multipart_response(response, stream=True) | ||
|
|
||
| def retrieve_nifti_image( | ||
| self, ae_title: str, study_uid: str, series_uid: str, image_uid: str | ||
| ) -> list[tuple[str, BytesIO]]: | ||
| """Retrieve NIfTI files for a single image.""" | ||
| url = ( | ||
| f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/" | ||
| f"series/{series_uid}/instances/{image_uid}/nifti" | ||
| ) | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| return list(self._iter_multipart_response(response, stream=False)) | ||
|
|
||
| def iter_nifti_image( | ||
| self, ae_title: str, study_uid: str, series_uid: str, image_uid: str | ||
| ) -> Iterator[tuple[str, BytesIO]]: | ||
| """Iterate over NIfTI files for a single image.""" | ||
| url = ( | ||
| f"{self.server_url}/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/" | ||
| f"series/{series_uid}/instances/{image_uid}/nifti" | ||
| ) | ||
| dicomweb_client = self._create_dicom_web_client(ae_title) | ||
| response = dicomweb_client._http_get( | ||
| url, | ||
| headers={"Accept": "multipart/related; type=application/octet-stream"}, | ||
| stream=True, | ||
| ) | ||
| yield from self._iter_multipart_response(response, stream=True) |
There was a problem hiding this comment.
There is significant code duplication across the retrieve_nifti_* and iter_nifti_* methods. The logic for URL construction and making the HTTP request is repeated in each method. This can be refactored to improve maintainability and reduce redundancy.
I suggest introducing a private helper method to handle the common logic of making the request. This will make the public methods much cleaner and easier to manage.
def _make_nifti_request(self, path: str):
url = f"{self.server_url}{path}"
dicomweb_client = self._create_dicom_web_client("")
return dicomweb_client._http_get(
url,
headers={"Accept": "multipart/related; type=application/octet-stream"},
stream=True,
)
def retrieve_nifti_study(self, ae_title: str, study_uid: str) -> list[tuple[str, BytesIO]]:
"""Retrieve NIfTI files for a study."""
path = f"/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/nifti"
response = self._make_nifti_request(path)
return list(self._iter_multipart_response(response, stream=False))
def iter_nifti_study(self, ae_title: str, study_uid: str) -> Iterator[tuple[str, BytesIO]]:
"""Iterate over NIfTI files for a study."""
path = f"/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/nifti"
response = self._make_nifti_request(path)
yield from self._iter_multipart_response(response, stream=True)
def retrieve_nifti_series(
self, ae_title: str, study_uid: str, series_uid: str
) -> list[tuple[str, BytesIO]]:
"""Retrieve NIfTI files for a series."""
path = (
f"/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/"
f"series/{series_uid}/nifti"
)
response = self._make_nifti_request(path)
return list(self._iter_multipart_response(response, stream=False))
def iter_nifti_series(
self, ae_title: str, study_uid: str, series_uid: str
) -> Iterator[tuple[str, BytesIO]]:
"""Iterate over NIfTI files for a series."""
path = (
f"/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/"
f"series/{series_uid}/nifti"
)
response = self._make_nifti_request(path)
yield from self._iter_multipart_response(response, stream=True)
def retrieve_nifti_image(
self, ae_title: str, study_uid: str, series_uid: str, image_uid: str
) -> list[tuple[str, BytesIO]]:
"""Retrieve NIfTI files for a single image."""
path = (
f"/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/"
f"series/{series_uid}/instances/{image_uid}/nifti"
)
response = self._make_nifti_request(path)
return list(self._iter_multipart_response(response, stream=False))
def iter_nifti_image(
self, ae_title: str, study_uid: str, series_uid: str, image_uid: str
) -> Iterator[tuple[str, BytesIO]]:
"""Iterate over NIfTI files for a single image."""
path = (
f"/api/dicom-web/{ae_title}/wadors/studies/{study_uid}/"
f"series/{series_uid}/instances/{image_uid}/nifti"
)
response = self._make_nifti_request(path)
yield from self._iter_multipart_response(response, stream=True)There was a problem hiding this comment.
Thanks for the suggestion. I considered this but decided to keep the explicit methods as-is. The "duplication" is only ~3 lines of boilerplate per method (URL construction + _http_get call), and having six clearly-named methods with distinct type signatures provides better IDE support (autocomplete, go-to-definition) and makes each method's URL path immediately visible without indirection. The proposed helper doesn't meaningfully reduce complexity, it just moves it behind another layer
| if exit_code == DcmToNiftiExitCode.SUCCESS: | ||
| pass | ||
| elif exit_code == DcmToNiftiExitCode.NO_DICOM_FOUND: | ||
| raise NoValidDicomError(f"No DICOM images found in input folder: {error_msg}") | ||
| elif exit_code == DcmToNiftiExitCode.VERSION_REPORT: | ||
| logger.info(f"dcm2niix version report: {error_msg}") | ||
| elif exit_code == DcmToNiftiExitCode.CORRUPT_DICOM: | ||
| raise InvalidDicomError(f"Corrupt DICOM file: {error_msg}") | ||
| elif exit_code == DcmToNiftiExitCode.INVALID_INPUT_FOLDER: | ||
| raise InputDirectoryError(f"Input folder invalid: {error_msg}") | ||
| elif exit_code == DcmToNiftiExitCode.INVALID_OUTPUT_FOLDER: | ||
| raise OutputDirectoryError(f"Output folder invalid: {error_msg}") | ||
| elif exit_code == DcmToNiftiExitCode.WRITE_PERMISSION_ERROR: | ||
| raise OutputDirectoryError( | ||
| f"Unable to write to output folder (check permissions): {error_msg}" | ||
| ) | ||
| elif exit_code == DcmToNiftiExitCode.PARTIAL_CONVERSION: | ||
| logger.warning(f"Converted some but not all input DICOMs: {error_msg}") | ||
| elif exit_code == DcmToNiftiExitCode.RENAME_ERROR: | ||
| raise DcmToNiftiConversionError(f"Unable to rename files: {error_msg}") | ||
| elif exit_code == DcmToNiftiExitCode.UNSPECIFIED_ERROR or exit_code != 0: | ||
| raise DcmToNiftiConversionError( | ||
| f"Unspecified error (exit code {exit_code}): {error_msg}" | ||
| ) |
There was a problem hiding this comment.
The long if/elif chain to handle dcm2niix exit codes is functional but can be made more concise and maintainable. Using a dictionary to map exit codes to exceptions and messages would make this logic more data-driven and easier to extend if more exit codes need to be handled in the future.
error_map = {
DcmToNiftiExitCode.NO_DICOM_FOUND: (NoValidDicomError, f"No DICOM images found in input folder: {error_msg}"),
DcmToNiftiExitCode.CORRUPT_DICOM: (InvalidDicomError, f"Corrupt DICOM file: {error_msg}"),
DcmToNiftiExitCode.INVALID_INPUT_FOLDER: (InputDirectoryError, f"Input folder invalid: {error_msg}"),
DcmToNiftiExitCode.INVALID_OUTPUT_FOLDER: (OutputDirectoryError, f"Output folder invalid: {error_msg}"),
DcmToNiftiExitCode.WRITE_PERMISSION_ERROR: (OutputDirectoryError, f"Unable to write to output folder (check permissions): {error_msg}"),
DcmToNiftiExitCode.RENAME_ERROR: (DcmToNiftiConversionError, f"Unable to rename files: {error_msg}"),
}
if exit_code == DcmToNiftiExitCode.SUCCESS:
pass
elif exit_code == DcmToNiftiExitCode.VERSION_REPORT:
logger.info(f"dcm2niix version report: {error_msg}")
elif exit_code == DcmToNiftiExitCode.PARTIAL_CONVERSION:
logger.warning(f"Converted some but not all input DICOMs: {error_msg}")
elif exit_code in error_map:
exc_class, msg = error_map[exit_code]
raise exc_class(msg)
elif exit_code != 0:
raise DcmToNiftiConversionError(
f"Unspecified error (exit code {exit_code}): {error_msg}"
)There was a problem hiding this comment.
I prefer keeping the if/elif chain here. The exit codes have three distinct behaviors: raise an exception, log and continue, or do nothing (SUCCESS), which don't all fit cleanly into a dictionary lookup. The suggested refactor still needs if/elif for SUCCESS, VERSION_REPORT, and PARTIAL_CONVERSION, so it splits the logic across two patterns (dict + if/elif) instead of one consistent chain. The current version is 24 lines, reads top-to-bottom, and each case is immediately visible. I don't think the indirection improves readability or maintainability for a fixed set of 10 exit codes that rarely changes.
adit/dicom_web/utils/wadors_utils.py
Outdated
| for _base_name, files in file_pairs.items(): | ||
| if "json" in files: | ||
| json_file_path = os.path.join(nifti_output_dir, files["json"]) | ||
| async with aiofiles.open(json_file_path, "rb") as f: | ||
| json_content = await f.read() | ||
| yield files["json"], BytesIO(json_content) | ||
|
|
||
| if "nifti" in files: | ||
| nifti_file_path = os.path.join(nifti_output_dir, files["nifti"]) | ||
| async with aiofiles.open(nifti_file_path, "rb") as f: | ||
| nifti_content = await f.read() | ||
| yield files["nifti"], BytesIO(nifti_content) | ||
|
|
||
| if "bval" in files: | ||
| bval_file_path = os.path.join(nifti_output_dir, files["bval"]) | ||
| async with aiofiles.open(bval_file_path, "rb") as f: | ||
| bval_content = await f.read() | ||
| yield files["bval"], BytesIO(bval_content) | ||
|
|
||
| if "bvec" in files: | ||
| bvec_file_path = os.path.join(nifti_output_dir, files["bvec"]) | ||
| async with aiofiles.open(bvec_file_path, "rb") as f: | ||
| bvec_content = await f.read() | ||
| yield files["bvec"], BytesIO(bvec_content) |
There was a problem hiding this comment.
The logic for yielding the different file types (json, nifti, bval, bvec) is repetitive. Each file type has its own if block with nearly identical code for opening, reading, and yielding the file content. This can be simplified to reduce duplication.
Consider using a loop over a predefined list of file types to handle this more elegantly.
| for _base_name, files in file_pairs.items(): | |
| if "json" in files: | |
| json_file_path = os.path.join(nifti_output_dir, files["json"]) | |
| async with aiofiles.open(json_file_path, "rb") as f: | |
| json_content = await f.read() | |
| yield files["json"], BytesIO(json_content) | |
| if "nifti" in files: | |
| nifti_file_path = os.path.join(nifti_output_dir, files["nifti"]) | |
| async with aiofiles.open(nifti_file_path, "rb") as f: | |
| nifti_content = await f.read() | |
| yield files["nifti"], BytesIO(nifti_content) | |
| if "bval" in files: | |
| bval_file_path = os.path.join(nifti_output_dir, files["bval"]) | |
| async with aiofiles.open(bval_file_path, "rb") as f: | |
| bval_content = await f.read() | |
| yield files["bval"], BytesIO(bval_content) | |
| if "bvec" in files: | |
| bvec_file_path = os.path.join(nifti_output_dir, files["bvec"]) | |
| async with aiofiles.open(bvec_file_path, "rb") as f: | |
| bvec_content = await f.read() | |
| yield files["bvec"], BytesIO(bvec_content) | |
| file_order = ["json", "nifti", "bval", "bvec"] | |
| for _base_name, files in file_pairs.items(): | |
| for file_type in file_order: | |
| if file_type in files: | |
| filename = files[file_type] | |
| file_path = os.path.join(nifti_output_dir, filename) | |
| async with aiofiles.open(file_path, "rb") as f: | |
| content = await f.read() | |
| yield filename, BytesIO(content) |
There was a problem hiding this comment.
Good suggestion
…s, and simplify file-yielding loop
Pull Request Review: WADO-RS NIfTI Conversion API EndpointsThis is a solid implementation that adds NIfTI conversion capabilities to the DICOMweb API. Here's my detailed feedback: ✅ Strengths
🔧 Code Quality Issues1. Potential Resource Leak in Client (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@adit/dicom_web/renderers.py`:
- Around line 101-120: The generator in render (streaming_content) emits only a
closing boundary when images yields zero items because first_part remains True;
modify streaming_content to track whether any part was yielded (e.g., use a
boolean like first_part or parts_emitted) and after the async for loop only emit
the closing boundary if at least one part was emitted—otherwise yield nothing
(or raise/return an empty body) so no standalone closing boundary is sent;
update references in streaming_content (first_part, boundary, images, render)
accordingly.
In `@adit/dicom_web/utils/wadors_utils.py`:
- Around line 246-253: The exception handlers in _process_single_fetch currently
swallow NoValidDicomError, NoSpatialDataError, and DcmToNiftiConversionError and
just return, which yields an empty response for non-STUDY requests; update the
except blocks in _process_single_fetch so that if the request level (the
function's level parameter) is not "STUDY" you re-raise the caught exception (or
raise an appropriate HTTP/ValueError) instead of silently returning, and only
log-and-return for level == "STUDY"; ensure the change is consistent with how
wado_retrieve_nifti expects to handle errors so callers at SERIES/IMAGE levels
receive an error response rather than an empty multipart payload.
- Around line 258-259: The call to aiofiles.os.scandir in wadors_utils.py should
be used as an async context manager to guarantee the underlying OS file
descriptor is closed; replace the current direct await
aiofiles.os.scandir(nifti_output_dir) usage (the entries -> all_files list
comprehension) with a with (await aiofiles.os.scandir(nifti_output_dir)) as
entries: block and then build all_files = [entry.name for entry in entries]
inside that block so entries is always cleaned up even on error or interruption.
🧹 Nitpick comments (5)
adit/dicom_web/tests/acceptance/test_wadors.py (1)
665-669: Addstrict=Truetozip()calls (Ruff B905).The preceding assertion already checks equal lengths, but
strict=Trueprovides a safety net if that assertion is ever removed or reordered.🔧 Proposed fix
- for (r_name, _), (i_name, _) in zip(retrieved, iterated): + for (r_name, _), (i_name, _) in zip(retrieved, iterated, strict=True):Apply the same change on lines 693 and 756.
Also applies to: 690-694, 753-757
adit/dicom_web/utils/wadors_utils.py (2)
229-244: dcm2niix input directory includes thenifti_outputsubdirectory.
temp_pathis used as both the parent for DICOM files and the location ofnifti_output_dir. Since dcm2niix scans the input directory recursively, it will also scan intonifti_output/. On the first conversion call this is harmless (the output dir is empty), but it's fragile — if the code is ever refactored to run multiple conversions into the same temp dir, stale outputs could interfere.Consider placing DICOM files in their own subdirectory (e.g.,
temp_path / "dicom_input").🔧 Proposed fix
async with TemporaryDirectory() as temp_dir: temp_path = Path(temp_dir) + dicom_input_dir = temp_path / "dicom_input" + await aiofiles.os.makedirs(dicom_input_dir, exist_ok=True) for file_idx, dicom_image in enumerate(dicom_images): - dicom_file_path = temp_path / f"dicom_file_{file_idx}.dcm" + dicom_file_path = dicom_input_dir / f"dicom_file_{file_idx}.dcm" await sync_to_async(write_dataset, thread_sensitive=False)( dicom_image, dicom_file_path ) nifti_output_dir = temp_path / "nifti_output" await aiofiles.os.makedirs(nifti_output_dir, exist_ok=True) converter = DicomToNiftiConverter() try: await sync_to_async(converter.convert, thread_sensitive=False)( - temp_path, nifti_output_dir + dicom_input_dir, nifti_output_dir )
118-156:_fetch_dicom_datalargely duplicateswado_retrieve's fetch logic.Both functions build a
QueryDataset, dispatch onlevel, and call the sameoperator.fetch_*methods. The only difference is sync vs async and the callback accumulating into a list vs a queue. Consider extracting the shared dispatch into a helper to reduce duplication.Also, the
asserton line 145 will be stripped underpython -O. The existingwado_retrieve(line 76) has the same issue, so this is consistent, but worth noting.adit-client/adit_client/client.py (2)
297-332: Monkey-patchingDICOMwebClient._extract_part_contentis fragile.This approach depends on the internal structure of
dicomweb-client, which may change without notice across versions. Additionally, line 299 creates a throwawayDICOMwebClientwith an emptyae_titlesolely to access_decode_multipart_message, allocating an unnecessary HTTP session.Consider implementing a standalone multipart parser (or using a library like
requests-toolbelt) instead of patching the library internals. This would decouple the NIfTI streaming from thedicomweb-clientlibrary's private API.
194-278: All six NIfTI methods usedicomweb_client._http_get(private API).This is the same concern as the monkey-patching: coupling to
DICOMwebClientinternals. If you do proceed with this approach, consider usingrequests.Sessiondirectly (which is already configured via_create_dicom_web_client) to avoid depending on private methods.Also note
dicomweb_clientis instantiated on e.g., lines 197 and 208 but only used for_http_get; the same client instance could be reused within_iter_multipart_responseinstead of creating a second one on line 299.
PR Review: Add WADO-RS NIfTI conversion API endpointsSummaryThis PR adds DICOM-to-NIfTI conversion capabilities to the WADO-RS API, introducing three new endpoints at study, series, and instance levels. The implementation includes granular error handling, modality-based pre-filtering, and streaming multipart responses with associated metadata files. Code Quality & Best PracticesStrengths
Potential Issues & ImprovementsCritical Issues
Medium Priority Issues
Security Concerns
Test CoverageGood coverage of happy paths, but missing:
RecommendationsMust Fix (Before Merge)
Should Fix
Nice to Have
ConclusionSolid implementation with good architecture. However, address security and resource management concerns before merging, particularly filename sanitization, resource limits, and cleanup in exception paths. Overall: Approve with changes requested Great work! Code quality is high, but please address the critical issues. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@adit/dicom_web/utils/wadors_utils.py`:
- Around line 108-118: The exception handlers in wadors_utils.py must re-raise
API errors using exception chaining to preserve tracebacks: in the
ExceptionGroup loop inside the try/except, change the raises to "raise
ServiceUnavailableApiError(str(exc)) from exc" and "raise
BadGatewayApiError(str(exc)) from exc" when inspecting each exc; likewise update
the individual handlers to "raise ServiceUnavailableApiError(str(err)) from err"
in the except RetriableDicomError block and "raise BadGatewayApiError(str(err))
from err" in the except DicomError block so the original exceptions (exc/err)
are attached as the __cause__.
🧹 Nitpick comments (2)
adit/dicom_web/utils/wadors_utils.py (2)
121-159: Duplicated fetch dispatch logic.The STUDY/SERIES/IMAGE dispatch (Lines 134–157) mirrors
fetch_with_sentinel(Lines 67–91) nearly verbatim. Consider extracting a shared helper that accepts a callback, reducing the maintenance burden of keeping both branches in sync.
235-237: Dispatchingsync_to_asyncper file is inefficient for large series.Each iteration dispatches a separate thread-pool task for a single
write_datasetcall. For a series with hundreds of DICOM images, this creates significant overhead. Batch all writes into onesync_to_asynccall.♻️ Proposed refactor
- for file_idx, dicom_image in enumerate(dicom_images): - dicom_file_path = temp_path / f"dicom_file_{file_idx}.dcm" - await sync_to_async(write_dataset, thread_sensitive=False)(dicom_image, dicom_file_path) + def _write_all_dicoms(images: list[Dataset], dest: Path) -> None: + for file_idx, dicom_image in enumerate(images): + write_dataset(dicom_image, dest / f"dicom_file_{file_idx}.dcm") + + await sync_to_async(_write_all_dicoms, thread_sensitive=False)(dicom_images, temp_path)
Pull Request Review: Add WADO-RS NIfTI Conversion API EndpointsThis is a well-structured PR that adds a useful feature for converting DICOM to NIfTI format via REST API endpoints. Overall the implementation is solid with good test coverage. ✅ Strengths
🐛 Potential Bugs (High Priority)1. Race condition in sentinel delivery (adit/dicom_web/utils/wadors_utils.py:660-708) If the async task is cancelled before the sentinel is placed, the consumer loop will hang forever waiting for None. Add proper cancellation handling with try/except asyncio.CancelledError. 2. Inconsistent exception handling (adit/dicom_web/utils/wadors_utils.py:723-737) The nested fetch_with_sentinel() function could raise exceptions (e.g., ValueError for invalid level) that would not be handled properly. Add a catch-all exception handler. 3. Potential filename injection vulnerability (adit/dicom_web/renderers.py:348-349) While filenames are sanitized by removing some characters, consider using a more robust sanitization approach with a whitelist of allowed characters and length limits.
|
Pull Request Review: Add WADO-RS NIfTI conversion API endpointsOverviewThis PR adds three new WADO-RS endpoints for streaming DICOM-to-NIfTI converted files at study, series, and instance levels. The implementation includes robust error handling, modality filtering, and client library support. ✅ Strengths1. Excellent Error Handling
2. Performance & Memory Efficiency
3. Smart Modality Filtering
4. Comprehensive Testing
🔍 Issues & RecommendationsCritical Issues1. Missing NoSpatialDataError Definition
|
…ame sanitization in renderers
- Track sentinel receipt in wado_retrieve to avoid cancelling fetch_task when consumer exits normally after seeing the sentinel - Update test fake_run stubs to create output folders/files so converter validation passes
…ng it Let DcmToNiftiConversionError bubble up from _process_single_fetch and add peek_nifti_files to eagerly surface conversion errors before streaming begins, so callers receive proper error responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Raise DcmToNiftiConversionError instead of silently falling through on VERSION_REPORT (exit code 3) in DicomToNiftiConverter - Decode dcm2niix stdout/stderr with errors="replace" to prevent UnicodeDecodeError on non-UTF-8 DICOM tag values - Add try/except and _error_stream to WadoMultipartApplicationNiftiRenderer so mid-stream errors yield a valid closing boundary (RFC-compliant) - Catch DcmToNiftiConversionError in wado_retrieve_nifti to avoid raw 500s - Add raise_for_status() before multipart parsing in all NIfTI client methods - Replace assert with explicit ValueError for IMAGE-level SeriesInstanceUID check - Use getattr(series, "Modality", None) to handle non-compliant PACS responses - Replace aiofiles.os.scandir with listdir to avoid unclosed directory handle - Add PatientID to find_series query for PACS compatibility - Add load-bearing comment to peek_nifti_files explaining why the peek must not be removed
Summary
/niftiat study, series, and instance level)that stream DICOM-to-NIfTI converted files as multipart HTTP responses
dcm2niixexit codes with a typed exceptionhierarchy (
DcmToNiftiConversionErrorand subclasses)fetching, with warnings logged for unexpected conversion failures
adit-clientmethods (retrieve_nifti_*/iter_nifti_*) forprogrammatic consumption of the new endpoints
Specification
New Endpoints
Three WADO-RS endpoints for on-the-fly DICOM-to-NIfTI conversion, appended as
/niftito the standard WADO-RS retrieve paths:{ae_title}/wadors/studies/{study_uid}/niftiRetrieveNiftiStudyAPIView{ae_title}/wadors/studies/{study_uid}/series/{series_uid}/niftiRetrieveNiftiSeriesAPIView{ae_title}/wadors/studies/{study_uid}/series/{series_uid}/instances/{image_uid}/niftiRetrieveNiftiImageAPIViewResponse format:
multipart/related; type=application/octet-streamwith boundarynifti-boundary. Each part includes aContent-Dispositionheader with the output filename. Files are yielded in a deterministic order per conversion: JSON sidecar first, then NIfTI (.nii.gz/.nii), then.bval, then.bvec.Conversion Pipeline
DicomOperator(DIMSE).SR,KO,PR) are skipped before fetching to avoid unnecessary network and conversion overhead.dcm2niix(-f "%s-%d" -z y).aiofilesand streamed as a multipart response. Study-level requests process one series at a time to bound memory usage.Error Handling
dcm2niix Exit-Code Mapping
A typed exception hierarchy (
DcmToNiftiConversionErrorand subclasses) maps eachdcm2niixexit code to a specific exception:DcmToNiftiConversionErrorNoValidDicomErrorInvalidDicomErrorInputDirectoryErrorOutputDirectoryErrorOutputDirectoryErrorDcmToNiftiConversionErrorGraceful Degradation
NoValidDicomErrorandNoSpatialDataErrorare logged as warnings and the series is skipped — the request continues with remaining series.InvalidDicomError,ExternalToolError,OutputDirectoryError, and the baseDcmToNiftiConversionErrorfor exit codes 1 and 9) are logged as errors and re-raised, so callers can identify the exact failure.RetriableDicomError→ 503, otherDicomError→ 502.Modality Pre-Filtering
At study level, a QIDO-RS-style series query is issued first. Series whose
Modalityis in{"SR", "KO", "PR"}are skipped with a debug log. This prevents fetching large structured-report or presentation-state series that cannot produce NIfTI output.adit-client Methods
Six new public methods on
AditClient, in two flavors per level:retrieve_nifti_study(ae_title, study_uid)list[tuple[str, BytesIO]]iter_nifti_study(ae_title, study_uid)Iterator[tuple[str, BytesIO]]retrieve_nifti_series(ae_title, study_uid, series_uid)list[tuple[str, BytesIO]]iter_nifti_series(ae_title, study_uid, series_uid)Iterator[tuple[str, BytesIO]]retrieve_nifti_image(ae_title, study_uid, series_uid, image_uid)list[tuple[str, BytesIO]]iter_nifti_image(ae_title, study_uid, series_uid, image_uid)Iterator[tuple[str, BytesIO]]Security
Filenames in
Content-Dispositionheaders are sanitized:\r,\n, and"characters are stripped to prevent header injection.Files Changed (8 files)
adit/dicom_web/urls.pyadit/dicom_web/views.pyadit/dicom_web/renderers.pyWadoMultipartApplicationNiftiRendereradit/dicom_web/utils/wadors_utils.pywado_retrieve_nifti(),_process_single_fetch(),_fetch_dicom_data(), modality filteringadit/core/errors.pyDcmToNiftiConversionErrorhierarchy (6 subclasses)adit/core/utils/dicom_to_nifti_converter.pyDcmToNiftiExitCodeenum, exit-code-to-exception mappingadit-client/adit_client/client.pyTest Coverage
49 tests total (43 unit + 6 acceptance), covering all new components.
Unit Tests (43 tests)
adit/core/tests/utils/test_dicom_to_nifti_converter.pyadit/dicom_web/tests/utils/test_wado_retrieve_nifti.pyNoValidDicomError/NoSpatialDataErrorgraceful handling, 'DcmToNiftiConversionError' propagation, error wrapping (RetriableDicomError-> 503,DicomError-> 502)adit/dicom_web/tests/test_renderers.py\r,\n,"stripped),content_typepropertyadit-client/tests/test_nifti_client.py_extract_filename(valid, path stripping, missing header, no filename field),_extract_part_content_with_headers(boundary markers, empty bytes, normal content),_iter_multipart_response(header parsing, response-header fallback, missing disposition error)Acceptance Tests (6 tests)
adit/dicom_web/tests/acceptance/test_wadors.pyretrieve_*/iter_*), verifying.nii.gz+.jsonpresence, non-empty content, and filename consistency between retrieve and iterDesign Decisions
PseudonymorTrialProtocolIDparameters (unlike DICOM retrieve endpoints), since manipulation is not applicable to NIfTI output.API Usage (adit-client)
Return type: Each method returns
(filename, BytesIO)tuples. Typical filenames include:series_description-sequence_name.json— JSON sidecar with DICOM metadataseries_description-sequence_name.nii.gz— compressed NIfTI imageseries_description-sequence_name.bval— b-values (diffusion MRI only)series_description-sequence_name.bvec— b-vectors (diffusion MRI only)Summary by CodeRabbit
New Features
Improvements
Tests