Update maisi readme and add input check to controlnet infer script (#1825)

guopengf · pre-commit-ci[bot] · web-flow · commit 69ccafdd382d · 2024-09-16T10:03:33.000+08:00
Fixes # .

### Description
1. We provide several recommended spacing parameters for different
output sizes in README
2. Add an additional input check in the ControlNet inference script to
prevent generating images with small FOV.

### Checks
&lt;!--- Put an `x` in all the boxes that apply, and remove the not
applicable items --&gt;
- [x] Avoid including large-size files in the PR.
- [x] Clean up long text outputs from code cells in the notebook.
- [x] For security purposes, please check the contents and remove any
sensitive info such as user names and private key.
- [x] Ensure (1) hyperlinks and markdown anchors are working (2) use
relative paths for tutorial repo files (3) put figure and graphs in the
`./figure` folder
- [x] Notebook runs automatically `./runner.sh -t &lt;path to .ipynb file&gt;`

---------

Signed-off-by: Pengfei Guo &lt;pengfeig@nvidia.com&gt;
Co-authored-by: pre-commit-ci[bot] &lt;66853113+pre-commit-ci[bot]@users.noreply.github.com&gt;
diff --git a/generation/maisi/README.md b/generation/maisi/README.md
@@ -68,6 +68,14 @@ The information for the inference input, like body region and anatomy to generat
 
 To generate images with substantial dimensions, such as 512 &times; 512 &times; 512 or larger, using GPUs with 80GB of memory, it is advisable to configure the `"num_splits"` parameter in [the auto-encoder configuration](./configs/config_maisi.json#L11-L37) to 16. This adjustment is crucial to avoid out-of-memory issues during inference.
 
+#### Recommended spacing for different output sizes:
+
+|`output_size`| Recommended `"spacing"`|
+|:-----:|:-----:|
+[256, 256, 256]  | [1.5, 1.5, 1.5] |
+[512, 512, 128]  | [0.8, 0.8, 2.5] |
+[512, 512, 512]  | [1.0, 1.0, 1.0] |
+
 #### Execute Inference:
 To run the inference script, please run:
 ```bash
diff --git a/generation/maisi/scripts/infer_controlnet.py b/generation/maisi/scripts/infer_controlnet.py
@@ -23,7 +23,7 @@
 from monai.transforms import SaveImage
 from monai.utils import RankFilter
 
-from .sample import ldm_conditional_sample_one_image
+from .sample import check_input, ldm_conditional_sample_one_image
 from .utils import define_instance, prepare_maisi_controlnet_json_dataloader, setup_ddp
 
 
@@ -150,10 +150,13 @@ def main():
         top_region_index_tensor = batch["top_region_index"].to(device)
         bottom_region_index_tensor = batch["bottom_region_index"].to(device)
         spacing_tensor = batch["spacing"].to(device)
+        out_spacing = tuple((batch["spacing"].squeeze().numpy() / 100).tolist())
         # get target dimension
         dim = batch["dim"]
         output_size = (dim[0].item(), dim[1].item(), dim[2].item())
         latent_shape = (args.latent_channels, output_size[0] // 4, output_size[1] // 4, output_size[2] // 4)
+        # check if output_size and out_spacing are valid.
+        check_input(None, None, None, output_size, out_spacing, None)
         # generate a single synthetic image using a latent diffusion model with controlnet.
         synthetic_images, _ = ldm_conditional_sample_one_image(
             autoencoder,