From e180c33b94fd3898829a35133bce7f892ce78545 Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:08:39 -0400 Subject: [PATCH 01/10] Delete docs/hpc/09_ood/02_CellACDC.mdx --- docs/hpc/09_ood/02_CellACDC.mdx | 32 -------------------------------- 1 file changed, 32 deletions(-) delete mode 100644 docs/hpc/09_ood/02_CellACDC.mdx diff --git a/docs/hpc/09_ood/02_CellACDC.mdx b/docs/hpc/09_ood/02_CellACDC.mdx deleted file mode 100644 index 6c8f490c0b..0000000000 --- a/docs/hpc/09_ood/02_CellACDC.mdx +++ /dev/null @@ -1,32 +0,0 @@ -# Cell-ACDC in OOD - -[Cell-ACDC](https://cell-acdc.readthedocs.io) is a GUI-based Python framework for segmentation, tracking, cell cycle annotations and quantification of microscopy data. - -## Getting Started -You can run Cell-ACDC in OOD by going to the URL [ood.torch.hpc.nyu.edu](http://ood.torch.hpc.nyu.edu) in your browser and selecting `Cell-ACDC` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header. - -:::note -Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your Cell-ACDC session differently, you'll need to select it from the menu. -::: - -## Configuration - -You can select the number of cores, amount of memory, and number of hours. - -![OOD Cell-ACDC Configuration](./static/ood_cellacdc_config.png) - -## Cell-ACDC running in OOD - -After you hit the `Launch` button you'll have to wait for the scheduler to find node(s) for you to run on: -![OOD Cell-ACDC in queue](./static/ood_cellacdc_in_queue.png) - -Then you'll have a short wait for the Cell-ACDC itself to start up.
-Once that happens you'll get one last page that will give you links to: -- open a terminal window on the compute node your Cell-ACDC session is running on -- go to the directory associated with your Session ID that stores output, config and other related files for your session -- make changes to compression and image qualtiy -- get a link that you can share that will allow others to view your Cell-ACDC session - -![Pre-launch Cell-ACDC OOD](./static/ood_cellacdc_prelaunch.png) - -Please click the `Launch Cell-ACDC` button and a Cell-ACDC window will open. From 8a420df6fa56ed7a4b9b8597dc7f0594f8b49bc2 Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:10:16 -0400 Subject: [PATCH 02/10] Delete docs/hpc/09_ood/03_Dask.mdx --- docs/hpc/09_ood/03_Dask.mdx | 113 ------------------------------------ 1 file changed, 113 deletions(-) delete mode 100644 docs/hpc/09_ood/03_Dask.mdx diff --git a/docs/hpc/09_ood/03_Dask.mdx b/docs/hpc/09_ood/03_Dask.mdx deleted file mode 100644 index ce2a371652..0000000000 --- a/docs/hpc/09_ood/03_Dask.mdx +++ /dev/null @@ -1,113 +0,0 @@ -# Dask in Jupyter Notebook in OOD - -[Dask](https://docs.dask.org/en/stable/) is a Python library for parallel and distributed computing. - -## Getting Started -You can run Dask in a Jupyter Notebook in OOD by going to the URL [ood.torch.hpc.nyu.edu](http://ood.torch.hpc.nyu.edu) in your browser and selecting `DS-GA.1004 - Jupyter Dask` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header. - -:::note -Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your Dask session differently, you'll need to select it from the menu. -::: - -## Configuration - -You can select the Dask version, number of cores, amount of memory, root directory, number of hours, and optional Slurm options. - -![OOD Dask Configuration](./static/ood_dask_config.png) - -:::warning -If you select to use `/home` as your root directory be careful not to go over your quota. You can find your current usage with the `myquota` command. Please see our [Storage documentation](../03_storage/01_intro_and_data_management.mdx) for details about your storage options. -::: - -## Dask with Jupyter Notebook running in OOD - -After you hit the `Launch` button you'll have to wait for the scheduler to find node(s) for you to run on: -![OOD Dask in queue](./static/ood_dask_in_queue.png) - -Then you'll have a short wait for Dask itself to start up.
-Once that happens you'll get one last page that will give you links to: -- open a terminal window on the compute node your Dask session is running on -- go to the directory associated with your Session ID that stores output, config and other related files for your session - -![Pre-launch Dask OOD](./static/ood_dask_prelaunch.png) - -Please click the `Connect to Jupyter` button and a Jupyter window will open. - -## Dask Example - -Start a new Jupyter notebook with 4 cores, 16GB memory, and set your root directory to `/scratch`. Enter the following code in the first cell and execute it by pressing the `Shift` and `Enter` keys at the same time. -```python -import os -import pandas as pd -import numpy as np -import time - -# Create a directory for the large files -output_dir = "tmp/large_data_files" -os.makedirs(output_dir, exist_ok=True) - -num_files = 5 # Number of files to create -rows_per_file = 10_000_000 # 10 million rows per file -for i in range(num_files): - data = { - 'col1': np.random.randint(0, 100, size=rows_per_file), - 'value': np.random.rand(rows_per_file) * 100 - } - df = pd.DataFrame(data) - df.to_csv(os.path.join(output_dir, f'data_{i}.csv'), index=False) -print(f"{num_files} large CSV files created in '{output_dir}'.") - -import dask.dataframe as dd -from dask.distributed import Client -import time -import os - -# Start a Dask client for distributed processing (optional but recommended) -# This allows you to monitor the computation with the Dask dashboard -client = Client(n_workers=4, threads_per_worker=2, memory_limit='16GB') # Adjust these as per your system resources -print(client) - -# Load multiple CSV files into a Dask DataFrame -# Dask will automatically partition and parallelize the reading of these files -output_dir = '/scratch/rjy1/tmp/large_data_files' -dask_df = dd.read_csv(os.path.join(output_dir, 'data_*.csv')) - -# Perform a calculation (e.g., calculate the mean of the 'value' column) -# This operation will be parallelized across the available workers -result_dask = dask_df['value'].mean() - -# Trigger the computation and measure the time -start_time = time.time() -computed_result_dask = result_dask.compute() -end_time = time.time() - -print(f"Dask took {end_time - start_time} seconds to compute the mean across {num_files} files.") -print(f"Result (Dask): {computed_result_dask}") - -import pandas as pd -import time -import os - -# Perform the same calculation sequentially with Pandas -start_time_pandas = time.time() -total_mean = 0 -total_count = 0 -for i in range(num_files): - df = pd.read_csv(os.path.join(output_dir, f'data_{i}.csv')) - total_mean += df['value'].sum() - total_count += len(df) -computed_result_pandas = total_mean / total_count -end_time_pandas = time.time() - -print(f"Pandas took {end_time_pandas - start_time_pandas} seconds to compute the mean across {num_files} files.") -print(f"Result (Pandas): {computed_result_pandas}") -``` -You should get output like: -``` -5 large CSV files created in 'tmp/large_data_files'. - -Dask took 3.448112726211548 seconds to compute the mean across 5 files. -Result (Dask): 50.010815178612596 -Pandas took 9.641847610473633 seconds to compute the mean across 5 files. -Result (Pandas): 50.01081517861258 -``` From db7011a8a310286ddd06d2a84789a712a3237a1f Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:11:03 -0400 Subject: [PATCH 03/10] Delete docs/hpc/09_ood/10_Spark.mdx --- docs/hpc/09_ood/10_Spark.mdx | 114 ----------------------------------- 1 file changed, 114 deletions(-) delete mode 100644 docs/hpc/09_ood/10_Spark.mdx diff --git a/docs/hpc/09_ood/10_Spark.mdx b/docs/hpc/09_ood/10_Spark.mdx deleted file mode 100644 index 27f8b4df9d..0000000000 --- a/docs/hpc/09_ood/10_Spark.mdx +++ /dev/null @@ -1,114 +0,0 @@ -# Spark Standalone Cluster with Jupyter Notebook in OOD - -## Getting Started -You can run a Spark Standalone Cluster with Jupyter Notebook in OOD by going to the URL [ood.torch.hpc.nyu.edu](http://ood.torch.hpc.nyu.edu) in your browser and selecting `Spark Standalone Cluster` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header. - -:::note -Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your Spark Standalone Cluster with Jupyter Notebook session differently, you'll need to select it from the menu. -::: - -## Configuration - -You can select the Spark version, amount of time, number of nodes and cores, amount of memory, gpu type (if any), Jupyter notebook root directory, path to custom pyspark overlay (if any), and optional Slurm options. - -![OOD Spark Configuration](./static/ood_spark_config.png) - - -:::warning -If you select to use `/home` as your root directory be careful not to go over your quota. You can find your current usage with the `myquota` command. Please see our [Storage documentation](../03_storage/01_intro_and_data_management.mdx) for details about your storage options. -::: - -## Spark Standalone Cluster with Jupyter Notebook running in OOD - -After you hit the `Launch` button you'll have to wait for the scheduler to find node(s) for you to run on: -![OOD Spark in queue](./static/ood_spark_in_queue.png) - -Then you'll have a short wait for Spark itself to start up.
-Once that happens you'll get one last page that will give you links to: -- open a terminal window on the compute node your Spark session is running on -- go to the directory associated with your Session ID that stores output, config and other related files for your session - -![Pre-launch Spark OOD](./static/ood_spark_prelaunch.png) - -Please click the `Connect to the Jupyter Notebook Environment` button and a Jupyter window will open. Please select to create a new notebook and you're ready to go. - -### Spark Standalone Cluster Jupyter Notebook Example - -Please enter the following commands into the first cell of your Jupyter notebook and execute them by typing `Shift` and `Enter` at the same time. -```python -from pyspark import SparkContext -import requests - -# Create a SparkContext -sc = SparkContext("local", "WordCountExample") - -# Get text of Moby Dick from Project Gutenberg -file_url = 'https://www.gutenberg.org/ebooks/2701.txt.utf-8' -try: - response = requests.get(file_url) - response.raise_for_status() -except requests.exceptions.RequestException as e: - print(f"Error during request: {e}") -# Save text to temp file -with open('moby_dick_temp_spark_example.txt', "w") as file: - file.write(response.text) - -# Create an RDD from a text file -lines = sc.textFile("/scratch/rjy1/moby_dick_temp_spark_example.txt") - -# FlatMap to split lines into words and convert to lowercase -words = lines.flatMap(lambda line: line.lower().split(" ")) - -# Map each word to a (word, 1) tuple -word_pairs = words.map(lambda word: (word, 1)) - -# ReduceByKey to sum the counts for each word -word_counts = word_pairs.reduceByKey(lambda a, b: a + b) - -# Collect the results to the driver program -results = word_counts.collect() - -# Print the word counts -for word, count in results: - print(f"{word}: {count}") -``` - -You should get output like: -``` -the: 14512 -project: 87 -gutenberg: 25 -ebook: 8 -of: 6682 -moby: 81 -dick;: 10 -or,: 17 -whale: 533 -: 4318 -this: 1277 -is: 1601 -for: 1555 -use: 39 -anyone: 5 -anywhere: 11 -in: 4126 -united: 24 -states: 13 -and: 6321 -most: 284 -other: 360 -parts: 32 -world: 79 -at: 1310 -no: 488 -cost: 3 -with: 1750 -almost: 189 -restrictions: 2 -whatsoever.: 5 -you: 843 -may: 227 -copy: 15 -... -``` -etc. From eb303f62ef32363a828994bdfd2fe52a2892344e Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:11:13 -0400 Subject: [PATCH 04/10] Delete docs/hpc/09_ood/11_Stata.mdx --- docs/hpc/09_ood/11_Stata.mdx | 43 ------------------------------------ 1 file changed, 43 deletions(-) delete mode 100644 docs/hpc/09_ood/11_Stata.mdx diff --git a/docs/hpc/09_ood/11_Stata.mdx b/docs/hpc/09_ood/11_Stata.mdx deleted file mode 100644 index 2d702e819c..0000000000 --- a/docs/hpc/09_ood/11_Stata.mdx +++ /dev/null @@ -1,43 +0,0 @@ -# Stata in OOD - -## Getting Started -You can run Stata in OOD by going to the URL [ood.torch.hpc.nyu.edu](http://ood.torch.hpc.nyu.edu) in your browser and selecting `STATA` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header. - -:::note -Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your NGLView Jupyter Notebook session differently, you'll need to select it from the menu. -::: - -## Configuration - -You can select the Stata version, number or cores, amount of memory, amount of time, and optional Slurm options. - -![OOD Stata Configuration](./static/ood_stata_config.png) - -## Stata running in OOD - -After you hit the `Launch` button you'll have to wait for the scheduler to find node(s) for you to run on: -![OOD Stata in queue](./static/ood_stata_in_queue.png) - -Then you'll have a short wait for Stata itself to start up.
-Once that happens you'll get one last form that will allow you to: -- open a terminal window on the compute node your Stata session is running on -- go to the directory associated with your Session ID that stores output, config and other related files for your session -- set compression and image quality for your app -- get a share-able, view only link to your app - -![Pre-launch Stata OOD](./static/ood_stata_prelaunch.png) - -Then after you hit the `Launch STATA` button, your Stata window will be displayed. - -### Stata example - -Please enter the following commands into the Stata command window and execute them by hitting `enter`: -``` -. sysuse auto -. describe -. summarize -. twoway (scatter mpg weight) -``` - -You should get output like this: -![OOD Stata example](./static/ood_stata_example.png) From 23275e69ab46a1a1e5e87eb6e7a169ae24bb7d96 Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:11:52 -0400 Subject: [PATCH 05/10] Rename 07_jupyter_with_conda_singularity.mdx to 01_jupyter_with_conda_singularity.mdx --- ...onda_singularity.mdx => 01_jupyter_with_conda_singularity.mdx} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/hpc/09_ood/{07_jupyter_with_conda_singularity.mdx => 01_jupyter_with_conda_singularity.mdx} (100%) diff --git a/docs/hpc/09_ood/07_jupyter_with_conda_singularity.mdx b/docs/hpc/09_ood/01_jupyter_with_conda_singularity.mdx similarity index 100% rename from docs/hpc/09_ood/07_jupyter_with_conda_singularity.mdx rename to docs/hpc/09_ood/01_jupyter_with_conda_singularity.mdx From f8b4fd96319b07b8302bf821bd0c07f4feee2290 Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:12:09 -0400 Subject: [PATCH 06/10] Rename 09_RStudio.mdx to 02_RStudio.mdx --- docs/hpc/09_ood/{09_RStudio.mdx => 02_RStudio.mdx} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/hpc/09_ood/{09_RStudio.mdx => 02_RStudio.mdx} (100%) diff --git a/docs/hpc/09_ood/09_RStudio.mdx b/docs/hpc/09_ood/02_RStudio.mdx similarity index 100% rename from docs/hpc/09_ood/09_RStudio.mdx rename to docs/hpc/09_ood/02_RStudio.mdx From a89c0f19516afd61b2141e9d0e2dc976a7491dab Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:12:23 -0400 Subject: [PATCH 07/10] Rename 08_matlab_proxy.mdx to 03_matlab_proxy.mdx --- docs/hpc/09_ood/{08_matlab_proxy.mdx => 03_matlab_proxy.mdx} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/hpc/09_ood/{08_matlab_proxy.mdx => 03_matlab_proxy.mdx} (100%) diff --git a/docs/hpc/09_ood/08_matlab_proxy.mdx b/docs/hpc/09_ood/03_matlab_proxy.mdx similarity index 100% rename from docs/hpc/09_ood/08_matlab_proxy.mdx rename to docs/hpc/09_ood/03_matlab_proxy.mdx From 5e75be3423e225c4d26063dc1e3cec88fabeb858 Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:13:26 -0400 Subject: [PATCH 08/10] Rename 01_jupyter_with_conda_singularity.mdx to 02_jupyter_with_conda_singularity.mdx --- ...onda_singularity.mdx => 02_jupyter_with_conda_singularity.mdx} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/hpc/09_ood/{01_jupyter_with_conda_singularity.mdx => 02_jupyter_with_conda_singularity.mdx} (100%) diff --git a/docs/hpc/09_ood/01_jupyter_with_conda_singularity.mdx b/docs/hpc/09_ood/02_jupyter_with_conda_singularity.mdx similarity index 100% rename from docs/hpc/09_ood/01_jupyter_with_conda_singularity.mdx rename to docs/hpc/09_ood/02_jupyter_with_conda_singularity.mdx From a74b08777e788281f70280b50cae5ce99e01a942 Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:13:51 -0400 Subject: [PATCH 09/10] Rename 05_igv.mdx to 07_igv.mdx --- docs/hpc/09_ood/{05_igv.mdx => 07_igv.mdx} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/hpc/09_ood/{05_igv.mdx => 07_igv.mdx} (100%) diff --git a/docs/hpc/09_ood/05_igv.mdx b/docs/hpc/09_ood/07_igv.mdx similarity index 100% rename from docs/hpc/09_ood/05_igv.mdx rename to docs/hpc/09_ood/07_igv.mdx From 817e9cb9a2d44d95db77e5a437e111b737f5e3ce Mon Sep 17 00:00:00 2001 From: Eric Date: Tue, 10 Mar 2026 13:14:16 -0400 Subject: [PATCH 10/10] Rename 02_RStudio.mdx to 05_RStudio.mdx --- docs/hpc/09_ood/{02_RStudio.mdx => 05_RStudio.mdx} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/hpc/09_ood/{02_RStudio.mdx => 05_RStudio.mdx} (100%) diff --git a/docs/hpc/09_ood/02_RStudio.mdx b/docs/hpc/09_ood/05_RStudio.mdx similarity index 100% rename from docs/hpc/09_ood/02_RStudio.mdx rename to docs/hpc/09_ood/05_RStudio.mdx