Juan lectures #3

jadgt · 2025-12-15T08:40:27Z

These are the lectures for the Python for HPC course.

…key concepts, communication types, and integration with NumPy.

…rmance optimization, independent processes, and collective communication. Add practical exercises for users to practice single-core speed and MPI concepts.

qianglise · 2025-12-15T13:53:02Z

content/intohpc.md

+- Minimize Python in hot paths: Move heavy math into NumPy calls; keep Python for orchestration only.
+- Benchmark correctly: Use large N, pin threads to 1 for fair single-core tests, and report the best of multiple runs after a warmup.
+
+--


@jadgt I guess you want to use "---" instead of "--" here

ffrancesco94

Guapísimo, está muy bien <3 I only wrote a couple of minor things. I enjoyed!

ffrancesco94 · 2025-12-15T20:56:09Z

content/intohpc.md

+HPC systems, often called *supercomputers* or *clusters*, are made up of
+many computers (called **nodes**) connected by a fast network. Each node
+can have multiple cores which are **CPUs** (and sometimes **GPUs**) that 
+run tasks in parallel.


maybe have tasks (called **jobs**) since you use the word 'job' later

Good idea, I have added it!

ffrancesco94 · 2025-12-15T21:06:48Z

content/intohpc.md

+- Multicore laptops and workstations  
+- *Single compute nodes* on a cluster  
+
+Programs use **threads** to execute in parallel (e.g., with OpenMP in C/C++/Fortran or **multiprocessing in Python**).


In Python multiprocessing uses processes, not threads, and the processes do not share the memory space (they actually clone the interpreter and everything else). Python hides it well, so maybe it's not that important, but I think it's better to use threading rather than multiprocessing here. Especially in the new free-threaded versions, this can play a role.

Very nicely spotted, I have changed it from the multiprocessing to the threading module.
Also I have updated the example to reflect the typical I/O operations that benefit from using threading. Thanks.

ffrancesco94 · 2025-12-15T21:18:25Z

content/mpi4py.md

+- Add a short delay using `time.sleep(rank)` before sending or receiving.
+- Observe how process 0 must wait until process 1 calls `recv()` before it can continue, and vice versa.
+- Try swapping the order of the calls (e.g., both processes call `send()` first), what happens?
+- You may notice the program hangs or deadlocks, because both processes are waiting for a `recv()` that never starts.


Maybe just add a line saying that there are non-blocking versions of these primitives? No need to explain anything, just a link to the docs so that interested people can read about it.

Fair enough, I didn't wanted to go into this because it requires explaining the wait and so on but I have added links to the docs.

ashwinvis · 2025-12-16T19:57:36Z

content/intohpc.md

+    # timed runs
+    tmin = float("inf")
+    for _ in range(repeats):
+        t0 = perf_counter()
+        fn(*args, **kwargs)
+        dt = perf_counter() - t0
+        tmin = min(tmin, dt)
+    return tmin


While making benchmarks, it makes sense to either do median / arithmetic mean. More on this here:
https://pyperf.readthedocs.io/en/latest/analyze.html#minimum-vs-average
The general advice that I see is that if the benchmark is unstable, which is usually the case, median is a good measure. Of course we don't go to this depth here, but let's show the right way to do things.
For either of those operations you can use np.median or np.mean.

Kudos for using perf_counter!

Collect observations into a list and compute the the right stats out of the loop.

If it makes sense, we can use timeit.timeit directly. If not, it can be mentioned as a :::{tip} ... :::

I meant timeit.timeit, from the std library.

ashwinvis · 2025-12-16T20:00:21Z

content/intohpc.md

+:::{keypoints} 
+Advantages:
+- Easy communication between threads (shared variables)
+- Low latency data access
+
+Limitations:
+- Limited by the number of cores on one machine
+- Risk of race conditions if data access is not synchronized
+::: 


We reserve keypoints for the end of the episode.

Suggested change

:::{keypoints}

Advantages:

- Easy communication between threads (shared variables)

- Low latency data access

Limitations:

- Limited by the number of cores on one machine

- Risk of race conditions if data access is not synchronized

:::

:::{note}

Advantages:

- Easy communication between threads (shared variables)

- Low latency data access

Limitations:

- Limited by the number of cores on one machine

- Risk of race conditions if data access is not synchronized

:::

ashwinvis · 2025-12-16T20:04:28Z

content/intohpc.md

+::: 
+
+:::{exercise} Practice with threaded parallelism in Python
+This is a textbook example of I/O-bound concurrency with shared memory. It efficiently handles tasks that spend most of their time waiting (simulated by time.sleep) by allowing other threads to run during those pauses, maximizing efficiency despite the GIL. It also perfectly demonstrates the convenience of Python threading: because all threads live in the same process, they can instantly write to a single global data structure (database), avoiding the complexity of inter-process communication, while using a Lock to safely manage the one major risk of this approach (race conditions).


Suggested change

This is a textbook example of I/O-bound concurrency with shared memory. It efficiently handles tasks that spend most of their time waiting (simulated by time.sleep) by allowing other threads to run during those pauses, maximizing efficiency despite the GIL. It also perfectly demonstrates the convenience of Python threading: because all threads live in the same process, they can instantly write to a single global data structure (database), avoiding the complexity of inter-process communication, while using a Lock to safely manage the one major risk of this approach (race conditions).

This is a textbook example of I/O-bound concurrency with shared memory. It efficiently handles tasks that spend most of their time waiting (simulated by `time.sleep`) by allowing other threads to run during those pauses, maximizing efficiency despite the {abbr}`GIL (Global Interpreter Lock: a built-in internal thread lock to prevent race conditions that could corrupt data)`. It also perfectly demonstrates the convenience of Python threading: because all threads live in the same process, they can instantly write to a single global data structure (database), avoiding the complexity of inter-process communication, while using a Lock to safely manage the one major risk of this approach (race conditions).

I am using https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-abbr here, if you are wondering what {abbr} is.

ashwinvis · 2025-12-16T20:09:49Z

content/intohpc.md

+## Python in High-Performance Computing
+
+Python has become one of the most widely used languages in scientific computing due to its simplicity, readability, and extensive ecosystem of numerical libraries.  
+Although Python itself is interpreted and slower than compiled languages such as C or Fortran, it now provides a mature set of tools that allow code to **run efficiently on modern HPC architectures**.


Suggested change

Although Python itself is interpreted and slower than compiled languages such as C or Fortran, it now provides a mature set of tools that allow code to **run efficiently on modern HPC architectures**.

Although Python itself is interpreted and often slower than compiled languages such as C or Fortran, it now provides a mature set of tools that allow code to **run efficiently on modern HPC architectures**.

ashwinvis · 2025-12-16T20:14:39Z

content/intohpc.md

+:::{keypoints}
+Advantages:
+- Scales to thousands of nodes
+- Each process works independently, avoiding memory contention
+
+Limitations:
+- Requires explicit communication (send/receive)
+- More complex programming model
+- More latency, requires minimizing movement of data.
+:::


Suggested change

:::{keypoints}

Advantages:

- Scales to thousands of nodes

- Each process works independently, avoiding memory contention

Limitations:

- Requires explicit communication (send/receive)

- More complex programming model

- More latency, requires minimizing movement of data.

:::

:::{note}

Advantages:

- Scales to thousands of nodes

- Each process works independently, avoiding memory contention

Limitations:

- Requires explicit communication (send/receive)

- More complex programming model

- More latency, requires minimizing movement of data.

:::

ashwinvis · 2025-12-16T20:38:32Z

content/mpi4py.md

+- MPI creates multiple independent processes running the same program.  
+- Point-to-point communication exchanges data directly between two processes.  
+- Collective communication coordinates data exchange across many processes.  
+- mpi4py integrates tightly with NumPy for efficient, zero-copy data transfers.  


Checking zero-copy again.

ashwinvis · 2025-12-16T20:38:53Z

content/mpi4py.md

+1. **Point-to-point communication**: Data moves **directly** between two processes.  
+2. **Collective communication**: Data is exchanged among **all processes** in a communicator in a coordinated way.  
+
+:::{keypoints}


Suggested change

:::{keypoints}

:::{note}

ashwinvis · 2025-12-16T20:39:30Z

content/mpi4py.md

+A major strength of `mpi4py` is its **direct integration with NumPy arrays**.  
+MPI operations can send and receive **buffer-like objects**, such as NumPy arrays, without copying data between Python and C memory.
+
+:::{keypoints} Important


Suggested change

:::{keypoints} Important

:::{important}

ashwinvis · 2025-12-16T20:43:50Z

content/mpi4py.md

+#### Syntax differences
+**Lowercase (Python objects):**
+```python
+comm.send(obj, dest=1)
+data = comm.recv(source=0)
+```
+- The message (obj) can be any Python object.


Suggested change

#### Syntax differences

**Lowercase (Python objects):**

```python

comm.send(obj, dest=1)

data = comm.recv(source=0)

```

- The message (obj) can be any Python object.

#### Syntax differences

**Lowercase (Python objects):**

```python

comm.send(obj, dest=1)

data = comm.recv(source=0)

```

- The message (`obj`) can be any Python object.

ashwinvis · 2025-12-16T21:01:27Z

content/mpi4py.md

+The most basic form of communication in MPI is **point-to-point**, meaning data is sent from one process directly to another.  
+
+Each message involves:
+- A **sender** and a **receiver**


Suggested change

- A **sender** and a **receiver**

- A **sender** (`source`) and a **receiver** (`dest`)

Of course you can avoid this since this terminology assumes point-to-point and not collectives.

jadgt added 5 commits November 11, 2025 14:06

Add introductory content on High-Performance Computing (HPC)

f15b95f

Add comprehensive introduction to MPI with Python (mpi4py) including …

57d011e

…key concepts, communication types, and integration with NumPy.

Trimming and update parallel computing examples in HPC introduction

70b868e

update index

b9a9d89

Enhance HPC and MPI documentation with detailed explanations on perfo…

5b918cd

…rmance optimization, independent processes, and collective communication. Add practical exercises for users to practice single-core speed and MPI concepts.

qianglise reviewed Dec 15, 2025

View reviewed changes

ffrancesco94 reviewed Dec 15, 2025

View reviewed changes

Adressed comments

550efd9

ashwinvis reviewed Dec 16, 2025

View reviewed changes

	Although Python itself is interpreted and slower than compiled languages such as C or Fortran, it now provides a mature set of tools that allow code to run efficiently on modern HPC architectures.
	Although Python itself is interpreted and often slower than compiled languages such as C or Fortran, it now provides a mature set of tools that allow code to run efficiently on modern HPC architectures.

	- A sender and a receiver
	- A sender (`source`) and a receiver (`dest`)

Juan lectures #3

Are you sure you want to change the base?

Juan lectures #3

Uh oh!

Conversation

jadgt commented Dec 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffrancesco94 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffrancesco94 Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashwinvis Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ffrancesco94 Dec 15, 2025 •

edited

Loading

ashwinvis Dec 16, 2025 •

edited

Loading