Add numba parallel example

gjbex · gjbex · commit abd718c33730 · 2021-12-18T23:29:36.000+01:00
diff --git a/source-code/numba/README.md b/source-code/numba/README.md
@@ -12,3 +12,5 @@ can be obtained without much effort.
 1. `Primes`: code to compute the first n prime numbers comparing a pure Python
     implementation with numba JIT and eager JIT.
 1. `Ufunc`: defining a numpy ufunc using numba.
+1. `numba_parallel.ipynb`: jupyter notebook experimenting with numba's
+   parallel capabilities.
diff --git a/source-code/numba/numba_parallel.ipynb b/source-code/numba/numba_parallel.ipynb
@@ -0,0 +1,320 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "647ec939-d34d-4a92-8a8d-60e0078fee69",
+   "metadata": {},
+   "source": [
+    "# Requirements"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "98749cd7-43b1-45b4-8ad8-c68006996d22",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from numba import njit\n",
+    "import numpy as np\n",
+    "import random"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e3a9b00-2e46-4b57-9951-3d649a9ed193",
+   "metadata": {},
+   "source": [
+    "# Random $\\pi$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c79e332-5fb5-4103-8a02-74ef83972464",
+   "metadata": {},
+   "source": [
+    "Compute $\\pi$ by generating random points in a square and counting how many there are in the circle inscribed in the square."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "75bbae32-14d6-44f4-b83d-3dda129355d2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def compute_pi(nr_tries):\n",
+    "    hits = 0\n",
+    "    for _ in range(nr_tries):\n",
+    "        x = random.random()\n",
+    "        y = random.random()\n",
+    "        if x**2 + y**2 < 1.0:\n",
+    "            hits += 1\n",
+    "    return 4.0*hits/nr_tries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "f96298c8-d477-4da6-a19f-0de852c81329",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@njit\n",
+    "def compute_pi_jit(nr_tries):\n",
+    "    hits = 0\n",
+    "    for _ in range(nr_tries):\n",
+    "        x = random.random()\n",
+    "        y = random.random()\n",
+    "        if x**2 + y**2 < 1.0:\n",
+    "            hits += 1\n",
+    "    return 4.0*hits/nr_tries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "a0d922f3-13ba-4c6d-beeb-c8292b1baf67",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@njit(['float64(int64)'])\n",
+    "def compute_pi_jit_sign(nr_tries):\n",
+    "    hits = 0\n",
+    "    for _ in range(nr_tries):\n",
+    "        x = random.random()\n",
+    "        y = random.random()\n",
+    "        if x**2 + y**2 < 1.0:\n",
+    "            hits += 1\n",
+    "    return 4.0*hits/nr_tries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "b830e45b-bc46-42f6-9b40-2f636c9989cd",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "27.1 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit compute_pi(100_000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "b98f5c18-a5fb-468c-8f96-ca25782ebac8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "687 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit compute_pi_jit(100_000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "78c37a87-dd0c-49c6-ac8d-85e6f21832c2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "685 µs ± 8.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit compute_pi_jit_sign(np.int64(100_000))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18e84532-48d9-4aa0-8218-709888e3162e",
+   "metadata": {},
+   "source": [
+    "Using numba's just-in-time compiler significantly speeds up the computations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fc275a6-8ac6-4481-beed-41896c5b39e9",
+   "metadata": {},
+   "source": [
+    "# Quadrature $\\pi$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32312a15-8070-4a8a-a198-170252d8efde",
+   "metadata": {},
+   "source": [
+    "Another method to compute $\\pi$ is to compute the definite integral\n",
+    "$$\n",
+    "\\frac{\\pi}{2} = \\int_{-1}^{1} \\sqrt{1 - x^2} dx\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "17597694-cb80-4e2a-aa4c-c4c9e6d6de84",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@njit\n",
+    "def quad_pi_jit(nr_steps):\n",
+    "    delta = 2.0/nr_steps\n",
+    "    x = np.linspace(-1.0, 1.0, nr_steps)\n",
+    "    f = np.empty_like(x)\n",
+    "    for i in range(x.size):\n",
+    "        f[i] = np.sqrt(1.0 - x[i]**2)\n",
+    "    return 2.0*f.sum()*delta"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89b96c0e-78bf-4d89-b962-a3dd1cc9e92a",
+   "metadata": {},
+   "source": [
+    "We can implement this so that the loop can be parallelized (numba cannot deal with reductions)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "39a14289-55a9-4775-b902-3d1f1b7f58ec",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@njit(parallel=True)\n",
+    "def quad_pi_par(nr_steps):\n",
+    "    delta = 2.0/nr_steps\n",
+    "    x = np.linspace(-1.0, 1.0, nr_steps)\n",
+    "    f = np.empty_like(x)\n",
+    "    for i in range(x.size):\n",
+    "        f[i] = np.sqrt(1.0 - x[i]**2)\n",
+    "    return 2.0*f.sum()*delta"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ea90942-284e-454d-a750-bd9d08ff057e",
+   "metadata": {},
+   "source": [
+    "The pure numpy implementation for comparison."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "f1949a6a-8230-4c9d-b642-ce82e17fede7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def quad_pi_np(nr_steps):\n",
+    "    delta = 2.0/nr_steps\n",
+    "    x = np.linspace(-1.0, 1.0, nr_steps)\n",
+    "    return 2.0*np.sqrt(1.0 - x**2).sum()*delta"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "id": "2492cd21-68d8-4b9a-9a0b-656cfb2c9e2d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "328 ms ± 34.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit quad_pi_jit(100_000_000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "id": "2acb30d2-92cf-4082-8572-675b7694747b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "202 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit quad_pi_par(100_000_000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "id": "be932fa1-f46a-4e8b-a301-384adf777364",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "676 ms ± 43.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "%timeit quad_pi_np(100_000_000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a402e83f-ccb7-4518-b8de-b14e499f994a",
+   "metadata": {},
+   "source": [
+    "The parallized version is faster, but the parallel efficiency is far from great."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}