From 1af2ec7415bc3bdefb776b7d530f4b4bf3e5ecf5 Mon Sep 17 00:00:00 2001
From: Christoph Boeddeker <cbj@mail.uni-paderborn.de>
Date: Tue, 27 Jul 2021 14:15:00 +0200
Subject: [PATCH 1/6] add first version of overview.md

---
 doc/overview.md | 71 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 doc/overview.md

diff --git a/doc/overview.md b/doc/overview.md
new file mode 100644
index 00000000..35fa6e09
--- /dev/null
+++ b/doc/overview.md
@@ -0,0 +1,71 @@
+# Paderbox: A collection of utilities for audio / speech processing
+
+ - `pb.io`
+   - `pb.io.load`, `pb.io.dump`: Loads/saves an arbitary file. See docsting for supported formats.
+     - `unsafe` argument can enable unsafe backends like `pickle`
+   - `pb.io.load_{json,yaml,csv,hdf5}`, `pb.io.dump_{json,yaml,csv,hdf5}`, `pb.io.loads_{json,yaml,csv,hdf5}`, `pb.io.dumps_{json,yaml,csv,hdf5}`:
+     - Load or dump/save some data in a particular format. The `s` in `dumps` and `loads` follow python convention to obtain/yield string or bytes representation of an object.
+   - `pb.io.recursive_load_audio`: Recursive load of audio files.
+   - `pb.io.{symlink,update_hdf5,mkdir_p}`
+   - `pb.io.data_dir`: Collection of paths, loaded from enviroment and with defaults for our department file system.
+   - `paderbox.io.atomic`: Atomic file operations. See docstrings for more information.
+ - `pb.transform`:
+   - `pb.transform.{stft,istft,STFT}`: Functions and class (`STFT`) to calculate the stft and its inverse.
+   - Other transformations we either don't use, or rarely use.
+   - `pb.transform.resample_sox`: Resample with `sox` binary
+ - `from paderbox.visualization import plot, axes_context`:
+   - `plot.{line,scatter}`: Make a line or scatter plot
+   - `plot.stft`: Plot the stft signal
+   - `with axes_context(columns=...) as axes`:
+     - Context manager to change visualization parameters (e.g. grid, colors, ...)
+     - Helper to create a plotting grid. Use `ax=axes.new` or `ax=axes.last`.
+ - `pb.array`:
+   - `pb.array.interval`: Helper to have a memory efficient 1D boolian array that represents intervals as replacement for numpy.
+   - `pb.array.pad_axis`: Add an `axis` argument to `np.pad`.
+   - `pb.array.morph`: Deprecated in favour of `einops` (https://github.com/arogozhnikov/einops)
+   - `pb.array.segment_axis`: Segment a signal. Use an implementation detail of numpy to do ist without memory replications.
+ - `pb.utils`:
+   - `pb.utils.process_caller.run_process`:
+     - Wrapper around `subprocess.run` for better exception messages and other defaults.
+   - `pb.utils.process_caller.run_processes`:
+     - Run multiple processes in parallel.
+   - `with pb.utils.debug_utils.debug_on: ...`: Invoke `pdb` on exception.
+   - `pb.utils.mapping.Dispatcher`: Dict like, but verbose exception message on `KeyError`. No relevant overhead to `dict`.
+   - `pb.utils.nested.{flatten,deflatten,nested_op,FlatView,...}`:
+     - Utilities to work with nested objects.
+   - `pb.utils.pandas_utils.py_query`: Alternative to `pd.DataFrame.query` that supports all python code.
+   - `pb.utils.pandas_utils.squeeze_df`: Remove "boring" colums in dataframe (i.e. each row has same value in that column)
+   - `pb.utils.pandas_utils.display_df`: Combine `IPython.display.display` with `squeeze_df`.
+   - `pb.utils.pretty.{pretty,pprint}`: Uses `IPython.lib.pretty.*`, but displays a summay of large numpy arrays instead of the actual values.
+   - `pb.utils.profiling.lprun`: Lineprofiler decorator. ToDo: Make internal doku public.
+   - `python -m paderbox.utils.strip_solutions`: CLI Helper to create a template notebook from solution notebook.
+   - `pb.utils.timer.TimerDict`: Helper to get runtime of a codeblock.
+
+# Standalone
+
+ - [`lazy_dataset`](https://github.com/fgnt/lazy_dataset): Process large datasets as if it was an iterable.
+   - Inpur pipeline with lazy loading, transformations and parallel loading.
+   - Not limited to any NN framework.
+ - [`dlp_mpi`](https://github.com/fgnt/dlp_mpi):
+   - Parallisation with MPI based on `mpi4py`
+
+# Special purpose packages
+
+ - [`padertorch`](https://github.com/fgnt/padertorch)
+   - A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
+ - [`pb_bss`](https://github.com/fgnt/pb_bss): Code related to blind source separation.
+   - Metrics: `pb_bss.evaluation.{???}`
+   - Beamforming: `pb_bss.extraction.{???}`
+   - (Spatial) Mixture Models: `pb_bss.distribution.{???}`
+ - [`nara_wpe`](https://github.com/fgnt/nara_wpe): Weighted Prediction Error
+   - Dereverberation code: `nara_wpe.???`
+ - [`sms_wsj`](https://github.com/fgnt/sms_wsj): SMS-WSJ: Spatialized Multi-Speaker Wall Street Journal database for multi-channel source separation and recognition
+ - [`paderwasn`](https://github.com/fgnt/paderwasn):
+
+# Example code
+
+ - [`nn-gev`](https://github.com/fgnt/nn-gev)
+ - [`pb_chime5`](https://github.com/fgnt/pb_chime5)
+ - [`pb_sed`](https://github.com/fgnt/pb_sed)
+ - [`upb_audio_tagging_2019`](https://github.com/fgnt/upb_audio_tagging_2019)
+ - [`sins`](https://github.com/fgnt/sins)

From 6f1af3b03eb2e27a363bd67df19d5abb97092dcd Mon Sep 17 00:00:00 2001
From: michael-kuhlmann <kuhlmann@nt.uni-paderborn.de>
Date: Wed, 28 Jul 2021 11:47:19 +0200
Subject: [PATCH 2/6] Fix a typo

---
 doc/overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/overview.md b/doc/overview.md
index 35fa6e09..bb36d669 100644
--- a/doc/overview.md
+++ b/doc/overview.md
@@ -23,7 +23,7 @@
    - `pb.array.interval`: Helper to have a memory efficient 1D boolian array that represents intervals as replacement for numpy.
    - `pb.array.pad_axis`: Add an `axis` argument to `np.pad`.
    - `pb.array.morph`: Deprecated in favour of `einops` (https://github.com/arogozhnikov/einops)
-   - `pb.array.segment_axis`: Segment a signal. Use an implementation detail of numpy to do ist without memory replications.
+   - `pb.array.segment_axis`: Segment a signal. Use an implementation detail of numpy to do it without memory replications.
  - `pb.utils`:
    - `pb.utils.process_caller.run_process`:
      - Wrapper around `subprocess.run` for better exception messages and other defaults.

From 409f6e2d6d79bf8aaaf8e7d0e7394689c19141c9 Mon Sep 17 00:00:00 2001
From: gburrek <58566283+gburrek@users.noreply.github.com>
Date: Wed, 28 Jul 2021 11:54:00 +0200
Subject: [PATCH 3/6] Add a description of paderwasn

---
 doc/overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/overview.md b/doc/overview.md
index bb36d669..504ba97e 100644
--- a/doc/overview.md
+++ b/doc/overview.md
@@ -60,7 +60,7 @@
  - [`nara_wpe`](https://github.com/fgnt/nara_wpe): Weighted Prediction Error
    - Dereverberation code: `nara_wpe.???`
  - [`sms_wsj`](https://github.com/fgnt/sms_wsj): SMS-WSJ: Spatialized Multi-Speaker Wall Street Journal database for multi-channel source separation and recognition
- - [`paderwasn`](https://github.com/fgnt/paderwasn):
+ - [`paderwasn`](https://github.com/fgnt/paderwasn): Collection of methods for acoustic signal processing in wireless acoustic sensor networks
 
 # Example code
 

From e8c4c922f44cac641f85cd861ee037594840e1e1 Mon Sep 17 00:00:00 2001
From: TCord <49722035+TCord@users.noreply.github.com>
Date: Wed, 28 Jul 2021 11:56:38 +0200
Subject: [PATCH 4/6] Create overview.md

---
 doc/overview.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/overview.md b/doc/overview.md
index 504ba97e..7bc85956 100644
--- a/doc/overview.md
+++ b/doc/overview.md
@@ -11,7 +11,7 @@
    - `paderbox.io.atomic`: Atomic file operations. See docstrings for more information.
  - `pb.transform`:
    - `pb.transform.{stft,istft,STFT}`: Functions and class (`STFT`) to calculate the stft and its inverse.
-   - Other transformations we either don't use, or rarely use.
+   - Other transformations we use less frequently (mel filterbanks/MFCCs, ...).
    - `pb.transform.resample_sox`: Resample with `sox` binary
  - `from paderbox.visualization import plot, axes_context`:
    - `plot.{line,scatter}`: Make a line or scatter plot
@@ -20,7 +20,7 @@
      - Context manager to change visualization parameters (e.g. grid, colors, ...)
      - Helper to create a plotting grid. Use `ax=axes.new` or `ax=axes.last`.
  - `pb.array`:
-   - `pb.array.interval`: Helper to have a memory efficient 1D boolian array that represents intervals as replacement for numpy.
+   - `pb.array.interval`: Helper to have a memory efficient 1D boolian array that is represented by its intervals as replacement for numpy.
    - `pb.array.pad_axis`: Add an `axis` argument to `np.pad`.
    - `pb.array.morph`: Deprecated in favour of `einops` (https://github.com/arogozhnikov/einops)
    - `pb.array.segment_axis`: Segment a signal. Use an implementation detail of numpy to do it without memory replications.
@@ -38,13 +38,13 @@
    - `pb.utils.pandas_utils.display_df`: Combine `IPython.display.display` with `squeeze_df`.
    - `pb.utils.pretty.{pretty,pprint}`: Uses `IPython.lib.pretty.*`, but displays a summay of large numpy arrays instead of the actual values.
    - `pb.utils.profiling.lprun`: Lineprofiler decorator. ToDo: Make internal doku public.
-   - `python -m paderbox.utils.strip_solutions`: CLI Helper to create a template notebook from solution notebook.
+   - `python -m paderbox.utils.strip_solutions`: CLI Helper to create a template notebook from a solution notebook for teaching purposes.
    - `pb.utils.timer.TimerDict`: Helper to get runtime of a codeblock.
 
 # Standalone
 
  - [`lazy_dataset`](https://github.com/fgnt/lazy_dataset): Process large datasets as if it was an iterable.
-   - Inpur pipeline with lazy loading, transformations and parallel loading.
+   - Input pipeline with lazy loading, transformations and parallel loading.
    - Not limited to any NN framework.
  - [`dlp_mpi`](https://github.com/fgnt/dlp_mpi):
    - Parallisation with MPI based on `mpi4py`

From e39d5141b850a8b4654e9b287b6fcd765ca9436d Mon Sep 17 00:00:00 2001
From: michael-kuhlmann <kuhlmann@nt.uni-paderborn.de>
Date: Thu, 29 Jul 2021 08:58:12 +0200
Subject: [PATCH 5/6] Update some words

boolian -> boolean
parallisation -> parallelization
---
 doc/overview.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/overview.md b/doc/overview.md
index 7bc85956..828cbc5b 100644
--- a/doc/overview.md
+++ b/doc/overview.md
@@ -20,7 +20,7 @@
      - Context manager to change visualization parameters (e.g. grid, colors, ...)
      - Helper to create a plotting grid. Use `ax=axes.new` or `ax=axes.last`.
  - `pb.array`:
-   - `pb.array.interval`: Helper to have a memory efficient 1D boolian array that is represented by its intervals as replacement for numpy.
+   - `pb.array.interval`: Helper to have a memory efficient 1D boolean array that is represented by its intervals as replacement for numpy.
    - `pb.array.pad_axis`: Add an `axis` argument to `np.pad`.
    - `pb.array.morph`: Deprecated in favour of `einops` (https://github.com/arogozhnikov/einops)
    - `pb.array.segment_axis`: Segment a signal. Use an implementation detail of numpy to do it without memory replications.
@@ -47,7 +47,7 @@
    - Input pipeline with lazy loading, transformations and parallel loading.
    - Not limited to any NN framework.
  - [`dlp_mpi`](https://github.com/fgnt/dlp_mpi):
-   - Parallisation with MPI based on `mpi4py`
+   - Parallelization with MPI based on `mpi4py`
 
 # Special purpose packages
 

From 9c0baaf773ad7db3996937f717c522a026b5b169 Mon Sep 17 00:00:00 2001
From: TCord <49722035+TCord@users.noreply.github.com>
Date: Thu, 29 Jul 2021 10:22:46 +0200
Subject: [PATCH 6/6] Fix some typos

---
 doc/overview.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/overview.md b/doc/overview.md
index 828cbc5b..29b92ad2 100644
--- a/doc/overview.md
+++ b/doc/overview.md
@@ -1,10 +1,10 @@
 # Paderbox: A collection of utilities for audio / speech processing
 
  - `pb.io`
-   - `pb.io.load`, `pb.io.dump`: Loads/saves an arbitary file. See docsting for supported formats.
+   - `pb.io.load`, `pb.io.dump`: Loads/saves an arbitrary file. See docstring for supported formats.
      - `unsafe` argument can enable unsafe backends like `pickle`
    - `pb.io.load_{json,yaml,csv,hdf5}`, `pb.io.dump_{json,yaml,csv,hdf5}`, `pb.io.loads_{json,yaml,csv,hdf5}`, `pb.io.dumps_{json,yaml,csv,hdf5}`:
-     - Load or dump/save some data in a particular format. The `s` in `dumps` and `loads` follow python convention to obtain/yield string or bytes representation of an object.
+     - Load or dump/save some data in a particular format. The `s` as in `dumps` and `loads` follows python convention to obtain/yield string or bytes representation of an object.
    - `pb.io.recursive_load_audio`: Recursive load of audio files.
    - `pb.io.{symlink,update_hdf5,mkdir_p}`
    - `pb.io.data_dir`: Collection of paths, loaded from enviroment and with defaults for our department file system.
@@ -43,7 +43,7 @@
 
 # Standalone
 
- - [`lazy_dataset`](https://github.com/fgnt/lazy_dataset): Process large datasets as if it was an iterable.
+ - [`lazy_dataset`](https://github.com/fgnt/lazy_dataset): Process a large dataset as if it was an iterable.
    - Input pipeline with lazy loading, transformations and parallel loading.
    - Not limited to any NN framework.
  - [`dlp_mpi`](https://github.com/fgnt/dlp_mpi):
@@ -52,7 +52,7 @@
 # Special purpose packages
 
  - [`padertorch`](https://github.com/fgnt/padertorch)
-   - A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
+   - A collection of common functionalities to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
  - [`pb_bss`](https://github.com/fgnt/pb_bss): Code related to blind source separation.
    - Metrics: `pb_bss.evaluation.{???}`
    - Beamforming: `pb_bss.extraction.{???}`