Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 78 additions & 101 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,78 @@
Agent Contributor Guidelines and Information
============================================

Read the ``README.rst`` for an overview of libEnsemble.

- libEnsemble uses a manager-worker architecture. Points are generated by a generator and sent to a worker, which runs a simulator.
- The manager determines how and when points get passed to workers via an allocation function.
- See ``libensemble/tests/regression_tests/test_1d_sampling.py`` for a simple example of the libEnsemble interface.

Critical Repository Layout Information
--------------------------------------

- ``libensemble/`` - Source code.
- ``/alloc_funcs`` - Allocation functions. Policies for passing work between the manager and workers.
- ``/comms`` - Modules and abstractions for communication between the manager and workers.
- ``/executors`` - An interface for launching executables, often simulations.
- ``/gen_classes`` - Generators that adhere to the `gest-api` standard.
Recommended over entries from ``/gen_funcs`` that perform similar functionality.
- ``/gen_funcs`` - Generator functions. Modules for producing points for simulations. (Legacy)
- ``/resources`` - Classes and functions for managing compute resources for MPI tasks, libensemble workers.
- ``/sim_funcs`` - Simulator functions. Modules for running simulations or performing experiments.
- ``/tests`` - Tests.
- ``/functionality_tests`` - Primarily tests libEnsemble code only.
- ``/regression_tests`` - Tests libEnsemble code with external code. Often more closely resembles actual use-cases.
- ``/unit_tests`` - Tests for individual modules.
- ``ensemble.py`` - The primary interface for parameterizing and running libEnsemble. The ``Ensemble`` class in this module wraps the lower-level ``libE`` function and automates argument parsing and state management.
- ``generators.py`` - Base classes for generators that adhere to the `gest-api` standard.
- ``history.py`` - Module for recording points that have been generated and simulation results. NumPy structured array.
- ``libE.py`` - libE main file. Previous primary interface for parameterizing and running libEnsemble. The primary interface in ``ensemble.py`` wraps this function.
- ``manager.py`` - Module for maintaining the history array and passing points between the workers.
- ``message_numbers.py`` - Constants that represent states of the ensemble.
- ``specs.py`` - Dataclasses for parameterizing the ensemble. Most importantly, contains ``LibeSpecs, SimSpecs, GenSpecs``.
- ``worker.py`` - Module for running generators and simulators. Communicates with the manager.
- ``examples/`` - The ``*_funcs`` and ``calling_scripts`` directories contain symlinks to examples further in the source code.
- ``/libE_submission_scripts`` - Example scripts for submitting libEnsemble jobs to HPC systems.
- ``/tutorials`` - Tutorials on how to use libEnsemble.

Information about Generators
----------------------------

- Generators are functions or objects that produce points for simulations.
- The History array is a NumPy structured array that stores points that have been generated and simulation results.
Its fields match ``sim_specs/gen_specs["out"]`` or ``vocs`` attributes, plus additional reserved fields for metadata.
- Prior to libEnsemble v1.6.0, generators were plain functions. They often ran in "persistent" mode, meaning they executed in a
long-running loop, sending and receiving points to and from the manager until the ensemble was complete.
- A ``gest-api`` or "standardized" generator is a class that inherits from ``gest_api.Generator``, implements ``suggest`` and ``ingest`` methods (which process lists of dictionaries, not NumPy arrays), and is parameterized by a ``vocs``.
- See ``libensemble/gen_classes/external/sampling.py`` for simple examples of the pure ``gest-api`` interface. (Note: ``libensemble.generators.LibensembleGenerator`` exists to wrap legacy NumPy-based workflows, but pure ``gest_api.Generator`` is preferred).
- Generators are often used for simple sampling, optimization, calibration, uncertainty quantification, and other simulation-based tasks.
- **Automatic Variable Mapping**: When using ``LibensembleGenerator`` subclasses, they automatically map all ``VOCS`` variables to a single multi-dimensional ``"x"`` field in the History array if no explicit ``variables_mapping`` is provided. Pure ``gest_api.Generator`` classes handle variables natively.
- **Mandatory Input Fields**: Even for simple generators that don't ingest data, ``gen_specs["in"]`` or ``gen_specs["persis_in"]`` must be defined if using an allocation function like ``only_persistent_gens`` that attempts to send rows. If these are empty, the manager will raise an ``AssertionError`` stating that no fields were requested to be sent.
- **Default Allocator**: ``only_persistent_gens`` is the default allocator for standardized ``gest-api`` generators. It treats these generators as persistent entities that communicate throughout the run.

General Guidelines
------------------

- If using classic ``sim_specs`` and ``gen_specs``, then ensure that ``sim_specs["out"]`` and ``gen_specs["in"]`` field names match, and vice-versa.
- As-of libEnsemble v1.6.0, ``SimSpecs`` and ``GenSpecs`` can also be parameterized by a ``vocs`` object, imported from ``gest_api.vocs`` (NOT xopt.vocs).
- ``VOCS`` contains variables, objectives, constraints, and other settings that define the problem.
See ``libensemble/tests/regression_tests/test_xopt_EI.py`` for an example of how to use it.
- An MPI distribution is not required for libEnsemble to run, but is required to use the ``MPIExecutor``. ``mpich`` is recommended.
- New tests are heavily encouraged for new features, bug fixes, or integrations. See ``libensemble/tests/regression_tests`` for examples.
- Never use destructive git commands unless explicitly requested.
- Code is in the ``black`` style. This should be enforced by ``pre-commit``.
- When writing new code, prefer the ``LibeSpecs``, ``SimSpecs``, and ``GenSpecs`` dataclasses over the classic ``sim_specs`` and ``gen_specs`` bare dictionaries.
- Read ``CONTRIBUTING.md`` for more information.
- The external ``libE-community-examples`` repository contains past use-cases, generators, and other examples.

Development Environment
-----------------------

- ``pixi`` is the recommended environment manager for libEnsemble development. See ``pyproject.toml`` for the list
of dependencies and the available testing environments. (Note: If ``pixi`` is not in your system path, it can often be found in ``/opt/homebrew/bin/pixi`` or ``/usr/local/bin/pixi``).
- Enter the development environment with ``pixi shell -e dev``. This environment contains the most common dependencies for development and testing.
- For one-off commands, use ``pixi run -e dev``. This will run a single command in the development environment.
- If ``pixi`` is not available or not preferred by the user, ``pip install -e .`` can be used instead. Other dependencies may need to be installed manually.
- If committing, use ``pre-commit`` to ensure that code style and formatting are consistent. See ``.pre-commit-config.yaml`` for
the configuration and ``pyproject.toml`` for other configuration.

Testing
-------

- Run tests with the ``run_tests.py`` script: ``python libensemble/tests/run_tests.py``. See ``libensemble/tests/run_tests.py`` for usage information.
- Some tests require third party software to be installed. When developing a feature or fixing a bug, since the entire test suite will be run on Github Actions,
for local development running individual tests is sufficient.
- Individual unit tests can be run with ``pixi run -e dev pytest path/to/test_file``.
- A libEnsemble run typically outputs an ``ensemble.log`` and ``libE_stats.txt`` file in the working directory. Check these files for tracebacks or run statistics.
- An "ensemble" or "workflow" directory may also be created, often containing per-simulation output directories

Modernizing Scripts for libEnsemble 2.0
---------------------------------------

When modernizing existing libEnsemble scripts (functionality tests, regression tests, or user examples) for version 2.0, follow these steps:

- **Switch to `gest-api` Generators**: Replace legacy generator functions (from `libensemble.gen_funcs`) with standardized generator classes (from `libensemble.gen_classes` or other `gest-api` compatible sources).
- **Use `VOCS` for Parameterization**: Standardized generators are parameterized by a `VOCS` object (from `gest_api.vocs`). Define variables and objectives within this object.
- **Set `gen_specs["generator"]`**: Instead of `gen_f`, use the `generator` field in `GenSpecs` to pass the initialized generator class.
- **Remove Explicit `AllocSpecs`**: In libEnsemble 2.0, `only_persistent_gens` is the default allocator. Scripts that previously used `give_sim_work_first` or other simple allocators can often remove `alloc_specs` entirely when switching to standardized generators.
- **Generator Placement**: By default, generators run on the manager thread (Worker 0). This means all allocated workers are available for simulation tasks unless `gen_on_worker` is explicitly set to `True` in `libE_specs`.
- **Mandatory Fields**: Ensure `gen_specs["in"]` or `gen_specs["persis_in"]` includes at least one field (e.g., `["sim_id"]`) if feedback is sent back to the generator, to satisfy the allocator's requirements.
- **gest-api Simulators**: The gest-api pattern also applies to simulators. Set `SimSpecs.simulator` to a callable with signature `(input_dict: dict, **kwargs) -> dict` instead of providing a `sim_f`. libEnsemble automatically wraps it with `gest_api_sim` from `libensemble.sim_funcs.gest_api_wrapper` and handles all NumPy conversions. `SimSpecs.inputs` and `SimSpecs.outputs` can be derived automatically when `SimSpecs.vocs` is provided.
- **`safe_mode` is opt-in**: `libE_specs["safe_mode"]` defaults to `False`, meaning protected History fields (`gen_worker`, `gen_started_time`, `gen_ended_time`, `sim_worker`, `sim_started`, `sim_started_time`, `sim_ended`, `sim_ended_time`, `gen_informed`, `gen_informed_time`, `kill_sent`) are freely overwritable by default. Set `safe_mode=True` to enable protection. Overwriting these fields without understanding their purpose may crash libEnsemble.
# Agent Contributor Guidelines

## Architecture

Manager-worker framework: manager allocates points from a generator to workers running simulators. See `libensemble/tests/regression_tests/test_1d_sampling.py` for a minimal example.

## Repository Layout

Core paths relative to `libensemble/`:
- `alloc_funcs/` - Allocation policies.
- `comms/` - Manager-worker communication
- `executors/` - Launching executables
- `gen_classes/` - **Preferred**: gest-api generators
- `gen_funcs/` - Legacy generators
- `resources/` - Compute resource management
- `sim_funcs/` - Simulator functions
- `tests/{functionality,regression,unit}_tests/`
- `ensemble.py` - Primary interface (wraps `libE()`)
- `generators.py` - gest-api base classes
- `history.py` - NumPy structured array for input/output data
- `libE.py` - Entrypoint for libEnsemble, and also legacy top-level interface
- `manager.py` - History & worker coordination
- `specs.py` - `SimSpecs`, `GenSpecs`, `AllocSpecs`, `ExitCriteria`, `LibeSpecs` dataclasses
- `worker.py` - Runs simulators, communicates with manager. Can be configured to run generators as well

## Specifications (Modern Configs)

All configs use **dataclasses** from `specs.py`, not bare dicts (legacy):
- `SimSpecs` - simulator config (`sim_f`/`simulator`, `in`/`inputs`, `out`/`outputs`, `vocs`)
- `GenSpecs` - generator config (`gen_f`/`generator`, `in`/`inputs`, `out`/`outputs`, `persis_in`, `vocs`, `user`)
- `AllocSpecs` - allocation function config (`alloc_f`, `user`)
- `ExitCriteria` - termination conditions (`sim_max`, `wallclock_max`, `stop_val`)
- `LibeSpecs` - runtime config (`comms`, `nworkers`, `gen_on_worker`, `safe_mode`, etc.)

These accept `vocs` from `gest_api.vocs` (not xopt.vocs). The dict-based `sim_specs`/`gen_specs` API still works but is legacy.

## Generators

- **gest-api** (preferred): class inheriting `gest_api.Generator`, implements `suggest(input_dicts)`/`ingest(output_dicts)`, parameterized by `VOCS`. See `libensemble/gen_classes/external/sampling.py`.
- Generators are used for sampling, optimization, calibration, uncertainty quantification, and other simulation-based tasks.
- **Legacy**: plain functions with persistent loops. Use `LibensembleGenerator` to wrap into gest-api.
- History array: NumPy structured array with fields from `sim_specs/gen_specs["out"]` or `vocs` attributes plus reserved metadata fields.
- **Automatic Variable Mapping**: `LibensembleGenerator` maps all VOCS vars to `"x"` field unless `variables_mapping` is provided.
- **Mandatory Input Fields**: `gen_specs["in"]`/`["persis_in"]` must have >=1 field (e.g. `["sim_id"]`) when using `only_persistent_gens` allocator.
- **Default Allocator**: `only_persistent_gens` for gest-api generators.

## Conventions

- Match output fields ↔ input fields (e.g., `SimSpecs.out` ↔ `GenSpecs.in`, and vice-versa).
- Always use the dataclass configs from `specs.py` (`SimSpecs`, `GenSpecs`, etc.) over legacy bare dicts.
- `SimSpecs`/`GenSpecs` accept `vocs` from `gest_api.vocs` (not xopt.vocs).
- Code style: `black` (enforced by pre-commit via `pre-commit`).
- No destructive git commands without explicit request.

## Development

- **pixi** recommended. Enter: `pixi shell -e dev`. One-off: `pixi run -e dev <cmd>`. (Path: `/opt/homebrew/bin/pixi` or `/usr/local/bin/pixi`.)
- Fallback: `pip install -e .` (may need manual dependency installs).
- Pre-commit: `pre-commit` (config in `.pre-commit-config.yaml`, `pyproject.toml`).

## Testing

- Full suite: `python libensemble/tests/run_tests.py`
- Single unit test: `pixi run -e dev pytest path/to/test_file`
- Single regression/functionality test: `pixi run -e dev python path/to/test_file -n 4`
- Check `ensemble.log` and `libE_stats.txt` for run diagnostics.

## Modernizing for libEnsemble 2.0

When updating scripts from legacy patterns:

- **Generators**: Replace `gen_f` with gest-api `Generator` class set via `gen_specs["generator"]`.
- **VOCS**: Parameterize generators with `VOCS` from `gest_api.vocs`.
- **AllocSpecs**: `AllocSpecs` dataclass replaces bare dict. Often removable — `only_persistent_gens` is the default allocator.
- **Generator placement**: Runs on manager (Worker 0) by default. Set `LibeSpecs(gen_on_worker=True)` to run on a dedicated worker.
- **Input fields**: `GenSpecs.inputs`/`persis_in` must contain >=1 field.
- **Simulators**: Use `SimSpecs.simulator` with `(input_dict, **kwargs) -> dict` instead of `sim_f`. libEnsemble auto-wraps via `gest_api_sim`. `inputs`/`outputs` auto-derived from `vocs`.
- **safe_mode**: `LibeSpecs(safe_mode=False)` by default (protected History fields overwritable). Set `True` to guard metadata fields (`gen_worker`, `sim_worker`, `sim_started`, `sim_ended`, etc.).
13 changes: 12 additions & 1 deletion docs/advanced_installation/advanced_installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,18 @@ Further recommendations for selected HPC systems are given in the
Globus Compute
--------------

`Globus Compute`_ may be installed optionally to submit simulation function instances to remote Globus Compute endpoints.
`Globus Compute`_ may be installed optionally to submit simulation function
instances to remote Globus Compute endpoints::

pip install globus-compute-sdk

This is an optional dependency; libEnsemble operates normally without it.
If Globus Compute is not installed and a ``globus_compute_endpoint`` is
configured, libEnsemble will warn and fall back to local execution.

See :ref:`Globus Compute - Remote User Functions<globus_compute_ref>` for
usage, and the :doc:`GlobusComputeExecutor API reference</executor/ex_globus_compute>`
for the full executor interface.

.. _Globus Compute: https://www.globus.org/compute
.. _Python: http://www.python.org
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def __getattr__(cls, name):
return MagicMock()


autodoc_mock_imports = ["ax", "gpcam", "IPython", "matplotlib", "pandas", "scipy", "surmise"]
autodoc_mock_imports = ["ax", "globus_compute_sdk", "gpcam", "IPython", "matplotlib", "pandas", "scipy", "surmise"]

MOCK_MODULES = [
"argparse",
Expand Down
57 changes: 57 additions & 0 deletions docs/executor/ex_globus_compute.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Globus Compute Executor
=======================

`Overview <ex_overview.html>`__ \|\| `Base Executor <ex_base.html>`__ \|\| `MPI Executor <ex_mpi.html>`__ \|\| **Globus Compute Executor**

The :class:`GlobusComputeExecutor<libensemble.executors.globus_compute_executor.GlobusComputeExecutor>`
submits Python callables to a remote `Globus Compute`_ endpoint instead of
launching local subprocesses. It can be used inside simulator functions in the
same way as the :doc:`MPI Executor<ex_mpi>`, retrieving it from
``libE_info["executor"]``.

See :ref:`Globus Compute - Remote User Functions<globus_compute_ref>` for an
overview of the two GC integration modes (manager-side GC-only and user-facing
executor).

.. note::

``globus-compute-sdk`` must be installed to use this executor::

pip install globus-compute-sdk

Users must also authenticate via Globus_ and have an active
`Globus Compute endpoint`_ running on the target system.

GlobusComputeExecutor
---------------------

.. autoclass:: libensemble.executors.globus_compute_executor.GlobusComputeExecutor
:members: register_app, submit, set_workerID, set_worker_info
:show-inheritance:

.. automethod:: __init__

GlobusComputeTask
-----------------

Tasks are created and returned by
:meth:`GlobusComputeExecutor.submit()<libensemble.executors.globus_compute_executor.GlobusComputeExecutor.submit>`.
Each task wraps a ``concurrent.futures.Future`` from the Globus Compute SDK
and exposes the same polling interface as other libEnsemble tasks.

.. autoclass:: libensemble.executors.globus_compute_executor.GlobusComputeTask
:members: poll, wait, kill, result, running, done, cancelled

**Task states**: ``RUNNING`` | ``FINISHED`` | ``FAILED`` | ``USER_KILLED``

**Key attributes**:

:task.state: (string) Current task state — one of the values above.
:task.finished: (bool) True once the task has completed (successfully or not).
:task.success: (bool) True if the remote callable returned without raising.
:task.runtime: (float) Elapsed wall-clock seconds since submission.
:task.submit_time: (float) Time since epoch at submission.

.. _Globus Compute: https://www.globus.org/compute
.. _Globus: https://www.globus.org/
.. _Globus Compute endpoint: https://globus-compute.readthedocs.io/en/latest/endpoints.html
8 changes: 6 additions & 2 deletions docs/executor/ex_index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _executor_index:

**Overview** \|\| `Base Executor <ex_base.html>`__ \|\| `MPI Executor <ex_mpi.html>`__
**Overview** \|\| `Base Executor <ex_base.html>`__ \|\| `MPI Executor <ex_mpi.html>`__ \|\| `Globus Compute Executor <ex_globus_compute.html>`__

Executors
=========
Expand All @@ -14,8 +14,12 @@ portable interface for running and managing user applications.
ex_overview
ex_base
ex_mpi
ex_globus_compute

The **Executor** provides a portable interface for running applications on any system and
any number of compute resources.
any number of compute resources. The :doc:`MPI Executor<ex_mpi>` launches MPI
applications on local resources; the
:doc:`Globus Compute Executor<ex_globus_compute>` submits Python callables to
remote Globus Compute endpoints.

Please select from the sections above or the sidebar navigation to read more.
Loading
Loading