diff --git a/docs/newproject.md b/docs/newproject.md index 5ffe34d9..7aa9e69f 100644 --- a/docs/newproject.md +++ b/docs/newproject.md @@ -1,5 +1,24 @@ # New Project Guide +## A Quick Hands-On Approach + +This guide is suitable for scientists or anyone else who wants to start trying things quickly to establish their first model and make a first attempt. More detail is provided below with more detail on the nuances and alternatives for each step. + +1. Use [https://pyearthtools.readthedocs.io/en/latest/notebooks/tutorial/FourCastMini_Demo.html](https://pyearthtools.readthedocs.io/en/latest/notebooks/tutorial/FourCastMini_Demo.html) as a template for what to do. +1. Determine the parameters you want to model, such as `temperature` or `wind`. When these become part of the neural network, they will be called *channels*. +2. Determine the data source they come from, such as ERA5 or another model or re-analysis source +3. Develop a `pipeline` which includes data normalisation +4. Using a bundled model, configure that model to the size required. This may only required the adjustment of `img_size`, `in_channels` and `out_channels` to match the size of your data. The grid dimension must be a multiple of four for this model, so you may need to crop or regrid your data to match. In future, a standard approach without this limitation will be added. +5. Run some number of training steps (using the `.fit` method) and visualise the outputs. Visualising predictions from the trained model every 3000 steps or so provides useful insight into the training process as well as helping see when the model might be fully trained. *There is no definite answer to how much training will be required. If your model isn't showing any progress at all after a couple of epochs, there may be a problem. Some models will start to show progress after 3000 steps.* + +This approach should be a usable starting point for any gridded inputs and outputs. The example is based on global modelling, but could reasonably be applied to nowcasting, observational data, limited area modelling, or just anything you can represent in an xarray on a grid. You could even add a grid containins data from a weather station at each grid point and see what happens. + +Getting a neural network to perform well and make optimal predictions is very hard, with many nuances. Getting started should be reasonably simple. + +The sections below go into more detail on how to treat source data, how to develop the most suitable pipeline for your project, how to use alternative neural network architectures, how to manage the training process, and how to perform a more thorough evaluation of the outputs. + +## Metholodogical Information + This guide offers a simple, repeatable process for undertaking a machine learning project. Experts in machine learning will recognise this as a standard approach, but of course it can be adapted as required in the project. Completing a project (whether using PyEarthTools or not) comprises the following steps: 1. Identify the sources of data that you wish to work with diff --git a/notebooks/tutorial/Working_with_Climate_Data.ipynb b/notebooks/tutorial/Working_with_Climate_Data.ipynb index f448862f..f50d4a70 100644 --- a/notebooks/tutorial/Working_with_Climate_Data.ipynb +++ b/notebooks/tutorial/Working_with_Climate_Data.ipynb @@ -461,8 +461,8 @@ " parent_experiment: historical\n", " modeling_realm: atmos\n", " realization: 1\n", - " cmor_version: 2.5.6