{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# GAIA: Global AI Accelerator \n\nThis repository contains code for training and running climate neural\nnetwork surrogate models. For detais on various experiments visit our\nsite https://stresearch.github.io/gaia/\n\n| The GAIA team is a collaboration between:\n| - [STR](https://www.str.us)_\n| - [University of New South Wales,\n  Sydney](https://www.ccrc.unsw.edu.au/ccrc-team/academic-research/steven-sherwood)_\n\n**Warning:** *This is an active research project. The code base is\nconstantly evolving as new features are being added and old ones are\ndepreciated.*\n\n   This work is part of the DARPA ACTME (AI-assisted Climate\n   Tipping-point Modeling) AIE Program -\n   https://github.com/ACTM-darpa/info-and-links\n\n|image1|\n\n-  [Installation](#installation)_\n-  [Data Preprocessing](#data-preprocessing)_\n\n   -  [Example Toy Dataset](#example-toy-dataset)_\n   -  [Process Raw Dataset](#process-raw-dataset)_\n\n-  [Configuration Parameters](#configuration-parameters)_\n\n   -  [Configuration Parameters\n      Details](#configuration-parameters-details)_\n   -  [Dataset Params](#dataset-params)_\n   -  [Training Params](#training-params)_\n   -  [Model Params](#model-params)_\n\n-  [Training](#training)_\n-  [Inference](#inference)_\n-  [Generate Diagnostic Plots](#generate-diagnostic-plots)_\n-  [Export Model for Integration](#export-model-for-integration)_\n-  [Pre-trained Models](#pre-trained-models)_\n\n## Installation\n\nInstall requirments:\n\n.. code:: bash\n\n   git clone https://github.com/stresearch/gaia\n   pip install -r requirements\n\n## Data Preprocessing\n\n### Example Toy Dataset\n\nWe provide a toy dataset\n[here](https://4d41262f-0f54-45cc-b82b-6ba60be7a600-gaia-models.s3.amazonaws.com/actm_gallery/cam4_toy.tar.gz)_.\nIt\u2019s subsampled cam4 dataset.\n\n### Process Raw Dataset\n\nTo prerocess large scale exports from climate model runs. we work with\noutputs from two climate models: CAM4 and SPCAM. - We assume raw data\nresides in an S3 bucket with one file per day in the ``NCDF4`` format. -\nTo prepocess the data we use a fairy large AWS EC instance: -\n``r4.16xlarge`` with 64 CPUs - attach at least 500GB EBS volume for\nlocal caching\n\nTo run prepocessing from an AWS instance with default parameters for\nsplit=\\ ``train,test``:\n\n.. code:: python\n\n   NCDataConstructor.default_data(\n           cls,\n           split=\"train\",\n           bucket_name=\"name_of_bucket\",\n           prefix=\"spcamclbm-nx-16-20m-timestep\",\n           save_location=\".\",\n           train_years = 2,\n           cache = \".\",\n           workers = 64\n       )\n\nWe assume the following input/output variables:\n\nThis should generate 4 files:\n\n::\n\n   spcamclbm-nx-16-20m-timestep_4_test.pt   spcamclbm-nx-16-20m-timestep_4_val.pt   \n   spcamclbm-nx-16-20m-timestep_4_train.pt  spcamclbm-nx-16-20m-timestep_4_var_index.pt\n\nCopy to machine where you want to train the model. For more details see\n```gaia.data`[\nmodule](https://github.com/stresearch/gaia/blob/c0268fa86aac53b04626ba77ebba1c76293f7557/gaia/data.py#L454)_\n\n## Configuration Parameters\n\nTo perform training, we use a machine with at least a single GPU and\n64GBs of RAM (to load the full dataset into memory, smaller for a toy\ndataset). To use the toy dataset, set the environmental variable\n``GAIA_TOY_DATA`` prefix where it is located.\n\nConfigure the data, model and training parameters. We specify\n``mode, dataset, inputs, outputs, batch_size, model_type, gpu and max-epochs``\n\n.. code:: python\n\n   import sys\n   import os\n   import glob\n   from gaia.training import main\n   from gaia.config import Config\n\n   os.environ[\"GAIA_TOY_DATA\"] = \"/ssddg1/gaia/cam4_v5/cam4-famip-30m-timestep-third-upload\"\n\n   inputs = ['B_Q [t+1]',\n    'B_T [t+1]',\n    'B_U [t+1]',\n    'B_V [t+1]',\n    'B_OMEGA [t+1]',\n    'B_Z3 [t+1]',\n    'B_PS [t+1]',\n    'SOLIN [t+1]',\n    'B_SHFLX [t+1]',\n    'B_LHFLX [t+1]',\n    'LANDFRAC [t]',\n    'OCNFRAC [t]',\n    'ICEFRAC [t]',\n    'FSNS [t]',\n    'FLNS [t]',\n    'FSNT [t]',\n    'FLNT [t]',\n    'FSDS [t]']\n\n   outputs = ['A_PTTEND [t+1]',\n    'A_PTEQ [t+1]',\n    'FSNS [t+1]',\n    'FLNS [t+1]',\n    'FSNT [t+1]',\n    'FLNT [t+1]',\n    'FSDS [t+1]',\n    'FLDS [t+1]',\n    'SRFRAD [t+1]',\n    'SOLL [t+1]',\n    'SOLS [t+1]',\n    'SOLLD [t+1]',\n    'SOLSD [t+1]',\n    'PRECT [t+1]',\n    'PRECC [t+1]',\n    'PRECL [t+1]',\n    'PRECSC [t+1]',\n    'PRECSL [t+1]']\n\n   config = Config(\n           {\n               \"mode\": \"train,test,predict\",\n               \"dataset_params\": {\n                   \"dataset\": \"toy\",\n                   \"inputs\": inputs,\n                   \"outputs\": outputs,\n                   \"batch_size\": 4096,\n               },\n               \"trainer_params\": {\"gpus\": [gpu], \"max_epochs\": 100},\n               \"model_params\": {\n                   \"model_type\": \"fcn\",\n               },\n           }\n       )\n\nThis is what the full config file looks.\n\n.. code:: python\n\n   print(config)\n\n   dataset_params:\n     batch_size: 4096\n     dataset: cam4_toy\n     inputs:\n     - B_Q [t+1]\n     - B_T [t+1]\n     - B_U [t+1]\n     - B_V [t+1]\n     - B_OMEGA [t+1]\n     - B_Z3 [t+1]\n     - B_PS [t+1]\n     - SOLIN [t+1]\n     - B_SHFLX [t+1]\n     - B_LHFLX [t+1]\n     - LANDFRAC [t]\n     - OCNFRAC [t]\n     - ICEFRAC [t]\n     - FSNS [t]\n     - FLNS [t]\n     - FSNT [t]\n     - FLNT [t]\n     - FSDS [t]\n     mean_thres: 1.0e-13\n     outputs:\n     - A_PTTEND [t+1]\n     - A_PTEQ [t+1]\n     - FSNS [t+1]\n     - FLNS [t+1]\n     - FSNT [t+1]\n     - FLNT [t+1]\n     - FSDS [t+1]\n     - FLDS [t+1]\n     - SRFRAD [t+1]\n     - SOLL [t+1]\n     - SOLS [t+1]\n     - SOLLD [t+1]\n     - SOLSD [t+1]\n     - PRECT [t+1]\n     - PRECC [t+1]\n     - PRECL [t+1]\n     - PRECSC [t+1]\n     - PRECSL [t+1]\n     test:\n       batch_size: 4096\n       data_grid: &id001\n       - 3.5446380000000097\n       - 7.3888135000000075\n       - 13.967214000000006\n       - 23.944625\n       - 37.23029000000011\n       - 53.1146050000002\n       - 70.05915000000029\n       - 85.43911500000031\n       - 100.51469500000029\n       - 118.25033500000026\n       - 139.11539500000046\n       - 163.66207000000043\n       - 192.53993500000033\n       - 226.51326500000036\n       - 266.4811550000001\n       - 313.5012650000006\n       - 368.81798000000157\n       - 433.8952250000011\n       - 510.45525500000167\n       - 600.5242000000027\n       - 696.7962900000033\n       - 787.7020600000026\n       - 867.1607600000013\n       - 929.6488750000024\n       - 970.5548300000014\n       - 992.5560999999998\n       dataset_file: /ssddg1/gaia/cam4_v5/cam4-famip-30m-timestep-third-upload_test.pt\n       flatten: true\n       include_index: false\n       inputs: &id002\n       - B_Q [t+1]\n       - B_T [t+1]\n       - B_U [t+1]\n       - B_V [t+1]\n       - B_OMEGA [t+1]\n       - B_Z3 [t+1]\n       - B_PS [t+1]\n       - SOLIN [t+1]\n       - B_SHFLX [t+1]\n       - B_LHFLX [t+1]\n       - LANDFRAC [t]\n       - OCNFRAC [t]\n       - ICEFRAC [t]\n       - FSNS [t]\n       - FLNS [t]\n       - FSNT [t]\n       - FLNT [t]\n       - FSDS [t]\n       outputs: &id003\n       - A_PTTEND [t+1]\n       - A_PTEQ [t+1]\n       - FSNS [t+1]\n       - FLNS [t+1]\n       - FSNT [t+1]\n       - FLNT [t+1]\n       - FSDS [t+1]\n       - FLDS [t+1]\n       - SRFRAD [t+1]\n       - SOLL [t+1]\n       - SOLS [t+1]\n       - SOLLD [t+1]\n       - SOLSD [t+1]\n       - PRECT [t+1]\n       - PRECC [t+1]\n       - PRECL [t+1]\n       - PRECSC [t+1]\n       - PRECSL [t+1]\n       shuffle: false\n       space_filter: null\n       subsample: 1\n       subsample_mode: random\n       var_index_file: /ssddg1/gaia/cam4_v5/cam4-famip-30m-timestep-third-upload_var_index.pt\n     train:\n       batch_size: 4096\n       data_grid: *id001\n       dataset_file: /ssddg1/gaia/cam4_v5/cam4-famip-30m-timestep-third-upload_train.pt\n       flatten: false\n       include_index: false\n       inputs: *id002\n       outputs: *id003\n       shuffle: true\n       space_filter: null\n       subsample: 1\n       subsample_mode: random\n       var_index_file: /ssddg1/gaia/cam4_v5/cam4-famip-30m-timestep-third-upload_var_index.pt\n     val:\n       batch_size: 4096\n       data_grid: *id001\n       dataset_file: /ssddg1/gaia/cam4_v5/cam4-famip-30m-timestep-third-upload_val.pt\n       flatten: false\n       include_index: false\n       inputs: *id002\n       outputs: *id003\n       shuffle: false\n       space_filter: null\n       subsample: 1\n       subsample_mode: random\n       var_index_file: /ssddg1/gaia/cam4_v5/cam4-famip-30m-timestep-third-upload_var_index.pt\n   mode: train,test,predict\n   model_params:\n     ckpt: null\n     lr: 0.001\n     lr_schedule: cosine\n     model_config:\n       dropout: 0.01\n       hidden_size: 512\n       leaky_relu: 0.15\n       model_type: fcn\n       num_layers: 7\n     model_type: fcn\n     replace_std_with_range: false\n     use_output_scaling: false\n     weight_decay: 0\n   seed: true\n   trainer_params:\n     gpus:\n     - 5\n     max_epochs: 100\n     precision: 16\n\n### Configuration Parameters Details\n\nFor default parameters consult ``gaia.config.Config`` class. There are\nthree groups of parameters:\n``trainer_params, dataset_params, model_params`` .\n\nParameters can be specified by - directly passing nested dictionaries\nfor each - pass in nothing which will automatically read in defaults\nfrom Config - command line arguments using the ``dot`` notation to\noverride specified Config defaults\n\nExample configs:\n\n### Dataset Params\n\n.. code:: python\n\n   dataset_params = \n   {'test': {'batch_size': 138240,\n     'dataset_file': '/ssddg1/gaia/cam4/cam4-famip-30m-timestep_4_test.pt',\n     'flatten': True,\n     'shuffle': False,\n     'var_index_file': '/ssddg1/gaia/cam4/cam4-famip-30m-timestep_4_var_index.pt'},\n    'train': {'batch_size': 138240,\n     'dataset_file': '/ssddg1/gaia/cam4/cam4-famip-30m-timestep_4_train.pt',\n     'flatten': False,\n     'shuffle': True,\n     'var_index_file': '/ssddg1/gaia/cam4/cam4-famip-30m-timestep_4_var_index.pt'},\n    'val': {'batch_size': 138240,\n     'dataset_file': '/ssddg1/gaia/cam4/cam4-famip-30m-timestep_4_val.pt',\n     'flatten': False,\n     'shuffle': False,\n     'var_index_file': '/ssddg1/gaia/cam4/cam4-famip-30m-timestep_4_var_index.pt'}}\n\n### Training Params\n\n.. code:: python\n\n   training_params = \n   {'precision': 16, 'max_epochs': 200, gpus=[0]}\n\n### Model Params\n\n.. code:: python\n\n   model_params = \n   {'lr': 0.001,\n    'optimizer': 'adam',\n    'model_config': {'model_type': 'fcn', 'num_layers': 7}}\n    \n\nWe support the following types of NN models:\n\nfcn: baseline MLP\n\n.. code:: python\n\n   model_config = {\n       \"model_type\": \"fcn\",\n       \"num_layers\": 7,\n       \"hidden_size\": 512,\n       \"dropout\": 0.01,\n       \"leaky_relu\": 0.15\n   }\n\nfcn_history: baseline MLP with an extra input of memory variables\ni.e. outputs from previous time step\n\n.. code:: python\n\n\n   model_config = {\n       \"model_type\": \"fcn_history\",\n       \"num_layers\": 7,\n       \"hidden_size\": 512,\n       \"leaky_relu\": 0.15\n   }\n\nconv1d: same as fcn functionally but accepts an \u201cimage\u201d like data\ni.e. image of lat,lon,variablles\n\n.. code:: python\n\n   model_config = {\n       \"model_type\": \"conv1d\",\n       \"num_layers\": 7,\n       \"hidden_size\": 128\n   }\n\nresdnn: architecture from [ref]\n\n.. code:: python\n\n   model_config = {\n       \"model_type\": \"resdnn\",\n       \"num_layers\": 7,\n       \"hidden_size\": 512,\n       \"dropout\": 0.01,\n       \"leaky_relu\": 0.15\n   }\n\nencoderdecoder: encoder/decoder with a bottleneck feature\n\n.. code:: python\n\n   model_config = {\n       \"model_type\": \"encoderdecoder\",\n       \"num_layers\": 7,\n       \"hidden_size\": 512,\n       \"dropout\": 0.01,\n       \"leaky_relu\": 0.15,\n       \"bottleneck_dim\": 32,\n   }\n\ntransformer: transformer with z level positional encoding\n\n.. code:: python\n\n   model_config = {\n               \"model_type\": \"transformer\",\n               \"num_layers\": 3,\n               \"hidden_size\": 128,\n           }\n\nconv2d: 2D seperable depthwise conv net with lat/lons as the spatial\ndimensions\n\n.. code:: python\n\n   model_config = {\n             \"model_type\": \"conv2d\",\n             \"num_layers\": 7,\n             \"hidden_size\": 176,\n             \"kernel_size\": 3,\n         }\n\n## Training\n\nTo train:\n\n.. code:: python\n\n   main(**config.config)\n\nAfter training the model is saved under ``lightning_logs/version_XX`` .\nAll the parameters are also saved under\n``lightning_logs/version_XX/hparams.yaml``\n\n## Inference\n\nTo use a model saved under saved under ``lightning_logs/version_XX``\npass the checkpoint path to ``ckpt`` argument and all the configuration\nwill automatically load\n\n.. code:: python\n\n   config = Config(\n           {\n               \"mode\": \"predict\",\n               \"dataset_params\": {\n                   \"dataset\": \"toy\",\n                   \"inputs\": inputs,\n                   \"outputs\": outputs,\n                   \"batch_size\": 4096,\n               },\n               \"trainer_params\": {\"gpus\": [gpu], \"max_epochs\": 100},\n               \"model_params\": {\n                   \"ckpt\": \"lightning_logs/version_XX\",\n               },\n           }\n       )\n\n   main(**config.config)\n\nPredictions file will be written out to the experiment checkpoint.\n\n## Generate Diagnostic Plots\n\nPlots will be saved in the experiment directory\n\n.. code:: python\n\n   from gaia.plot import save_diagnostic_plot, save_gradient_plots\n   save_gradient_plots(model_dir, device = f\"cuda:{gpu}\")\n   save_diagnostic_plot(model_dir) \n\n## Export Model for Integration\n\nExport pretrained pytorch model to a torchscript checkpoint to be loaded\ninto the intergrated hybrid model.\n\n.. code:: python\n\n   from gaia.export import export\n\n   model_dir = \"lightning_logs/version_3\"\n   export_name = \"export_model_cam4.pt\"\n\n   export(model_dir, export_name)\n\n## Pre-trained Models\n\nTo use a pretrained model:\n\n.. code:: python\n\n\n   config = Config(\n           {\n               \"mode\": \"predict\",\n               \"dataset_params\": {\n                   \"dataset\": \"toy\",\n                   \"inputs\": inputs,\n                   \"outputs\": outputs,\n                   \"batch_size\": 4096,\n               },\n               \"trainer_params\": {\"gpus\": [gpu], \"max_epochs\": 100},\n               \"model_params\": {\n                   \"ckpt\": \"path_to_checkpoint_directory\",\n               },\n           }\n       )\n\n   main(**config.config)\n\nFor lower level model access, you can load it directly:\n\n.. code:: python\n\n   from gaia.models import TrainingModel\n   model  = TrainingModel.load_from_checkpoint(get_checkpoint_file(model_dir))\n\nDownload pre-trained models:\n\n-  [FCN\n   CAM4](https://4d41262f-0f54-45cc-b82b-6ba60be7a600-gaia-models.s3.amazonaws.com/actm_gallery/fcn_cam4_model.ckpt)_\n-  [FCN\n   SPCAM](https://4d41262f-0f54-45cc-b82b-6ba60be7a600-gaia-models.s3.amazonaws.com/actm_gallery/fcn_spcam_model.ckpt)_\n\n.. |image1| image:: https://stresearch.github.io/gaia/sections/overview/overview_screenshot.png\n   :target: https://stresearch.github.io/gaia/\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}