cybench.runs package

Submodules

cybench.runs.agml_workshop module

class cybench.runs.agml_workshop.LSTMModel(time_series_have_same_length=False, num_rnn_layers=1, rnn_hidden_size=64, num_outputs=1, *args, **kwargs)

Bases: BaseModel, Module

_abc_impl = <_abc._abc_data object>
_get_validation_splits(all_years, num_folds=1, num_valid_years=5)
_optimize_hyperparameters(train_dataset, param_space, loss, batch_size, epochs, save_model_path)
_train_epoch(train_loader, loss, optimizer)
fit(train_dataset, optimize_hyperparameters=False, epochs=10, **fit_params)

Fit or train the model.

Parameters:
  • dataset – Dataset

  • **fit_params – Additional parameters.

Returns:

A tuple containing the fitted model and a dict with additional information.

forward(X_ts, X_rest)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod load(model_name)

Deserialize a saved model.

Parameters:

model_name – Filename that was used to save the model.

Returns:

The deserialized model.

predict(test_dataset)

Run fitted model on data.

Parameters:
  • dataset – Dataset

  • **predict_params – Additional parameters.

Returns:

A tuple containing a np.ndarray and a dict with additional information.

predict_items(X: list, device: str = 'cpu', **predict_params)

Run fitted model on a list of data items.

Parameters:
  • X (list) – a list of data items, each of which is a dict

  • device (str) – str, the device to use

  • **predict_params – Additional parameters

Returns:

A tuple containing a np.ndarray and a dict with additional information.

save(model_name)

Save model, e.g. using pickle.

Parameters:

model_name – Filename that will be used to save the model.

cybench.runs.agml_workshop.date_from_dekad(dekad, year)

Reconstruct date string from dekad and year. NOTE: Don’t use this with CY-Bench data aligned to crop season. For aligned data, KEY_YEAR and year in “date” can be different. So it’s incorrect to infer data based on dekad and year.

Parameters:
  • dekad (int) – a number from 1-36 indicating ~10-day periods

  • year (int) – year in YYYY format

Returns:

datetime in YYYYmmdd format

cybench.runs.agml_workshop.get_cybench_data()

Reproduce results from AgML 2024 for LSTM models using CY-Bench data. Compare the workshop LSTM implementation and benchmark LSTM implementation to validate their performance on the same data. NRMSE must be around 25%. These results were produced with

inputs:

static: [“awc”] time series: [“tmin”, “tmax”, “tavg”, “prec”, “cwb”, “rad”] + [“fpar”]

NOTE: These should match the definitions of STATIC_PREDICTORS

and TIME_SERIES_PREDICTORS.

NOTE: All time series inputs are at the same (dekadal) resolution.

This means BaselineLSTM does not need to aggregate time series data.

epochs=10 lr=0.0001 weight_decay=0.0001. Since BaselineLSTM uses weight_decay=0.00001, the same value is now used for the workshop LSTMModel implementation above.

cybench.runs.agml_workshop.get_cybench_data_aligned_to_crop_season()
cybench.runs.agml_workshop.get_workshop_data()

Reproduce results from AgML 2024 for LSTM models. Compare the workshop LSTM implementation and benchmark LSTM implementation to validate their performance on the same data. NRMSE must be around 25%. These results were produced with

inputs:

static: [“awc”] time series: [“tmin”, “tmax”, “tavg”, “prec”, “cwb”, “rad”] + [“fpar”]

NOTE: These should match the definitions of STATIC_PREDICTORS

and TIME_SERIES_PREDICTORS.

NOTE: All time series inputs are at the same (dekadal) resolution.

This means BaselineLSTM does not need to aggregate time series data.

epochs=10 lr=0.0001 weight_decay=0.0001. Since BaselineLSTM uses weight_decay=0.00001, the same value is now used for the workshop LSTMModel implementation above.

cybench.runs.agml_workshop.validate_agml_workshop_results(df_y, dfs_x, time_series_have_same_length=False)

cybench.runs.process_results module

cybench.runs.process_results.df_to_markdown(df, formatted_df)
cybench.runs.process_results.format_row(row, metric)
cybench.runs.process_results.results_to_metrics()
cybench.runs.process_results.results_to_residuals(model_names)
cybench.runs.process_results.write_results_to_table()

cybench.runs.results_plots module

cybench.runs.results_plots.box_plots_metrics(data, crop, countries, metric, metric_label, subplots_per_row=4)
cybench.runs.results_plots.box_plots_residuals(data, crop, countries, residual_cols, residual_labels, ymin, ymax, subplots_per_row=4)
cybench.runs.results_plots.plot_bars(df, metric, metric_label, title_label, file_name)
cybench.runs.results_plots.plot_graph(df, x_col, hue_col, x_label, metric, metric_label, title, file_name, rotation=45)
cybench.runs.results_plots.plot_metrics(df: DataFrame, metric: str | None = None)
cybench.runs.results_plots.plot_yearly_metrics(data, crop, country, metric, metric_label)
cybench.runs.results_plots.plot_yearly_residuals(data, crop, country, residual_cols, residual_labels)

cybench.runs.run_benchmark module

cybench.runs.run_benchmark.compute_metrics(run_name: str, model_names: list | None = None) DataFrame

Compute evaluation metrics on saved predictions. :param run_name: The name of the run. Will be used to store log files and model results :type run_name: str :param model_names: names of models :type model_names: list

Returns:

a pd.DataFrame containing evaluation metrics

cybench.runs.run_benchmark.get_prediction_residuals(run_name: str, model_names: dict) DataFrame

Get prediction residuals (i.e., model predictions - labels). :param run_name: The name of the run. Will be used to store log files and model results :type run_name: str :param model_names: A mapping of model name (key) to a shorter name (value) :type model_names: dict

Returns:

a pd.DataFrame containing prediction residuals

cybench.runs.run_benchmark.load_results(run_name: str) DataFrame

Load saved results for analysis or visualization. :param run_name: The name of the run. Will be used to store log files and model results :type run_name: str

Returns:

a pd.DataFrame containing the predictions of benchmark models

cybench.runs.run_benchmark.run_benchmark(run_name: str, model_name: str | None = None, model_constructor: callable | None = None, model_init_kwargs: dict | None = None, model_fit_kwargs: dict | None = None, baseline_models: list | None = None, dataset_name: str = 'maize_NL', sel_years: list | None = None, nn_models_epochs: int | None = None) dict

Run CY-Bench. :param run_name: The name of the run. Will be used to store log files and model results :type run_name: str :param model_name: The name of the model. Will be used to store log files and model results :type model_name: str :param model_constructor: The constructor of the model. Will be used to construct the model :type model_constructor: Callable :param model_init_kwargs: The kwargs used when constructing the model. :type model_init_kwargs: dict :param model_fit_kwargs: The kwargs used to fit the model. :type model_fit_kwargs: dict :param baseline_models: A list of names of baseline models to run next to the provided model.

If unspecified, a default list of baseline models will be used.

Parameters:
  • dataset_name (str) – The name of the dataset to load

  • sel_years (list) – a list of years to run leave one year out (for tests)

  • nn_models_epochs (int) – Number of epochs to run for nn-models (for tests)

Returns:

a dictionary containing the results of the benchmark

cybench.runs.run_benchmark.run_benchmark_on_all_data()

cybench.runs.validate_model module

cybench.runs.validate_model.validate_single_model(run_name: str, model_name: str, model_constructor: callable, model_init_kwargs: dict | None = None, model_fit_kwargs: dict | None = None, baseline_models: list | None = None, dataset_name: str = 'test_maize_us', test_years_to_leave_out: list | None = None) dict

Run a single model on a single outer fold and return validation results. Test is is left out completely and not used for training or validation. Not used for benchmarking. Use run_benchmark instead. Hyperparameters should be optimized in each outer fold in the benchmark. This function should only be used for exploration of initial hyperparameter settings.

Parameters:
  • run_name (str) – The name of the run. Will be used to store log files and model results

  • model_name (str) – The name of the model. Will be used to store log files and model results

  • model_constructor (Callable) – The constructor of the model. Will be used to construct the model

  • model_init_kwargs (dict) – The kwargs used when constructing the model.

  • model_fit_kwargs (dict) – The kwargs used to fit the model.

  • baseline_models (list) – A list of names of baseline models to run next to the provided model. If unspecified, a default list of baseline models will be used.

  • dataset_name (str) – The name of the dataset to load

Returns:

a dictionary containing the results of the benchmark

Module contents