cybench.models package

Submodules

cybench.models.model module

Model base class

class cybench.models.model.BaseModel

Bases: ABC

_abc_impl = <_abc._abc_data object>
abstract fit(dataset: Dataset, **fit_params) tuple

Fit or train the model.

Parameters:
  • dataset – Dataset

  • **fit_params – Additional parameters.

Returns:

A tuple containing the fitted model and a dict with additional information.

abstract load(model_name)

Deserialize a saved model.

Parameters:

model_name – Filename that was used to save the model.

Returns:

The deserialized model.

predict(dataset: Dataset, **predict_params) tuple

Run fitted model on data.

Parameters:
  • dataset – Dataset

  • **predict_params – Additional parameters.

Returns:

A tuple containing a np.ndarray and a dict with additional information.

abstract predict_batch(X: list, **predict_params)

Run fitted model on batched data items.

Parameters:
  • X – a list of data items, each of which is a dict

  • **predict_params – Additional parameters.

Returns:

A tuple containing a np.ndarray and a dict with additional information.

predict_item(X: dict, **predict_params)

Run fitted model on one data item.

Parameters:
  • X – a data item

  • **predict_params – Additional parameters.

Returns:

A tuple containing a np.ndarray and a dict with additional information.

abstract save(model_name)

Save model, e.g. using pickle.

Parameters:

model_name – Filename that will be used to save the model.

cybench.models.naive_models module

class cybench.models.naive_models.AverageYieldModel(group_by=['adm_id'])

Bases: BaseModel

A naive yield prediction model.

Predicts the average of the training set by location. If the location is not in the training data, then predicts the global average.

_abc_impl = <_abc._abc_data object>
fit(dataset: Dataset, **fit_params) tuple

Fit or train the model.

Parameters:
  • dataset – Dataset

  • **fit_params – Additional parameters.

Returns:

A tuple containing the fitted model and a dict with additional information.

load(model_name)

Deserialize a saved model.

Parameters:

model_name – Filename that was used to save the model.

Returns:

The deserialized model.

predict_batch(X: list)

Run fitted model on batched data items.

Parameters:

X – a list of data items, each of which is a dict

Returns:

A tuple containing a np.ndarray and a dict with additional information.

save(model_name)

Save model, e.g. using pickle.

Parameters:

model_name – Filename that will be used to save the model.

cybench.models.nn_models module

class cybench.models.nn_models.BaseNNModel(**kwargs)

Bases: BaseModel, Module

_abc_impl = <_abc._abc_data object>
fit(dataset: Dataset, optimize_hyperparameters: bool = False, param_space: dict | None = None, do_kfold: bool = False, kfolds: int = 5, *args, **kwargs)

Fit or train the model.

Parameters:
  • dataset – Dataset

  • **fit_params – Additional parameters.

Returns:

A tuple containing the fitted model and a dict with additional information.

classmethod load(model_name)

Load model using torch.load.

Parameters:

model_name – Filename that was used to save the model.

Returns:

The loaded model.

predict_batch(X: list, device: str | None = None, batch_size: int | None = None)

Run fitted model on batched data items.

Parameters:
  • X – a list of data items, each of which is a dict

  • device – str, the device to use, default is “cuda” if available else “cpu”

  • batch_size – int, the batch size, default is self.batch_size stored during fit method

Returns:

A tuple containing a np.ndarray and a dict with additional information.

save(model_name)

Save model using torch.save.

Parameters:

model_name – Filename that will be used to save the model.

train_model(train_dataset: Dataset, val_dataset: Dataset | None = None, val_fraction: float = 0.1, val_split_by_year: bool = False, val_every_n_epochs: int = 1, do_early_stopping: bool = False, num_epochs: int = 1, batch_size: int = 10, loss_fn: callable | None = None, loss_kwargs: dict | None = None, optim_fn: callable | None = None, optim_kwargs: dict | None = None, scheduler_fn: callable | None = None, scheduler_kwargs: dict | None = None, device: str | None = None, **fit_params)

Fit or train the model.

Parameters:
  • train_dataset – Dataset,

  • val_dataset – Dataset, default is None. If None, val_fraction is used to split train_dataset into train and val.

  • val_fraction – float, percentage of data to use for validation, default is 0.1

  • val_split_by_year – bool, whether to split validation data by year, default is False

  • val_every_n_epochs – int, validation frequency, default is 1

  • do_early_stopping – bool, whether to use early stopping, default is False

  • num_epochs – int, the number of epochs to train the model, default is 1

  • batch_size – int, the batch size, default is 10

  • loss_fn – callable, the loss function, default is torch.nn.functional.mse_loss

  • loss_kwargs – dict, additional parameters for the loss function, default is {“reduction”: “mean”}

  • optim_fn – callable, the optimizer function, default is torch.optim.Adam

  • optim_kwargs – dict, additional parameters for the optimizer function, default is {}

  • scheduler_fn – callable, the scheduler function, default is None

  • scheduler_kwargs – dict, additional parameters for the scheduler function, default is {}

  • device – str, the device to use, default is “cpu”

  • **fit_params – Additional parameters.

Returns:

A tuple containing the fitted model and a dict with additional information.

class cybench.models.nn_models.ExampleLSTM(hidden_size, num_layers, output_size=1, transforms=[<function transform_ts_inputs_to_dekadal>, <function transform_stack_ts_static_inputs>], **kwargs)

Bases: BaseNNModel

_abc_impl = <_abc._abc_data object>
forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cybench.models.sklearn_model module

class cybench.models.sklearn_model.SklearnModel(sklearn_est, feature_cols=None, scaler=None)

Bases: BaseModel

_abc_impl = <_abc._abc_data object>
_design_features(crop, data_df)

Design features using data samples.

Parameters:
  • crop – crop name (e.g. maize)

  • data_df – A pandas dataframe of data samples from Dataset

Returns:

A pandas dataframe with KEY_LOC, KEY_YEAR and features.

_optimize_hyperparameters(X, y, param_space, groups=None, kfolds=5)

Optimize hyperparameters

Parameters:
  • X – np.ndarray of training features

  • y – np.ndarray of training labels

  • param_space – a dict of parameters to optimize

  • groups – np.ndarray with group values (e.g year values) for each row in X and y

  • kfolds – number of splits cross validation

Returns:

A sklearn pipeline refitted with the optimal hyperparameters.

fit(dataset: Dataset, **fit_params) tuple

Fit or train the model.

Parameters:
  • dataset – Dataset

  • **fit_params – Additional parameters.

Returns:

A tuple containing the fitted model and a dict with additional information.

load(model_name)

Deserialize a saved model.

Parameters:

model_name – Filename that was used to save the model.

Returns:

The deserialized model.

predict(dataset)

Run fitted model on batched data items.

Parameters:

dataset – Dataset

Returns:

A tuple containing a np.ndarray and a dict with additional information.

predict_batch(X: list)

Run fitted model on batched data items.

Parameters:

X – a list of data items, each of which is a dict

Returns:

A tuple containing a np.ndarray and a dict with additional information.

save(model_name)

Save model, e.g. using pickle. Check here for options to save and load scikit-learn models: https://scikit-learn.org/stable/model_persistence.html

Parameters:

model_name – Filename that will be used to save the model.

cybench.models.trend_model module

class cybench.models.trend_model.TrendModel(trend='linear')

Bases: BaseModel

Default trend estimator.

Trend is estimated using years as features.

_abc_impl = <_abc._abc_data object>
_linear_trend_estimator(trend_x, trend_y)

Implements a linear trend. :param trend_x: a list of years. :param trend_y: a list of values (e.g. yields) :param pred_x: year for which to predict trend

Returns:

A linear trend estimator

_quadratic_trend_estimator(trend_x, trend_y)

Implements a quadratic trend. Suggested by @ritviksahajpal. :param trend_x: a np.ndarray of years. :param trend_y: a np.ndarray of values (e.g. yields)

Returns:

A quadratic trend estimator (with an additive quadratic term)

fit(dataset: Dataset, **fit_params) tuple

Fit or train the model. :param dataset: Dataset :param **fit_params: Additional parameters.

Returns:

A tuple containing the fitted model and a dict with additional information.

load(model_name)

Deserialize a saved model. :param model_name: Filename that was used to save the model.

Returns:

The deserialized model.

predict_batch(X: list)

Run fitted model on batched data items. :param X: a list of data items, each of which is a dict

Returns:

A tuple containing a np.ndarray and a dict with additional information.

save(model_name)

Save model, e.g. using pickle. :param model_name: Filename that will be used to save the model.

Module contents