cybench.models package
Submodules
cybench.models.model module
Model base class
- class cybench.models.model.BaseModel
Bases:
ABC
- _abc_impl = <_abc._abc_data object>
- abstract fit(dataset: Dataset, **fit_params) tuple
Fit or train the model.
- Parameters:
dataset – Dataset
**fit_params – Additional parameters.
- Returns:
A tuple containing the fitted model and a dict with additional information.
- abstract load(model_name)
Deserialize a saved model.
- Parameters:
model_name – Filename that was used to save the model.
- Returns:
The deserialized model.
- predict(dataset: Dataset, **predict_params) tuple
Run fitted model on data.
- Parameters:
dataset – Dataset
**predict_params – Additional parameters.
- Returns:
A tuple containing a np.ndarray and a dict with additional information.
- abstract predict_batch(X: list, **predict_params)
Run fitted model on batched data items.
- Parameters:
X – a list of data items, each of which is a dict
**predict_params – Additional parameters.
- Returns:
A tuple containing a np.ndarray and a dict with additional information.
- predict_item(X: dict, **predict_params)
Run fitted model on one data item.
- Parameters:
X – a data item
**predict_params – Additional parameters.
- Returns:
A tuple containing a np.ndarray and a dict with additional information.
- abstract save(model_name)
Save model, e.g. using pickle.
- Parameters:
model_name – Filename that will be used to save the model.
cybench.models.naive_models module
- class cybench.models.naive_models.AverageYieldModel(group_by=['adm_id'])
Bases:
BaseModel
A naive yield prediction model.
Predicts the average of the training set by location. If the location is not in the training data, then predicts the global average.
- _abc_impl = <_abc._abc_data object>
- fit(dataset: Dataset, **fit_params) tuple
Fit or train the model.
- Parameters:
dataset – Dataset
**fit_params – Additional parameters.
- Returns:
A tuple containing the fitted model and a dict with additional information.
- load(model_name)
Deserialize a saved model.
- Parameters:
model_name – Filename that was used to save the model.
- Returns:
The deserialized model.
- predict_batch(X: list)
Run fitted model on batched data items.
- Parameters:
X – a list of data items, each of which is a dict
- Returns:
A tuple containing a np.ndarray and a dict with additional information.
- save(model_name)
Save model, e.g. using pickle.
- Parameters:
model_name – Filename that will be used to save the model.
cybench.models.nn_models module
- class cybench.models.nn_models.BaseNNModel(**kwargs)
Bases:
BaseModel
,Module
- _abc_impl = <_abc._abc_data object>
- fit(dataset: Dataset, optimize_hyperparameters: bool = False, param_space: dict | None = None, do_kfold: bool = False, kfolds: int = 5, *args, **kwargs)
Fit or train the model.
- Parameters:
dataset – Dataset
**fit_params – Additional parameters.
- Returns:
A tuple containing the fitted model and a dict with additional information.
- classmethod load(model_name)
Load model using torch.load.
- Parameters:
model_name – Filename that was used to save the model.
- Returns:
The loaded model.
- predict_batch(X: list, device: str | None = None, batch_size: int | None = None)
Run fitted model on batched data items.
- Parameters:
X – a list of data items, each of which is a dict
device – str, the device to use, default is “cuda” if available else “cpu”
batch_size – int, the batch size, default is self.batch_size stored during fit method
- Returns:
A tuple containing a np.ndarray and a dict with additional information.
- save(model_name)
Save model using torch.save.
- Parameters:
model_name – Filename that will be used to save the model.
- train_model(train_dataset: Dataset, val_dataset: Dataset | None = None, val_fraction: float = 0.1, val_split_by_year: bool = False, val_every_n_epochs: int = 1, do_early_stopping: bool = False, num_epochs: int = 1, batch_size: int = 10, loss_fn: callable | None = None, loss_kwargs: dict | None = None, optim_fn: callable | None = None, optim_kwargs: dict | None = None, scheduler_fn: callable | None = None, scheduler_kwargs: dict | None = None, device: str | None = None, **fit_params)
Fit or train the model.
- Parameters:
train_dataset – Dataset,
val_dataset – Dataset, default is None. If None, val_fraction is used to split train_dataset into train and val.
val_fraction – float, percentage of data to use for validation, default is 0.1
val_split_by_year – bool, whether to split validation data by year, default is False
val_every_n_epochs – int, validation frequency, default is 1
do_early_stopping – bool, whether to use early stopping, default is False
num_epochs – int, the number of epochs to train the model, default is 1
batch_size – int, the batch size, default is 10
loss_fn – callable, the loss function, default is torch.nn.functional.mse_loss
loss_kwargs – dict, additional parameters for the loss function, default is {“reduction”: “mean”}
optim_fn – callable, the optimizer function, default is torch.optim.Adam
optim_kwargs – dict, additional parameters for the optimizer function, default is {}
scheduler_fn – callable, the scheduler function, default is None
scheduler_kwargs – dict, additional parameters for the scheduler function, default is {}
device – str, the device to use, default is “cpu”
**fit_params – Additional parameters.
- Returns:
A tuple containing the fitted model and a dict with additional information.
- class cybench.models.nn_models.ExampleLSTM(hidden_size, num_layers, output_size=1, transforms=[<function transform_ts_inputs_to_dekadal>, <function transform_stack_ts_static_inputs>], **kwargs)
Bases:
BaseNNModel
- _abc_impl = <_abc._abc_data object>
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
cybench.models.sklearn_model module
- class cybench.models.sklearn_model.SklearnModel(sklearn_est, feature_cols=None, scaler=None)
Bases:
BaseModel
- _abc_impl = <_abc._abc_data object>
- _design_features(crop, data_df)
Design features using data samples.
- Parameters:
crop – crop name (e.g. maize)
data_df – A pandas dataframe of data samples from Dataset
- Returns:
A pandas dataframe with KEY_LOC, KEY_YEAR and features.
- _optimize_hyperparameters(X, y, param_space, groups=None, kfolds=5)
Optimize hyperparameters
- Parameters:
X – np.ndarray of training features
y – np.ndarray of training labels
param_space – a dict of parameters to optimize
groups – np.ndarray with group values (e.g year values) for each row in X and y
kfolds – number of splits cross validation
- Returns:
A sklearn pipeline refitted with the optimal hyperparameters.
- fit(dataset: Dataset, **fit_params) tuple
Fit or train the model.
- Parameters:
dataset – Dataset
**fit_params – Additional parameters.
- Returns:
A tuple containing the fitted model and a dict with additional information.
- load(model_name)
Deserialize a saved model.
- Parameters:
model_name – Filename that was used to save the model.
- Returns:
The deserialized model.
- predict(dataset)
Run fitted model on batched data items.
- Parameters:
dataset – Dataset
- Returns:
A tuple containing a np.ndarray and a dict with additional information.
- predict_batch(X: list)
Run fitted model on batched data items.
- Parameters:
X – a list of data items, each of which is a dict
- Returns:
A tuple containing a np.ndarray and a dict with additional information.
- save(model_name)
Save model, e.g. using pickle. Check here for options to save and load scikit-learn models: https://scikit-learn.org/stable/model_persistence.html
- Parameters:
model_name – Filename that will be used to save the model.
cybench.models.trend_model module
- class cybench.models.trend_model.TrendModel(trend='linear')
Bases:
BaseModel
Default trend estimator.
Trend is estimated using years as features.
- _abc_impl = <_abc._abc_data object>
- _linear_trend_estimator(trend_x, trend_y)
Implements a linear trend. :param trend_x: a list of years. :param trend_y: a list of values (e.g. yields) :param pred_x: year for which to predict trend
- Returns:
A linear trend estimator
- _quadratic_trend_estimator(trend_x, trend_y)
Implements a quadratic trend. Suggested by @ritviksahajpal. :param trend_x: a np.ndarray of years. :param trend_y: a np.ndarray of values (e.g. yields)
- Returns:
A quadratic trend estimator (with an additive quadratic term)
- fit(dataset: Dataset, **fit_params) tuple
Fit or train the model. :param dataset: Dataset :param **fit_params: Additional parameters.
- Returns:
A tuple containing the fitted model and a dict with additional information.
- load(model_name)
Deserialize a saved model. :param model_name: Filename that was used to save the model.
- Returns:
The deserialized model.
- predict_batch(X: list)
Run fitted model on batched data items. :param X: a list of data items, each of which is a dict
- Returns:
A tuple containing a np.ndarray and a dict with additional information.
- save(model_name)
Save model, e.g. using pickle. :param model_name: Filename that will be used to save the model.