cybench.datasets package

Submodules

cybench.datasets.alignment module

cybench.datasets.alignment._add_cutoff_days(df, lead_time)
cybench.datasets.alignment.align_data(df_y: DataFrame, dfs_x: tuple) tuple
cybench.datasets.alignment.trim_to_lead_time(df, crop_cal_df, lead_time, spinup_days=90)

cybench.datasets.configured module

cybench.datasets.configured._add_year(df: DataFrame) DataFrame
cybench.datasets.configured._preprocess_time_series_data(df, index_cols, select_cols, df_crop_cal, lead_time)
cybench.datasets.configured.load_dfs(crop: str, country_code: str, lead_time: str = 'mid-season') tuple
cybench.datasets.configured.load_dfs_crop(crop) tuple

cybench.datasets.dataset module

class cybench.datasets.dataset.Dataset(crop, data_target: DataFrame | None = None, data_inputs: list | None = None)

Bases: object

static _empty_df_target() DataFrame

Helper function that creates an empty (but rightly formatted) dataframe for yield statistics

static _filter_df_on_index(df: DataFrame, keys: list, level: int)

Helper method for filtering a dataframe based on the occurrence of certain values in a specified index

Parameters:
  • df – the dataframe that should be filtered

  • keys – the values on which it should filter

  • level – the index level in which samples should be filtered

Returns:

a filtered dataframe

_get_feature_data(loc_id: int, year: int) dict

Helper function for obtaining feature data corresponding to some index :param loc_id: location index value :param year: year index value :return: a dict containing all feature data corresponding to the specified index

static _split_df_on_index(df: DataFrame, split: tuple, level: int)
static _validate_dfs(df_y: DataFrame, dfs_x: list) bool

Helper function that implements some checks on whether the input dataframes are correctly formatted

Parameters:
  • df_y – dataframe containing yield statistics

  • dfs_x – list of dataframes each containing feature data

Returns:

a bool indicating whether the test has passed

property crop
property feature_names: set

Obtain a set containing all feature names

indices() list
static load(name: str) Dataset
property location_ids: set

Obtain a set containing all location ids occurring in the dataset

property max_date: str
property min_date: str
split_on_years(years_split: tuple) tuple

Create two new datasets based on the provided split in years

Parameters:

years_split – tuple e.g ([2012, 2014], [2013, 2015])

Returns:

two data sets

targets() array

Obtain an numpy array of targets or labels

property years: set

Obtain a set containing all years occurring in the dataset

cybench.datasets.dataset_overview module

cybench.datasets.dataset_torch module

class cybench.datasets.dataset_torch.TorchDataset(dataset: Dataset)

Bases: Dataset

classmethod _cast_to_tensor(sample: dict) dict

Create a sample with all data cast to torch tensors :param sample: the sample to convert :return: the converted data sample

classmethod collate_fn(samples: list) dict

Function that takes a list of data samples (as dicts, containing torch tensors) and converts it to a dict of batched torch tensors :param samples: a list of data samples :return: a dict with batched data

cybench.datasets.transforms module

cybench.datasets.transforms._transform_ts_input_to_dekadal(ts_key, value, dates, min_date, max_date)
cybench.datasets.transforms.transform_stack_ts_static_inputs(batch_dict, min_date, max_date)
cybench.datasets.transforms.transform_ts_inputs_to_dekadal(batch_dict, min_date, max_date)

Module contents