cybench.datasets package
Submodules
cybench.datasets.alignment module
- cybench.datasets.alignment._add_cutoff_days(df, lead_time)
- cybench.datasets.alignment.align_data(df_y: DataFrame, dfs_x: tuple) tuple
- cybench.datasets.alignment.trim_to_lead_time(df, crop_cal_df, lead_time, spinup_days=90)
cybench.datasets.configured module
- cybench.datasets.configured._add_year(df: DataFrame) DataFrame
- cybench.datasets.configured._preprocess_time_series_data(df, index_cols, select_cols, df_crop_cal, lead_time)
- cybench.datasets.configured.load_dfs(crop: str, country_code: str, lead_time: str = 'mid-season') tuple
- cybench.datasets.configured.load_dfs_crop(crop) tuple
cybench.datasets.dataset module
- class cybench.datasets.dataset.Dataset(crop, data_target: DataFrame | None = None, data_inputs: list | None = None)
Bases:
object
- static _empty_df_target() DataFrame
Helper function that creates an empty (but rightly formatted) dataframe for yield statistics
- static _filter_df_on_index(df: DataFrame, keys: list, level: int)
Helper method for filtering a dataframe based on the occurrence of certain values in a specified index
- Parameters:
df – the dataframe that should be filtered
keys – the values on which it should filter
level – the index level in which samples should be filtered
- Returns:
a filtered dataframe
- _get_feature_data(loc_id: int, year: int) dict
Helper function for obtaining feature data corresponding to some index :param loc_id: location index value :param year: year index value :return: a dict containing all feature data corresponding to the specified index
- static _split_df_on_index(df: DataFrame, split: tuple, level: int)
- static _validate_dfs(df_y: DataFrame, dfs_x: list) bool
Helper function that implements some checks on whether the input dataframes are correctly formatted
- Parameters:
df_y – dataframe containing yield statistics
dfs_x – list of dataframes each containing feature data
- Returns:
a bool indicating whether the test has passed
- property crop
- property feature_names: set
Obtain a set containing all feature names
- indices() list
- property location_ids: set
Obtain a set containing all location ids occurring in the dataset
- property max_date: str
- property min_date: str
- split_on_years(years_split: tuple) tuple
Create two new datasets based on the provided split in years
- Parameters:
years_split – tuple e.g ([2012, 2014], [2013, 2015])
- Returns:
two data sets
- targets() array
Obtain an numpy array of targets or labels
- property years: set
Obtain a set containing all years occurring in the dataset
cybench.datasets.dataset_overview module
cybench.datasets.dataset_torch module
- class cybench.datasets.dataset_torch.TorchDataset(dataset: Dataset)
Bases:
Dataset
- classmethod _cast_to_tensor(sample: dict) dict
Create a sample with all data cast to torch tensors :param sample: the sample to convert :return: the converted data sample
- classmethod collate_fn(samples: list) dict
Function that takes a list of data samples (as dicts, containing torch tensors) and converts it to a dict of batched torch tensors :param samples: a list of data samples :return: a dict with batched data
cybench.datasets.transforms module
- cybench.datasets.transforms._transform_ts_input_to_dekadal(ts_key, value, dates, min_date, max_date)
- cybench.datasets.transforms.transform_stack_ts_static_inputs(batch_dict, min_date, max_date)
- cybench.datasets.transforms.transform_ts_inputs_to_dekadal(batch_dict, min_date, max_date)