https://github.com/BigDataWUR/AgML-crop-yield-forecasting/actions/workflows/test.yml/badge.svg

AgML - Machine Learning for Agricultural Modeling

AgML is the AgMIP transdisciplinary community of agricultural and machine learning modelers.

AgML aspires to

  • identify key research gaps and opportunities at the intersection of agricultural modelling and machine learning research,

  • support enhanced collaboration and engagement between experts in these disciplines, and

  • conduct and publish protocol-based studies to establish best practices for robust machine learning use in agricultural modelling.

AgML Crop Yield Forecasting

The objective of AgML Crop Yield Forecasting task is to create a benchmark to compare models for crop yield forecasting across countries and crops. The models and forecasts can be used for food security planning or famine early warning. The benchmark is called CY-Bench (crop yield benchmark).

Table of contents

Overview

Early in-season predictions of crop yields can inform decisions at multiple levels of the food value chain from late-season agricultural management such as fertilization, harvest, and storage to import or export of produce. Anticipating crop yields is also important to ensure market transparency at the global level ( e.g. Agriculture Market Information System, GEOGLAM Crop Monitor) and to plan response actions in food insecure countries at risk of food production shortfalls.

We propose CY-Bench, a dataset and benchmark for subnational crop yield forecasting, with coverage of major crop growing countries of the world for maize and wheat. By subnational, we mean the administrative level where yield statistics are published. When statistics are available for multiple levels, we pick the highest resolution. By yield, we mean end-of-season yield statistics as published by national statistics offices or similar entities representing a group of countries. By forecasting, we mean prediction is made ahead of harvest. The task is also called in-season crop yield forecasting. In-season forecasting is done at a number of time points during the growing season from mid-season to before harvest. The first forecast is made in the middle of the season, i.e. (end of season - start of the season)/2, between mid-season and harvest and 2 weeks before harvest. These time points depend on the crop calendar for the selected crop and country (or region). Since yield statistics may not be available for the current season, we evaluate models using predictors and yield statistics for all available years. The models and forecasts can be used for food security planning or famine early warning. We compare models, algorithms and architectures by keeping other parts of the workflow as similar as possible. For example: the dataset includes same source for each type of predictor (e.g. weather variables, soil moisture, evapotranspiration, remote sensing biomass indicators, soil properties), and selected data are preprocessed using the same pipeline (use the crop mask, crop calendar; use the same boundary files and approach for spatial aggregation) and (for algorithms that require feature design) and same feature design protocol.

Coverage for maize

Undifferentiated Maize or Grain Maize where differentiated Maize Coverage Map

Coverage for wheat

Undifferentiated Wheat or Winter Wheat where differentiated Wheat Coverage Map

Deciphering crop names

The terms used to reference different varieties or seasons of maize/wheat has been simplified in CY-Bench. The following table describes the representative crop name as provided in the crop statistics

Country/Region

Maize

Wheat

EU-EUROSTAT

grain maize

soft wheat

Africa-FEWSNET

maize

Argentina

corn

wheat

Australia

winter wheat

Brazil

grain corn

grain wheat

China

grain corn

grain wheat/spring wheat/winter wheat

Germany

grain maize

winter wheat

India

maize

wheat

Mali

maize

Mexico

white/yellow corn

USA

grain corn

winter wheat

Getting started

cybench is an open source python library to load CY-Bench dataset and run the CY-Bench tasks.

Installation

git clone https://github.com/BigDataWUR/AgML-CY-Bench

Requirements

The benchmark results were produced in the following test environment:

Operating system: Ubuntu 18.04
CPU: Intel Xeon Gold 6448Y (32 Cores)
memory (RAM): 256GB
disk storage: 2TB
GPU: NVIDIA RTX A6000

Benchmark run time

During the benchmark run with the baseline models, several countries were run in parallel, each in a GPU in a distributed cluster. The larger countries took approximately 18 hours to complete. If run sequentially in a single capable GPU, the whole benchmark should take 50-60 hours to complete.

Software requirements: Python 3.9.4, scikit-learn 1.4.2, PyTorch 2.3.0+cu118.

Downloading dataset

Get the dataset from Google Drive or Zenodo.

Running the benchmark

First write a model class your_model that extends the BaseModel class. The base model class definition is inside models.model.

from cybench.models.model import BaseModel
from cybench.runs.run_benchmark import run_benchmark

class MyModel(BaseModel):
    pass


run_name = <run_name>
dataset_name = "maize_US"
run_benchmark(run_name=run_name,
              model_name="my_model",
              model_constructor=MyModel,
              model_init_kwargs: <int args>,
              model_fit_kwargs: <fit params>,
              dataset_name=dataset_name)

Dataset

Dataset can be loaded by crop and (optionally by country).

For example

dataset = Dataset.load("maize")

will load data for countries covered by the maize dataset. Maize data for the US can be loaded as follows:

dataset = Dataset.load("maize_US")

Data sources

Crop Statistics

Shapefiles or administrative boundaries

Predictors, crop masks, crop calendars

Africa from FEWSNET

Africa from FEWSNET

Weather: AgERA5

Mali (1)

Use Africa shapefiles from FEWSNET

Soil: WISE soil data

Argentina

Argentina

Soil moisture: GLDAS

Australia

Australia

Evapotranspiration: FAO

Brazil

Brazil

FAPAR: JRC FAPAR

China

China

Crop calendars: ESA WorldCereal

EU

EU

NDVI: MOD09CMG

Germany (2)

Use EU shapefiles

Crop Masks: ESA WorldCereal

India

India

Mexico

Mexico

US

US

1: Mali data at admin level 3. Mali data is also included in the FEWSNET Africa dataset, but at admin level 1 only.

2: Germany data is also included in the EU dataset, but there most of the data fails coherence tests (e.g. yield = production / harvest_area)

Leaderboard

Model Name

NRMSE

MAPE

AverageYieldModel

Linear TrendModel

Ridge (sklearn)

TorchLSTMModel

How to cite

Please cite CY-bench as follows:

@dataset{paudel_etal2024,
  author       = {Paudel, Dilli and
                  Baja, Hilmy and
                  van Bree, Ron and
                  Kallenberg, Michiel and
                  Ofori-Ampofo, Stella and
                  Potze, Aike and
                  Poudel, Pratishtha and
                  Saleh, Abdelrahman and
                  Anderson, Weston and
                  von Bloh, Malte and
                  Castellano, Andres and
                  Ennaji, Oumnia and
                  Hamed, Raed and
                  Laudien, Rahel and
                  Lee, Donghoon and
                  Luna, Inti and
                  Masiliūnas, Dainius and
                  Meroni, Michele and
                  Mutuku, Janet Mumo and
                  Mkuhlani, Siyabusa and
                  Richetti, Jonathan and
                  Ruane, Alex C. and
                  Sahajpal, Ritvik and
                  Shuai, Guanyuan and
                  Sitokonstantinou, Vasileios and
                  de Souza Noia Junior, Rogerio and
                  Srivastava, Amit Kumar and
                  Strong, Robert and
                  Sweet, Lily-belle and
                  Vojnović, Petar and
                  de Wit, Allard and
                  Zachow, Maximilian and
                  Athanasiadis, Ioannis N.},
  title        = {{CY-Bench: A comprehensive benchmark dataset
                   for subnational crop yield forecasting}},
  year         = 2024,
  publisher    = {AgML (https://www.agml.org/)},
  version      = {1.0},
  doi          = {10.5281/zenodo.11502143},
}

How to contribute

Thank you for your interest in contributing to AgML Crop Yield Forecasting. Please check contributing guidelines for how to get involved and contribute.

Additional information

For more information please visit the AgML website.

Indices and tables