eii.training
Model training and validation utilities.
NPP model training utilities.
This module provides functions for training and validating the NPP potential model, including sampling from natural areas and model evaluation.
Requires additional dependencies: pip install eii[training]
Note
Most users will not need this module. Use eii.client for retrieving pre-computed EII data, or eii.compute for calculating EII with the existing trained model.
Example
from eii.training import setup_training_grid, train_npp_model grid_cells = setup_training_grid() model = train_npp_model(training_data)
setup_training_grid(grid_size_deg=TRAINING_GRID_SIZE_DEG)
Create a global grid for spatial cross-validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_size_deg
|
int
|
Size of grid cells in degrees. |
TRAINING_GRID_SIZE_DEG
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary mapping cell names to ee.Geometry objects. |
Source code in src/eii/training/sampling.py
train_npp_model(training_data, predictor_names=None, response_property='longterm_avg_npp_sum', output_asset_path=None, num_trees=RF_NUM_TREES, min_leaf_population=RF_MIN_LEAF_POPULATION, variables_per_split=RF_VARIABLES_PER_SPLIT, bag_fraction=RF_BAG_FRACTION, seed=RF_SEED, export=True)
Train Random Forest model for NPP prediction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
training_data
|
FeatureCollection
|
FeatureCollection with predictor variables and response. |
required |
predictor_names
|
list[str] | None
|
List of predictor property names. If None, infers from first feature (requires getInfo). |
None
|
response_property
|
str
|
Name of the response variable property. |
'longterm_avg_npp_sum'
|
output_asset_path
|
str | None
|
Asset path to export model. If None, generates default. |
None
|
num_trees
|
int
|
Number of trees in the forest. |
RF_NUM_TREES
|
min_leaf_population
|
int
|
Minimum samples in a leaf. |
RF_MIN_LEAF_POPULATION
|
variables_per_split
|
int
|
Number of variables to consider per split. |
RF_VARIABLES_PER_SPLIT
|
bag_fraction
|
float
|
Fraction of data to bag per tree. |
RF_BAG_FRACTION
|
seed
|
int
|
Random seed. |
RF_SEED
|
export
|
bool
|
Whether to export the model to an asset. |
True
|
Returns:
| Type | Description |
|---|---|
tuple[Classifier, Task | None]
|
Tuple of (trained_model, export_task or None). |
Source code in src/eii/training/model.py
get_train_test_split(training_data, split_ratio=TRAIN_TEST_SPLIT_RATIO, seed=RF_SEED, cv_grid_size=CV_GRID_SIZE_DEG, cv_buffer_size=CV_BUFFER_DEG)
Perform spatially stratified train/test split.
Uses a grid (e.g., 2 degrees) to create spatial blocks. Can optionally apply a negative buffer (margin) around each block to ensuring physical separation between training and validation sets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
training_data
|
FeatureCollection
|
FeatureCollection. |
required |
split_ratio
|
float
|
Fraction for training (default 0.9). |
TRAIN_TEST_SPLIT_RATIO
|
seed
|
int
|
Random seed for reproducibility. |
RF_SEED
|
cv_grid_size
|
int
|
Grid size in degrees for cross-validation blocks. |
CV_GRID_SIZE_DEG
|
cv_buffer_size
|
float
|
Buffer size in degrees to exclude from block edges. 0.0 means no buffer. 0.5 means 0.5 deg excluded from all sides. |
CV_BUFFER_DEG
|
Returns:
| Type | Description |
|---|---|
tuple[FeatureCollection, FeatureCollection]
|
Tuple of (training_set, validation_set). |
Source code in src/eii/training/model.py
validate_model(validation_set, model_asset_path, response_vars=['current_npp'], prediction_names=['classification'])
Validate a trained model using a FeatureCollection of validation points.
This method applies the classifier directly to the features, which is extremely fast compared to image-based validation, as it utilizes the predictor values already stored in the table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validation_set
|
FeatureCollection
|
FeatureCollection containing predictors and actual response. |
required |
model_asset_path
|
str
|
Path to the trained GEE classifier. |
required |
response_vars
|
list[str]
|
List of property names for actual values (e.g. ['current_npp', 'npp_std']) |
['current_npp']
|
prediction_names
|
list[str]
|
List of property names for predicted values (matching model output) |
['classification']
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing metrics for each response variable. |