RES API References#

Warning

This page is under heavy development - Additional modules and methods will be documented as the API stabilizes.

RESource provides a comprehensive API for variable renewable energy (VRE) resource assessment through a modular architecture. This reference documents the main classes and methods available for building custom assessment workflows.

Core Workflow Classes#

RESource Builder#

Main orchestrator class for renewable energy resource assessments.

class RES.RESources.RESources_builder(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]

Bases: AttributesParser

Main orchestrator class for renewable energy resource assessment workflows.

__RESources_builder__ coordinates the complete workflow for assessing solar and wind potential at sub-national scales. It integrates spatial grid cell preparation, land availability analysis, weather data processing, economic evaluation, and site clustering into a unified framework.

This class implements a modular architecture where each assessment step is handled by specialized components, enabling reproducible, scalable, and transparent renewable energy assessments.

Parameters:
  • config_file_path (str or Path) -- Path to the YAML configuration file containing project settings

  • region_short_code (str) -- ISO or custom short code for the target region (e.g., 'BC' for British Columbia)

  • resource_type ({'solar', 'wind'}) -- Type of renewable energy resource to assess

store

Root directory for data storage (HDF5 file) and caching.

Type:

Path

units

Handler for unit conversions and standardization

Type:

Units

gridcells

Spatial grid generation and management

Type:

GridCells

timeseries

Climate data processing and capacity factor calculations

Type:

Timeseries

datahandler

HDF5-based data storage and retrieval interface

Type:

DataHandler

cell_processor

Land availability and capacity potential calculations

Type:

CellCapacityProcessor

coders

Canadian power system data integration (substations, transmission lines).

Type:

CODERSData

era5_cutout

ERA5 climate data cutout management

Type:

ERA5Cutout

scorer

Economic scoring and LCOE calculations

Type:

CellScorer

gwa_cells

Global Wind Atlas data integration (wind resources only)

Type:

GWACells

results_save_to

Output directory for assessment results

Type:

Path

region_name

Full name of the assessed region

Type:

str

get_grid_cells()[source]

Generate spatial grid cells covering the region boundary

get_cell_capacity(force_update=False)[source]

Calculate potential capacity based on land availability constraints

extract_weather_data()[source]

Process climate data for capacity factor calculations

update_gwa_scaled_params(memory_resource_limitation=False)[source]

Integrate Global Wind Atlas wind speed corrections (wind only)

get_CF_timeseries(cells=None, force_update=False)[source]

Generate hourly capacity factor time series

find_grid_nodes(cells=None, use_pypsa_buses=False)[source]

Identify nearest electrical grid connection points

score_cells(cells=None)[source]

Calculate economic scores based on LCOE methodology

get_clusters(scored_cells=None, wcss_tolerance=0.05)[source]

Perform spatial clustering of viable sites

get_cluster_timeseries(clusters=None, dissolved_indices=None, cells_timeseries=None)[source]

Generate representative time series for each cluster

build(select_top_sites=True, use_pypsa_buses=True, memory_resource_limitation=True)[source]

Execute complete assessment workflow

export_results(resource_type, resource_clusters, cluster_timeseries, save_to=Path('results'))[source]

Export results in standardized format for downstream models

select_top_sites(sites, timeseries, resource_max_capacity=10)[source]

Filter results to highest-potential sites within capacity constraints

Examples

Basic wind assessment workflow:

>>> from RES.RESources import RESources_builder
>>> builder = RESources_builder(
...     config_file_path="config/config_BC.yaml",
...     region_short_code="BC",
...     resource_type="wind"
... )
>>> results = builder.build()
>>> builder.export_results(*results)

Step-by-step workflow with intermediate inspection:

>>> builder = RESources_builder("config/config.yaml", "AB", "solar")
>>> cells = builder.get_grid_cells()
>>> cells_with_capacity = builder.get_cell_capacity()
>>> scored_cells = builder.score_cells(cells_with_capacity)
>>> clusters = builder.get_clusters(scored_cells)

Notes

  • Inherits configuration parsing capabilities from AttributesParser

  • Uses HDF5 storage for efficient handling of large geospatial datasets

  • Implements caching mechanisms to avoid redundant computations

  • Supports both solar PV and onshore wind technologies

  • Economic calculations follow NREL LCOE methodology

  • Clustering uses k-means with automatic cluster number optimization

build(select_top_sites: bool | None = True, use_pypsa_buses: bool | None = False, memory_resource_limitation: bool | None = True)[source]

Execute the specific module logic for the given resource type ('solar' or 'wind').

static create_summary_info(resource_type: str, region: str, sites: DataFrame, timeseries: DataFrame) str[source]

Creates summary information to be exported alongside results data.

static dump_export_metadata(info: str, save_to: Path | None = 'results/linking')[source]

Dumps the metadata summary information to a file. If the file already exists, it prepends the new info at the top of the file.

static export_results(resource_type: str, region: str, resource_clusters: DataFrame, cluster_timeseries: DataFrame, save_to: Path | None = PosixPath('results'))[source]

Export processed resource cluster results (geodataframe) to standard datafield csvs as input for downstream models. ### Args - resource_type: The type of resource ('solar' or 'wind'). - resource_clusters: A DataFrame containing resource cluster information. - output_dir [optional]: The directory to save the output files. Default to : 'results/*.csv'

> Currently supports: CLEWs, PyPSA

extract_weather_data()[source]

Extracts weather data for the cells (e.g. windspeed, solar influx). This method retrieves the ERA5 cutout and extracts windspeed data for the cells. If the windspeed data is already present in the stored dataset, it skips the extraction from the source. If the resource type is 'wind', it extracts the 'windspeed_ERA5' from the cutout and updates the cells GeoDataFrame. If the resource type is 'solar', it currently does not support extraction from the Global Solar Atlas data. :param None:

Returns:

None

Notes

  • Currently active for windspeed only due to significant contrast with high resolution data.

find_grid_nodes(cells: GeoDataFrame = None, use_pypsa_buses: bool = False) GeoDataFrame[source]

Find the grid nodes for the given cells.

Parameters:
  • cells (gpd.GeoDataFrame, optional) -- Cells with their coordinates, geometry, and unique cell ids. Defaults to None.

  • use_pypsa_buses (bool, optional) -- Whether to use PyPSA buses as preferred nodes for resource connection. Defaults to False.

Returns:

Updated grid cells with nearest grid node information

Return type:

gpd.GeoDataFrame

Notes

Could be parallelized with Step 1B/C.

get_CF_timeseries(cells: GeoDataFrame = None, force_update=False) tuple[source]

Extract timeseries information for the Cells' e.g. static CF (yearly mean) and timeseries (hourly).

Parameters:
  • cells (gpd.GeoDataFrame) -- Cells with their coordinates, geometry, and unique cell ids.

  • force_update (bool) -- If True, forces the update of the CF timeseries data.

Returns:

A namedtuple containing the cells with their timeseries data.

Return type:

tuple

Notes

  • The method uses the Timeseries class to retrieve the timeseries data for the cells.

  • The timeseries data is retrieved based on the resource type (e.g., 'solar' or 'wind').

  • If the cells argument is not provided, it retrieves the cells from the data handler.

  • Could be parallelized with Step 2B/2C

get_cell_capacity()[source]

Retrieves the potential capacity of the cells based on land availability and land-use intensity. :param force_update: If True, forces the update of the cell capacity data. :type force_update: bool

Returns:

A namedtuple containing the cells with their potential capacity and the capacity matrix.

Return type:

tuple

Notes

  • The capacity matrix is a 2D array where each row corresponds to a cell and each column corresponds to a time step.

  • The potential capacity is calculated as:
    • Potential capacity (MW) = available land (%) x land-use intensity (MW/sq.km) x Area of a cell (sq. km)

    • The method uses the CellCapacityProcessor class to process the capacity data.

  • The method returns a namedtuple with two attributes: data (the cells GeoDataFrame) and matrix (the capacity matrix).

  • Could be parallelized with Step 2A/2C.

get_cluster_timeseries(clusters: GeoDataFrame = None, dissolved_indices: DataFrame = None, cells_timeseries: DataFrame = None)[source]
get_clusters(scored_cells: GeoDataFrame = None, score_tolerance: float = 200, wcss_tolerance=None)[source]
### Args:
  • WCSS (Within-cluster Sum of Square) tolerance. Higher tolerance gives , more simplification and less number of clusters.

  • Default set to 0.05.

get_grid_cells() GeoDataFrame[source]

Retrieves the default grid cells for the region.

Parameters:

None

Returns:

A GeoDataFrame containing the grid cells with their coordinates, geometry, and unique cell ids.

Return type:

gpd.GeoDataFrame

Notes

  • The get_default_grid() method creates several attributes, such as the atlite cutout object and the region_boundary.

  • Uses the cutout.grid attribute to create the analysis grid cells (GeoDataFrame).

Step 0: Set-up the Grid Cells and their Unique Indices to populate incremental datafields and to easy navigation to cells
  • Step to create the Cells with unique indices generated from their x,y (centroids).

score_cells(cells: GeoDataFrame = None)[source]

Scores the Cells based on calculated LCOE ($/MWh). </br> Wrapper of the _.get_cell_score()_ method of _CellScorer_ object.

static select_top_sites(sites: GeoDataFrame | DataFrame, sites_timeseries: DataFrame, resource_max_capacity: float) Tuple[GeoDataFrame | DataFrame, DataFrame][source]
update_gwa_scaled_params(memory_resource_limitation: bool | None = False)[source]

The RESources_builder class coordinates the complete assessment workflow including spatial grid generation, land availability analysis, weather data processing, economic evaluation, and site clustering.

Key Methods:

  • get_grid_cells(): Generate spatial grid covering region

  • get_cell_capacity(): Calculate land-constrained potential capacity

  • get_CF_timeseries(): Generate capacity factor time series

  • score_cells(): Economic evaluation using LCOE methodology

  • get_clusters(): Spatial clustering of viable sites

  • build(): Execute complete assessment workflow

Spatial Grid Management#

Grid cell generation and spatial discretization.

class RES.cell.GridCells(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]

Bases: AttributesParser

Spatial grid cell generator for renewable energy resource assessment.

This class creates a regular spatial grid covering a specified region for discretized renewable energy potential analysis. It inherits from AttributesParser for configuration management and integrates with ERA5Cutout and GADMBoundaries to maintain consistency with climate data spatial resolution and regional boundaries.

Grid cells serve as the fundamental spatial units for capacity calculations, land availability analysis, and resource aggregation. Each cell represents a homogeneous area with uniform resource characteristics and constraints.

Parameters:
  • config_file_path (str or Path) -- Path to configuration file containing grid settings

  • region_short_code (str) -- Region identifier for boundary definition

  • resource_type (str) -- Resource type ('solar' or 'wind')

ERA5Cutout

ERA5 climate data cutout handler instance

Type:

ERA5Cutout

gadmBoundary

GADM boundary processor instance

Type:

GADMBoundaries

datahandler

HDF5 data storage interface for grid persistence

Type:

DataHandler

crs

Coordinate reference system ('EPSG:4326')

Type:

str

resolution

Grid resolution with 'dx' and 'dy' keys (decimal degrees)

Type:

dict

bounding_box

Spatial extent with 'minx', 'maxx', 'miny', 'maxy'

Type:

dict

actual_boundary

Precise regional boundary geometry

Type:

gpd.GeoDataFrame

coords

Grid coordinate arrays {'x': array, 'y': array}

Type:

dict

shape

Grid dimensions (rows, columns)

Type:

tuple

bounding_box_grid

Complete grid covering bounding box region

Type:

gpd.GeoDataFrame

grid_cells

Final grid cells intersecting with regional boundary (custom grid)

Type:

gpd.GeoDataFrame

cutout

ERA5 cutout object with climate data

Type:

atlite.Cutout

region_boundary

Regional boundary from ERA5 processing

Type:

gpd.GeoDataFrame

resource_grid_cells

Grid cells from default ERA5-based processing

Type:

gpd.GeoDataFrame

generate_coords() None[source]

Create coordinate arrays based on resolution and boundary

__get_grid__() gpd.GeoDataFrame[source]

Generate complete grid with cell geometries (private method)

get_custom_grid() gpd.GeoDataFrame

Create custom grid cells intersecting with regional boundary

get_default_grid() gpd.GeoDataFrame[source]

Create grid using ERA5 cutout methodology with climate data alignment

_check_resolution() None[source]

Validate resolution settings and issue warnings (private method, not currently used)

Examples

Generate grid for British Columbia wind assessment:

>>> from RES.cell import GridCells
>>> grid = GridCells(
...     config_file_path="config/config_BC.yaml",
...     region_short_code="BC",
...     resource_type="wind"
... )
>>> # Using custom grid approach
>>> custom_cells = grid.get_custom_grid()
>>> print(f"Generated {len(custom_cells)} custom grid cells")
>>>
>>> # Using default ERA5-aligned grid
>>> default_cells = grid.get_default_grid()
>>> print(f"Generated {len(default_cells)} ERA5-aligned grid cells")

Custom resolution configuration:

>>> # In configuration file (config_BC.yaml):
>>> # grid_cell_resolution:
>>> #   dx: 0.25  # 0.25 degrees longitude
>>> #   dy: 0.25  # 0.25 degrees latitude
>>> grid._check_resolution()  # Validate resolution settings

Notes

  • Default resolution matches ERA5 climate data (0.25° x 0.25°)

  • Grid cells are represented as square polygons with centroid coordinates

  • Inherits configuration management from AttributesParser

  • Integrates with ERA5Cutout for climate data alignment

  • Uses GADMBoundaries for precise regional boundary definition

  • Uses HDF5 storage for efficient caching of large grid datasets

  • Grid generation respects regional boundaries to avoid unnecessary cells

  • Resolution warnings issued if finer than climate data resolution

  • Coordinate system maintained as WGS84 for global compatibility

  • Supports both custom grid generation and ERA5-aligned grid generation

Grid Generation Approaches#

  1. Custom Grid (get_custom_grid()): - Creates grid based on regional bounding box - Intersects with precise regional boundaries - Stores results in HDF5 with 'cells' key

  2. Default Grid (get_default_grid()): - Uses ERA5 cutout grid as base - Aligns with climate data resolution - Overlays with regional boundaries - Stores both 'cells' and 'boundary' in HDF5

Resolution Considerations#

  • Minimum recommended: 0.25° (matching ERA5 resolution)

  • Harmonized resolutions required for interpolation of climate data

  • Coarser resolutions may miss local variations in resource quality

  • Square cells assumed (dx = dy) for geometric consistency

  • Resolution validation available via _check_resolution() method

Dependencies#

  • geopandas: Spatial data manipulation

  • numpy: Numerical operations for coordinate generation

  • shapely.geometry.box: Grid cell geometry creation

  • RES.AttributesParser: Parent class for configuration management

  • RES.boundaries.GADMBoundaries: Regional boundary processing

  • RES.era5_cutout.ERA5Cutout: Climate data cutout handling

  • RES.hdf5_handler.DataHandler: HDF5 data storage interface

  • RES.utility: Utility functions for cell ID assignment and logging

generate_coords()[source]
get_default_grid()[source]

Handles creation of regular spatial grids for renewable energy assessment with configurable resolution and boundary constraints.

Cell Capacity Processing#

Grid cell processing capabilities for spatial analysis.

class RES.CellCapacityProcessor.CellCapacityProcessor(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]

Bases: AttributesParser

Renewable energy capacity processor for grid cell-based resource assessment.

This class processes renewable energy potential capacity at the grid cell level by integrating climate data, land availability constraints, and techno-economic parameters. It calculates potential capacity matrices for solar and wind resources, applies land-use exclusions, and generates cost-attributed capacity datasets for energy system modeling.

The class serves as the core processing engine for renewable energy resource assessment, combining spatial analysis, climate data processing, and economic modeling to produce grid cell-level capacity estimates suitable for energy planning and optimization models.

INHERITED METHODS FROM AttributesParser:#

  • get_resource_disaggregation_config() -> Dict[str, dict]: Get resource-specific config

  • get_cutout_config() -> Dict[str, dict]: Get ERA5 cutout configuration

  • get_gadm_config() -> Dict[str, dict]: Get GADM boundary configuration

  • get_region_name() -> str: Get region name from config

  • get_atb_config() -> Dict[str, dict]: Get NREL ATB cost configuration

  • get_default_crs() -> str: Get default coordinate reference system

INHERITED ATTRIBUTES FROM AttributesParser:#

  • config (property): Full configuration dictionary

  • store (property): HDF5 store path for data persistence

  • config_file_path: Path to configuration file

  • region_short_code: Region identifier code

  • resource_type: Resource type identifier

  • Plus other configuration access methods

OWN METHODS DEFINED IN THIS CLASS:#

  • load_cost(resource_atb): Extract and process cost parameters from ATB data

  • __get_unified_region_shape__(): Create unified regional boundary geometry (private)

  • __create_cell_geom__(x, y): Create grid cell geometry from coordinates (private)

  • get_capacity(): Main method to process and calculate renewable energy capacity

  • plot_ERAF5_grid_land_availability(): Visualize land availability on ERA5 grid

  • plot_excluder_land_availability(): Visualize land availability at excluder resolution

param config_file_path:

Path to configuration file containing processing settings

type config_file_path:

str or Path

param region_short_code:

Region identifier for boundary and data processing

type region_short_code:

str

param resource_type:

Resource type ('solar', 'wind', or 'bess')

type resource_type:

str

ERA5Cutout

ERA5 climate data cutout processor instance

Type:

ERA5Cutout

LandContainer

Land exclusion and constraint processor instance

Type:

LandContainer

resource_disaggregation_config

Resource-specific disaggregation configuration

Type:

dict

resource_landuse_intensity

Land-use intensity for capacity calculation (MW/km²)

Type:

float

atb

NREL Annual Technology Baseline cost data processor

Type:

NREL_ATBProcessor

datahandler

HDF5 data storage interface

Type:

DataHandler

cutout_config

ERA5 cutout configuration parameters

Type:

dict

gadm_config

GADM boundary configuration parameters

Type:

dict

disaggregation_config

General disaggregation configuration

Type:

dict

region_name

Full region name from configuration

Type:

str

utility_pv_cost

Utility-scale PV cost data from NREL ATB

Type:

pd.DataFrame

land_based_wind_cost

Land-based wind cost data from NREL ATB

Type:

pd.DataFrame

composite_excluder

Combined land exclusion container from atlite

Type:

ExclusionContainer

cell_resolution

Grid cell resolution in degrees

Type:

float

cutout

ERA5 cutout object with climate data

Type:

atlite.Cutout

region_boundary

Regional boundary geometry

Type:

gpd.GeoDataFrame

region_shape

Unified regional shape for availability calculations

Type:

gpd.GeoDataFrame

Availability_matrix

Land availability matrix from atlite

Type:

xr.DataArray

capacity_matrix

Potential capacity matrix with resource and land constraints

Type:

xr.DataArray

provincial_cells

Final processed grid cells with capacity and cost attributes

Type:

gpd.GeoDataFrame

resource_capex

Capital expenditure cost (million $/MW)

Type:

float

resource_fom

Fixed operation and maintenance cost (million $/MW)

Type:

float

resource_vom

Variable operation and maintenance cost (million $/MW)

Type:

float

grid_connection_cost_per_km

Grid connection cost per kilometer (million $)

Type:

float

tx_line_rebuild_cost

Transmission line rebuild cost (million $)

Type:

float

load_cost(resource_atb: pd.DataFrame) tuple[source]

Extract cost parameters from NREL ATB data and convert units

__get_unified_region_shape__() gpd.GeoDataFrame[source]

Create unified regional boundary by dissolving sub-regional boundaries

__create_cell_geom__(x: float, y: float) Polygon[source]

Create square grid cell geometry from center coordinates

get_capacity() tuple[gpd.GeoDataFrame, xr.DataArray][source]

Main processing method to calculate renewable energy capacity with constraints

plot_ERAF5_grid_land_availability(...) matplotlib.figure.Figure[source]

Create visualization of land availability on ERA5 grid resolution

plot_excluder_land_availability(...) matplotlib.figure.Figure[source]

Create visualization of land availability at excluder resolution

Examples

Process solar capacity for British Columbia:

>>> from RES.CellCapacityProcessor import CellCapacityProcessor
>>> processor = CellCapacityProcessor(
...     config_file_path="config/config_BC.yaml",
...     region_short_code="BC",
...     resource_type="solar"
... )
>>> cells_gdf, capacity_matrix = processor.get_capacity()
>>> print(f"Processed {len(cells_gdf)} cells with total capacity: "
...       f"{cells_gdf['potential_capacity_solar'].sum():.1f} MW")

Process wind capacity with visualization:

>>> processor = CellCapacityProcessor(
...     config_file_path="config/config_AB.yaml",
...     region_short_code="AB",
...     resource_type="wind"
... )
>>> cells_gdf, capacity_matrix = processor.get_capacity()
>>> # Visualizations are automatically generated and saved

Extract cost parameters:

>>> capex, vom, fom, grid_cost, tx_cost = processor.load_cost(
...     processor.utility_pv_cost
... )
>>> print(f"Solar CAPEX: {capex:.3f} million $/MW")

Notes

  • Integrates climate data from ERA5 via atlite cutouts

  • Applies land-use constraints via ExclusionContainer

  • Converts NREL ATB costs from $/kW to million $/MW

  • Creates square grid cells based on ERA5 resolution (~30km at 0.25°)

  • Supports solar, wind, and battery energy storage systems (BESS)

  • Automatically generates land availability visualizations

  • Uses HDF5 storage for efficient data persistence

  • Grid cells are trimmed to exact regional boundaries

  • Assigns unique cell IDs for downstream processing

  • Cost parameters include CAPEX, FOM, VOM, and transmission costs

Processing Workflow#

  1. Load ERA5 cutout and regional boundaries

  2. Set up land exclusion constraints

  3. Extract cost parameters from NREL ATB

  4. Calculate availability matrix with land constraints

  5. Apply land-use intensity to compute capacity matrix

  6. Convert to GeoDataFrame with cell geometries

  7. Assign static cost parameters to each cell

  8. Trim cells to precise regional boundaries

  9. Generate visualizations and store results

Cost Parameter Processing#

  • CAPEX: Capital expenditure (converted from $/kW to million $/MW)

  • FOM: Fixed operation & maintenance (million $/MW annually)

  • VOM: Variable operation & maintenance (million $/MWh, if applicable)

  • Grid connection: Cost per kilometer for grid connection

  • Transmission rebuild: Cost for transmission line upgrades

  • Operational life: Asset lifetime (25 years solar, 20 years wind)

Dependencies#

  • geopandas: Spatial data manipulation

  • xarray: Multi-dimensional array operations

  • pandas: Data frame operations

  • shapely.geometry: Geometric operations

  • matplotlib.pyplot: Visualization

  • atlite: Climate data processing and exclusions

  • RES.AttributesParser: Parent class for configuration management

  • RES.lands.LandContainer: Land constraint processing

  • RES.era5_cutout.ERA5Cutout: Climate data cutout handling

  • RES.hdf5_handler.DataHandler: HDF5 data storage

  • RES.atb.NREL_ATBProcessor: Cost data processing

  • RES.utility: Utility functions for cell operations

raises KeyError:

If required configuration parameters are missing

raises ValueError:

If resource type is not supported or data processing fails

raises FileNotFoundError:

If configuration files or data dependencies are not found

get_capacity() tuple[source]

This method processes the capacity of the resources based on the availability matrix and other parameters. It calculates the potential capacity for each cell in the region and returns a named tuple containing the processed data and the capacity matrix. :returns: A named tuple containing the processed data and the capacity matrix. Can be accessed as:

<self.resources_nt>.data and <self.resources_nt>.matrix

Return type:

namedtuple

load_cost(resource_atb: DataFrame)[source]

Extracts cost parameters from the NREL ATB DataFrame and converts them to million $/MW.

Parameters:

resource_atb (pd.DataFrame) -- DataFrame containing NREL ATB cost data for the resource type.

Returns:

A dictionary containing the following cost parameters:
  • resource_capex: Capital expenditure in million $/MW

  • resource_vom: Variable operation and maintenance cost in million $/MW

  • resource_fom: Fixed operation and maintenance cost in million $/MW

  • grid_connection_cost_per_km: Grid connection cost per kilometer in million $

  • tx_line_rebuild_cost: Transmission line rebuild cost in million $

Return type:

dict

plot_ERAF5_grid_land_availability(region_boundary: GeoDataFrame = None, Availability_matrix: DataArray = None, figsize=(8, 6), legend_box_x_y: tuple = (1.2, 1))[source]

Plots the land availability based on the ERA5 grid cells. :param region_boundary: The region boundary to plot. If

not provided, the default region boundary will be used.

Parameters:
  • Availability_matrix (xr.DataArray, optional) -- The availability matrix to plot. If not provided, the default Availability matrix will be used.

  • figsize (tuple, optional) -- The size of the figure to create. Defaults to (8, 6).

  • legend_box_x_y (tuple, optional) -- The position of the legend box in the plot. Defaults to (1.2, 1).

Returns:

The figure object containing the plot.

Return type:

fig (matplotlib.figure.Figure)

plot_excluder_land_availability(excluder: ExclusionContainer = None)[source]

Plots the land availability based on the excluder resolution. :param excluder: The excluder to use for plotting :type excluder: ExclusionContainer, optional

Returns:

The figure object containing the plot

Return type:

fig (matplotlib.figure.Figure)

Note

If the above documentation doesn't render, this class provides grid cell processing capabilities for spatial analysis.

Administrative Boundaries#

GADM boundary processor for regional analysis.

class RES.boundaries.GADMBoundaries(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]

Bases: AttributesParser

GADM (Global Administrative Areas) boundary processor for regional analysis.

This class handles the retrieval, processing, and management of administrative boundaries from the GADM dataset. It provides functionality to extract specific regional boundaries at administrative level 2 (typically states/provinces/districts) for renewable energy resource assessment areas.

INHERITED METHODS FROM AttributesParser:#

  • get_gadm_config() -> Dict[str, dict]: Get GADM configuration from config file

  • get_default_crs() -> str: Get default coordinate reference system ('EPSG:4326')

  • get_country() -> str: Get country name from config file

  • get_region_name() -> str: Get region name from config file using region_short_code

  • get_region_mapping() -> Dict[str, dict]: Get region mapping dictionary

  • is_region_code_valid() -> bool: Validate region short code

  • load_config() -> Dict[str, dict]: Load YAML configuration file

  • get_excluder_crs() -> int: Get recommended CRS for excluder operations

  • get_vis_dir() -> Path: Get visualization directory path

  • region_code_validity (property): Boolean property for region code validation

  • Plus other utility methods for config access

OWN METHODS DEFINED IN THIS CLASS:#

  • get_country_boundary(country=None, force_update=False): Download and process complete country GADM boundaries

  • get_region_boundary(region_name=None, force_update=False): Extract and process specific regional boundary

  • get_bounding_box(): Generate minimum bounding rectangle for region

  • show_regions(basemap='CartoDB positron', save_path='vis/regions', save=False): Create interactive map visualization

  • run(): Execute complete boundary processing workflow

param config_file_path:

Path to configuration file containing GADM settings

type config_file_path:

str or Path

param region_short_code:

Short code identifying the target region within the country

type region_short_code:

str

param resource_type:

Resource type (passed through from parent workflow)

type resource_type:

str

admin_level

GADM administrative level (fixed at 2 for regional districts)

Type:

int

gadm_root

Root directory for GADM data storage

Type:

Path

gadm_processed

Directory for processed regional boundary files

Type:

Path

crs

Coordinate reference system ('EPSG:4326')

Type:

str

country

Country name extracted from configuration

Type:

str

region_file

Path to processed regional boundary file

Type:

Path

boundary_datafields

Mapping of GADM fields to standardized field names

Type:

dict

country_file

Path to country-level GADM boundary file

Type:

Path

boundary_country

GeoDataFrame containing country-level boundaries

Type:

gpd.GeoDataFrame

boundary_region

GeoDataFrame containing region-specific boundaries

Type:

gpd.GeoDataFrame

actual_boundary

GeoDataFrame containing the actual regional boundary geometry

Type:

gpd.GeoDataFrame

bounding_box

Dictionary containing bounding box coordinates (minx, maxx, miny, maxy)

Type:

dict

get_country_boundary(country=None, force_update=False) gpd.GeoDataFrame[source]

Download and process complete country GADM boundaries at administrative level 2

get_region_boundary(region_name=None, force_update=False) gpd.GeoDataFrame[source]

Extract and process specific regional boundary with standardized field names

get_bounding_box() tuple[source]

Generate minimum bounding rectangle for region and return (bounding_box_dict, boundary_gdf)

show_regions(basemap='CartoDB positron', save_path='vis/regions', save=False) folium.Map[source]

Create interactive folium map visualization of regional boundaries

run() gpd.GeoDataFrame or None[source]

Execute complete boundary processing workflow and return regional boundary GeoDataFrame

Examples

Extract British Columbia boundaries:

>>> from RES.boundaries import GADMBoundaries
>>> boundaries = GADMBoundaries(
...     config_file_path="config/config_BC.yaml",
...     region_short_code="BC",
...     resource_type="wind"
... )
>>> bc_boundary = boundaries.get_region_boundary()
>>> country_bounds = boundaries.get_country_boundary("Canada")
>>> bbox, actual_boundary = boundaries.get_bounding_box()
>>> interactive_map = boundaries.show_regions(save=True)
>>> result = boundaries.run()  # Execute full workflow

Notes

  • Uses pygadm package for GADM data access and download

  • Automatically handles data caching to avoid repeated downloads

  • Processes boundaries into GeoJSON format for efficient storage

  • Standardizes field names for consistent downstream processing

  • Administrative level 2 chosen to balance spatial resolution with data availability

  • All geometries maintained in WGS84 (EPSG:4326) for global compatibility

  • Region validation is performed using inherited region_code_validity property

  • Interactive maps are created using folium with optional save functionality

Dependencies#

  • pygadm: GADM data access and processing

  • geopandas: Spatial data manipulation

  • folium: Interactive map visualization (via geopandas.explore())

  • pathlib: Path handling

  • RES.AttributesParser: Parent class for configuration management

  • RES.utility: Utility functions for logging and updates

raises ValueError:

If the country is not found in the GADM dataset or if region code is invalid

raises Exception:

If there is an error fetching or loading the GADM data

get_bounding_box() tuple[source]

This method loads the region boundary using get_region_boundary() method and gets Minimum Bounding Rectangle (MBR).

Returns:

A tuple containing the dictionary of bounding box coordinates, and the actual boundary GeoDataFrame for the specified region.

Return type:

tuple

Purpose:

To be used internally to get the bounding box of the region to set ERA5 cutout boundaries.

get_country_boundary(country: str = None, force_update: bool = False) GeoDataFrame[source]

Retrieves and prepares the GADM boundaries dataset for the specified country (Administrative Level 2).

Parameters:
  • country (str) -- The name of the country to fetch GADM data for. If None, extracts the country from the user config file.

  • force_update (bool) -- If True, re-fetch the GADM data even if a local file exists.

Returns:

GeoDataFrame of the country's GADM regions in crs '4326'

Return type:

gpd.GeoDataFrame

Dependency:

Depends on pygadm package to fetch the GADM data.

Raises:
  • ValueError -- If the country is not found in the GADM dataset.

  • Exception -- If there is an error fetching or loading the GADM data.

get_region_boundary(region_name: str = None, force_update: bool = False) GeoDataFrame[source]

Prepares the boundaries for the specified region within the country. The defaults datafields (e.g. NAME_0, NAME_1, NAME_2) gets modified to match the user config file.

Parameters:

force_update (bool) -- To force update the data and formatting.

Returns:

GeoDataFrame of the region boundaries.

Return type:

gpd.GeoDataFrame

Raises:

ValueError -- If the region code is invalid or no data is found for the specified region

run()[source]

Executes the process of extracting boundaries and creating an interactive map. To be used as a main method to run the class's sequential tasks.

show_regions(basemap: str = 'CartoDB positron', save_path: str = 'vis/regions', save: bool = False)[source]

Create and save an interactive map for the specified region.

Parameters:
  • basemap (str) -- The basemap to use (default is 'CartoDB positron').

  • save_path (str) -- The path to save the HTML map. The default is given.

  • save (bool) -- If the user want's to skip saving as local file.

Returns:

An interactive map object showing the region boundaries.

Return type:

folium.Map

Note

If the above documentation doesn't render properly due to geospatial dependency issues, the GADMBoundaries class provides:

  • get_country_boundary(country, force_update=False): Download complete country boundaries

  • get_regional_boundary(force_update=False): Extract specific regional boundary

  • create_bounding_box(geometry, buffer_degrees=0.1): Generate spatial extent calculations

Downloads and processes administrative boundaries from the Global Administrative Areas (GADM) dataset for spatial analysis scope definition.

Climate Data Processing#

Weather data processing and capacity factor calculations.

class RES.timeseries.Timeseries(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]

Bases: AttributesParser

Climate data processor and capacity factor calculator for renewable energy resources.

This class handles the extraction, processing, and analysis of meteorological time series data to generate capacity factor profiles for solar and wind resources. It integrates with the Atlite library for climate data processing and provides technology-specific capacity factor calculations based on configurable turbine and panel specifications.

The class processes hourly weather data into capacity factors that represent the fraction of nameplate capacity that can be generated under specific meteorological conditions, accounting for technology performance curves and environmental constraints.

Parameters:
  • config_file_path (str or Path) -- Path to configuration file containing resource and technology settings

  • region_short_code (str) -- Region identifier for spatial data coordination

  • resource_type ({'solar', 'wind'}) -- Type of renewable resource for capacity factor calculation

resource_disaggregation_config

Technology-specific configuration parameters from config file

Type:

dict

datahandler

HDF5 interface for time series data storage and retrieval

Type:

DataHandler

gwa_cells

Global Wind Atlas integration for wind resource bias correction

Type:

GWACells

sites_profile

Raw capacity factor time series for all grid cells

Type:

xarray.DataArray

_CF_ts_df_

Processed time series with cells as columns, time as index

Type:

pandas.DataFrame

get_timeseries(cells)[source]

Generate capacity factor time series for specified grid cells

__process_PV_timeseries__(cells)[source]

Calculate solar PV capacity factors using irradiance and temperature

__process_WIND_timeseries__(cells, turbine_database, turbine_id)[source]

Calculate wind capacity factors using wind speeds and power curves

plot_timeseries_comparison(cells_sample, save_path=None)

Generate comparative plots of capacity factor profiles

get_annual_statistics(cells_timeseries)

Calculate annual capacity factor statistics and metrics

Examples

Generate solar PV time series:

>>> from RES.timeseries import Timeseries
>>> ts_processor = Timeseries(
...     config_file_path="config/config_BC.yaml",
...     region_short_code="BC",
...     resource_type="solar"
... )
>>> cells = get_grid_cells()  # From previous workflow step
>>> results = ts_processor.get_timeseries(cells)
>>> cf_timeseries = results.timeseries_df

Wind resource processing with turbine selection:

>>> ts_processor = Timeseries(
...     config_file_path="config/config.yaml",
...     region_short_code="AB",
...     resource_type="wind"
... )
>>> # Turbine parameters defined in configuration
>>> results = ts_processor.get_timeseries(wind_cells)

Time series analysis and visualization:

>>> annual_stats = ts_processor.get_annual_statistics(cf_timeseries)
>>> ts_processor.plot_timeseries_comparison(sample_cells, "output/plots/")

Notes

  • Uses Atlite library for meteorological data processing

  • Solar calculations account for panel orientation, tilt, and temperature effects

  • Wind calculations use power curves from turbine databases (OEDB, manufacturer specs)

  • Time series generated at hourly resolution for full assessment years

  • Global Wind Atlas corrections applied for improved wind speed accuracy

  • Results cached in HDF5 format for efficient reuse and large dataset handling

  • Supports both fixed-tilt and tracking solar PV configurations

  • Wind power curves interpolated for continuous wind speed ranges

Technology Integration#

Solar PV:
  • Irradiance-based capacity factor calculation

  • Temperature derating effects

  • Configurable panel specifications (efficiency, temperature coefficients)

  • Support for fixed tilt and single-axis tracking

Wind:
  • Power curve-based capacity factor calculation

  • Hub height wind speed extrapolation

  • Turbine database integration (OEDB standard)

  • Wake effects and array losses configurable

Data Dependencies#

  • ERA5 reanalysis data for meteorological variables

  • Global Wind Atlas for wind speed bias correction (wind only)

  • Technology databases for turbine and panel specifications

get_cluster_timeseries(all_clusters: DataFrame, cells_timeseries: DataFrame, dissolved_indices: DataFrame, sub_national_unit_tag: str)[source]
get_gwa_geogson_data(gwa_geojson_file_path: str | Path = None)[source]

Loads Global Wind Atlas (GWA) GeoJSON data from the specified file path. If no file path is provided, a default path to 'data/downloaded_data/GWA/canada.geojson' is used. If the file does not exist at the specified or default location, a message is printed to inform the user to download the required GIS map data. :param gwa_geojson_file_path: The file path to the GWA GeoJSON file.

Defaults to None, which uses the predefined default path ('data/downloaded_data/GWA/canada.geojson').

gwa_geojson_data

The loaded GeoJSON data from the specified file.

Type:

list

Raises:

FileNotFoundError -- If the specified or default GeoJSON file does not exist.

Notes

get_timeseries(cells: GeoDataFrame) tuple[source]

Retrieves the capacity factor (CF) timeseries for the cells.

Parameters:
  • cells (gpd.GeoDataFrame) -- Cells with their coordinates, geometry, and unique cell ids.

  • force_update (bool) -- If True, forces the update of the CF timeseries data.

Returns:

A namedtuple containing the cells with their timeseries data.

Return type:

tuple

Jobs:
  • Extract timeseries information for the Cells' e.g. static CF (yearly mean) and timeseries (hourly).

  • The timeseries data is generated using the atlite library's cutout methods for solar and wind resources.

  • The method processes the timeseries data for the specified resource type (solar or wind) and stores it in a pandas DataFrame.

Notes

Plug-in multiple sources to fit timeseries data e.g. [NSRDB, NREL](https://nsrdb.nrel.gov/data-sets/how-to-access-data)

get_windatlas_data(gwa_windspeed_raster_path: str | Path = None)[source]

Retrieves wind atlas data from a specified raster file or a default path.

If a raster file path is not provided, a default path is used. If the file does not exist at the default path, it prepares the necessary data. The method then loads and returns the wind speed data from the raster file.

Parameters:

gwa_windspeed_raster_path (str | Path, optional) -- The file path to the wind speed raster file. Defaults to None.

Returns:

The loaded wind speed data from the raster file.

Return type:

numpy.ndarray

get_windspeed_rescaling_data() tuple[source]

Retrieves wind speed rescaling data, including wind atlas data and geographical wind data in GeoJSON format. :returns:

A tuple containing:
  • wind_atlas (type depends on get_windatlas_data method): The wind atlas data.

  • wind_geojson (type depends on get_gwa_geogson_data method): The gis wind data in GeoJSON format.

Return type:

tuple

Integrates with Atlite library for climate data processing and generates technology-specific capacity factor time series from meteorological data.

Economic Evaluation#

LCOE-based economic scoring and site ranking.

class RES.score.CellScorer(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]

Bases: AttributesParser

Economic evaluation and scoring system for renewable energy grid cells.

This class implements Levelized Cost of Energy (LCOE) calculations to economically rank and score potential renewable energy sites. It integrates capital costs, operational expenses, grid connection costs, and capacity factors to provide comprehensive economic metrics for site comparison and selection.

The scoring methodology follows NREL LCOE documentation and incorporates distance-based grid connection costs, technology-specific capital expenditures, and site-specific capacity factors to generate comparable economic indicators across different locations and technologies.

Parameters:
  • config_file_path (str or Path) -- Configuration file containing economic parameters and assumptions

  • region_short_code (str) -- Region identifier for localized cost parameters

  • resource_type ({'solar', 'wind'}) -- Technology type for appropriate cost parameter selection

Inherits configuration parsing capabilities from AttributesParser
get_CRF(r, N)[source]

Calculate Capital Recovery Factor for annualized cost calculations

calculate_total_cost(distance_to_grid_km, grid_connection_cost_per_km,

tx_line_rebuild_cost, capex_tech, potential_capacity_mw)

Compute total project costs including CAPEX and grid connection

calculate_score(row, CF_column, CRF)[source]

Generate LCOE score for individual grid cells

get_cell_score(cells, CF_column, interest_rate=0.03)[source]

Apply economic scoring to entire dataset of grid cells

calc_LCOE_lambda_m1(row)

Alternative LCOE calculation method following NREL methodology

calc_LCOE_lambda_m2(row)

Enhanced LCOE calculation with detailed cost components

Examples

Basic economic scoring workflow:

>>> from RES.score import CellScorer
>>> scorer = CellScorer(
...     config_file_path="config/config_CAN.yaml",
...     region_short_code="BC",
...     resource_type="wind"
... )
>>> scored_cells = scorer.get_cell_score(cells_with_capacity_factors, 'CF_mean')
>>> # Get top 10% of cells by LCOE
>>> top_sites = scored_cells.head(int(len(scored_cells) * 0.1))

Custom economic analysis:

>>> # Calculate CRF for different financial scenarios
>>> crf_conservative = scorer.get_CRF(r=0.08, N=25)  # 8% discount, 25 year life
>>> crf_aggressive = scorer.get_CRF(r=0.06, N=30)    # 6% discount, 30 year life
>>>
>>> # Apply scoring with custom parameters
>>> for idx, row in cells.iterrows():
...     lcoe = scorer.calculate_score(row, 'CF_mean', crf_conservative)

Notes

LCOE Calculation Methodology: - Follows NREL Simple LCOE calculation framework - LCOE = (CAPEX × CRF + OPEX) / Annual Energy Production - Includes distance-based grid connection costs - Uses technology-specific cost parameters from configuration

Cost Components: - Technology CAPEX ($/MW installed capacity) - Grid connection costs ($/km distance to transmission) - Transmission line rebuild costs ($/km) - Annual O&M expenses (% of CAPEX) - Financial parameters (discount rate, project lifetime)

Economic Parameters: - Capital Recovery Factor (CRF) for cost annualization - Technology-specific cost assumptions - Regional cost multipliers and adjustments - Grid connection distance penalties

Limitations: - Simplified LCOE model without detailed financial modeling - Grid connection costs based on straight-line distances - Does not account for economies of scale in large projects - Static cost assumptions without temporal price variations

calculate_score(row: Series, node_distance_col: str, CF_column: str, CRF: float) float[source]

Calculate the Levelized Cost of Energy (LCOE) score for an individual grid cell.

LCOE Formula: LCOE = (CAPEX × CRF + FOM + VOM × Annual_Energy) / Annual_Energy

Parameters:
  • row (pd.Series) -- DataFrame row containing cell-specific data

  • node_distance_col (str) -- Column name for distance to grid connection

  • CF_column (str) -- Column name containing capacity factor data

  • CRF (float) -- Capital Recovery Factor for cost annualization

Returns:

LCOE in $/MWh, or 999999 if annual energy production is zero

Return type:

float

calculate_score_debug(row: Series, node_distance_col: str, CF_column: str, CRF: float) dict[source]

Debug version that returns breakdown of LCOE components. Use this to identify why larger sites get lower scores.

calculate_score_normalized(row: Series, node_distance_col: str, CF_column: str, CRF: float, reference_capacity: float = 1.0) float[source]

Calculate LCOE using a normalized capacity for fair comparison in clustering.

Parameters:

reference_capacity (float) -- Fixed capacity to use for cost calculations (MW)

calculate_score_per_mw(row: Series, node_distance_col: str, CF_column: str, CRF: float) float[source]

Calculate LCOE per MW for capacity-independent comparison.

calculate_total_cost(distance_to_grid_km: float, grid_connection_cost_per_km: float, tx_line_rebuild_cost: float, capex_tech: float, potential_capacity_mw: float) float[source]

Calculate total project cost with economies of scale for grid connection.

calculate_total_cost_shared_infrastructure(distance_to_grid_km: float, grid_connection_cost_per_km: float, tx_line_rebuild_cost: float, capex_tech: float, potential_capacity_mw: float, nearby_projects_mw: float = 0) float[source]

Calculate total cost considering potential for shared transmission infrastructure. This is most relevant for clustering applications.

calculate_total_cost_smooth_scaling(distance_to_grid_km: float, grid_connection_cost_per_km: float, tx_line_rebuild_cost: float, capex_tech: float, potential_capacity_MW: float, reference_capacity_MW: float = 100, scaling_exponent: float = 0.8) float[source]

Calculate total cost with smooth economies of scale for grid connection.

Parameters:
  • distance_to_grid_km (float) -- Distance to nearest grid connection point (km)

  • grid_connection_cost_per_km (float) -- Cost per km for grid connection (M$/km)

  • tx_line_rebuild_cost (float) -- Transmission line rebuild cost (M$/km)

  • capex_tech (float) -- Technology-specific capital expenditure (M$/MW)

  • potential_capacity_MW (float) -- Potential installed capacity (MW)

  • reference_capacity_MW (float, optional) -- Reference capacity for scaling. Defaults to 100 MW.

  • scaling_exponent (float, optional) -- Exponent for scaling economies e.g. <1 means economies of scale. Defaults to 0.8.

Returns:

Total project cost in millions of dollars (M$)

Return type:

float

calculate_total_cost_transmission_sizing(distance_to_grid_km: float, grid_connection_cost_per_km: float, tx_line_rebuild_cost: float, capex_tech: float, potential_capacity_mw: float) float[source]

Calculate total cost with transmission line sizing based on capacity. More realistic approach considering actual transmission requirements.

get_CRF(r: float, N: int) float[source]

Calculate Capital Recovery Factor (CRF) for annualized cost calculations.

The CRF converts a present-value capital cost into a stream of equal annual payments over the project lifetime. This is essential for LCOE calculations as it allows comparison of projects with different capital costs and lifetimes on an annualized basis.

Formula: CRF = [r × (1 + r)^N] / [(1 + r)^N - 1]

Parameters:
  • r (float) -- Discount rate (as decimal, e.g., 0.08 for 8%)

  • N (int) -- Project lifetime in years

Returns:

Capital Recovery Factor

Return type:

float

Example

>>> scorer = CellScorer(**config)
>>> crf = scorer.get_CRF(r=0.07, N=25)  # 7% discount, 25 years
>>> print(f"CRF: {crf:.4f}")
CRF: 0.0858
get_cell_score(cells: DataFrame, CF_column: str, interest_rate=0.03) DataFrame[source]

Calculate LCOE scores for all grid cells in a DataFrame and return ranked results.

This method applies economic scoring to an entire dataset of potential renewable energy sites, calculating LCOE for each cell and sorting results by economic attractiveness. It serves as the primary interface for batch economic analysis of renewable energy development opportunities.

Processing Steps: 1. Calculate Capital Recovery Factor from financial parameters 2. Apply LCOE calculation to each grid cell 3. Sort results by LCOE (ascending = most economically attractive first) 4. Return scored and ranked DataFrame

Parameters:
  • cells (pd.DataFrame) -- DataFrame containing grid cells with required columns: - nearest_station_distance_km: Distance to transmission (km) - grid_connection_cost_per_km_{resource_type}: Connection cost (M$/km) - tx_line_rebuild_cost_{resource_type}: Rebuild cost (M$/km) - capex_{resource_type}: Technology CAPEX (M$/MW) - potential_capacity_{resource_type}: Installable capacity (MW) - Operational_life_{resource_type}: Project lifetime (years)

  • CF_column (str) -- Column name containing capacity factor data (e.g., 'CF_mean', 'wind_CF_mean', 'solar_CF_mean')

  • interest_rate (float, optional) -- Discount rate for CRF calculation. Defaults to 0.03 (3%)

Returns:

Input DataFrame with added LCOE column, sorted by economic

attractiveness (lowest LCOE first). Column name format: 'lcoe_{resource_type}' with values in $/MWh

Return type:

pd.DataFrame

Raises:
  • KeyError -- If required columns are missing from input DataFrame

  • ValueError -- If capacity factors or operational life contain invalid values

Examples

>>> # Score wind energy sites using mean capacity factors
>>> wind_cells = scorer.get_cell_score(
...     cells=grid_data,
...     CF_column='wind_CF_mean',
...     interest_rate=0.07
... )
>>> print(f"Best site LCOE: ${wind_cells.iloc[0]['lcoe_wind']:.2f}/MWh")
>>> # Score solar sites with conservative financial assumptions
>>> solar_cells = scorer.get_cell_score(
...     cells=grid_data,
...     CF_column='solar_CF_mean',
...     interest_rate=0.08
... )

Notes

  • Cells with zero annual energy production receive infinite LCOE values

  • Results are sorted ascending (lowest LCOE = most attractive)

  • Method handles edge cases like zero capacity factors gracefully

  • LCOE values are in $/MWh for standard industry comparison

Implements Levelized Cost of Energy calculations following NREL methodology, incorporating capital costs, grid connection expenses, and capacity factors.

Specialized Modules#

Annual Technology Baseline (ATB)#

Processor for NREL's Annual Technology Baseline data.

class RES.atb.NREL_ATBProcessor(config_file_path: ~pathlib.Path = <factory>, region_short_code: str = 'None', resource_type: str = 'None')[source]

Bases: object

NREL_ATBProcessor is a class from RESource module, designed to process the Annual Technology Baseline (ATB) data sourced from the National Renewable Energy Laboratory (NREL). This class provides methods to pull, process, and store cost data for various renewable energy technologies, including utility-scale photovoltaic (PV) systems, land-based wind turbines, and battery energy storage systems (BESS).

config_file_path

Path to the configuration file.

Type:

Path

region_short_code

Short code for the region.

Type:

str

resource_type

Type of resource being processed.

Type:

str

atb_config

Configuration dictionary containing paths and settings for ATB data.

Type:

dict

atb_data_save_to

Path to the directory where ATB data will be saved.

Type:

Path

atb_parquet_source

Source URL or path for the ATB Parquet file.

Type:

str

atb_datafile

Name of the ATB data file.

Type:

str

atb_file_path

Full path to the ATB data file.

Type:

Path

datahandler

Instance of DataHandler for storing processed data.

Type:

DataHandler

__post_init__()[source]

Initializes the processor by loading configurations, setting up paths, and creating necessary directories.

pull_data()[source]

Pulls and processes the ATB data, extracting cost data for utility-scale PV, land-based wind, and BESS. Returns the processed data as a tuple.

_check_and_download_data()

Checks for the existence of the ATB data file locally and downloads it if necessary.

_process_solar_cost(atb_cost)

Filters and processes solar cost data from the ATB dataset based on specific criteria. Saves the processed data to a CSV file and stores it in the data handler.

_process_wind_cost(atb_cost)

Filters and processes land-based wind cost data from the ATB dataset based on specific criteria. Saves the processed data to a CSV file and stores it in the data handler.

_process_bess_cost(atb_cost)

Filters and processes battery energy storage system (BESS) cost data from the ATB dataset based on specific criteria. Saves the processed data to a CSV file and stores it in the data handler.

config_file_path: Path
pull_data()[source]

Pulls and processes the Annual Technology Baseline (ATB) data sourced from NREL.

Jobs:
  • Loads the config file, checks and downloads the required data file (from defined source url in config) if not already available.

  • Reads the ATB cost data from a Parquet file.

  • Processes the ATB cost data to extract -
    • Utility-scale photovoltaic (PV) cost.

    • Land-based wind cost.

    • Battery energy storage system (BESS) cost.

Returns:

A tuple containing the processed cost data for:
  • Utility-scale PV (self.utility_pv_cost)

  • Land-based wind (self.land_based_wind_cost)

  • BESS (self.bess_cost)

Return type:

tuple

region_short_code: str = 'None'
resource_type: str = 'None'

ℹ️ Version Notice: Currently configured for 2024 ATB data. Review and update configuration when using different years or datasets.

Global Land Cover#

Handler for GAEZ raster data processing.

class RES.gaez.GAEZRasterProcessor(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]

Bases: AttributesParser

GAEZ (Global Agro-Ecological Zones) raster data processor for renewable energy land constraint analysis.

This class handles the download, extraction, clipping, and visualization of GAEZ raster datasets used in renewable energy resource assessment. GAEZ provides global spatial data on agricultural suitability, land resources, and ecological constraints that are essential for identifying suitable areas for renewable energy development while avoiding productive agricultural land.

The processor integrates GAEZ land constraint data with regional boundaries to support renewable energy siting decisions and capacity assessments. It automatically downloads the required raster datasets, extracts specific layers based on configuration, clips them to regional boundaries, and generates visualization outputs for analysis.

INHERITED METHODS FROM AttributesParser:#

  • get_gaez_data_config() -> Dict[str, dict]: Get GAEZ dataset configuration parameters

  • get_region_name() -> str: Get full region name for display purposes

  • Plus other configuration access methods

INHERITED ATTRIBUTES FROM AttributesParser:#

  • config_file_path: Path to configuration file

  • region_short_code: Region identifier code

  • resource_type: Resource type identifier

  • Plus other configuration attributes

OWN METHODS DEFINED IN THIS CLASS:#

  • process_all_rasters(): Main pipeline for processing all configured raster types

  • plot_gaez_tif(): Generate visualization plots for processed raster data

  • __download_resources_zip_file__(): Download GAEZ ZIP archive from remote source

  • __extract_rasters__(): Extract required raster files from ZIP archive

  • __clip_to_boundary_n_plot__(): Clip rasters to regional boundaries and generate plots

param config_file_path:

Path to configuration file containing GAEZ dataset parameters

type config_file_path:

str or Path

param region_short_code:

Region identifier for boundary definition and file naming

type region_short_code:

str

param resource_type:

Resource type ('solar', 'wind', 'bess') - used for dependency injection

type resource_type:

str

gadmBoundary

GADM boundary processor for regional extent definition

Type:

GADMBoundaries

gaez_config

GAEZ dataset configuration parameters from config file

Type:

dict

gaez_root

Root directory for GAEZ data storage and processing

Type:

Path

zip_file

Path to the GAEZ ZIP archive file

Type:

Path

Rasters_in_use_direct

Directory for extracted and processed raster files

Type:

Path

raster_types

List of raster type configurations to process

Type:

list

region_boundary

Regional boundary geometry for clipping operations

Type:

gpd.GeoDataFrame

process_all_rasters(show=False) dict[source]

Main pipeline to download, extract, clip, and plot all configured rasters

plot_gaez_tif(tif_path, color_map, plot_title, save_to, show=False) matplotlib.Figure[source]

Generate and save visualization plots for raster data

Examples

Process GAEZ rasters for British Columbia:

>>> from RES.gaez import GAEZRasterProcessor
>>> gaez_processor = GAEZRasterProcessor(
...     config_file_path="config/config_BC.yaml",
...     region_short_code="BC",
...     resource_type="solar"
... )
>>> raster_paths = gaez_processor.process_all_rasters(show=True)
>>> print(f"Processed {len(raster_paths)} raster types")

Access specific raster data:

>>> # Raster paths are returned as dictionary
>>> if 'slope' in raster_paths:
...     slope_path = raster_paths['slope']
...     print(f"Slope raster available at: {slope_path}")

Configuration Requirements#

The GAEZ configuration must include:

```yaml gaez_data:

root: "data/downloaded_data/GAEZ" # Storage directory source: "https://s3.eu-west-1.amazonaws.com/data.gaezdev.aws.fao.org/LR.zip" zip_file: "LR.zip" # ZIP archive filename Rasters_in_use_direct: "Rasters_in_use" # Extraction directory raster_types:

  • name: "slope" raster: "slope.tif" zip_extract_direct: "slope" color_map: "terrain"

# Additional raster type configurations...

```

Data Processing Workflow#

  1. Configuration Loading: Extract GAEZ parameters from config file

  2. Download Check: Verify ZIP archive exists or download from source

  3. Extraction: Extract required raster files from ZIP archive

  4. Boundary Processing: Get regional boundaries from GADM processor

  5. Clipping: Clip each raster to regional boundaries

  6. Visualization: Generate plots for each processed raster

  7. Path Management: Return dictionary of processed raster file paths

Raster Type Configuration#

Each raster type requires: - name: Identifier for the raster layer - raster: Filename of the raster file within ZIP archive - zip_extract_direct: Directory path within ZIP archive - color_map: Matplotlib colormap for visualization

Supported Raster Types#

Common GAEZ rasters include: - Slope: Terrain slope for accessibility analysis - Soil Quality: Agricultural productivity constraints - Land Cover: Vegetation and land use classifications - Elevation: Digital elevation model data - Climate Zones: Agro-ecological zone classifications

Spatial Processing#

  • Input CRS: Inherits coordinate system from source rasters

  • Clipping: Uses regional boundaries with geometry buffering

  • Output Format: GeoTIFF files with preserved metadata

  • Resolution: Maintains original raster resolution

  • Compression: Optimized file storage for large datasets

Visualization Features#

  • Automatic Plotting: Generates plots for all processed rasters

  • Custom Colormaps: Configurable visualization schemes

  • Coordinate Display: Latitude/longitude axis labels

  • Legend Integration: Horizontal colorbar with value indicators

  • File Output: PNG format with high-resolution settings

Performance Considerations#

  • ZIP download time scales with file size (typically 100MB-1GB)

  • Extraction time depends on number of raster layers

  • Clipping operations are memory-intensive for large regions

  • Multiple raster processing benefits from parallel execution

  • Network connectivity affects initial download performance

Integration Points#

  • Boundaries: Uses GADMBoundaries for regional extent definition

  • Land Constraints: Provides input data for land availability analysis

  • Capacity Calculation: Supports renewable energy siting decisions

  • Visualization: Integrates with broader visualization workflows

Error Handling#

  • Download Failures: Graceful handling of network issues

  • Missing Files: Clear error messages for missing raster files

  • Extraction Errors: Validation of ZIP archive contents

  • Processing Failures: Detailed logging for debugging

Output Management#

  • Organized Storage: Systematic directory structure for processed data

  • File Naming: Consistent naming convention with region identifiers

  • Metadata Preservation: Maintains spatial reference and statistics

  • Visualization Archive: Organized plot storage for documentation

Notes

  • GAEZ data is provided by FAO (Food and Agriculture Organization)

  • Raster datasets are typically global coverage at moderate resolution

  • Processing large regions may require substantial disk space

  • Results integrate with renewable energy assessment workflows

  • Visualization outputs support decision-making and reporting

  • ZIP archives are cached locally to avoid repeated downloads

Dependencies#

  • requests: HTTP downloading of ZIP archives

  • rasterio: Raster data reading, processing, and writing

  • zipfile: ZIP archive extraction and management

  • pathlib: File path operations and directory management

  • matplotlib: Visualization and plot generation

  • RES.AttributesParser: Parent class for configuration management

  • RES.boundaries.GADMBoundaries: Regional boundary processing

  • RES.utility: Logging and status update functions

raises ConnectionError:

If GAEZ data download fails or source is unavailable

raises FileNotFoundError:

If required raster files are missing from ZIP archive

raises ValueError:

If configuration parameters are invalid or incomplete

raises RuntimeError:

If raster processing or clipping operations fail

See also

rasterio.mask.mask

Raster clipping functionality

RES.boundaries.GADMBoundaries

Regional boundary processing

RES.lands

Land constraint integration for renewable energy

plot_gaez_tif(tif_path, color_map, plot_title, save_to, show=False)[source]

Generate and save visualization plot for processed GAEZ raster data.

Creates a publication-quality matplotlib visualization of the clipped GAEZ raster data with proper coordinate system display, color mapping, and legend information. The plot includes geographic extent display with latitude/longitude axes and a horizontal colorbar for value interpretation.

This method supports both interactive display and file output, making it suitable for both exploratory analysis and report generation workflows. The visualization uses proper geographic coordinates and customizable color schemes to effectively communicate spatial patterns in the data.

Parameters:
  • tif_path (str or pathlib.Path) -- Path to the GeoTIFF raster file to visualize. Must be a valid raster file with spatial reference information.

  • color_map (str) -- Name of matplotlib colormap to use for visualization. Examples: 'terrain', 'viridis', 'plasma', 'coolwarm'.

  • plot_title (str) -- Title text to display at the top of the plot. Should describe the raster content and region.

  • save_to (str or pathlib.Path) -- Output path for saving the plot image file. Parent directories will be created if they don't exist.

  • show (bool, default=False) -- Whether to display the plot interactively on screen. If True, plot is shown in addition to being saved. If False, plot is only saved to file without display.

Returns:

The matplotlib Figure object containing the plot. Can be used for further customization or processing.

Return type:

matplotlib.figure.Figure

Examples

Create a basic plot:

>>> fig = processor.plot_gaez_tif(
...     tif_path="data/BC_slope.tif",
...     color_map="terrain",
...     plot_title="Slope Analysis for British Columbia",
...     save_to="plots/BC_slope.png"
... )

Create an interactive plot:

>>> fig = processor.plot_gaez_tif(
...     tif_path="data/BC_elevation.tif",
...     color_map="viridis",
...     plot_title="Elevation Map",
...     save_to="plots/elevation.png",
...     show=True
... )
Raises:
  • rasterio.errors.RasterioIOError -- If the input TIF file cannot be read or is corrupted

  • FileNotFoundError -- If the input TIF file does not exist

  • ValueError -- If the colormap name is not recognized by matplotlib

  • IOError -- If the output plot file cannot be written

Notes

  • Plot dimensions are fixed at 10x8 inches for consistency

  • Colorbar is positioned horizontally below the plot

  • Geographic extent is automatically derived from raster bounds

  • Output directories are created automatically if needed

  • Plot is always saved regardless of the show parameter

  • Figure is closed after processing to prevent memory leaks

  • NoData/masked values are handled transparently in visualization

process_all_rasters(show: bool = False)[source]

Main pipeline to download, extract, clip, and plot rasters based on configuration.

Executes the complete GAEZ raster processing workflow including: 1. Downloading ZIP archive if not present locally 2. Extracting required raster files from archive 3. Loading regional boundaries for clipping operations 4. Processing each configured raster type by clipping to boundaries 5. Generating visualization plots for all processed rasters

This method orchestrates all processing steps and returns paths to the processed raster files for downstream analysis.

Parameters:

show (bool, default=False) -- Whether to display generated plots interactively during processing. If True, matplotlib plots will be shown on screen. If False, plots are saved to disk without display.

Returns:

Dictionary mapping raster type names to processed file paths. Keys are raster type names from configuration. Values are Path objects pointing to clipped raster files.

Return type:

dict

Examples

Process all rasters with visualization:

>>> raster_paths = processor.process_all_rasters(show=True)
>>> print(f"Processed rasters: {list(raster_paths.keys())}")

Process rasters for programmatic use:

>>> raster_paths = processor.process_all_rasters(show=False)
>>> slope_raster = raster_paths.get('slope')
>>> if slope_raster:
...     print(f"Slope data at: {slope_raster}")
Raises:
  • ConnectionError -- If ZIP archive download fails due to network issues

  • FileNotFoundError -- If required raster files are missing from archive

  • RuntimeError -- If raster clipping or processing operations fail

Notes

  • Processing time scales with number of raster types and region size

  • Large regions may require substantial disk space for processed rasters

  • Network connection required for initial ZIP archive download

  • Existing processed rasters are not regenerated unless missing

  • All plots are saved regardless of the show parameter setting

Global Wind Atlas#

Handler for Global Wind Atlas data processing.

class RES.gwa.GWACells(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]

Bases: AttributesParser

Global Wind Atlas (GWA) data processor for high-resolution wind resource analysis.

This class integrates Global Wind Atlas data with regional boundaries to provide high-resolution wind resource assessment capabilities for renewable energy planning. GWA provides detailed wind speed, wind power density, and wind class information at much higher spatial resolution than ERA5 data, making it valuable for detailed site assessment and resource characterization.

The class handles downloading, processing, and spatial mapping of GWA raster data to ERA5 grid cells, enabling multi-scale wind resource analysis. It processes multiple GWA data layers including wind speed, wind power density, and IEC wind class classifications to provide comprehensive wind resource information.

INHERITED METHODS FROM GADMBoundaries:#

  • get_bounding_box() -> tuple: Get regional bounding box for spatial clipping

  • get_region_boundary() -> gpd.GeoDataFrame: Get regional boundary geometry

  • get_country_boundary() -> gpd.GeoDataFrame: Get country-level boundary geometry

  • Plus other boundary processing methods

INHERITED METHODS FROM AttributesParser:#

  • get_gwa_config() -> dict: Get GWA data configuration parameters

  • get_default_crs() -> str: Get default coordinate reference system

  • get_region_mapping() -> Dict[str, dict]: Get region mapping configuration

  • Plus other configuration access methods

INHERITED ATTRIBUTES FROM AttributesParser:#

  • region_short_code: Region identifier code

  • region_mapping: Dictionary mapping region codes to configuration

  • store: HDF5 data store path for processed results

  • Plus other configuration attributes

OWN METHODS DEFINED IN THIS CLASS:#

  • prepare_GWA_data(): Download and process GWA raster data for the region

  • download_file(): Download individual files from remote sources

  • load_gwa_cells(): Create GeoDataFrame of GWA cells with spatial geometry

  • map_GWA_cells_to_ERA5(): Map high-resolution GWA data to ERA5 grid cells

param config_file_path:

Path to configuration file containing GWA data parameters

type config_file_path:

str or Path

param region_short_code:

Region identifier for boundary definition and data filtering

type region_short_code:

str

param resource_type:

Resource type ('wind') for GWA wind resource analysis

type resource_type:

str

merged_data

Merged xarray DataArray containing all GWA data layers

Type:

xr.DataArray

gwa_config

GWA configuration parameters from config file

Type:

dict

datahandler

HDF5 data handler for storing processed results

Type:

DataHandler

gwa_datafields

Field definitions for GWA data layers

Type:

dict

gwa_rasters

Raster file specifications for GWA data

Type:

dict

gwa_sources

Source URLs for downloading GWA data

Type:

dict

gwa_root

Root directory for GWA data storage

Type:

Path

bounding_box

Regional bounding box coordinates for spatial clipping

Type:

dict

region_gwa_cells_df

Processed GWA cells data as pandas DataFrame

Type:

pd.DataFrame

gwa_cells_gdf

GWA cells with spatial geometry for analysis

Type:

gpd.GeoDataFrame

mapped_gwa_cells_aggr_df

GWA data aggregated to ERA5 grid cell resolution

Type:

pd.DataFrame

prepare_GWA_data(windspeed_min=10, windspeed_max=20, memory_resource_limitation=False) pd.DataFrame[source]

Download, process, and merge GWA raster data for the region

download_file(url, destination) None[source]

Download a file from URL to specified destination path

load_gwa_cells(memory_resource_limitation=False) gpd.GeoDataFrame[source]

Load GWA cells as GeoDataFrame with spatial geometry

map_GWA_cells_to_ERA5(memory_resource_limitation=False) None[source]

Map high-resolution GWA data to ERA5 grid cells for integration

Examples

Create GWA processor for British Columbia:

>>> from RES.gwa import GWACells
>>> gwa_processor = GWACells(
...     config_file_path="config/config_BC.yaml",
...     region_short_code="BC",
...     resource_type="wind"
... )
>>>
>>> # Load high-resolution GWA cells
>>> gwa_cells = gwa_processor.load_gwa_cells()
>>> print(f"Loaded {len(gwa_cells)} GWA cells")

Process GWA data with wind speed filtering:

>>> # Prepare data with wind speed constraints
>>> gwa_data = gwa_processor.prepare_GWA_data(
...     windspeed_min=12,
...     windspeed_max=25,
...     memory_resource_limitation=True
... )
>>> print(f"Filtered to {len(gwa_data)} high-quality wind cells")

Map GWA data to ERA5 grid:

>>> # Map high-resolution GWA to ERA5 cells
>>> gwa_processor.map_GWA_cells_to_ERA5(memory_resource_limitation=False)
>>> print("GWA data mapped to ERA5 grid cells")

Configuration Requirements#

The GWA configuration must include:

```yaml gwa_data:

root: "data/downloaded_data/GWA" # Storage directory datafields:

windspeed_gwa: "Wind speed at 100m" windpower_gwa: "Wind power density at 100m" IEC_Class_ExLoads: "IEC wind class"

rasters:

windspeed_gwa: "GWA_country_code_windspeed.tif" windpower_gwa: "GWA_country_code_windpower.tif" IEC_Class_ExLoads: "GWA_country_code_iec.tif"

sources:

windspeed_gwa: "https://globalwindatlas.info/download/GWA_country_code_windspeed.tif" windpower_gwa: "https://globalwindatlas.info/download/GWA_country_code_windpower.tif" IEC_Class_ExLoads: "https://globalwindatlas.info/download/GWA_country_code_iec.tif"

region_mapping:
BC:

GWA_country_code: "CAN" # Country code for GWA data

```

Data Processing Workflow#

  1. Configuration Loading: Extract GWA parameters and region mapping

  2. Data Download: Check for local data or download from GWA sources

  3. Raster Processing: Load and clip raster data to regional boundaries

  4. Data Merging: Combine multiple GWA layers into unified dataset

  5. Quality Filtering: Apply wind speed and other quality constraints

  6. Spatial Conversion: Convert raster data to point-based GeoDataFrame

  7. Grid Mapping: Aggregate high-resolution GWA data to ERA5 grid cells

  8. Data Storage: Store processed results in HDF5 format for reuse

GWA Data Layers#

Typical GWA datasets include: - Wind Speed: Mean wind speed at 100m height (m/s) - Wind Power Density: Wind power density at 100m height (W/m²) - IEC Wind Class: International Electrotechnical Commission wind classes - Capacity Factor: Estimated capacity factors for different turbine types - Wind Direction: Prevailing wind direction statistics

Spatial Resolution#

  • GWA Resolution: Typically 250m to 1km spatial resolution

  • ERA5 Resolution: Approximately 25km spatial resolution

  • Aggregation Method: Mean values for continuous variables

  • Coordinate System: WGS84 (EPSG:4326) for global compatibility

  • Clipping Boundaries: Regional boundaries from GADM database

Quality Control#

  • Wind Speed Filtering: Configurable minimum/maximum wind speed thresholds

  • Data Validation: Automatic detection and handling of NoData values

  • Spatial Validation: Clipping to valid regional boundaries

  • Memory Management: Optional memory limitation for large datasets

  • Error Handling: Graceful handling of download and processing errors

Performance Considerations#

  • Download time depends on data availability and network speed

  • Processing time scales with region size and data resolution

  • Memory usage can be substantial for large regions

  • Spatial overlay operations are computationally intensive

  • HDF5 storage provides efficient data access for repeated analysis

Integration Points#

  • ERA5 Data: Integration with ERA5 climate data for multi-scale analysis

  • Boundary Data: Uses GADM boundaries for regional definition

  • Capacity Analysis: Provides high-resolution input for capacity factor calculations

  • Resource Assessment: Supports detailed wind resource characterization

  • Grid Analysis: Compatible with grid cell generation workflows

Output Formats#

  • DataFrame: Tabular data with wind resource attributes

  • GeoDataFrame: Spatial data with point geometries

  • HDF5 Storage: Efficient storage for large datasets

  • Grid Mapping: ERA5-compatible aggregated datasets

Notes

  • GWA data is provided by Technical University of Denmark (DTU)

  • Global coverage with country-specific datasets

  • Higher resolution than ERA5 for detailed site assessment

  • Processing requires substantial computational resources for large regions

  • Results integrate seamlessly with ERA5-based renewable energy workflows

  • Data quality varies by region and local terrain complexity

  • Regular updates are available from the Global Wind Atlas portal

Dependencies#

  • geopandas: Spatial data processing and geometry operations

  • pandas: Tabular data manipulation and analysis

  • rioxarray: Raster data reading and spatial operations

  • xarray: N-dimensional array operations and data merging

  • requests: HTTP downloading of GWA datasets

  • pathlib: File path operations and directory management

  • RES.hdf5_handler.DataHandler: HDF5 data storage and retrieval

  • RES.boundaries.GADMBoundaries: Parent class for boundary processing

  • RES.utility: Logging and utility functions

raises ConnectionError:

If GWA data download fails due to network issues

raises FileNotFoundError:

If required configuration files or directories are missing

raises ValueError:

If wind speed thresholds or other parameters are invalid

raises RuntimeError:

If raster processing or spatial operations fail

See also

rioxarray.open_rasterio

Raster data reading functionality

geopandas.GeoDataFrame.overlay

Spatial overlay operations

RES.boundaries.GADMBoundaries

Parent class for boundary processing

RES.hdf5_handler.DataHandler

HDF5 data storage utilities

download_file(url: str, destination: Path) None[source]

Download a file from a remote URL to a specified local destination.

Downloads GWA raster files from remote sources when they are not available locally. The method handles HTTP requests with proper error checking and provides detailed logging of download operations.

Files are downloaded completely before being written to avoid partial downloads. The destination directory is created automatically if it doesn't exist, ensuring reliable file operations.

Parameters:
  • url (str) -- Complete URL of the file to download. Should be a valid HTTP/HTTPS URL pointing to a GWA raster file.

  • destination (Path) -- Local file path where the downloaded file should be saved. Parent directories will be created automatically if needed.

Returns:

The method doesn't return a value but saves the file to disk.

Return type:

None

Examples

Download a GWA wind speed raster:

>>> url = "https://globalwindatlas.info/download/CAN_windspeed.tif"
>>> destination = Path("data/GWA/CAN_windspeed.tif")
>>> processor.download_file(url, destination)

Download with automatic path handling:

>>> url = "https://globalwindatlas.info/download/CAN_windpower.tif"
>>> destination = processor.gwa_root / "CAN_windpower.tif"
>>> processor.download_file(url, destination)
Raises:
  • requests.RequestException -- If the HTTP request fails due to network issues or server errors

  • requests.HTTPError -- If the server returns an HTTP error status code

  • IOError -- If the local file cannot be written due to permissions or disk space

  • FileNotFoundError -- If the destination directory cannot be created

Notes

  • Download progress is logged through utility print functions

  • Network timeouts may occur for large files on slow connections

  • Existing files are overwritten without warning

  • File integrity is not verified after download

  • Destination path is automatically converted to Path object if needed

load_gwa_cells(memory_resource_limitation: bool | None = False)[source]

Load GWA cells as a spatial GeoDataFrame with point geometries.

Converts processed GWA tabular data into a spatial GeoDataFrame by creating point geometries from coordinate information. The resulting GeoDataFrame contains all wind resource attributes along with spatial geometry suitable for spatial analysis and visualization.

The method automatically clips the data to regional boundaries to ensure results are geographically constrained to the area of interest. This spatial filtering removes any cells that fall outside the defined regional boundaries despite being within the bounding box.

Parameters:

memory_resource_limitation (Optional[bool], default=False) -- Whether to enable memory-efficient processing for large datasets. Passed through to prepare_GWA_data() method to control filtering. If True, applies wind speed filtering to reduce memory usage. If False, processes the full dataset without memory limitations.

Returns:

Spatial GeoDataFrame containing GWA cells with: - Point geometries representing cell center coordinates - Wind resource attributes (speed, power density, IEC class) - Spatial reference system matching regional CRS - Geographic clipping to regional boundaries

Return type:

geopandas.GeoDataFrame

Examples

Load all GWA cells for the region:

>>> gwa_cells = processor.load_gwa_cells()
>>> print(f"Loaded {len(gwa_cells)} spatial cells")
>>> print(f"CRS: {gwa_cells.crs}")

Load with memory optimization:

>>> gwa_cells = processor.load_gwa_cells(memory_resource_limitation=True)
>>> print(f"Memory-optimized: {len(gwa_cells)} cells")

Access spatial and attribute data:

>>> # Spatial analysis
>>> total_area = gwa_cells.total_bounds
>>> print(f"Spatial extent: {total_area}")
>>>
>>> # Attribute analysis
>>> mean_windspeed = gwa_cells['windspeed_gwa'].mean()
>>> print(f"Average wind speed: {mean_windspeed:.2f} m/s")
Raises:
  • ValueError -- If coordinate columns (x, y) are missing from the GWA data

  • GeometryError -- If point geometries cannot be created from coordinates

  • CRSError -- If the coordinate reference system is invalid or undefined

Notes

  • Point geometries are created from x,y coordinate columns

  • Spatial clipping ensures geographic consistency with boundaries

  • CRS is inherited from the regional configuration

  • Processing time scales with the number of GWA cells

  • Memory usage depends on dataset size and attribute complexity

  • Results are suitable for spatial overlay and intersection operations

map_GWA_cells_to_ERA5(aggregation_level: str, memory_resource_limitation: bool | None)[source]

Map high-resolution GWA cells to coarser ERA5 grid cells for multi-scale analysis.

This method performs spatial aggregation of high-resolution GWA wind data (typically 250m-1km resolution) to ERA5 grid cells (approximately 25km resolution). The aggregation process uses spatial overlay operations to determine which GWA cells fall within each ERA5 cell, then computes mean values for all wind resource attributes.

The mapping enables integration of detailed GWA wind resource data with ERA5-based renewable energy analysis workflows, providing enhanced spatial detail while maintaining compatibility with ERA5 grid structures.

Processing is performed on a region-by-region basis to optimize memory usage and computational efficiency. Results are automatically stored in the HDF5 data store for subsequent analysis operations.

Parameters:

memory_resource_limitation (Optional[bool]) -- Whether to enable memory-efficient processing for large datasets. Passed through to load_gwa_cells() and prepare_GWA_data() methods. If True, applies filtering to reduce memory usage during processing. If False, processes the complete dataset without memory limitations.

Returns:

The method doesn't return a value but stores results in the HDF5 store. Aggregated data is accessible via self.mapped_gwa_cells_aggr_df attribute and permanently stored in the 'cells' store for future access.

Return type:

None

Examples

Map GWA data to ERA5 grid with full dataset:

>>> processor.map_GWA_cells_to_ERA5(memory_resource_limitation=False)
>>> print("GWA data mapped to ERA5 grid cells")

Map with memory optimization for large regions:

>>> processor.map_GWA_cells_to_ERA5(memory_resource_limitation=True)
>>> print("Memory-optimized mapping completed")

Access mapped results:

>>> # Results are stored in datahandler
>>> era5_with_gwa = processor.datahandler.from_store('cells')
>>> print(f"ERA5 cells with GWA data: {len(era5_with_gwa)}")
>>> print(f"Columns: {list(era5_with_gwa.columns)}")
Processing Workflow#
  1. Data Loading: Load ERA5 grid cells from HDF5 store

  2. GWA Loading: Load high-resolution GWA cells as GeoDataFrame

  3. Spatial Overlay: Perform intersection between GWA and ERA5 cells

  4. Coordinate Mapping: Update coordinates to ERA5 cell centers

  5. Aggregation: Compute mean values for numeric attributes by ERA5 cell

  6. Storage: Store aggregated results in HDF5 store with forced update

Spatial Operations#
  • Overlay Method: Intersection overlay to find spatial relationships

  • Aggregation Function: Mean aggregation for all numeric attributes

  • Coordinate Assignment: ERA5 cell coordinates replace GWA coordinates

  • Regional Processing: Separate processing by geographic regions

  • Memory Management: Regional processing reduces peak memory usage

Performance Considerations#
  • Processing time scales with number of GWA cells and ERA5 cells

  • Memory usage peaks during spatial overlay operations

  • Regional processing improves memory efficiency for large datasets

  • Storage operations may take time for large aggregated datasets

  • Spatial indexing improves performance for repeated operations

raises FileNotFoundError:

If ERA5 grid cells are not found in the HDF5 store

raises ValueError:

If spatial overlay operations fail due to geometry issues

raises MemoryError:

If dataset is too large for available memory (use memory limitation)

raises RuntimeError:

If HDF5 storage operations fail

Notes

  • Aggregation preserves all numeric wind resource attributes

  • Categorical attributes (like IEC class) may require special handling

  • Results overwrite existing data in the HDF5 store (force_update=True)

  • Processing is optimized for typical renewable energy analysis workflows

  • Spatial accuracy depends on the quality of ERA5 and GWA geometries

  • Large regions may require substantial processing time and memory

  • Results integrate seamlessly with ERA5-based capacity calculations

Data Quality#
  • Mean aggregation is appropriate for continuous wind variables

  • Statistical significance increases with more GWA cells per ERA5 cell

  • Spatial representation accuracy depends on resolution differences

  • Edge effects may occur at regional boundaries

merged_data: DataArray
prepare_GWA_data(windpseed_min=10, windpseed_max=20, memory_resource_limitation: bool = False) DataArray[source]

Download, process, and merge Global Wind Atlas raster data for the region.

This method orchestrates the complete GWA data preparation workflow including: downloading required raster files, loading and clipping them to regional boundaries, merging multiple data layers, and applying quality filters. The processed data is returned as a pandas DataFrame ready for analysis.

The method handles multiple GWA data types (wind speed, power density, IEC classes) and automatically downloads missing files from configured sources. Spatial clipping ensures data is limited to the region of interest, and wind speed filtering allows focus on viable wind resources.

Parameters:
  • windpseed_min (float, default=10) -- Minimum wind speed threshold in m/s for filtering cells. Cells with wind speeds below this value are excluded from results.

  • windpseed_max (float, default=20) -- Maximum wind speed threshold in m/s for filtering cells. Cells with wind speeds above this value are excluded from results.

  • memory_resource_limitation (bool, default=False) -- Whether to enable memory-efficient processing for large datasets. If True, applies wind speed filtering to reduce memory usage. If False, uses full wind speed range (0-50 m/s) for processing.

Returns:

Processed GWA data as pandas DataFrame with columns: - x, y: Spatial coordinates in regional CRS - windspeed_gwa: Wind speed at 100m height (m/s) - windpower_gwa: Wind power density at 100m height (W/m²) - IEC_Class_ExLoads: IEC wind class classifications - Additional fields as configured in GWA data configuration

Return type:

pd.DataFrame

Examples

Prepare data with default wind speed range:

>>> gwa_data = processor.prepare_GWA_data()
>>> print(f"Loaded {len(gwa_data)} wind resource cells")

Apply strict wind speed filtering for high-quality sites:

>>> high_wind_data = processor.prepare_GWA_data(
...     windpseed_min=15,
...     windpseed_max=25,
...     memory_resource_limitation=True
... )
>>> print(f"High-wind sites: {len(high_wind_data)} cells")

Process full dataset without filtering:

>>> all_data = processor.prepare_GWA_data(
...     windpseed_min=0,
...     windpseed_max=50,
...     memory_resource_limitation=False
... )
Raises:
  • ConnectionError -- If GWA data download fails due to network issues

  • FileNotFoundError -- If GWA raster files cannot be found locally or remotely

  • ValueError -- If wind speed thresholds are invalid (min >= max)

  • RuntimeError -- If raster processing or spatial operations fail

Notes

  • Processing time depends on region size and number of data layers

  • Downloaded files are cached locally to avoid repeated downloads

  • Memory usage scales with region size and data resolution

  • Wind speed filtering significantly reduces memory requirements

  • Multiple raster files are automatically merged into unified dataset

  • Spatial coordinates are preserved for subsequent spatial analysis

Turbine Configuration#

Turbine database and configuration management.

class RES.tech.OEDBTurbines(OEDB_config: dict)[source]

Bases: object

fetch_turbine_config(model)[source]

Fetches turbine data based on the resource type (e.g., 'wind') and saves the formatted configuration for the turbines found.

format_and_save_turbine_config(turbine_data: dict, save_to: str)[source]

Formats (to sync Atlite's Requirement) and saves the turbine's specification data to a YAML configuration file.

Parameters:
  • turbine_data (-) -- Turbine specification data.

  • save_to (-) -- The directory path where the YAML file will be saved.

load_turbine_config()[source]

Loads the turbine configuration from a YAML file.

Units Management#

Unit conversion and management utilities.

class RES.units.Units(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None, SAVE_TO_DIR: Path = PosixPath('data'), EXCEL_FILENAME: str = 'units.csv')[source]

Bases: AttributesParser

Unit conversion and metadata management system for renewable energy analysis.

This class provides standardized unit definitions and conversion capabilities for various parameters used throughout the renewable energy resource assessment workflow. It maintains a comprehensive dictionary of units for economic, technical, and energy parameters, enabling consistent data interpretation and reporting across different analysis modules.

Key Functionality: - Defines standard units for economic parameters (CAPEX, OPEX, LCOE) - Establishes energy and power unit conventions - Provides data persistence through HDF5 storage - Exports unit dictionaries for external reference - Ensures consistent unit usage across the analysis pipeline

SAVE_TO_DIR

Directory for saving unit reference files

Type:

Path

EXCEL_FILENAME

Filename for CSV export of unit dictionary

Type:

str

datahandler

HDF5 storage interface for data persistence

Type:

DataHandler

Standard Units Defined:
  • capex: Million USD per MW (Mil. USD/MW)

  • fom: Million USD per MW (Mil. USD/MW) - Fixed O&M

  • vom: Million USD per MW (Mil. USD/MW) - Variable O&M

  • potential_capacity: Megawatts (MW)

  • p_lcoe: Megawatt-hours per USD (MWH/USD)

  • energy: Megawatt-hours (MWh)

  • energy_demand: Petajoules (Pj)

Example

>>> units_manager = Units(
...     config_file_path="config/config.yaml",
...     SAVE_TO_DIR=Path("data/units"),
...     EXCEL_FILENAME="units_reference.csv"
... )
>>> units_manager.create_units_dictionary()
>>> # Creates standardized unit reference for project

Notes

  • Units follow international energy industry standards

  • Economic parameters use million USD to match typical project scales

  • Energy units align with grid-scale renewable energy reporting

  • HDF5 storage enables efficient data access and version control

EXCEL_FILENAME: str = 'units.csv'
SAVE_TO_DIR: Path = PosixPath('data')
create_units_dictionary()[source]

Create and persist a comprehensive dictionary of standardized units for renewable energy analysis.

This method establishes the authoritative unit reference for all parameters used in renewable energy resource assessment and economic analysis. It creates a standardized dictionary mapping parameter names to their corresponding units, ensuring consistency across all analysis modules and facilitating data interpretation and reporting.

Unit Categories: 1. Economic Parameters:

  • Capital expenditures (CAPEX) in Million USD/MW

  • Fixed and variable O&M costs in Million USD/MW

  • LCOE productivity metrics in MWH/USD

  1. Technical Parameters: - Power capacity in Megawatts (MW) - Energy production in Megawatt-hours (MWh) - Energy demand in Petajoules (Pj)

Process: 1. Defines comprehensive units dictionary with industry-standard units 2. Converts dictionary to pandas DataFrame for structured storage 3. Persists data to HDF5 store for efficient access 4. Exports human-readable CSV file for external reference

Data Storage: - HDF5 format: Efficient binary storage for programmatic access - CSV format: Human-readable reference for documentation and verification

Raises:
  • FileSystemError -- If output directory cannot be created

  • PermissionError -- If CSV file cannot be written

  • StorageError -- If HDF5 store operation fails

Example

>>> units_manager = Units(**config)
>>> units_manager.create_units_dictionary()
INFO: Units information created and saved to 'data/units.csv'

Notes

  • Units follow international energy industry conventions

  • Economic units scaled to typical renewable project magnitudes

  • CSV export includes parameter names and corresponding units

  • HDF5 storage enables version control and audit trails

Clustering and Aggregation#

Spatial clustering utilities for site aggregation.

Spatial clustering module for renewable energy resource assessment.

This module provides K-means clustering functionality for aggregating grid cells with similar renewable energy characteristics into representative clusters. The clustering is based on techno-economic metrics such as Levelized Cost of Electricity (LCOE) and potential capacity, enabling spatial aggregation for energy system modeling and optimization.

The module implements an automated workflow for determining optimal cluster numbers, performing spatial clustering, and creating representative cluster geometries that maintain spatial relationships while reducing computational complexity for large-scale renewable energy assessments.

Key Features#

  • Automated optimal cluster number determination using elbow method

  • Spatial clustering based on LCOE and capacity metrics

  • Grid cell identifier generation for data linking

  • Cluster geometry creation through spatial union operations

  • Regional boundary clipping for precise spatial extent

  • Visualization of clustering analysis results

Functions#

assign_cluster_id(cells, source_column=sub_national_unit_tag, index_name='cell')

Generate unique identifiers for grid cells based on region and coordinates

find_optimal_K(resource_type, data_for_clustering, region, wcss_tolerance, max_k)

Determine optimal number of clusters using elbow method and WCSS tolerance

pre_process_cluster_mapping(cells_scored, vis_directory, wcss_tolerance, resource_type)

Preprocess data and determine optimal cluster numbers for each region

cells_to_cluster_mapping(cells_scored, vis_directory, wcss_tolerance, resource_type, sort_columns)

Map grid cells to clusters based on similarity metrics and optimal cluster numbers

create_cells_Union_in_clusters(cluster_map_gdf, region_optimal_k_df, resource_type)

Create unified cluster geometries by dissolving individual cell boundaries

clip_cluster_boundaries_upto_regions(cell_cluster_gdf, gadm_regions_gdf, resource_type)

Clip cluster boundaries to precise regional administrative boundaries

Clustering Methodology#

The clustering approach follows a multi-step process:

  1. Data Preparation: Grid cells with calculated LCOE and capacity metrics

  2. Optimal K Determination: Uses elbow method with Within-Cluster Sum of Squares (WCSS)

  3. Regional Clustering: Performs K-means clustering separately for each region

  4. Spatial Aggregation: Creates unified cluster geometries through spatial union

  5. Boundary Refinement: Clips results to precise administrative boundaries

The LCOE-based clustering ensures that cells with similar techno-economic characteristics are grouped together, creating representative clusters suitable for energy system optimization while maintaining spatial coherence.

Algorithm Details#

  • K-means Clustering: Uses scikit-learn implementation with multiple initializations

  • Elbow Method: Automatically determines optimal cluster count based on WCSS tolerance

  • Missing Data Handling: Imputes missing values using mean strategy

  • Spatial Preservation: Maintains geographic relationships through geometry operations

  • Regional Processing: Handles each administrative region independently

Usage Examples#

Basic clustering workflow:

>>> import pandas as pd
>>> import geopandas as gpd
>>> from RES.cluster import cells_to_cluster_mapping, create_cells_Union_in_clusters
>>>
>>> # Perform clustering analysis
>>> cluster_map_gdf, optimal_k_df = cells_to_cluster_mapping(
>>>     cells_scored=scored_cells,
>>>     vis_directory="vis/BC",
>>>     wcss_tolerance=0.15,
>>>     resource_type="solar",
>>>     sort_columns=["lcoe_solar"]
>>> )
>>>
>>> # Create unified cluster geometries
>>> clusters_gdf, cluster_indices = create_cells_Union_in_clusters(
>>>     cluster_map_gdf=cluster_map_gdf,
>>>     region_optimal_k_df=optimal_k_df,
>>>     resource_type="solar"
>>> )

Cell identification:

>>> # Generate unique cell identifiers
>>> cells_with_ids = assign_cluster_id(
>>>     cells=grid_cells,
>>>     source_column="Province",
>>>     index_name="cell_id"
>>> )

Input Data Requirements#

The clustering functions expect GeoDataFrames with specific columns:

Required Columns: - 'x', 'y': Grid cell centroid coordinates - sub_national_unit_tag: Administrative region classification - 'lcoe_{resource_type}': Levelized cost of electricity - 'potential_capacity_{resource_type}': Maximum potential capacity - 'geometry': Spatial geometry (Polygon or Point)

Optional Columns: - 'capex_{resource_type}': Capital expenditure costs - 'fom_{resource_type}': Fixed operation and maintenance costs - 'vom_{resource_type}': Variable operation and maintenance costs - '{resource_type}_CF_mean': Average capacity factor - 'nearest_station': Nearest grid connection point - 'nearest_station_distance_km': Distance to grid connection

Output Data Structure#

Clustering results include:

Cluster Map GeoDataFrame: - Individual cells with assigned cluster numbers - Original cell attributes preserved - Cluster_No: Integer cluster identifier - Optimal_k: Optimal number of clusters for region

Unified Clusters GeoDataFrame: - Dissolved cluster geometries - Aggregated techno-economic parameters - Representative cluster characteristics - Spatial extent covering all member cells

Cluster Indices Dictionary: - Mapping of original cell indices to clusters - Structure: {region: {cluster_no: [cell_indices]}} - Enables traceability from clusters back to individual cells

Visualization Outputs#

The module generates several visualization products:

Elbow Plots: - WCSS vs. number of clusters for each region - Optimal cluster number identification - Saved to vis_directory/Regional_cluster_Elbow_Plots/

Performance Considerations#

  • Memory usage scales with number of grid cells and clusters

  • Processing time increases with higher max_k values

  • Imputation handles missing data but may affect clustering quality

  • Large regions may benefit from hierarchical clustering approaches

Dependencies#

  • pandas: Data manipulation and analysis

  • geopandas: Spatial data operations

  • numpy: Numerical computations

  • matplotlib.pyplot: Visualization

  • sklearn.cluster.KMeans: K-means clustering algorithm

  • sklearn.impute.SimpleImputer: Missing value imputation

  • pathlib: File path operations

  • logging: Progress and error reporting

  • RES.utility: Custom utility functions for spatial operations

Notes

  • Clustering is performed separately for each administrative region

  • WCSS tolerance controls the trade-off between cluster number and representation

  • Missing or infinite values are automatically handled through imputation

  • Cluster ranking is based on ascending LCOE values (lowest cost first)

  • Spatial relationships are preserved through geometry operations

  • Results are suitable for energy system optimization models

See also

-, -, -

RES.cluster.assign_cluster_id(cells: GeoDataFrame, source_column: str = None, index_name: str = 'cell') GeoDataFrame[source]

Generate unique identifiers for grid cells based on region and coordinates.

Creates standardized cell identifiers that combine regional information with spatial coordinates to ensure uniqueness across the entire assessment domain. These identifiers serve as primary keys for data linking and result tracking throughout the assessment workflow.

Parameters:
  • cells (gpd.GeoDataFrame) -- Input GeoDataFrame containing spatial data with 'x', 'y' coordinates and regional classification information

  • source_column (str, default None) -- Column name containing regional classification (e.g., province, state)

  • index_name (str, default 'cell') -- Name for the new unique identifier column

Returns:

GeoDataFrame with new unique cell identifier column set as index

Return type:

gpd.GeoDataFrame

Examples

Basic cell ID assignment:

>>> cells_with_ids = assign_cluster_id(
...     cells=grid_cells,
...     source_column='Province',
...     index_name='cell_id'
... )
>>> print(cells_with_ids.index.name)  # 'cell_id'

Custom identifier format:

>>> # Creates IDs like: "BC_-123.5_49.2"
>>> cells = assign_cluster_id(cells, 'Province', 'unique_cell')
Raises:
  • ValueError -- If source_column doesn't exist in the GeoDataFrame

  • ValueError -- If required coordinate columns 'x', 'y' are missing

Notes

  • Removes spaces from region names for consistent formatting

  • ID format: "{region}_{x_coord}_{y_coord}"

  • Coordinates maintain original decimal precision

  • Sets generated IDs as DataFrame index for efficient lookups

  • Essential for linking spatial analysis results across workflow steps

RES.cluster.cells_to_cluster_mapping(cells_scored: DataFrame, vis_directory: str, wcss_tolerance: float, sub_national_unit_tag: str, resource_type: str, sort_columns: list) tuple[DataFrame, DataFrame][source]

Map grid cells to clusters based on similarity metrics and optimal cluster numbers.

Performs spatial clustering of renewable energy grid cells by grouping cells with similar techno-economic characteristics (primarily LCOE) into representative clusters. The function implements a systematic approach to divide each region's cells into the optimal number of clusters determined through elbow method analysis.

This is the main clustering workflow function that transforms individual grid cells into clustered representations suitable for energy system optimization models, reducing computational complexity while preserving spatial and economic relationships.

Parameters:
  • cells_scored (pd.DataFrame) -- Scored grid cells with techno-economic attributes Must contain LCOE, capacity, and regional classification data

  • vis_directory (str) -- Directory path for saving clustering visualization outputs Used for elbow plots and clustering analysis results

  • wcss_tolerance (float) -- Within-Cluster Sum of Squares tolerance (0.0 to 1.0) Controls cluster granularity vs. computational efficiency trade-off

  • resource_type (str) -- Renewable energy resource type ('solar', 'wind', 'bess') Determines which columns to use for clustering analysis

  • sort_columns (list) -- Column names for sorting cells before cluster assignment Typically includes LCOE or other ranking metrics

Returns:

  • cells_cluster_map_df: Individual cells with assigned cluster numbers

  • optimal_k_df: Summary of optimal cluster counts by region

Return type:

tuple[pd.DataFrame, pd.DataFrame]

Examples

Perform clustering for wind resources:

>>> cluster_map, optimal_k = cells_to_cluster_mapping(
...     cells_scored=wind_cells_scored,
...     vis_directory="vis/Alberta",
...     wcss_tolerance=0.20,
...     resource_type="wind",
...     sort_columns=["lcoe_wind", "potential_capacity_wind"]
... )
>>> print(f"Created {cluster_map['Cluster_No'].max()} clusters across regions")

Clustering Methodology#

The clustering approach follows several key principles:

  1. Regional Separation: Clustering is performed independently for each administrative region to maintain spatial coherence and respect political boundaries that affect renewable energy development.

  2. LCOE-Based Similarity: Cells are grouped based on Levelized Cost of Electricity (LCOE) as the primary similarity metric, ensuring clusters represent similar economic viability.

  3. Sorted Assignment: Within each region, cells are sorted by specified metrics (typically LCOE) before being assigned to clusters, ensuring that the best cells are distributed across clusters.

  4. Equal Distribution: Cells are divided as evenly as possible across the optimal number of clusters for each region, preventing cluster size imbalances.

Algorithm Workflow#

  1. Preprocessing: Call pre_process_cluster_mapping to determine optimal k

  2. Region Filtering: Focus on regions with valid optimal cluster numbers

  3. Cell Sorting: Sort cells within each region by specified criteria

  4. Cluster Assignment: Divide sorted cells into optimal number of groups

  5. Remainder Handling: Merge small remainder groups into larger clusters

  6. Numbering: Assign sequential cluster numbers within each region

Cluster Assignment Strategy#

For each region with n cells and k optimal clusters: - Calculate step_size = n ÷ k - Assign cells [0:step_size] to cluster 1 - Assign cells [step_size:2*step_size] to cluster 2 - Continue until all cells are assigned - Merge any remainder cells into the last cluster

This ensures balanced cluster sizes while maintaining economic similarity through the pre-sorting step.

Output Data Structure#

cells_cluster_map_df contains: - All original cell attributes (LCOE, capacity, coordinates, etc.) - 'Cluster_No': Integer cluster identifier within region - 'Optimal_k': Total number of clusters for the cell's region - 'cell': Unique cell identifier (as index)

optimal_k_df contains: - sub_national_unit_tag : Administrative region unit (e.g. Region or Municipality etc.) - 'Optimal_k': Optimal number of clusters determined for region

Performance Considerations#

  • Memory usage scales linearly with number of cells

  • Processing time increases with number of regions and complexity

  • Sorting operations may be memory-intensive for large datasets

  • Cluster assignment is efficient O(n) operation per region

Quality Assurance#

  • Validates that all cells receive cluster assignments

  • Ensures cluster numbers are sequential within regions

  • Maintains data integrity through concatenation operations

  • Preserves spatial relationships through regional processing

Notes

  • Clustering preserves regional boundaries for political/administrative coherence

  • LCOE-based sorting ensures economic similarity within clusters

  • Balanced cluster sizes improve downstream optimization performance

  • Results are suitable for capacity expansion and dispatch optimization models

  • Cluster numbering resets for each region (regional scope)

raises ValueError:

If required columns are missing or data validation fails

raises KeyError:

If region names don't match between datasets

raises RuntimeError:

If clustering assignment produces invalid results

See also

pre_process_cluster_mapping

Preprocessing and optimal k determination

create_cells_Union_in_clusters

Spatial union of clustered cells

find_optimal_K

Core optimal cluster number determination

RES.cluster.clip_cluster_boundaries_upto_regions(cell_cluster_gdf: GeoDataFrame, gadm_regions_gdf: GeoDataFrame, resource_type) GeoDataFrame[source]

Clip cluster boundaries to precise regional administrative boundaries.

Refines cluster geometries by clipping them to exact administrative boundaries, ensuring that cluster extents respect political and administrative divisions. This final processing step removes any geometric artifacts from the clustering process and aligns results with official regional boundaries.

The function performs spatial clipping operations to trim cluster polygons to the precise extent of administrative regions, maintaining data integrity while ensuring geographic accuracy for policy and planning applications.

Parameters:
  • cell_cluster_gdf (gpd.GeoDataFrame) -- Unified cluster geometries from create_cells_Union_in_clusters Contains cluster polygons that may extend beyond regional boundaries

  • gadm_regions_gdf (gpd.GeoDataFrame) -- Official administrative boundary geometries from GADM dataset Defines precise regional extents for clipping operations

  • resource_type (str) -- Resource type identifier ('solar', 'wind', 'bess') Used for column identification and sorting operations

Returns:

Clipped cluster geometries with boundaries precisely aligned to administrative regions, sorted by LCOE in ascending order

Return type:

gpd.GeoDataFrame

Examples

Clip wind clusters to provincial boundaries:

>>> clipped_clusters = clip_cluster_boundaries_upto_regions(
...     cell_cluster_gdf=unified_clusters,
...     gadm_regions_gdf=provincial_boundaries,
...     resource_type="wind"
... )
>>> print(f"Clipped {len(clipped_clusters)} clusters to regional boundaries")

Clipping Operations#

  1. Spatial Intersection: Clips cluster geometries using administrative boundaries

  2. Topology Preservation: Maintains valid polygon geometry after clipping

  3. Attribute Retention: Preserves all cluster attributes through clipping

  4. Multi-geometry Handling: Manages potential multi-polygon results

Boundary Alignment Benefits#

  • Policy Compliance: Ensures clusters respect administrative jurisdictions

  • Planning Accuracy: Aligns with regional energy planning boundaries

  • Data Integrity: Removes geometric inconsistencies from processing

  • Visualization Quality: Improves map accuracy for stakeholder communication

Geometric Considerations#

  • Handles edge cases where clusters span multiple regions

  • Preserves cluster identity even after boundary clipping

  • Maintains geometric validity through robust clipping algorithms

  • May create multi-polygon geometries for clusters crossing boundaries

Sorting and Organization#

Results are sorted by LCOE in ascending order to facilitate: - Economic dispatch optimization - Merit order analysis - Least-cost development planning - Investment prioritization

Quality Assurance#

  • Validates geometric integrity after clipping operations

  • Ensures all clusters remain within administrative boundaries

  • Maintains attribute consistency through spatial operations

  • Preserves cluster ranking and identification

Performance Notes#

  • Clipping operations scale with geometric complexity

  • Large regions or detailed boundaries increase processing time

  • Memory usage depends on cluster and boundary detail level

  • Results are optimized for downstream energy modeling applications

Use Cases#

  • Regulatory Compliance: Ensuring development respects jurisdictions

  • Policy Analysis: Aligning renewable development with administrative units

  • Planning Integration: Connecting energy models with regional planning

  • Stakeholder Communication: Accurate maps for decision-maker engagement

Notes

  • Final step in the clustering workflow before energy system modeling

  • Essential for maintaining political and administrative coherence

  • Improves visual quality of cluster maps and analysis results

  • Ensures compatibility with regional energy planning frameworks

  • Results are ready for capacity expansion and dispatch optimization

raises GeometryError:

If clipping operations produce invalid geometries

raises ValueError:

If input datasets have incompatible coordinate systems

raises AttributeError:

If required columns are missing from input data

See also

create_cells_Union_in_clusters

Preceding cluster creation function

gpd.GeoDataFrame.clip

Core spatial clipping operation

RES.boundaries.GADMBoundaries

Administrative boundary data source

RES.cluster.create_cells_Union_in_clusters(cluster_map_gdf: GeoDataFrame, region_optimal_k_df: DataFrame, sub_national_unit_tag: str, resource_type: str) tuple[DataFrame, dict][source]

Create unified cluster geometries by dissolving individual cell boundaries.

Transforms individual grid cells assigned to clusters into unified cluster geometries through spatial union operations. This process aggregates both geometric boundaries and techno-economic attributes to create representative cluster entities suitable for energy system optimization models.

The function performs spatial dissolve operations grouped by cluster number within each region, creating cohesive cluster polygons while maintaining traceability back to original cells through detailed index mapping.

Parameters:
  • cluster_map_gdf (gpd.GeoDataFrame) -- Grid cells with cluster assignments from cells_to_cluster_mapping Must contain defined sub_national_unit_tag, 'Cluster_No', and geometric attributes

  • region_optimal_k_df (pd.DataFrame) -- Summary of optimal cluster numbers by region Contains defined sub_national_unit_tag and 'Optimal_k' columns

  • resource_type (str) -- Resource type identifier ('solar', 'wind', 'bess') Used for column naming and aggregation rules

Returns:

  • dissolved_gdf: Unified cluster geometries with aggregated attributes

  • dissolved_indices: Mapping of cluster to original cell indices

Return type:

tuple[pd.DataFrame, dict]

Examples

Create unified solar clusters:

>>> clusters_gdf, cell_mapping = create_cells_Union_in_clusters(
...     cluster_map_gdf=mapped_cells,
...     region_optimal_k_df=optimal_k_summary,
...     resource_type="solar"
... )
>>> print(f"Created {len(clusters_gdf)} unified clusters")
>>> print(f"Cluster 1 contains {len(cell_mapping['BC'][1])} original cells")

Aggregation Strategy#

Different attributes are aggregated using specific strategies:

Economic Metrics: - LCOE: Median value (representative of cluster economics) - CAPEX, FOM, VOM: First value (uniform within region/technology)

Performance Metrics: - Capacity Factor: Mean value (average performance) - Potential Capacity: Sum (total cluster capacity)

Infrastructure Metrics: - Nearest Station: First value (primary connection point) - Distance to Grid: First value (representative distance)

Classification: - Region, Cluster_No: First value (preserved identity)

Geometric Operations#

  1. Spatial Dissolve: Union of cell geometries within each cluster

  2. Topology Preservation: Maintains valid polygon geometry

  3. Attribute Aggregation: Combines cell attributes per aggregation rules

  4. Index Tracking: Records original cell indices for each cluster

Output Structure#

dissolved_gdf contains unified clusters with: - 'cluster_id': Unique cluster identifier (as index) - sub_national_unit_tag: Administrative region unit (e.g., Region or Municipality) - 'Cluster_No': Sequential cluster number within region - 'Rank': Cluster ranking based on LCOE (ascending) - Economic attributes: Aggregated costs and performance metrics - 'geometry': Unified cluster polygon geometry

dissolved_indices structure: ``` {

'region_name': {

cluster_no: [list_of_original_cell_indices], ...

}#

Processing Workflow#

  1. Region Iteration: Process each region independently

  2. Cluster Grouping: Group cells by cluster number within region

  3. Index Recording: Store original cell indices before dissolving

  4. Spatial Dissolve: Union geometries and aggregate attributes

  5. Result Compilation: Concatenate all dissolved clusters

  6. ID Assignment: Generate unique cluster identifiers

  7. Ranking: Sort and rank clusters by economic metrics

  8. Column Cleanup: Standardize column names for downstream use

Traceability Features#

The dissolved_indices dictionary enables: - Mapping clusters back to constituent cells - Detailed analysis of cluster composition - Validation of aggregation results - Disaggregation for detailed reporting

Quality Assurance#

  • Validates that all cells are included in clusters

  • Ensures geometric validity after spatial operations

  • Maintains attribute consistency through aggregation

  • Preserves regional and cluster identity information

Performance Considerations#

  • Memory usage scales with cluster complexity and number

  • Spatial operations may be computationally intensive

  • Large clusters with many cells require more processing time

  • Geometric simplification may be beneficial for very detailed cells

Notes

  • Cluster ranking facilitates economic dispatch optimization

  • Column name standardization removes resource type suffixes

  • Median LCOE provides robust cluster economic representation

  • Spatial union preserves geographic relationships

  • Results are optimized for energy system modeling workflows

raises ValueError:

If cluster assignments are invalid or missing

raises GeometryError:

If spatial dissolve operations fail

raises KeyError:

If required columns are missing from input data

See also

cells_to_cluster_mapping

Preceding cluster assignment function

clip_cluster_boundaries_upto_regions

Boundary refinement function

gpd.GeoDataFrame.dissolve

Core spatial dissolve operation

RES.cluster.find_optimal_K(resource_type: str, data_for_clustering: DataFrame, region: str, wcss_tolerance: float, max_k: int) DataFrame[source]

Determine optimal number of clusters using elbow method and WCSS tolerance.

Analyzes grid cells with renewable energy characteristics to find the optimal number of K-means clusters using the elbow method. The Within-Cluster Sum of Squares (WCSS) tolerance parameter controls the trade-off between cluster representation accuracy and computational complexity.

The function iteratively tests different cluster numbers (k) and calculates WCSS for each configuration. The optimal k is determined when WCSS falls below the specified tolerance threshold, indicating diminishing returns for additional clusters.

Parameters:
  • resource_type (str) -- Type of renewable energy resource ('solar', 'wind', 'bess') Used for labeling and file naming

  • data_for_clustering (pd.DataFrame) -- Preprocessed data containing clustering features (LCOE, capacity) Must have no missing values or infinite values

  • region (str) -- Name of the administrative region being processed Used for plot titles and output messages

  • wcss_tolerance (float) -- Tolerance threshold as fraction of total WCSS (0.0 to 1.0) Lower values = more clusters, higher values = fewer clusters

  • max_k (int) -- Maximum number of clusters to test Limited by data size and computational constraints

Returns:

Optimal number of clusters for the region Returns None if no optimal k found within tolerance

Return type:

int or None

Examples

Find optimal clusters for solar data:

>>> optimal_k = find_optimal_K(
...     resource_type="solar",
...     data_for_clustering=clean_data,
...     region="British Columbia",
...     wcss_tolerance=0.15,
...     max_k=20
... )
>>> print(f"Optimal clusters: {optimal_k}")

Notes

  • WCSS measures squared distances from cluster centroids

  • Higher WCSS tolerance leads to fewer, more aggregated clusters

  • Lower WCSS tolerance leads to more, finer-grained clusters

  • Elbow plots are automatically generated and displayed

  • Function uses K-means with 10 random initializations for stability

  • Processing time increases quadratically with max_k

Algorithm Details#

  1. Test k from 1 to min(max_k, data_size)

  2. Calculate WCSS (inertia) for each k using K-means

  3. Compute tolerance threshold as fraction of total WCSS

  4. Find first k where WCSS ≤ tolerance threshold

  5. Generate elbow plot with optimal k marked

The WCSS measures the sum of squared distances between each data point and its assigned cluster centroid. Lower WCSS indicates tighter, more homogeneous clusters but may lead to over-segmentation.

raises ValueError:

If data_for_clustering is empty or contains only NaN values

raises RuntimeError:

If K-means clustering fails for any k value

See also

sklearn.cluster.KMeans

K-means clustering implementation

pre_process_cluster_mapping

Preprocessing function that calls this method

RES.cluster.pre_process_cluster_mapping(cells_scored: DataFrame, vis_directory: str, wcss_tolerance: float, sub_national_unit_tag: str, resource_type: str) tuple[DataFrame, DataFrame][source]

Preprocess data and determine optimal cluster numbers for each region.

Performs comprehensive preprocessing of scored grid cells to prepare them for K-means clustering analysis. The function handles missing data, determines optimal cluster numbers for each administrative region, and generates visualization outputs for clustering analysis.

This function serves as the preprocessing pipeline that prepares raw scored cell data for the main clustering workflow, ensuring data quality and generating region-specific clustering parameters.

Parameters:
  • cells_scored (pd.DataFrame) -- GeoDataFrame containing scored grid cells with LCOE and capacity data Must include columns: 'Region', 'lcoe_{resource_type}', 'potential_capacity_{resource_type}'

  • vis_directory (str) -- Base directory path for saving visualization outputs Elbow plots will be saved in subdirectory 'Regional_cluster_Elbow_Plots'

  • wcss_tolerance (float) -- WCSS tolerance threshold for optimal cluster determination (0.0 to 1.0) Controls trade-off between cluster number and representation accuracy

  • resource_type (str) -- Resource type identifier ('solar', 'wind', 'bess') Used for column name construction and labeling

Returns:

  • cells_scored_cluster_mapped: Enhanced cell data with optimal k values and cell IDs

  • region_optimal_k_df: Summary of optimal cluster numbers by region

Return type:

tuple[pd.DataFrame, pd.DataFrame]

Examples

Preprocess solar cell data:

>>> cells_mapped, optimal_k_summary = pre_process_cluster_mapping(
...     cells_scored=scored_solar_cells,
...     vis_directory="vis/BC",
...     wcss_tolerance=0.15,
...     resource_type="solar"
... )
>>> print(f"Processed {len(cells_mapped)} cells across {len(optimal_k_summary)} regions")

Processing Workflow#

  1. Region Iteration: Process each unique administrative region separately

  2. Data Validation: Check for required columns and sufficient data

  3. Data Cleaning: Handle infinite values and missing data through imputation

  4. Optimal K Finding: Apply elbow method to determine cluster numbers

  5. Visualization: Generate and save elbow plots for each region

  6. Data Integration: Merge optimal k values back to cell data

  7. ID Assignment: Generate unique cell identifiers for data linking

Data Quality Handling#

  • Missing Columns: Regions without required columns are skipped

  • Infinite Values: Replaced with NaN for proper imputation

  • Empty Data: Regions with insufficient data are excluded

  • Imputation: Uses mean strategy for missing value replacement

  • Zero Clusters: Regions with optimal_k=0 are filtered out

Output Structure#

cells_scored_cluster_mapped contains: - All original cell attributes - 'Optimal_k': Optimal cluster number for the cell's region - 'cell': Unique cell identifier (set as index)

region_optimal_k_df contains: - 'Region': Administrative region name - 'Optimal_k': Optimal number of clusters for the region

Visualization Outputs#

Generates elbow plots saved to: {vis_directory}/Regional_cluster_Elbow_Plots/elbow_plot_region_{region}.png

Each plot shows: - WCSS vs. number of clusters - Optimal k marked with vertical line - Region-specific title and labels

Notes

  • Processing is performed region-by-region for spatial coherence

  • Imputation strategy can affect clustering quality

  • Visualization directory is created if it doesn't exist

  • Regions with insufficient data (< 2 cells) may be skipped

  • Memory usage scales with number of regions and cells per region

raises ValueError:

If vis_directory path is invalid or cannot be created

raises KeyError:

If required columns are missing from cells_scored

raises RuntimeError:

If imputation or clustering fails for critical regions

See also

find_optimal_K

Core optimal cluster determination function

assign_cluster_id

Cell identifier generation function

cells_to_cluster_mapping

Main clustering workflow function

Note

If the above documentation doesn't render properly, this module provides clustering algorithms for renewable energy resource grouping and analysis.

Key functions from RES.cluster:

  • assign_cluster_id(): Generate unique cell identifiers

  • determine_elbow_optimal_clusters(): Automatic cluster number optimization

  • cluster_sites(): K-means clustering with economic weighting

  • get_representative_timeseries(): Cluster-representative time series generation

Visualization Tools#

Comprehensive plotting and mapping utilities.

Visualization and plotting utilities for renewable energy resource assessment.

This module provides comprehensive visualization tools for displaying renewable energy assessment results including spatial maps, time series plots, capacity distributions, economic analysis charts, and interactive dashboards. It supports both static publication-quality figures and interactive web-based visualizations.

The visualization tools are designed to facilitate analysis interpretation, result communication, and workflow debugging through clear, informative graphics that highlight spatial patterns, temporal variations, and economic trade-offs in renewable energy development potential.

Key Functions:
  • Spatial mapping: Choropleth maps of resource potential and constraints

  • Time series visualization: Capacity factor profiles and seasonal patterns

  • Economic analysis: LCOE distributions and cost component breakdowns

  • Cluster visualization: Site groupings and representative characteristics

  • Interactive dashboards: Web-based exploration interfaces

  • Export utilities: High-resolution figure generation for publications

Dependencies:
  • matplotlib/seaborn: Static plotting and publication graphics

  • plotly: Interactive visualizations and dashboards

  • folium: Web-based interactive maps

  • geopandas: Spatial data visualization

  • xarray: Multi-dimensional data plotting

RES.visuals.add_compass_arrow(ax, x: float = 0.9, y: float = 0.9, fontsize: float = 9, color: str = 'grey', length: float = 0.05, text_offset: float = 0.01, arrow_head_width: float = 6, arrow_width=1.5)[source]

Adds a simple north arrow to the plot. :param ax: The plot axes to annotate. :type ax: matplotlib.axes.Axes :param x: X position in axes fraction coordinates. :type x: float :param y: Y position in axes fraction coordinates. :type y: float :param length: Length of the arrow in axes fraction units. :type length: float :param text_offset: Offset for the 'N' label below the arrow. :type text_offset: float

RES.visuals.add_compass_arrow_custom(ax, x: float = 0.9, y: float = 0.9, fontsize: float = 9, color: str = 'grey', length: float = 0.01, text_offset: float = 0.01, arrow_head_width: float = 6, arrow_border_width: float = 0.5, text: str = 'N')[source]

Alternative version with more arrow head customization. Uses the older arrow method for more control over head dimensions.

RES.visuals.add_compass_to_plot(ax, x_offset=0.76, y_offset=0.92, size=14, triangle_size=0.02)[source]

Adds a simple upward-pointing triangle with an 'N' label below it as a North indicator.

Parameters:
  • ax (matplotlib.axes.Axes) -- The plot axes to annotate.

  • x_offset (float) -- X position in axes fraction coordinates.

  • y_offset (float) -- Y position in axes fraction coordinates.

  • size (int) -- Font size for the 'N' label.

  • triangle_size (float) -- Radius of the triangle (in axes fraction units).

RES.visuals.create_key_data_map_interactive(province_gadm_regions_gdf: GeoDataFrame, provincial_conservation_protected_lands: GeoDataFrame, aeroway_with_buffer_solar: GeoDataFrame, aeroway_with_buffer_wind: GeoDataFrame, aeroway: GeoDataFrame, provincial_bus_gdf: GeoDataFrame, current_region: dict, about_OSM_data: dict[dict], map_html_save_to: str)[source]

Creates an interactive map with key data for a specific province, including regions, conservation lands, aeroways, and bus nodes.

Parameters:
  • province_gadm_regions_gdf (gpd.GeoDataFrame) -- GeoDataFrame containing the province's administrative regions.

  • provincial_conservation_protected_lands (gpd.GeoDataFrame) -- GeoDataFrame containing conservation and protected lands.

  • aeroway_with_buffer_solar (gpd.GeoDataFrame) -- GeoDataFrame containing solar aeroways with buffer zones.

  • aeroway_with_buffer_wind (gpd.GeoDataFrame) -- GeoDataFrame containing wind aeroways with buffer zones.

  • aeroway (gpd.GeoDataFrame) -- GeoDataFrame containing aeroways.

  • provincial_bus_gdf (gpd.GeoDataFrame) -- GeoDataFrame containing provincial bus routes.

  • current_region (dict) -- Dictionary containing information about the current region.

  • about_OSM_data (dict[dict]) -- Dictionary containing information about OSM data.

  • map_html_save_to (str) -- _description_

RES.visuals.create_raster_image_with_legend(raster: str, cmap: str, title: str, plot_save_to: str)[source]

Creates a raster image with a legend for land classes.

RES.visuals.create_sites_ts_plots_all_sites(resource_type: str, CF_ts_df: DataFrame, save_to_dir: str)[source]

Creates an interactive timeseries plot for the top sites of a given resource type. :param resource_type: The type of resource (e.g., 'solar', 'wind'). :type resource_type: str :param CF_ts_df: DataFrame containing the capacity factor timeseries data. :type CF_ts_df: pd.DataFrame :param save_to_dir: Directory to save the plot. :type save_to_dir: str

RES.visuals.create_sites_ts_plots_all_sites_2(resource_type: str, CF_ts_df: DataFrame, save_to_dir: str)[source]
RES.visuals.create_timeseries_interactive_plots(ts_df: DataFrame, save_to_dir: str)[source]
RES.visuals.create_timeseries_plots(cells_df, CF_timeseries_df, max_resource_capacity, dissolved_indices, resampling, representative_color_palette, std_deviation_gradient, vis_directory)[source]
RES.visuals.create_timeseries_plots_solar(cells_df, CF_timeseries_df, dissolved_indices, max_solar_capacity, resampling, solar_vis_directory)[source]

Generates time series plots for solar capacity factor (CF) data. :param cells_df: DataFrame containing cell information. :type cells_df: pd.DataFrame :param CF_timeseries_df: DataFrame containing capacity factor time series data. :type CF_timeseries_df: pd.DataFrame :param dissolved_indices: Dictionary mapping regions and cluster numbers to indices in CF_timeseries_df. :type dissolved_indices: dict :param max_solar_capacity: Maximum solar capacity for investment. :type max_solar_capacity: float :param resampling: Resampling frequency for the time series data. :type resampling: str :param solar_vis_directory: Directory to save the generated plots. :type solar_vis_directory: str

RES.visuals.get_CF_wind_check_plot(cells: GeoDataFrame, gwa_raster_data: DataArray, boundary: GeoDataFrame, region_code: str, region_name: str, columns: list, figure_height: int = 7, font_family: str = 'sans-serif', save_to: str | Path = None)[source]

Plots GWA benchmark (left), CF_IEC3 (middle), and wind_CF_mean (right).

RES.visuals.get_conservation_lands_plot(CPCAD_actual: GeoDataFrame, CPCAD_with_buffer: GeoDataFrame, save_to: Path | str, font_family: str = 'sans-serif')[source]

Creates a plot comparing original and buffered conservation lands.

RES.visuals.get_data_in_map_plot(cells, resource_type: str = None, datafield: str = None, title: str = None, ax=None, compass_size: float = 10, font_family: str = None, discalimers: bool = False, show=True)[source]

Plots a map of renewable energy resources (solar or wind) with capacity factor, potential capacity, or LCOE. :param cells: GeoDataFrame containing the resource data. :type cells: gpd.GeoDataFrame :param resource_type: Type of renewable resource ('solar' or 'wind'). Defaults to None. :type resource_type: str, optional :param datafield: Data field to plot ('CF', 'CAPACITY', or 'SCORE'). Defaults to None. :type datafield: str, optional :param title: Title for the plot. Defaults to None. :type title: str, optional :param ax: Axes to plot on. If None, a new figure and axes are created. Defaults to None. :type ax: matplotlib.axes.Axes, optional :param compass_size: Size of the compass in the plot. Defaults to 10. :type compass_size: float, optional :param font_family: Font family for text in the plot. Defaults to 'sans-serif'. :type font_family: str, optional :param discalimers: Whether to include disclaimers in the plot. Defaults to False. :type discalimers: bool, optional :param show: Whether to display the plot. Defaults to True. :type show: bool, optional

Returns:

The axes with the plotted map.

Return type:

ax (matplotlib.axes.Axes)

RES.visuals.get_selected_vs_missed_visuals(cells: GeoDataFrame, province_short_code, resource_type, lcoe_threshold: float, CF_threshold: float, capacity_threshold: float, text_box_x=0.4, text_box_y=0.95, title_y=1, title_x=0.6, font_size=10, dpi=1000, figsize=(12, 7), save=False)[source]

Generate visualizations for selected vs missed cells.

Parameters:
  • cells (gpd.GeoDataFrame) -- GeoDataFrame containing cell data.

  • province_short_code (str) -- Short code for the province.

  • resource_type (str) -- Type of renewable resource (e.g., 'solar', 'wind').

  • lcoe_threshold (float) -- _description_

  • CF_threshold (float) -- _description_

  • capacity_threshold (float) -- _description_

  • text_box_x (float, optional) -- _description_. Defaults to .4.

  • text_box_y (float, optional) -- _description_. Defaults to .95.

  • title_y (int, optional) -- _description_. Defaults to 1.

  • title_x (float, optional) -- _description_. Defaults to 0.6.

  • font_size (int, optional) -- _description_. Defaults to 10.

  • dpi (int, optional) -- _description_. Defaults to 1000.

  • figsize (tuple, optional) -- _description_. Defaults to (12, 7).

  • save (bool, optional) -- _description_. Defaults to False.

RES.visuals.get_stepwise_availability_plots(excluder: ExclusionContainer, region_shape: GeoDataFrame, raster_configs: list[dict], vector_configs: list[dict], save_to: str | Path)[source]
RES.visuals.plot_data_in_GADM_regions(dataframe, data_column_df, gadm_regions_gdf, color_map, dpi, plt_title, plt_file_name, vis_directory)[source]

Plots data from a DataFrame on GADM regions using GeoPandas and Matplotlib.

Parameters:
  • dataframe (pd.DataFrame) -- DataFrame containing the data to plot.

  • data_column_df (str) -- Name of the column in the DataFrame to plot.

  • gadm_regions_gdf (gpd.GeoDataFrame) -- GeoDataFrame containing the GADM regions.

  • color_map (str) -- Name of the color map to use for the plot.

  • dpi (int) -- Dots per inch for the plot.

  • plt_title (str) -- Title of the plot.

  • plt_file_name (str) -- File name for saving the plot.

  • vis_directory (str) -- Directory for saving the visualization.

RES.visuals.plot_gaez_raster_with_boundary(raster_path, legend_csv, gdf_path, dst_crs='EPSG:4326', figsize=(12, 7), compass_length=0.1, font_family='serif', title=None, plot_save_to=None)[source]

Plot a GAEZ categorical raster with a shadowed boundary layer using colors from CSV.

RES.visuals.plot_grid_lines(region_code: str, region_name: str, lines: GeoDataFrame, boundary: GeoDataFrame, font_family: str = None, figsize: tuple = (10, 8), dpi=500, save_to: str | Path = None, show: bool = True)[source]

Plots transmission lines with binned voltage levels in a specified region.

RES.visuals.plot_resources_scatter_metric(resource_type: str, clusters_resources: GeoDataFrame, lcoe_threshold: float = 999, color=None, save_to_root: str | Path = 'vis')[source]

Generate a scatter plot visualizing the relationship between Capacity Factor (CF) and Levelized Cost of Energy (LCOE) for renewable energy resources (solar or wind). The plot highlights clusters of resources based on their potential capacity. :param resource_type: The type of renewable resource to plot. Must be either 'solar' or 'wind'. :type resource_type: str :param clusters_resources: A GeoDataFrame containing resource cluster data.

Expected columns include:
  • 'CF_mean': Average capacity factor of the resource cluster.

  • 'lcoe': Levelized Cost of Energy for the resource cluster.

  • 'potential_capacity': Potential capacity of the resource cluster (used for bubble size).

Parameters:
  • lcoe_threshold (float) -- The maximum LCOE value to include in the plot. Clusters with LCOE above this threshold are excluded.

  • color (optional) -- Custom color for the scatter plot bubbles. Defaults to 'darkorange' for solar and 'navy' for wind.

  • save_to_root (str | Path, optional) -- Directory path where the plot image will be saved. Defaults to 'vis'.

Returns:

The function saves the generated plot as a PNG image in the specified directory.

Return type:

None

Notes

  • The size of the bubbles in the scatter plot represents the potential capacity of the resource clusters.

  • The x-axis (CF_mean) is formatted as percentages for better readability.

  • A legend is included to indicate the bubble sizes in gigawatts (GW).

  • The plot includes an annotation explaining the scoring methodology for LCOE.

  • The plot is saved as a transparent PNG image with a resolution of 600 dpi.

Example

>>> plot_resources_scatter_metric(
...     resource_type='solar',
...     clusters_resources=solar_clusters_gdf,
...     lcoe_threshold=50,
...     save_to_root='output/plots'
... )
RES.visuals.plot_resources_scatter_metric_combined(solar_clusters: DataFrame, wind_clusters: DataFrame, bubbles_GW: list = [1, 5, 10], bubbles_scale: float = 0.4, lcoe_threshold: float = 200, font_family=None, figsize=(3.5, 2.5), dpi=1000, save_to_root: str = 'vis', set_transparent: bool = False)[source]

Plot combined scatter metrics for solar and wind resources.

Parameters:
  • solar_clusters (pd.DataFrame) -- DataFrame containing solar cluster data.

  • wind_clusters (pd.DataFrame) -- DataFrame containing wind cluster data.

  • bubbles_GW (list, optional) -- List of bubble sizes in GW. Defaults to [1, 5, 10].

  • bubbles_scale (float, optional) -- Scaling factor for bubble sizes. Defaults to 0.4.

  • lcoe_threshold (float, optional) -- LCOE threshold for filtering. Defaults to 200.

  • font_family (str, optional) -- Font family for the plot. Defaults to 'sans-serif'.

  • save_to_root (str, optional) -- Directory to save the plot. Defaults to 'vis'.

  • set_transparent (bool, optional) -- Whether to set the background transparent. Defaults to False.

RES.visuals.plot_with_matched_cells(ax, cells: GeoDataFrame, filtered_cells: GeoDataFrame, column: str, cmap: str, background_cell_linewidth: float, selected_cells_linewidth: float, font_size: int = 9)[source]

Helper function to plot cells with matched cells overlay.

RES.visuals.size_for_legend(mw)[source]

Calculate bubble size for capacity-based map legends.

Converts megawatt capacity values to appropriate bubble sizes for proportional symbol maps, ensuring visual clarity and proper scaling across different capacity ranges.

Parameters:

mw (float) -- Capacity value in megawatts

Returns:

Scaled bubble size for mapping visualization

Return type:

float

Examples

>>> size_for_legend(100)  # 100 MW site
50.0
>>> size_for_legend(500)  # 500 MW site
150.0
RES.visuals.visualize_ss_nodes(substations_gdf, provincem_gadm_regions_gdf: GeoDataFrame, plot_name)[source]

Visualizes transmission nodes (buses) on a map with different colors based on substation types.

Parameters: - gadm_regions_gdf (GeoDataFrame): GeoDataFrame containing base regions to plot. - buses_gdf (GeoDataFrame): GeoDataFrame containing buses with 'substation_type' column. - plot_name (str): File path to save the plot image.

Returns: - None

The RES.visuals module provides:

  • Spatial mapping with choropleth visualization

  • Time series plotting and seasonal analysis

  • Economic analysis charts and distributions

  • Interactive web-based dashboards

  • Publication-quality figure export

Local Data Store with HDF5 file#

HDF5-based data storage and retrieval.

class RES.hdf5_handler.DataHandler(hdf_file_path: Path = None, silent_initiation: bool | None = True, show_structure: bool | None = False)[source]

Bases: object

A class to handle reading and writing data to an HDF5 file. This class provides methods to save DataFrames or GeoDataFrames to an HDF5 file. This class is useful for managing large datasets efficiently, allowing for quick access and storage of structured data.

Key Features:
  • Save DataFrames or GeoDataFrames to an HDF5 file with optional geometry handling.

  • Load data from the HDF5 file, converting WKT geometries back to GeoDataFrames.

  • Manage the structure of the HDF5 file, including showing the tree structure and deleting keys.

Dependencies:
  • pandas: For DataFrame operations

  • geopandas: For GeoDataFrame operations

  • h5py: For HDF5 file handling

  • shapely: For geometry serialization and deserialization

store

Path to the HDF5 file.

Type:

Path

data_new

Data to be saved.

Type:

pd.DataFrame or gpd.GeoDataFrame

data_ext

Existing data from the store.

Type:

pd.DataFrame or gpd.GeoDataFrame

updated_data

Updated data after merging new data.

Type:

pd.DataFrame or gpd.GeoDataFrame

__init__(hdf_file_path

Path, silent_initiation: Optional[bool] = True, show_structure: Optional[bool] = False): Initializes the DataHandler with the file path.

to_store(data

pd.DataFrame or gpd.GeoDataFrame, key: str, hdf_file_path: Path = None, force_update: bool = False): Saves the DataFrame or GeoDataFrame to the HDF5 file.

from_store(key

str): Loads data from the HDF5 store and handles geometry conversion.

refresh()[source]

Initializes a new DataHandler instance with the current store path.

show_tree(store_path

Path, show_dataset: bool = False): Recursively prints the hierarchy of an HDF5 file.

del_key(store_path

Path, key_to_delete: str): Deletes a specific key from the HDF5 file.

static del_key(store_path, key_to_delete: str)[source]

Deletes a specific key from the HDF5 file.

Parameters:
  • store_path (Path) -- Path to the HDF5 file.

  • key_to_delete (str) -- The key to delete from the HDF5 file.

Raises:

KeyError -- If the key does not exist in the HDF5 file.

Returns:

This method prints the status of the deletion operation.

Return type:

None

Example

>>> DataHandler.del_key(Path('data.h5'), 'my_key')
This will delete 'my_key' from the 'data.h5' file if it exists
from_store(key: str)[source]

Load data from the HDF5 store and handle geometry conversion.

Parameters:

key (str) -- Key for loading the DataFrame or GeoDataFrame.

Returns:

The loaded DataFrame or GeoDataFrame.

Return type:

pd.DataFrame or gpd.GeoDataFrame

Raises:
  • FileNotFoundError -- If the key is not found in the store.

  • TypeError -- If the loaded data is not a DataFrame or GeoDataFrame.

refresh()[source]

Initialize a new DataHandler instance with the current store path. This method is useful for reloading the DataHandler with the same store path without needing to reinitialize the entire class.

Parameters:

None

Returns:

A new instance of DataHandler with the same store path.

Return type:

DataHandler

static show_tree(store_path, show_dataset: bool = False)[source]

This method provides a structured view of the keys and datasets within the HDF5 file, allowing users to understand its organization.

Parameters:
  • store_path (Path) -- Path to the HDF5 file.

  • show_dataset (bool) -- If True, also show datasets within the groups.

Raises:

Exception -- If there is an error reading the file.

Returns:

This method prints the structure to the console.

Return type:

None

to_store(data: DataFrame, key: str, hdf_file_path: Path = None, force_update: bool = False)[source]

Save the DataFrame or GeoDataFrame to an HDF5 file.

Parameters:
  • hdf_file_path (Path) -- Path to the HDF5 file. If None, it uses the existing store path.

  • data (pd.DataFrame or gpd.GeoDataFrame) -- The DataFrame or GeoDataFrame to save.

  • key (str) -- Key for saving the DataFrame to the HDF5 file.

  • force_update (bool) -- If True, force update the data even if it exists.

Raises:
  • TypeError -- If the data is not a DataFrame or GeoDataFrame.

  • ValueError -- If the key is empty.

Note

If the above documentation doesn't render, this class provides HDF5-based data storage and retrieval capabilities for the RESource framework.

Utility Functions#

Common helper functions and data operations.

The RES.utility module includes:

  • Configuration file parsing and validation

  • Data I/O operations (YAML, JSON, geospatial formats)

  • Coordinate transformations and spatial utilities

  • Hierarchical logging and progress reporting

  • URL downloading and caching mechanisms

Configuration Management#

All classes inherit configuration parsing capabilities from AttributesParser, enabling:

  • YAML-based configuration management

  • Parameter validation and default value handling

  • Environment-specific settings (development, production)

  • Technology-specific parameter sets

Data Storage#

The framework uses HDF5-based storage through DataHandler for:

  • Efficient large dataset management

  • Automated caching to avoid redundant computations

  • Cross-platform compatibility

  • Hierarchical data organization

Examples#

Basic Assessment Workflow#

from RES.RESources import RESources_builder

# Initialize assessment
builder = RESources_builder(
    config_file_path="config/config_BC.yaml",
    region_short_code="BC",
    resource_type="wind"
)

# Execute complete workflow
results = builder.build(
    select_top_sites=True,
    use_pypsa_buses=True, 
    memory_resource_limitation=True
)

# Export results
builder.export_results(*results, save_to="output/BC_wind/")

Step-by-Step Analysis#

# Manual workflow control
cells = builder.get_grid_cells()
cells_with_capacity = builder.get_cell_capacity()
cells_with_timeseries = builder.get_CF_timeseries(cells_with_capacity)
scored_cells = builder.score_cells(cells_with_timeseries) 
clusters = builder.get_clusters(scored_cells)

Notes#

  • All spatial data maintained in WGS84 (EPSG:4326) coordinate system

  • Time series generated at hourly resolution for full assessment years

  • Economic calculations follow NREL LCOE methodology

  • Clustering uses k-means with automatic optimization

  • Caching mechanisms minimize redundant computation

  • Modular design enables workflow customization


Warning

This page is under heavy development