RES API References#
Warning
This page is under heavy development - Additional modules and methods will be documented as the API stabilizes.
RESource provides a comprehensive API for variable renewable energy (VRE) resource assessment through a modular architecture. This reference documents the main classes and methods available for building custom assessment workflows.
Core Workflow Classes#
RESource Builder#
Main orchestrator class for renewable energy resource assessments.
- class RES.RESources.RESources_builder(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]
Bases:
AttributesParser
Main orchestrator class for renewable energy resource assessment workflows.
__RESources_builder__ coordinates the complete workflow for assessing solar and wind potential at sub-national scales. It integrates spatial grid cell preparation, land availability analysis, weather data processing, economic evaluation, and site clustering into a unified framework.
This class implements a modular architecture where each assessment step is handled by specialized components, enabling reproducible, scalable, and transparent renewable energy assessments.
- Parameters:
config_file_path (str or Path) -- Path to the YAML configuration file containing project settings
region_short_code (str) -- ISO or custom short code for the target region (e.g., 'BC' for British Columbia)
resource_type ({'solar', 'wind'}) -- Type of renewable energy resource to assess
- store
Root directory for data storage (HDF5 file) and caching.
- Type:
Path
- units
Handler for unit conversions and standardization
- Type:
Units
- gridcells
Spatial grid generation and management
- Type:
GridCells
- timeseries
Climate data processing and capacity factor calculations
- Type:
Timeseries
- datahandler
HDF5-based data storage and retrieval interface
- Type:
DataHandler
- cell_processor
Land availability and capacity potential calculations
- Type:
CellCapacityProcessor
- coders
Canadian power system data integration (substations, transmission lines).
- Type:
CODERSData
- era5_cutout
ERA5 climate data cutout management
- Type:
ERA5Cutout
- scorer
Economic scoring and LCOE calculations
- Type:
CellScorer
- gwa_cells
Global Wind Atlas data integration (wind resources only)
- Type:
GWACells
- results_save_to
Output directory for assessment results
- Type:
Path
- region_name
Full name of the assessed region
- Type:
str
- get_grid_cells()[source]
Generate spatial grid cells covering the region boundary
- get_cell_capacity(force_update=False)[source]
Calculate potential capacity based on land availability constraints
- extract_weather_data()[source]
Process climate data for capacity factor calculations
- update_gwa_scaled_params(memory_resource_limitation=False)[source]
Integrate Global Wind Atlas wind speed corrections (wind only)
- get_CF_timeseries(cells=None, force_update=False)[source]
Generate hourly capacity factor time series
- find_grid_nodes(cells=None, use_pypsa_buses=False)[source]
Identify nearest electrical grid connection points
- score_cells(cells=None)[source]
Calculate economic scores based on LCOE methodology
- get_clusters(scored_cells=None, wcss_tolerance=0.05)[source]
Perform spatial clustering of viable sites
- get_cluster_timeseries(clusters=None, dissolved_indices=None, cells_timeseries=None)[source]
Generate representative time series for each cluster
- build(select_top_sites=True, use_pypsa_buses=True, memory_resource_limitation=True)[source]
Execute complete assessment workflow
- export_results(resource_type, resource_clusters, cluster_timeseries, save_to=Path('results'))[source]
Export results in standardized format for downstream models
- select_top_sites(sites, timeseries, resource_max_capacity=10)[source]
Filter results to highest-potential sites within capacity constraints
Examples
Basic wind assessment workflow:
>>> from RES.RESources import RESources_builder >>> builder = RESources_builder( ... config_file_path="config/config_BC.yaml", ... region_short_code="BC", ... resource_type="wind" ... ) >>> results = builder.build() >>> builder.export_results(*results)
Step-by-step workflow with intermediate inspection:
>>> builder = RESources_builder("config/config.yaml", "AB", "solar") >>> cells = builder.get_grid_cells() >>> cells_with_capacity = builder.get_cell_capacity() >>> scored_cells = builder.score_cells(cells_with_capacity) >>> clusters = builder.get_clusters(scored_cells)
Notes
Inherits configuration parsing capabilities from AttributesParser
Uses HDF5 storage for efficient handling of large geospatial datasets
Implements caching mechanisms to avoid redundant computations
Supports both solar PV and onshore wind technologies
Economic calculations follow NREL LCOE methodology
Clustering uses k-means with automatic cluster number optimization
- build(select_top_sites: bool | None = True, use_pypsa_buses: bool | None = False, memory_resource_limitation: bool | None = True)[source]
Execute the specific module logic for the given resource type ('solar' or 'wind').
- static create_summary_info(resource_type: str, region: str, sites: DataFrame, timeseries: DataFrame) str [source]
Creates summary information to be exported alongside results data.
- static dump_export_metadata(info: str, save_to: Path | None = 'results/linking')[source]
Dumps the metadata summary information to a file. If the file already exists, it prepends the new info at the top of the file.
- static export_results(resource_type: str, region: str, resource_clusters: DataFrame, cluster_timeseries: DataFrame, save_to: Path | None = PosixPath('results'))[source]
Export processed resource cluster results (geodataframe) to standard datafield csvs as input for downstream models. ### Args - resource_type: The type of resource ('solar' or 'wind'). - resource_clusters: A DataFrame containing resource cluster information. - output_dir [optional]: The directory to save the output files. Default to : 'results/*.csv'
> Currently supports: CLEWs, PyPSA
- extract_weather_data()[source]
Extracts weather data for the cells (e.g. windspeed, solar influx). This method retrieves the ERA5 cutout and extracts windspeed data for the cells. If the windspeed data is already present in the stored dataset, it skips the extraction from the source. If the resource type is 'wind', it extracts the 'windspeed_ERA5' from the cutout and updates the cells GeoDataFrame. If the resource type is 'solar', it currently does not support extraction from the Global Solar Atlas data. :param None:
- Returns:
None
Notes
Currently active for windspeed only due to significant contrast with high resolution data.
- find_grid_nodes(cells: GeoDataFrame = None, use_pypsa_buses: bool = False) GeoDataFrame [source]
Find the grid nodes for the given cells.
- Parameters:
cells (gpd.GeoDataFrame, optional) -- Cells with their coordinates, geometry, and unique cell ids. Defaults to None.
use_pypsa_buses (bool, optional) -- Whether to use PyPSA buses as preferred nodes for resource connection. Defaults to False.
- Returns:
Updated grid cells with nearest grid node information
- Return type:
gpd.GeoDataFrame
Notes
Could be parallelized with Step 1B/C.
- get_CF_timeseries(cells: GeoDataFrame = None, force_update=False) tuple [source]
Extract timeseries information for the Cells' e.g. static CF (yearly mean) and timeseries (hourly).
- Parameters:
cells (gpd.GeoDataFrame) -- Cells with their coordinates, geometry, and unique cell ids.
force_update (bool) -- If True, forces the update of the CF timeseries data.
- Returns:
A namedtuple containing the cells with their timeseries data.
- Return type:
tuple
Notes
The method uses the Timeseries class to retrieve the timeseries data for the cells.
The timeseries data is retrieved based on the resource type (e.g., 'solar' or 'wind').
If the cells argument is not provided, it retrieves the cells from the data handler.
Could be parallelized with Step 2B/2C
- get_cell_capacity()[source]
Retrieves the potential capacity of the cells based on land availability and land-use intensity. :param force_update: If True, forces the update of the cell capacity data. :type force_update: bool
- Returns:
A namedtuple containing the cells with their potential capacity and the capacity matrix.
- Return type:
tuple
Notes
The capacity matrix is a 2D array where each row corresponds to a cell and each column corresponds to a time step.
- The potential capacity is calculated as:
Potential capacity (MW) = available land (%) x land-use intensity (MW/sq.km) x Area of a cell (sq. km)
The method uses the CellCapacityProcessor class to process the capacity data.
The method returns a namedtuple with two attributes: data (the cells GeoDataFrame) and matrix (the capacity matrix).
Could be parallelized with Step 2A/2C.
- get_cluster_timeseries(clusters: GeoDataFrame = None, dissolved_indices: DataFrame = None, cells_timeseries: DataFrame = None)[source]
- get_clusters(scored_cells: GeoDataFrame = None, score_tolerance: float = 200, wcss_tolerance=None)[source]
- ### Args:
WCSS (Within-cluster Sum of Square) tolerance. Higher tolerance gives , more simplification and less number of clusters.
Default set to 0.05.
- get_grid_cells() GeoDataFrame [source]
Retrieves the default grid cells for the region.
- Parameters:
None
- Returns:
A GeoDataFrame containing the grid cells with their coordinates, geometry, and unique cell ids.
- Return type:
gpd.GeoDataFrame
Notes
The get_default_grid() method creates several attributes, such as the atlite cutout object and the region_boundary.
Uses the cutout.grid attribute to create the analysis grid cells (GeoDataFrame).
- Step 0: Set-up the Grid Cells and their Unique Indices to populate incremental datafields and to easy navigation to cells
Step to create the Cells with unique indices generated from their x,y (centroids).
- score_cells(cells: GeoDataFrame = None)[source]
Scores the Cells based on calculated LCOE ($/MWh). </br> Wrapper of the _.get_cell_score()_ method of _CellScorer_ object.
- static select_top_sites(sites: GeoDataFrame | DataFrame, sites_timeseries: DataFrame, resource_max_capacity: float) Tuple[GeoDataFrame | DataFrame, DataFrame] [source]
- update_gwa_scaled_params(memory_resource_limitation: bool | None = False)[source]
The RESources_builder
class coordinates the complete assessment workflow including spatial grid generation, land availability analysis, weather data processing, economic evaluation, and site clustering.
Key Methods:
get_grid_cells()
: Generate spatial grid covering regionget_cell_capacity()
: Calculate land-constrained potential capacityget_CF_timeseries()
: Generate capacity factor time seriesscore_cells()
: Economic evaluation using LCOE methodologyget_clusters()
: Spatial clustering of viable sitesbuild()
: Execute complete assessment workflow
Spatial Grid Management#
Grid cell generation and spatial discretization.
- class RES.cell.GridCells(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]
Bases:
AttributesParser
Spatial grid cell generator for renewable energy resource assessment.
This class creates a regular spatial grid covering a specified region for discretized renewable energy potential analysis. It inherits from AttributesParser for configuration management and integrates with ERA5Cutout and GADMBoundaries to maintain consistency with climate data spatial resolution and regional boundaries.
Grid cells serve as the fundamental spatial units for capacity calculations, land availability analysis, and resource aggregation. Each cell represents a homogeneous area with uniform resource characteristics and constraints.
- Parameters:
config_file_path (str or Path) -- Path to configuration file containing grid settings
region_short_code (str) -- Region identifier for boundary definition
resource_type (str) -- Resource type ('solar' or 'wind')
- ERA5Cutout
ERA5 climate data cutout handler instance
- Type:
ERA5Cutout
- gadmBoundary
GADM boundary processor instance
- Type:
GADMBoundaries
- datahandler
HDF5 data storage interface for grid persistence
- Type:
DataHandler
- crs
Coordinate reference system ('EPSG:4326')
- Type:
str
- resolution
Grid resolution with 'dx' and 'dy' keys (decimal degrees)
- Type:
dict
- bounding_box
Spatial extent with 'minx', 'maxx', 'miny', 'maxy'
- Type:
dict
- actual_boundary
Precise regional boundary geometry
- Type:
gpd.GeoDataFrame
- coords
Grid coordinate arrays {'x': array, 'y': array}
- Type:
dict
- shape
Grid dimensions (rows, columns)
- Type:
tuple
- bounding_box_grid
Complete grid covering bounding box region
- Type:
gpd.GeoDataFrame
- grid_cells
Final grid cells intersecting with regional boundary (custom grid)
- Type:
gpd.GeoDataFrame
- cutout
ERA5 cutout object with climate data
- Type:
atlite.Cutout
- region_boundary
Regional boundary from ERA5 processing
- Type:
gpd.GeoDataFrame
- resource_grid_cells
Grid cells from default ERA5-based processing
- Type:
gpd.GeoDataFrame
- generate_coords() None [source]
Create coordinate arrays based on resolution and boundary
- __get_grid__() gpd.GeoDataFrame [source]
Generate complete grid with cell geometries (private method)
- get_custom_grid() gpd.GeoDataFrame
Create custom grid cells intersecting with regional boundary
- get_default_grid() gpd.GeoDataFrame [source]
Create grid using ERA5 cutout methodology with climate data alignment
- _check_resolution() None [source]
Validate resolution settings and issue warnings (private method, not currently used)
Examples
Generate grid for British Columbia wind assessment:
>>> from RES.cell import GridCells >>> grid = GridCells( ... config_file_path="config/config_BC.yaml", ... region_short_code="BC", ... resource_type="wind" ... ) >>> # Using custom grid approach >>> custom_cells = grid.get_custom_grid() >>> print(f"Generated {len(custom_cells)} custom grid cells") >>> >>> # Using default ERA5-aligned grid >>> default_cells = grid.get_default_grid() >>> print(f"Generated {len(default_cells)} ERA5-aligned grid cells")
Custom resolution configuration:
>>> # In configuration file (config_BC.yaml): >>> # grid_cell_resolution: >>> # dx: 0.25 # 0.25 degrees longitude >>> # dy: 0.25 # 0.25 degrees latitude >>> grid._check_resolution() # Validate resolution settings
Notes
Default resolution matches ERA5 climate data (0.25° x 0.25°)
Grid cells are represented as square polygons with centroid coordinates
Inherits configuration management from AttributesParser
Integrates with ERA5Cutout for climate data alignment
Uses GADMBoundaries for precise regional boundary definition
Uses HDF5 storage for efficient caching of large grid datasets
Grid generation respects regional boundaries to avoid unnecessary cells
Resolution warnings issued if finer than climate data resolution
Coordinate system maintained as WGS84 for global compatibility
Supports both custom grid generation and ERA5-aligned grid generation
Grid Generation Approaches#
Custom Grid (get_custom_grid()): - Creates grid based on regional bounding box - Intersects with precise regional boundaries - Stores results in HDF5 with 'cells' key
Default Grid (get_default_grid()): - Uses ERA5 cutout grid as base - Aligns with climate data resolution - Overlays with regional boundaries - Stores both 'cells' and 'boundary' in HDF5
Resolution Considerations#
Minimum recommended: 0.25° (matching ERA5 resolution)
Harmonized resolutions required for interpolation of climate data
Coarser resolutions may miss local variations in resource quality
Square cells assumed (dx = dy) for geometric consistency
Resolution validation available via _check_resolution() method
Dependencies#
geopandas: Spatial data manipulation
numpy: Numerical operations for coordinate generation
shapely.geometry.box: Grid cell geometry creation
RES.AttributesParser: Parent class for configuration management
RES.boundaries.GADMBoundaries: Regional boundary processing
RES.era5_cutout.ERA5Cutout: Climate data cutout handling
RES.hdf5_handler.DataHandler: HDF5 data storage interface
RES.utility: Utility functions for cell ID assignment and logging
- generate_coords()[source]
- get_default_grid()[source]
Handles creation of regular spatial grids for renewable energy assessment with configurable resolution and boundary constraints.
Cell Capacity Processing#
Grid cell processing capabilities for spatial analysis.
- class RES.CellCapacityProcessor.CellCapacityProcessor(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]
Bases:
AttributesParser
Renewable energy capacity processor for grid cell-based resource assessment.
This class processes renewable energy potential capacity at the grid cell level by integrating climate data, land availability constraints, and techno-economic parameters. It calculates potential capacity matrices for solar and wind resources, applies land-use exclusions, and generates cost-attributed capacity datasets for energy system modeling.
The class serves as the core processing engine for renewable energy resource assessment, combining spatial analysis, climate data processing, and economic modeling to produce grid cell-level capacity estimates suitable for energy planning and optimization models.
INHERITED METHODS FROM AttributesParser:#
get_resource_disaggregation_config() -> Dict[str, dict]: Get resource-specific config
get_cutout_config() -> Dict[str, dict]: Get ERA5 cutout configuration
get_gadm_config() -> Dict[str, dict]: Get GADM boundary configuration
get_region_name() -> str: Get region name from config
get_atb_config() -> Dict[str, dict]: Get NREL ATB cost configuration
get_default_crs() -> str: Get default coordinate reference system
INHERITED ATTRIBUTES FROM AttributesParser:#
config (property): Full configuration dictionary
store (property): HDF5 store path for data persistence
config_file_path: Path to configuration file
region_short_code: Region identifier code
resource_type: Resource type identifier
Plus other configuration access methods
OWN METHODS DEFINED IN THIS CLASS:#
load_cost(resource_atb): Extract and process cost parameters from ATB data
__get_unified_region_shape__(): Create unified regional boundary geometry (private)
__create_cell_geom__(x, y): Create grid cell geometry from coordinates (private)
get_capacity(): Main method to process and calculate renewable energy capacity
plot_ERAF5_grid_land_availability(): Visualize land availability on ERA5 grid
plot_excluder_land_availability(): Visualize land availability at excluder resolution
- param config_file_path:
Path to configuration file containing processing settings
- type config_file_path:
str or Path
- param region_short_code:
Region identifier for boundary and data processing
- type region_short_code:
str
- param resource_type:
Resource type ('solar', 'wind', or 'bess')
- type resource_type:
str
- ERA5Cutout
ERA5 climate data cutout processor instance
- Type:
ERA5Cutout
- LandContainer
Land exclusion and constraint processor instance
- Type:
LandContainer
- resource_disaggregation_config
Resource-specific disaggregation configuration
- Type:
dict
- resource_landuse_intensity
Land-use intensity for capacity calculation (MW/km²)
- Type:
float
- atb
NREL Annual Technology Baseline cost data processor
- Type:
NREL_ATBProcessor
- datahandler
HDF5 data storage interface
- Type:
DataHandler
- cutout_config
ERA5 cutout configuration parameters
- Type:
dict
- gadm_config
GADM boundary configuration parameters
- Type:
dict
- disaggregation_config
General disaggregation configuration
- Type:
dict
- region_name
Full region name from configuration
- Type:
str
- utility_pv_cost
Utility-scale PV cost data from NREL ATB
- Type:
pd.DataFrame
- land_based_wind_cost
Land-based wind cost data from NREL ATB
- Type:
pd.DataFrame
- composite_excluder
Combined land exclusion container from atlite
- Type:
ExclusionContainer
- cell_resolution
Grid cell resolution in degrees
- Type:
float
- cutout
ERA5 cutout object with climate data
- Type:
atlite.Cutout
- region_boundary
Regional boundary geometry
- Type:
gpd.GeoDataFrame
- region_shape
Unified regional shape for availability calculations
- Type:
gpd.GeoDataFrame
- Availability_matrix
Land availability matrix from atlite
- Type:
xr.DataArray
- capacity_matrix
Potential capacity matrix with resource and land constraints
- Type:
xr.DataArray
- provincial_cells
Final processed grid cells with capacity and cost attributes
- Type:
gpd.GeoDataFrame
- resource_capex
Capital expenditure cost (million $/MW)
- Type:
float
- resource_fom
Fixed operation and maintenance cost (million $/MW)
- Type:
float
- resource_vom
Variable operation and maintenance cost (million $/MW)
- Type:
float
- grid_connection_cost_per_km
Grid connection cost per kilometer (million $)
- Type:
float
- tx_line_rebuild_cost
Transmission line rebuild cost (million $)
- Type:
float
- load_cost(resource_atb: pd.DataFrame) tuple [source]
Extract cost parameters from NREL ATB data and convert units
- __get_unified_region_shape__() gpd.GeoDataFrame [source]
Create unified regional boundary by dissolving sub-regional boundaries
- __create_cell_geom__(x: float, y: float) Polygon [source]
Create square grid cell geometry from center coordinates
- get_capacity() tuple[gpd.GeoDataFrame, xr.DataArray] [source]
Main processing method to calculate renewable energy capacity with constraints
- plot_ERAF5_grid_land_availability(...) matplotlib.figure.Figure [source]
Create visualization of land availability on ERA5 grid resolution
- plot_excluder_land_availability(...) matplotlib.figure.Figure [source]
Create visualization of land availability at excluder resolution
Examples
Process solar capacity for British Columbia:
>>> from RES.CellCapacityProcessor import CellCapacityProcessor >>> processor = CellCapacityProcessor( ... config_file_path="config/config_BC.yaml", ... region_short_code="BC", ... resource_type="solar" ... ) >>> cells_gdf, capacity_matrix = processor.get_capacity() >>> print(f"Processed {len(cells_gdf)} cells with total capacity: " ... f"{cells_gdf['potential_capacity_solar'].sum():.1f} MW")
Process wind capacity with visualization:
>>> processor = CellCapacityProcessor( ... config_file_path="config/config_AB.yaml", ... region_short_code="AB", ... resource_type="wind" ... ) >>> cells_gdf, capacity_matrix = processor.get_capacity() >>> # Visualizations are automatically generated and saved
Extract cost parameters:
>>> capex, vom, fom, grid_cost, tx_cost = processor.load_cost( ... processor.utility_pv_cost ... ) >>> print(f"Solar CAPEX: {capex:.3f} million $/MW")
Notes
Integrates climate data from ERA5 via atlite cutouts
Applies land-use constraints via ExclusionContainer
Converts NREL ATB costs from $/kW to million $/MW
Creates square grid cells based on ERA5 resolution (~30km at 0.25°)
Supports solar, wind, and battery energy storage systems (BESS)
Automatically generates land availability visualizations
Uses HDF5 storage for efficient data persistence
Grid cells are trimmed to exact regional boundaries
Assigns unique cell IDs for downstream processing
Cost parameters include CAPEX, FOM, VOM, and transmission costs
Processing Workflow#
Load ERA5 cutout and regional boundaries
Set up land exclusion constraints
Extract cost parameters from NREL ATB
Calculate availability matrix with land constraints
Apply land-use intensity to compute capacity matrix
Convert to GeoDataFrame with cell geometries
Assign static cost parameters to each cell
Trim cells to precise regional boundaries
Generate visualizations and store results
Cost Parameter Processing#
CAPEX: Capital expenditure (converted from $/kW to million $/MW)
FOM: Fixed operation & maintenance (million $/MW annually)
VOM: Variable operation & maintenance (million $/MWh, if applicable)
Grid connection: Cost per kilometer for grid connection
Transmission rebuild: Cost for transmission line upgrades
Operational life: Asset lifetime (25 years solar, 20 years wind)
Dependencies#
geopandas: Spatial data manipulation
xarray: Multi-dimensional array operations
pandas: Data frame operations
shapely.geometry: Geometric operations
matplotlib.pyplot: Visualization
atlite: Climate data processing and exclusions
RES.AttributesParser: Parent class for configuration management
RES.lands.LandContainer: Land constraint processing
RES.era5_cutout.ERA5Cutout: Climate data cutout handling
RES.hdf5_handler.DataHandler: HDF5 data storage
RES.atb.NREL_ATBProcessor: Cost data processing
RES.utility: Utility functions for cell operations
- raises KeyError:
If required configuration parameters are missing
- raises ValueError:
If resource type is not supported or data processing fails
- raises FileNotFoundError:
If configuration files or data dependencies are not found
- get_capacity() tuple [source]
This method processes the capacity of the resources based on the availability matrix and other parameters. It calculates the potential capacity for each cell in the region and returns a named tuple containing the processed data and the capacity matrix. :returns: A named tuple containing the processed data and the capacity matrix. Can be accessed as:
<self.resources_nt>.data and <self.resources_nt>.matrix
- Return type:
namedtuple
- load_cost(resource_atb: DataFrame)[source]
Extracts cost parameters from the NREL ATB DataFrame and converts them to million $/MW.
- Parameters:
resource_atb (pd.DataFrame) -- DataFrame containing NREL ATB cost data for the resource type.
- Returns:
- A dictionary containing the following cost parameters:
resource_capex: Capital expenditure in million $/MW
resource_vom: Variable operation and maintenance cost in million $/MW
resource_fom: Fixed operation and maintenance cost in million $/MW
grid_connection_cost_per_km: Grid connection cost per kilometer in million $
tx_line_rebuild_cost: Transmission line rebuild cost in million $
- Return type:
dict
- plot_ERAF5_grid_land_availability(region_boundary: GeoDataFrame = None, Availability_matrix: DataArray = None, figsize=(8, 6), legend_box_x_y: tuple = (1.2, 1))[source]
Plots the land availability based on the ERA5 grid cells. :param region_boundary: The region boundary to plot. If
not provided, the default region boundary will be used.
- Parameters:
Availability_matrix (xr.DataArray, optional) -- The availability matrix to plot. If not provided, the default Availability matrix will be used.
figsize (tuple, optional) -- The size of the figure to create. Defaults to (8, 6).
legend_box_x_y (tuple, optional) -- The position of the legend box in the plot. Defaults to (1.2, 1).
- Returns:
The figure object containing the plot.
- Return type:
fig (matplotlib.figure.Figure)
- plot_excluder_land_availability(excluder: ExclusionContainer = None)[source]
Plots the land availability based on the excluder resolution. :param excluder: The excluder to use for plotting :type excluder: ExclusionContainer, optional
- Returns:
The figure object containing the plot
- Return type:
fig (matplotlib.figure.Figure)
Note
If the above documentation doesn't render, this class provides grid cell processing capabilities for spatial analysis.
Administrative Boundaries#
GADM boundary processor for regional analysis.
- class RES.boundaries.GADMBoundaries(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]
Bases:
AttributesParser
GADM (Global Administrative Areas) boundary processor for regional analysis.
This class handles the retrieval, processing, and management of administrative boundaries from the GADM dataset. It provides functionality to extract specific regional boundaries at administrative level 2 (typically states/provinces/districts) for renewable energy resource assessment areas.
INHERITED METHODS FROM AttributesParser:#
get_gadm_config() -> Dict[str, dict]: Get GADM configuration from config file
get_default_crs() -> str: Get default coordinate reference system ('EPSG:4326')
get_country() -> str: Get country name from config file
get_region_name() -> str: Get region name from config file using region_short_code
get_region_mapping() -> Dict[str, dict]: Get region mapping dictionary
is_region_code_valid() -> bool: Validate region short code
load_config() -> Dict[str, dict]: Load YAML configuration file
get_excluder_crs() -> int: Get recommended CRS for excluder operations
get_vis_dir() -> Path: Get visualization directory path
region_code_validity (property): Boolean property for region code validation
Plus other utility methods for config access
OWN METHODS DEFINED IN THIS CLASS:#
get_country_boundary(country=None, force_update=False): Download and process complete country GADM boundaries
get_region_boundary(region_name=None, force_update=False): Extract and process specific regional boundary
get_bounding_box(): Generate minimum bounding rectangle for region
show_regions(basemap='CartoDB positron', save_path='vis/regions', save=False): Create interactive map visualization
run(): Execute complete boundary processing workflow
- param config_file_path:
Path to configuration file containing GADM settings
- type config_file_path:
str or Path
- param region_short_code:
Short code identifying the target region within the country
- type region_short_code:
str
- param resource_type:
Resource type (passed through from parent workflow)
- type resource_type:
str
- admin_level
GADM administrative level (fixed at 2 for regional districts)
- Type:
int
- gadm_root
Root directory for GADM data storage
- Type:
Path
- gadm_processed
Directory for processed regional boundary files
- Type:
Path
- crs
Coordinate reference system ('EPSG:4326')
- Type:
str
- country
Country name extracted from configuration
- Type:
str
- region_file
Path to processed regional boundary file
- Type:
Path
- boundary_datafields
Mapping of GADM fields to standardized field names
- Type:
dict
- country_file
Path to country-level GADM boundary file
- Type:
Path
- boundary_country
GeoDataFrame containing country-level boundaries
- Type:
gpd.GeoDataFrame
- boundary_region
GeoDataFrame containing region-specific boundaries
- Type:
gpd.GeoDataFrame
- actual_boundary
GeoDataFrame containing the actual regional boundary geometry
- Type:
gpd.GeoDataFrame
- bounding_box
Dictionary containing bounding box coordinates (minx, maxx, miny, maxy)
- Type:
dict
- get_country_boundary(country=None, force_update=False) gpd.GeoDataFrame [source]
Download and process complete country GADM boundaries at administrative level 2
- get_region_boundary(region_name=None, force_update=False) gpd.GeoDataFrame [source]
Extract and process specific regional boundary with standardized field names
- get_bounding_box() tuple [source]
Generate minimum bounding rectangle for region and return (bounding_box_dict, boundary_gdf)
- show_regions(basemap='CartoDB positron', save_path='vis/regions', save=False) folium.Map [source]
Create interactive folium map visualization of regional boundaries
- run() gpd.GeoDataFrame or None [source]
Execute complete boundary processing workflow and return regional boundary GeoDataFrame
Examples
Extract British Columbia boundaries:
>>> from RES.boundaries import GADMBoundaries >>> boundaries = GADMBoundaries( ... config_file_path="config/config_BC.yaml", ... region_short_code="BC", ... resource_type="wind" ... ) >>> bc_boundary = boundaries.get_region_boundary() >>> country_bounds = boundaries.get_country_boundary("Canada") >>> bbox, actual_boundary = boundaries.get_bounding_box() >>> interactive_map = boundaries.show_regions(save=True) >>> result = boundaries.run() # Execute full workflow
Notes
Uses pygadm package for GADM data access and download
Automatically handles data caching to avoid repeated downloads
Processes boundaries into GeoJSON format for efficient storage
Standardizes field names for consistent downstream processing
Administrative level 2 chosen to balance spatial resolution with data availability
All geometries maintained in WGS84 (EPSG:4326) for global compatibility
Region validation is performed using inherited region_code_validity property
Interactive maps are created using folium with optional save functionality
Dependencies#
pygadm: GADM data access and processing
geopandas: Spatial data manipulation
folium: Interactive map visualization (via geopandas.explore())
pathlib: Path handling
RES.AttributesParser: Parent class for configuration management
RES.utility: Utility functions for logging and updates
- raises ValueError:
If the country is not found in the GADM dataset or if region code is invalid
- raises Exception:
If there is an error fetching or loading the GADM data
- get_bounding_box() tuple [source]
This method loads the region boundary using get_region_boundary() method and gets Minimum Bounding Rectangle (MBR).
- Returns:
A tuple containing the dictionary of bounding box coordinates, and the actual boundary GeoDataFrame for the specified region.
- Return type:
tuple
- Purpose:
To be used internally to get the bounding box of the region to set ERA5 cutout boundaries.
- get_country_boundary(country: str = None, force_update: bool = False) GeoDataFrame [source]
Retrieves and prepares the GADM boundaries dataset for the specified country (Administrative Level 2).
- Parameters:
country (str) -- The name of the country to fetch GADM data for. If None, extracts the country from the user config file.
force_update (bool) -- If True, re-fetch the GADM data even if a local file exists.
- Returns:
GeoDataFrame of the country's GADM regions in crs '4326'
- Return type:
gpd.GeoDataFrame
- Dependency:
Depends on pygadm package to fetch the GADM data.
- Raises:
ValueError -- If the country is not found in the GADM dataset.
Exception -- If there is an error fetching or loading the GADM data.
- get_region_boundary(region_name: str = None, force_update: bool = False) GeoDataFrame [source]
Prepares the boundaries for the specified region within the country. The defaults datafields (e.g. NAME_0, NAME_1, NAME_2) gets modified to match the user config file.
- Parameters:
force_update (bool) -- To force update the data and formatting.
- Returns:
GeoDataFrame of the region boundaries.
- Return type:
gpd.GeoDataFrame
- Raises:
ValueError -- If the region code is invalid or no data is found for the specified region
- run()[source]
Executes the process of extracting boundaries and creating an interactive map. To be used as a main method to run the class's sequential tasks.
- show_regions(basemap: str = 'CartoDB positron', save_path: str = 'vis/regions', save: bool = False)[source]
Create and save an interactive map for the specified region.
- Parameters:
basemap (str) -- The basemap to use (default is 'CartoDB positron').
save_path (str) -- The path to save the HTML map. The default is given.
save (bool) -- If the user want's to skip saving as local file.
- Returns:
An interactive map object showing the region boundaries.
- Return type:
folium.Map
Note
If the above documentation doesn't render properly due to geospatial dependency issues, the GADMBoundaries class provides:
get_country_boundary(country, force_update=False)
: Download complete country boundariesget_regional_boundary(force_update=False)
: Extract specific regional boundarycreate_bounding_box(geometry, buffer_degrees=0.1)
: Generate spatial extent calculations
Downloads and processes administrative boundaries from the Global Administrative Areas (GADM) dataset for spatial analysis scope definition.
Climate Data Processing#
Weather data processing and capacity factor calculations.
- class RES.timeseries.Timeseries(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]
Bases:
AttributesParser
Climate data processor and capacity factor calculator for renewable energy resources.
This class handles the extraction, processing, and analysis of meteorological time series data to generate capacity factor profiles for solar and wind resources. It integrates with the Atlite library for climate data processing and provides technology-specific capacity factor calculations based on configurable turbine and panel specifications.
The class processes hourly weather data into capacity factors that represent the fraction of nameplate capacity that can be generated under specific meteorological conditions, accounting for technology performance curves and environmental constraints.
- Parameters:
config_file_path (str or Path) -- Path to configuration file containing resource and technology settings
region_short_code (str) -- Region identifier for spatial data coordination
resource_type ({'solar', 'wind'}) -- Type of renewable resource for capacity factor calculation
- resource_disaggregation_config
Technology-specific configuration parameters from config file
- Type:
dict
- datahandler
HDF5 interface for time series data storage and retrieval
- Type:
DataHandler
- gwa_cells
Global Wind Atlas integration for wind resource bias correction
- Type:
GWACells
- sites_profile
Raw capacity factor time series for all grid cells
- Type:
xarray.DataArray
- _CF_ts_df_
Processed time series with cells as columns, time as index
- Type:
pandas.DataFrame
- get_timeseries(cells)[source]
Generate capacity factor time series for specified grid cells
- __process_PV_timeseries__(cells)[source]
Calculate solar PV capacity factors using irradiance and temperature
- __process_WIND_timeseries__(cells, turbine_database, turbine_id)[source]
Calculate wind capacity factors using wind speeds and power curves
- plot_timeseries_comparison(cells_sample, save_path=None)
Generate comparative plots of capacity factor profiles
- get_annual_statistics(cells_timeseries)
Calculate annual capacity factor statistics and metrics
Examples
Generate solar PV time series:
>>> from RES.timeseries import Timeseries >>> ts_processor = Timeseries( ... config_file_path="config/config_BC.yaml", ... region_short_code="BC", ... resource_type="solar" ... ) >>> cells = get_grid_cells() # From previous workflow step >>> results = ts_processor.get_timeseries(cells) >>> cf_timeseries = results.timeseries_df
Wind resource processing with turbine selection:
>>> ts_processor = Timeseries( ... config_file_path="config/config.yaml", ... region_short_code="AB", ... resource_type="wind" ... ) >>> # Turbine parameters defined in configuration >>> results = ts_processor.get_timeseries(wind_cells)
Time series analysis and visualization:
>>> annual_stats = ts_processor.get_annual_statistics(cf_timeseries) >>> ts_processor.plot_timeseries_comparison(sample_cells, "output/plots/")
Notes
Uses Atlite library for meteorological data processing
Solar calculations account for panel orientation, tilt, and temperature effects
Wind calculations use power curves from turbine databases (OEDB, manufacturer specs)
Time series generated at hourly resolution for full assessment years
Global Wind Atlas corrections applied for improved wind speed accuracy
Results cached in HDF5 format for efficient reuse and large dataset handling
Supports both fixed-tilt and tracking solar PV configurations
Wind power curves interpolated for continuous wind speed ranges
Technology Integration#
- Solar PV:
Irradiance-based capacity factor calculation
Temperature derating effects
Configurable panel specifications (efficiency, temperature coefficients)
Support for fixed tilt and single-axis tracking
- Wind:
Power curve-based capacity factor calculation
Hub height wind speed extrapolation
Turbine database integration (OEDB standard)
Wake effects and array losses configurable
Data Dependencies#
ERA5 reanalysis data for meteorological variables
Global Wind Atlas for wind speed bias correction (wind only)
Technology databases for turbine and panel specifications
- get_cluster_timeseries(all_clusters: DataFrame, cells_timeseries: DataFrame, dissolved_indices: DataFrame, sub_national_unit_tag: str)[source]
- get_gwa_geogson_data(gwa_geojson_file_path: str | Path = None)[source]
Loads Global Wind Atlas (GWA) GeoJSON data from the specified file path. If no file path is provided, a default path to 'data/downloaded_data/GWA/canada.geojson' is used. If the file does not exist at the specified or default location, a message is printed to inform the user to download the required GIS map data. :param gwa_geojson_file_path: The file path to the GWA GeoJSON file.
Defaults to None, which uses the predefined default path ('data/downloaded_data/GWA/canada.geojson').
- gwa_geojson_data
The loaded GeoJSON data from the specified file.
- Type:
list
- Raises:
FileNotFoundError -- If the specified or default GeoJSON file does not exist.
Notes
- The Global Wind Atlas GIS map data can be downloaded from:
https://globalwindatlas.info/en/download/maps-country-and-region
- get_timeseries(cells: GeoDataFrame) tuple [source]
Retrieves the capacity factor (CF) timeseries for the cells.
- Parameters:
cells (gpd.GeoDataFrame) -- Cells with their coordinates, geometry, and unique cell ids.
force_update (bool) -- If True, forces the update of the CF timeseries data.
- Returns:
A namedtuple containing the cells with their timeseries data.
- Return type:
tuple
- Jobs:
Extract timeseries information for the Cells' e.g. static CF (yearly mean) and timeseries (hourly).
The timeseries data is generated using the atlite library's cutout methods for solar and wind resources.
The method processes the timeseries data for the specified resource type (solar or wind) and stores it in a pandas DataFrame.
- Notes
Plug-in multiple sources to fit timeseries data e.g. [NSRDB, NREL](https://nsrdb.nrel.gov/data-sets/how-to-access-data)
- get_windatlas_data(gwa_windspeed_raster_path: str | Path = None)[source]
Retrieves wind atlas data from a specified raster file or a default path.
If a raster file path is not provided, a default path is used. If the file does not exist at the default path, it prepares the necessary data. The method then loads and returns the wind speed data from the raster file.
- Parameters:
gwa_windspeed_raster_path (str | Path, optional) -- The file path to the wind speed raster file. Defaults to None.
- Returns:
The loaded wind speed data from the raster file.
- Return type:
numpy.ndarray
- get_windspeed_rescaling_data() tuple [source]
Retrieves wind speed rescaling data, including wind atlas data and geographical wind data in GeoJSON format. :returns:
- A tuple containing:
wind_atlas (type depends on get_windatlas_data method): The wind atlas data.
wind_geojson (type depends on get_gwa_geogson_data method): The gis wind data in GeoJSON format.
- Return type:
tuple
Integrates with Atlite library for climate data processing and generates technology-specific capacity factor time series from meteorological data.
Economic Evaluation#
LCOE-based economic scoring and site ranking.
- class RES.score.CellScorer(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]
Bases:
AttributesParser
Economic evaluation and scoring system for renewable energy grid cells.
This class implements Levelized Cost of Energy (LCOE) calculations to economically rank and score potential renewable energy sites. It integrates capital costs, operational expenses, grid connection costs, and capacity factors to provide comprehensive economic metrics for site comparison and selection.
The scoring methodology follows NREL LCOE documentation and incorporates distance-based grid connection costs, technology-specific capital expenditures, and site-specific capacity factors to generate comparable economic indicators across different locations and technologies.
- Parameters:
config_file_path (str or Path) -- Configuration file containing economic parameters and assumptions
region_short_code (str) -- Region identifier for localized cost parameters
resource_type ({'solar', 'wind'}) -- Technology type for appropriate cost parameter selection
- Inherits configuration parsing capabilities from AttributesParser
- get_CRF(r, N)[source]
Calculate Capital Recovery Factor for annualized cost calculations
- calculate_total_cost(distance_to_grid_km, grid_connection_cost_per_km,
tx_line_rebuild_cost, capex_tech, potential_capacity_mw)
Compute total project costs including CAPEX and grid connection
- calculate_score(row, CF_column, CRF)[source]
Generate LCOE score for individual grid cells
- get_cell_score(cells, CF_column, interest_rate=0.03)[source]
Apply economic scoring to entire dataset of grid cells
- calc_LCOE_lambda_m1(row)
Alternative LCOE calculation method following NREL methodology
- calc_LCOE_lambda_m2(row)
Enhanced LCOE calculation with detailed cost components
Examples
Basic economic scoring workflow:
>>> from RES.score import CellScorer >>> scorer = CellScorer( ... config_file_path="config/config_CAN.yaml", ... region_short_code="BC", ... resource_type="wind" ... ) >>> scored_cells = scorer.get_cell_score(cells_with_capacity_factors, 'CF_mean') >>> # Get top 10% of cells by LCOE >>> top_sites = scored_cells.head(int(len(scored_cells) * 0.1))
Custom economic analysis:
>>> # Calculate CRF for different financial scenarios >>> crf_conservative = scorer.get_CRF(r=0.08, N=25) # 8% discount, 25 year life >>> crf_aggressive = scorer.get_CRF(r=0.06, N=30) # 6% discount, 30 year life >>> >>> # Apply scoring with custom parameters >>> for idx, row in cells.iterrows(): ... lcoe = scorer.calculate_score(row, 'CF_mean', crf_conservative)
Notes
LCOE Calculation Methodology: - Follows NREL Simple LCOE calculation framework - LCOE = (CAPEX × CRF + OPEX) / Annual Energy Production - Includes distance-based grid connection costs - Uses technology-specific cost parameters from configuration
Cost Components: - Technology CAPEX ($/MW installed capacity) - Grid connection costs ($/km distance to transmission) - Transmission line rebuild costs ($/km) - Annual O&M expenses (% of CAPEX) - Financial parameters (discount rate, project lifetime)
Economic Parameters: - Capital Recovery Factor (CRF) for cost annualization - Technology-specific cost assumptions - Regional cost multipliers and adjustments - Grid connection distance penalties
Limitations: - Simplified LCOE model without detailed financial modeling - Grid connection costs based on straight-line distances - Does not account for economies of scale in large projects - Static cost assumptions without temporal price variations
- calculate_score(row: Series, node_distance_col: str, CF_column: str, CRF: float) float [source]
Calculate the Levelized Cost of Energy (LCOE) score for an individual grid cell.
LCOE Formula: LCOE = (CAPEX × CRF + FOM + VOM × Annual_Energy) / Annual_Energy
- Parameters:
row (pd.Series) -- DataFrame row containing cell-specific data
node_distance_col (str) -- Column name for distance to grid connection
CF_column (str) -- Column name containing capacity factor data
CRF (float) -- Capital Recovery Factor for cost annualization
- Returns:
LCOE in $/MWh, or 999999 if annual energy production is zero
- Return type:
float
- calculate_score_debug(row: Series, node_distance_col: str, CF_column: str, CRF: float) dict [source]
Debug version that returns breakdown of LCOE components. Use this to identify why larger sites get lower scores.
- calculate_score_normalized(row: Series, node_distance_col: str, CF_column: str, CRF: float, reference_capacity: float = 1.0) float [source]
Calculate LCOE using a normalized capacity for fair comparison in clustering.
- Parameters:
reference_capacity (float) -- Fixed capacity to use for cost calculations (MW)
- calculate_score_per_mw(row: Series, node_distance_col: str, CF_column: str, CRF: float) float [source]
Calculate LCOE per MW for capacity-independent comparison.
- calculate_total_cost(distance_to_grid_km: float, grid_connection_cost_per_km: float, tx_line_rebuild_cost: float, capex_tech: float, potential_capacity_mw: float) float [source]
Calculate total project cost with economies of scale for grid connection.
- calculate_total_cost_shared_infrastructure(distance_to_grid_km: float, grid_connection_cost_per_km: float, tx_line_rebuild_cost: float, capex_tech: float, potential_capacity_mw: float, nearby_projects_mw: float = 0) float [source]
Calculate total cost considering potential for shared transmission infrastructure. This is most relevant for clustering applications.
- calculate_total_cost_smooth_scaling(distance_to_grid_km: float, grid_connection_cost_per_km: float, tx_line_rebuild_cost: float, capex_tech: float, potential_capacity_MW: float, reference_capacity_MW: float = 100, scaling_exponent: float = 0.8) float [source]
Calculate total cost with smooth economies of scale for grid connection.
- Parameters:
distance_to_grid_km (float) -- Distance to nearest grid connection point (km)
grid_connection_cost_per_km (float) -- Cost per km for grid connection (M$/km)
tx_line_rebuild_cost (float) -- Transmission line rebuild cost (M$/km)
capex_tech (float) -- Technology-specific capital expenditure (M$/MW)
potential_capacity_MW (float) -- Potential installed capacity (MW)
reference_capacity_MW (float, optional) -- Reference capacity for scaling. Defaults to 100 MW.
scaling_exponent (float, optional) -- Exponent for scaling economies e.g. <1 means economies of scale. Defaults to 0.8.
- Returns:
Total project cost in millions of dollars (M$)
- Return type:
float
- calculate_total_cost_transmission_sizing(distance_to_grid_km: float, grid_connection_cost_per_km: float, tx_line_rebuild_cost: float, capex_tech: float, potential_capacity_mw: float) float [source]
Calculate total cost with transmission line sizing based on capacity. More realistic approach considering actual transmission requirements.
- get_CRF(r: float, N: int) float [source]
Calculate Capital Recovery Factor (CRF) for annualized cost calculations.
The CRF converts a present-value capital cost into a stream of equal annual payments over the project lifetime. This is essential for LCOE calculations as it allows comparison of projects with different capital costs and lifetimes on an annualized basis.
Formula: CRF = [r × (1 + r)^N] / [(1 + r)^N - 1]
- Parameters:
r (float) -- Discount rate (as decimal, e.g., 0.08 for 8%)
N (int) -- Project lifetime in years
- Returns:
Capital Recovery Factor
- Return type:
float
Example
>>> scorer = CellScorer(**config) >>> crf = scorer.get_CRF(r=0.07, N=25) # 7% discount, 25 years >>> print(f"CRF: {crf:.4f}") CRF: 0.0858
- get_cell_score(cells: DataFrame, CF_column: str, interest_rate=0.03) DataFrame [source]
Calculate LCOE scores for all grid cells in a DataFrame and return ranked results.
This method applies economic scoring to an entire dataset of potential renewable energy sites, calculating LCOE for each cell and sorting results by economic attractiveness. It serves as the primary interface for batch economic analysis of renewable energy development opportunities.
Processing Steps: 1. Calculate Capital Recovery Factor from financial parameters 2. Apply LCOE calculation to each grid cell 3. Sort results by LCOE (ascending = most economically attractive first) 4. Return scored and ranked DataFrame
- Parameters:
cells (pd.DataFrame) -- DataFrame containing grid cells with required columns: - nearest_station_distance_km: Distance to transmission (km) - grid_connection_cost_per_km_{resource_type}: Connection cost (M$/km) - tx_line_rebuild_cost_{resource_type}: Rebuild cost (M$/km) - capex_{resource_type}: Technology CAPEX (M$/MW) - potential_capacity_{resource_type}: Installable capacity (MW) - Operational_life_{resource_type}: Project lifetime (years)
CF_column (str) -- Column name containing capacity factor data (e.g., 'CF_mean', 'wind_CF_mean', 'solar_CF_mean')
interest_rate (float, optional) -- Discount rate for CRF calculation. Defaults to 0.03 (3%)
- Returns:
- Input DataFrame with added LCOE column, sorted by economic
attractiveness (lowest LCOE first). Column name format: 'lcoe_{resource_type}' with values in $/MWh
- Return type:
pd.DataFrame
- Raises:
KeyError -- If required columns are missing from input DataFrame
ValueError -- If capacity factors or operational life contain invalid values
Examples
>>> # Score wind energy sites using mean capacity factors >>> wind_cells = scorer.get_cell_score( ... cells=grid_data, ... CF_column='wind_CF_mean', ... interest_rate=0.07 ... ) >>> print(f"Best site LCOE: ${wind_cells.iloc[0]['lcoe_wind']:.2f}/MWh")
>>> # Score solar sites with conservative financial assumptions >>> solar_cells = scorer.get_cell_score( ... cells=grid_data, ... CF_column='solar_CF_mean', ... interest_rate=0.08 ... )
Notes
Cells with zero annual energy production receive infinite LCOE values
Results are sorted ascending (lowest LCOE = most attractive)
Method handles edge cases like zero capacity factors gracefully
LCOE values are in $/MWh for standard industry comparison
Implements Levelized Cost of Energy calculations following NREL methodology, incorporating capital costs, grid connection expenses, and capacity factors.
Specialized Modules#
Annual Technology Baseline (ATB)#
Processor for NREL's Annual Technology Baseline data.
- class RES.atb.NREL_ATBProcessor(config_file_path: ~pathlib.Path = <factory>, region_short_code: str = 'None', resource_type: str = 'None')[source]
Bases:
object
NREL_ATBProcessor is a class from RESource module, designed to process the Annual Technology Baseline (ATB) data sourced from the National Renewable Energy Laboratory (NREL). This class provides methods to pull, process, and store cost data for various renewable energy technologies, including utility-scale photovoltaic (PV) systems, land-based wind turbines, and battery energy storage systems (BESS).
- config_file_path
Path to the configuration file.
- Type:
Path
- region_short_code
Short code for the region.
- Type:
str
- resource_type
Type of resource being processed.
- Type:
str
- atb_config
Configuration dictionary containing paths and settings for ATB data.
- Type:
dict
- atb_data_save_to
Path to the directory where ATB data will be saved.
- Type:
Path
- atb_parquet_source
Source URL or path for the ATB Parquet file.
- Type:
str
- atb_datafile
Name of the ATB data file.
- Type:
str
- atb_file_path
Full path to the ATB data file.
- Type:
Path
- datahandler
Instance of DataHandler for storing processed data.
- Type:
DataHandler
- __post_init__()[source]
Initializes the processor by loading configurations, setting up paths, and creating necessary directories.
- pull_data()[source]
Pulls and processes the ATB data, extracting cost data for utility-scale PV, land-based wind, and BESS. Returns the processed data as a tuple.
- _check_and_download_data()
Checks for the existence of the ATB data file locally and downloads it if necessary.
- _process_solar_cost(atb_cost)
Filters and processes solar cost data from the ATB dataset based on specific criteria. Saves the processed data to a CSV file and stores it in the data handler.
- _process_wind_cost(atb_cost)
Filters and processes land-based wind cost data from the ATB dataset based on specific criteria. Saves the processed data to a CSV file and stores it in the data handler.
- _process_bess_cost(atb_cost)
Filters and processes battery energy storage system (BESS) cost data from the ATB dataset based on specific criteria. Saves the processed data to a CSV file and stores it in the data handler.
- config_file_path: Path
- pull_data()[source]
Pulls and processes the Annual Technology Baseline (ATB) data sourced from NREL.
- Jobs:
Loads the config file, checks and downloads the required data file (from defined source url in config) if not already available.
Reads the ATB cost data from a Parquet file.
- Processes the ATB cost data to extract -
Utility-scale photovoltaic (PV) cost.
Land-based wind cost.
Battery energy storage system (BESS) cost.
- Returns:
- A tuple containing the processed cost data for:
Utility-scale PV (self.utility_pv_cost)
Land-based wind (self.land_based_wind_cost)
BESS (self.bess_cost)
- Return type:
tuple
- region_short_code: str = 'None'
- resource_type: str = 'None'
ℹ️ Version Notice: Currently configured for 2024 ATB data. Review and update configuration when using different years or datasets.
Global Land Cover#
Handler for GAEZ raster data processing.
- class RES.gaez.GAEZRasterProcessor(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]
Bases:
AttributesParser
GAEZ (Global Agro-Ecological Zones) raster data processor for renewable energy land constraint analysis.
This class handles the download, extraction, clipping, and visualization of GAEZ raster datasets used in renewable energy resource assessment. GAEZ provides global spatial data on agricultural suitability, land resources, and ecological constraints that are essential for identifying suitable areas for renewable energy development while avoiding productive agricultural land.
The processor integrates GAEZ land constraint data with regional boundaries to support renewable energy siting decisions and capacity assessments. It automatically downloads the required raster datasets, extracts specific layers based on configuration, clips them to regional boundaries, and generates visualization outputs for analysis.
INHERITED METHODS FROM AttributesParser:#
get_gaez_data_config() -> Dict[str, dict]: Get GAEZ dataset configuration parameters
get_region_name() -> str: Get full region name for display purposes
Plus other configuration access methods
INHERITED ATTRIBUTES FROM AttributesParser:#
config_file_path: Path to configuration file
region_short_code: Region identifier code
resource_type: Resource type identifier
Plus other configuration attributes
OWN METHODS DEFINED IN THIS CLASS:#
process_all_rasters(): Main pipeline for processing all configured raster types
plot_gaez_tif(): Generate visualization plots for processed raster data
__download_resources_zip_file__(): Download GAEZ ZIP archive from remote source
__extract_rasters__(): Extract required raster files from ZIP archive
__clip_to_boundary_n_plot__(): Clip rasters to regional boundaries and generate plots
- param config_file_path:
Path to configuration file containing GAEZ dataset parameters
- type config_file_path:
str or Path
- param region_short_code:
Region identifier for boundary definition and file naming
- type region_short_code:
str
- param resource_type:
Resource type ('solar', 'wind', 'bess') - used for dependency injection
- type resource_type:
str
- gadmBoundary
GADM boundary processor for regional extent definition
- Type:
GADMBoundaries
- gaez_config
GAEZ dataset configuration parameters from config file
- Type:
dict
- gaez_root
Root directory for GAEZ data storage and processing
- Type:
Path
- zip_file
Path to the GAEZ ZIP archive file
- Type:
Path
- Rasters_in_use_direct
Directory for extracted and processed raster files
- Type:
Path
- raster_types
List of raster type configurations to process
- Type:
list
- region_boundary
Regional boundary geometry for clipping operations
- Type:
gpd.GeoDataFrame
- process_all_rasters(show=False) dict [source]
Main pipeline to download, extract, clip, and plot all configured rasters
- plot_gaez_tif(tif_path, color_map, plot_title, save_to, show=False) matplotlib.Figure [source]
Generate and save visualization plots for raster data
Examples
Process GAEZ rasters for British Columbia:
>>> from RES.gaez import GAEZRasterProcessor >>> gaez_processor = GAEZRasterProcessor( ... config_file_path="config/config_BC.yaml", ... region_short_code="BC", ... resource_type="solar" ... ) >>> raster_paths = gaez_processor.process_all_rasters(show=True) >>> print(f"Processed {len(raster_paths)} raster types")
Access specific raster data:
>>> # Raster paths are returned as dictionary >>> if 'slope' in raster_paths: ... slope_path = raster_paths['slope'] ... print(f"Slope raster available at: {slope_path}")
Configuration Requirements#
The GAEZ configuration must include:
root: "data/downloaded_data/GAEZ" # Storage directory source: "https://s3.eu-west-1.amazonaws.com/data.gaezdev.aws.fao.org/LR.zip" zip_file: "LR.zip" # ZIP archive filename Rasters_in_use_direct: "Rasters_in_use" # Extraction directory raster_types:
name: "slope" raster: "slope.tif" zip_extract_direct: "slope" color_map: "terrain"
# Additional raster type configurations...
Data Processing Workflow#
Configuration Loading: Extract GAEZ parameters from config file
Download Check: Verify ZIP archive exists or download from source
Extraction: Extract required raster files from ZIP archive
Boundary Processing: Get regional boundaries from GADM processor
Clipping: Clip each raster to regional boundaries
Visualization: Generate plots for each processed raster
Path Management: Return dictionary of processed raster file paths
Raster Type Configuration#
Each raster type requires: - name: Identifier for the raster layer - raster: Filename of the raster file within ZIP archive - zip_extract_direct: Directory path within ZIP archive - color_map: Matplotlib colormap for visualization
Supported Raster Types#
Common GAEZ rasters include: - Slope: Terrain slope for accessibility analysis - Soil Quality: Agricultural productivity constraints - Land Cover: Vegetation and land use classifications - Elevation: Digital elevation model data - Climate Zones: Agro-ecological zone classifications
Spatial Processing#
Input CRS: Inherits coordinate system from source rasters
Clipping: Uses regional boundaries with geometry buffering
Output Format: GeoTIFF files with preserved metadata
Resolution: Maintains original raster resolution
Compression: Optimized file storage for large datasets
Visualization Features#
Automatic Plotting: Generates plots for all processed rasters
Custom Colormaps: Configurable visualization schemes
Coordinate Display: Latitude/longitude axis labels
Legend Integration: Horizontal colorbar with value indicators
File Output: PNG format with high-resolution settings
Performance Considerations#
ZIP download time scales with file size (typically 100MB-1GB)
Extraction time depends on number of raster layers
Clipping operations are memory-intensive for large regions
Multiple raster processing benefits from parallel execution
Network connectivity affects initial download performance
Integration Points#
Boundaries: Uses GADMBoundaries for regional extent definition
Land Constraints: Provides input data for land availability analysis
Capacity Calculation: Supports renewable energy siting decisions
Visualization: Integrates with broader visualization workflows
Error Handling#
Download Failures: Graceful handling of network issues
Missing Files: Clear error messages for missing raster files
Extraction Errors: Validation of ZIP archive contents
Processing Failures: Detailed logging for debugging
Output Management#
Organized Storage: Systematic directory structure for processed data
File Naming: Consistent naming convention with region identifiers
Metadata Preservation: Maintains spatial reference and statistics
Visualization Archive: Organized plot storage for documentation
Notes
GAEZ data is provided by FAO (Food and Agriculture Organization)
Raster datasets are typically global coverage at moderate resolution
Processing large regions may require substantial disk space
Results integrate with renewable energy assessment workflows
Visualization outputs support decision-making and reporting
ZIP archives are cached locally to avoid repeated downloads
Dependencies#
requests: HTTP downloading of ZIP archives
rasterio: Raster data reading, processing, and writing
zipfile: ZIP archive extraction and management
pathlib: File path operations and directory management
matplotlib: Visualization and plot generation
RES.AttributesParser: Parent class for configuration management
RES.boundaries.GADMBoundaries: Regional boundary processing
RES.utility: Logging and status update functions
- raises ConnectionError:
If GAEZ data download fails or source is unavailable
- raises FileNotFoundError:
If required raster files are missing from ZIP archive
- raises ValueError:
If configuration parameters are invalid or incomplete
- raises RuntimeError:
If raster processing or clipping operations fail
See also
rasterio.mask.mask
Raster clipping functionality
RES.boundaries.GADMBoundaries
Regional boundary processing
RES.lands
Land constraint integration for renewable energy
- plot_gaez_tif(tif_path, color_map, plot_title, save_to, show=False)[source]
Generate and save visualization plot for processed GAEZ raster data.
Creates a publication-quality matplotlib visualization of the clipped GAEZ raster data with proper coordinate system display, color mapping, and legend information. The plot includes geographic extent display with latitude/longitude axes and a horizontal colorbar for value interpretation.
This method supports both interactive display and file output, making it suitable for both exploratory analysis and report generation workflows. The visualization uses proper geographic coordinates and customizable color schemes to effectively communicate spatial patterns in the data.
- Parameters:
tif_path (str or pathlib.Path) -- Path to the GeoTIFF raster file to visualize. Must be a valid raster file with spatial reference information.
color_map (str) -- Name of matplotlib colormap to use for visualization. Examples: 'terrain', 'viridis', 'plasma', 'coolwarm'.
plot_title (str) -- Title text to display at the top of the plot. Should describe the raster content and region.
save_to (str or pathlib.Path) -- Output path for saving the plot image file. Parent directories will be created if they don't exist.
show (bool, default=False) -- Whether to display the plot interactively on screen. If True, plot is shown in addition to being saved. If False, plot is only saved to file without display.
- Returns:
The matplotlib Figure object containing the plot. Can be used for further customization or processing.
- Return type:
matplotlib.figure.Figure
Examples
Create a basic plot:
>>> fig = processor.plot_gaez_tif( ... tif_path="data/BC_slope.tif", ... color_map="terrain", ... plot_title="Slope Analysis for British Columbia", ... save_to="plots/BC_slope.png" ... )
Create an interactive plot:
>>> fig = processor.plot_gaez_tif( ... tif_path="data/BC_elevation.tif", ... color_map="viridis", ... plot_title="Elevation Map", ... save_to="plots/elevation.png", ... show=True ... )
- Raises:
rasterio.errors.RasterioIOError -- If the input TIF file cannot be read or is corrupted
FileNotFoundError -- If the input TIF file does not exist
ValueError -- If the colormap name is not recognized by matplotlib
IOError -- If the output plot file cannot be written
Notes
Plot dimensions are fixed at 10x8 inches for consistency
Colorbar is positioned horizontally below the plot
Geographic extent is automatically derived from raster bounds
Output directories are created automatically if needed
Plot is always saved regardless of the show parameter
Figure is closed after processing to prevent memory leaks
NoData/masked values are handled transparently in visualization
- process_all_rasters(show: bool = False)[source]
Main pipeline to download, extract, clip, and plot rasters based on configuration.
Executes the complete GAEZ raster processing workflow including: 1. Downloading ZIP archive if not present locally 2. Extracting required raster files from archive 3. Loading regional boundaries for clipping operations 4. Processing each configured raster type by clipping to boundaries 5. Generating visualization plots for all processed rasters
This method orchestrates all processing steps and returns paths to the processed raster files for downstream analysis.
- Parameters:
show (bool, default=False) -- Whether to display generated plots interactively during processing. If True, matplotlib plots will be shown on screen. If False, plots are saved to disk without display.
- Returns:
Dictionary mapping raster type names to processed file paths. Keys are raster type names from configuration. Values are Path objects pointing to clipped raster files.
- Return type:
dict
Examples
Process all rasters with visualization:
>>> raster_paths = processor.process_all_rasters(show=True) >>> print(f"Processed rasters: {list(raster_paths.keys())}")
Process rasters for programmatic use:
>>> raster_paths = processor.process_all_rasters(show=False) >>> slope_raster = raster_paths.get('slope') >>> if slope_raster: ... print(f"Slope data at: {slope_raster}")
- Raises:
ConnectionError -- If ZIP archive download fails due to network issues
FileNotFoundError -- If required raster files are missing from archive
RuntimeError -- If raster clipping or processing operations fail
Notes
Processing time scales with number of raster types and region size
Large regions may require substantial disk space for processed rasters
Network connection required for initial ZIP archive download
Existing processed rasters are not regenerated unless missing
All plots are saved regardless of the show parameter setting
Global Wind Atlas#
Handler for Global Wind Atlas data processing.
- class RES.gwa.GWACells(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None)[source]
Bases:
AttributesParser
Global Wind Atlas (GWA) data processor for high-resolution wind resource analysis.
This class integrates Global Wind Atlas data with regional boundaries to provide high-resolution wind resource assessment capabilities for renewable energy planning. GWA provides detailed wind speed, wind power density, and wind class information at much higher spatial resolution than ERA5 data, making it valuable for detailed site assessment and resource characterization.
The class handles downloading, processing, and spatial mapping of GWA raster data to ERA5 grid cells, enabling multi-scale wind resource analysis. It processes multiple GWA data layers including wind speed, wind power density, and IEC wind class classifications to provide comprehensive wind resource information.
INHERITED METHODS FROM GADMBoundaries:#
get_bounding_box() -> tuple: Get regional bounding box for spatial clipping
get_region_boundary() -> gpd.GeoDataFrame: Get regional boundary geometry
get_country_boundary() -> gpd.GeoDataFrame: Get country-level boundary geometry
Plus other boundary processing methods
INHERITED METHODS FROM AttributesParser:#
get_gwa_config() -> dict: Get GWA data configuration parameters
get_default_crs() -> str: Get default coordinate reference system
get_region_mapping() -> Dict[str, dict]: Get region mapping configuration
Plus other configuration access methods
INHERITED ATTRIBUTES FROM AttributesParser:#
region_short_code: Region identifier code
region_mapping: Dictionary mapping region codes to configuration
store: HDF5 data store path for processed results
Plus other configuration attributes
OWN METHODS DEFINED IN THIS CLASS:#
prepare_GWA_data(): Download and process GWA raster data for the region
download_file(): Download individual files from remote sources
load_gwa_cells(): Create GeoDataFrame of GWA cells with spatial geometry
map_GWA_cells_to_ERA5(): Map high-resolution GWA data to ERA5 grid cells
- param config_file_path:
Path to configuration file containing GWA data parameters
- type config_file_path:
str or Path
- param region_short_code:
Region identifier for boundary definition and data filtering
- type region_short_code:
str
- param resource_type:
Resource type ('wind') for GWA wind resource analysis
- type resource_type:
str
- merged_data
Merged xarray DataArray containing all GWA data layers
- Type:
xr.DataArray
- gwa_config
GWA configuration parameters from config file
- Type:
dict
- datahandler
HDF5 data handler for storing processed results
- Type:
DataHandler
- gwa_datafields
Field definitions for GWA data layers
- Type:
dict
- gwa_rasters
Raster file specifications for GWA data
- Type:
dict
- gwa_sources
Source URLs for downloading GWA data
- Type:
dict
- gwa_root
Root directory for GWA data storage
- Type:
Path
- bounding_box
Regional bounding box coordinates for spatial clipping
- Type:
dict
- region_gwa_cells_df
Processed GWA cells data as pandas DataFrame
- Type:
pd.DataFrame
- gwa_cells_gdf
GWA cells with spatial geometry for analysis
- Type:
gpd.GeoDataFrame
- mapped_gwa_cells_aggr_df
GWA data aggregated to ERA5 grid cell resolution
- Type:
pd.DataFrame
- prepare_GWA_data(windspeed_min=10, windspeed_max=20, memory_resource_limitation=False) pd.DataFrame [source]
Download, process, and merge GWA raster data for the region
- download_file(url, destination) None [source]
Download a file from URL to specified destination path
- load_gwa_cells(memory_resource_limitation=False) gpd.GeoDataFrame [source]
Load GWA cells as GeoDataFrame with spatial geometry
- map_GWA_cells_to_ERA5(memory_resource_limitation=False) None [source]
Map high-resolution GWA data to ERA5 grid cells for integration
Examples
Create GWA processor for British Columbia:
>>> from RES.gwa import GWACells >>> gwa_processor = GWACells( ... config_file_path="config/config_BC.yaml", ... region_short_code="BC", ... resource_type="wind" ... ) >>> >>> # Load high-resolution GWA cells >>> gwa_cells = gwa_processor.load_gwa_cells() >>> print(f"Loaded {len(gwa_cells)} GWA cells")
Process GWA data with wind speed filtering:
>>> # Prepare data with wind speed constraints >>> gwa_data = gwa_processor.prepare_GWA_data( ... windspeed_min=12, ... windspeed_max=25, ... memory_resource_limitation=True ... ) >>> print(f"Filtered to {len(gwa_data)} high-quality wind cells")
Map GWA data to ERA5 grid:
>>> # Map high-resolution GWA to ERA5 cells >>> gwa_processor.map_GWA_cells_to_ERA5(memory_resource_limitation=False) >>> print("GWA data mapped to ERA5 grid cells")
Configuration Requirements#
The GWA configuration must include:
root: "data/downloaded_data/GWA" # Storage directory datafields:
windspeed_gwa: "Wind speed at 100m" windpower_gwa: "Wind power density at 100m" IEC_Class_ExLoads: "IEC wind class"
- rasters:
windspeed_gwa: "GWA_country_code_windspeed.tif" windpower_gwa: "GWA_country_code_windpower.tif" IEC_Class_ExLoads: "GWA_country_code_iec.tif"
- sources:
windspeed_gwa: "https://globalwindatlas.info/download/GWA_country_code_windspeed.tif" windpower_gwa: "https://globalwindatlas.info/download/GWA_country_code_windpower.tif" IEC_Class_ExLoads: "https://globalwindatlas.info/download/GWA_country_code_iec.tif"
- region_mapping:
- BC:
GWA_country_code: "CAN" # Country code for GWA data
Data Processing Workflow#
Configuration Loading: Extract GWA parameters and region mapping
Data Download: Check for local data or download from GWA sources
Raster Processing: Load and clip raster data to regional boundaries
Data Merging: Combine multiple GWA layers into unified dataset
Quality Filtering: Apply wind speed and other quality constraints
Spatial Conversion: Convert raster data to point-based GeoDataFrame
Grid Mapping: Aggregate high-resolution GWA data to ERA5 grid cells
Data Storage: Store processed results in HDF5 format for reuse
GWA Data Layers#
Typical GWA datasets include: - Wind Speed: Mean wind speed at 100m height (m/s) - Wind Power Density: Wind power density at 100m height (W/m²) - IEC Wind Class: International Electrotechnical Commission wind classes - Capacity Factor: Estimated capacity factors for different turbine types - Wind Direction: Prevailing wind direction statistics
Spatial Resolution#
GWA Resolution: Typically 250m to 1km spatial resolution
ERA5 Resolution: Approximately 25km spatial resolution
Aggregation Method: Mean values for continuous variables
Coordinate System: WGS84 (EPSG:4326) for global compatibility
Clipping Boundaries: Regional boundaries from GADM database
Quality Control#
Wind Speed Filtering: Configurable minimum/maximum wind speed thresholds
Data Validation: Automatic detection and handling of NoData values
Spatial Validation: Clipping to valid regional boundaries
Memory Management: Optional memory limitation for large datasets
Error Handling: Graceful handling of download and processing errors
Performance Considerations#
Download time depends on data availability and network speed
Processing time scales with region size and data resolution
Memory usage can be substantial for large regions
Spatial overlay operations are computationally intensive
HDF5 storage provides efficient data access for repeated analysis
Integration Points#
ERA5 Data: Integration with ERA5 climate data for multi-scale analysis
Boundary Data: Uses GADM boundaries for regional definition
Capacity Analysis: Provides high-resolution input for capacity factor calculations
Resource Assessment: Supports detailed wind resource characterization
Grid Analysis: Compatible with grid cell generation workflows
Output Formats#
DataFrame: Tabular data with wind resource attributes
GeoDataFrame: Spatial data with point geometries
HDF5 Storage: Efficient storage for large datasets
Grid Mapping: ERA5-compatible aggregated datasets
Notes
GWA data is provided by Technical University of Denmark (DTU)
Global coverage with country-specific datasets
Higher resolution than ERA5 for detailed site assessment
Processing requires substantial computational resources for large regions
Results integrate seamlessly with ERA5-based renewable energy workflows
Data quality varies by region and local terrain complexity
Regular updates are available from the Global Wind Atlas portal
Dependencies#
geopandas: Spatial data processing and geometry operations
pandas: Tabular data manipulation and analysis
rioxarray: Raster data reading and spatial operations
xarray: N-dimensional array operations and data merging
requests: HTTP downloading of GWA datasets
pathlib: File path operations and directory management
RES.hdf5_handler.DataHandler: HDF5 data storage and retrieval
RES.boundaries.GADMBoundaries: Parent class for boundary processing
RES.utility: Logging and utility functions
- raises ConnectionError:
If GWA data download fails due to network issues
- raises FileNotFoundError:
If required configuration files or directories are missing
- raises ValueError:
If wind speed thresholds or other parameters are invalid
- raises RuntimeError:
If raster processing or spatial operations fail
See also
rioxarray.open_rasterio
Raster data reading functionality
geopandas.GeoDataFrame.overlay
Spatial overlay operations
RES.boundaries.GADMBoundaries
Parent class for boundary processing
RES.hdf5_handler.DataHandler
HDF5 data storage utilities
- download_file(url: str, destination: Path) None [source]
Download a file from a remote URL to a specified local destination.
Downloads GWA raster files from remote sources when they are not available locally. The method handles HTTP requests with proper error checking and provides detailed logging of download operations.
Files are downloaded completely before being written to avoid partial downloads. The destination directory is created automatically if it doesn't exist, ensuring reliable file operations.
- Parameters:
url (str) -- Complete URL of the file to download. Should be a valid HTTP/HTTPS URL pointing to a GWA raster file.
destination (Path) -- Local file path where the downloaded file should be saved. Parent directories will be created automatically if needed.
- Returns:
The method doesn't return a value but saves the file to disk.
- Return type:
None
Examples
Download a GWA wind speed raster:
>>> url = "https://globalwindatlas.info/download/CAN_windspeed.tif" >>> destination = Path("data/GWA/CAN_windspeed.tif") >>> processor.download_file(url, destination)
Download with automatic path handling:
>>> url = "https://globalwindatlas.info/download/CAN_windpower.tif" >>> destination = processor.gwa_root / "CAN_windpower.tif" >>> processor.download_file(url, destination)
- Raises:
requests.RequestException -- If the HTTP request fails due to network issues or server errors
requests.HTTPError -- If the server returns an HTTP error status code
IOError -- If the local file cannot be written due to permissions or disk space
FileNotFoundError -- If the destination directory cannot be created
Notes
Download progress is logged through utility print functions
Network timeouts may occur for large files on slow connections
Existing files are overwritten without warning
File integrity is not verified after download
Destination path is automatically converted to Path object if needed
- load_gwa_cells(memory_resource_limitation: bool | None = False)[source]
Load GWA cells as a spatial GeoDataFrame with point geometries.
Converts processed GWA tabular data into a spatial GeoDataFrame by creating point geometries from coordinate information. The resulting GeoDataFrame contains all wind resource attributes along with spatial geometry suitable for spatial analysis and visualization.
The method automatically clips the data to regional boundaries to ensure results are geographically constrained to the area of interest. This spatial filtering removes any cells that fall outside the defined regional boundaries despite being within the bounding box.
- Parameters:
memory_resource_limitation (Optional[bool], default=False) -- Whether to enable memory-efficient processing for large datasets. Passed through to prepare_GWA_data() method to control filtering. If True, applies wind speed filtering to reduce memory usage. If False, processes the full dataset without memory limitations.
- Returns:
Spatial GeoDataFrame containing GWA cells with: - Point geometries representing cell center coordinates - Wind resource attributes (speed, power density, IEC class) - Spatial reference system matching regional CRS - Geographic clipping to regional boundaries
- Return type:
geopandas.GeoDataFrame
Examples
Load all GWA cells for the region:
>>> gwa_cells = processor.load_gwa_cells() >>> print(f"Loaded {len(gwa_cells)} spatial cells") >>> print(f"CRS: {gwa_cells.crs}")
Load with memory optimization:
>>> gwa_cells = processor.load_gwa_cells(memory_resource_limitation=True) >>> print(f"Memory-optimized: {len(gwa_cells)} cells")
Access spatial and attribute data:
>>> # Spatial analysis >>> total_area = gwa_cells.total_bounds >>> print(f"Spatial extent: {total_area}") >>> >>> # Attribute analysis >>> mean_windspeed = gwa_cells['windspeed_gwa'].mean() >>> print(f"Average wind speed: {mean_windspeed:.2f} m/s")
- Raises:
ValueError -- If coordinate columns (x, y) are missing from the GWA data
GeometryError -- If point geometries cannot be created from coordinates
CRSError -- If the coordinate reference system is invalid or undefined
Notes
Point geometries are created from x,y coordinate columns
Spatial clipping ensures geographic consistency with boundaries
CRS is inherited from the regional configuration
Processing time scales with the number of GWA cells
Memory usage depends on dataset size and attribute complexity
Results are suitable for spatial overlay and intersection operations
- map_GWA_cells_to_ERA5(aggregation_level: str, memory_resource_limitation: bool | None)[source]
Map high-resolution GWA cells to coarser ERA5 grid cells for multi-scale analysis.
This method performs spatial aggregation of high-resolution GWA wind data (typically 250m-1km resolution) to ERA5 grid cells (approximately 25km resolution). The aggregation process uses spatial overlay operations to determine which GWA cells fall within each ERA5 cell, then computes mean values for all wind resource attributes.
The mapping enables integration of detailed GWA wind resource data with ERA5-based renewable energy analysis workflows, providing enhanced spatial detail while maintaining compatibility with ERA5 grid structures.
Processing is performed on a region-by-region basis to optimize memory usage and computational efficiency. Results are automatically stored in the HDF5 data store for subsequent analysis operations.
- Parameters:
memory_resource_limitation (Optional[bool]) -- Whether to enable memory-efficient processing for large datasets. Passed through to load_gwa_cells() and prepare_GWA_data() methods. If True, applies filtering to reduce memory usage during processing. If False, processes the complete dataset without memory limitations.
- Returns:
The method doesn't return a value but stores results in the HDF5 store. Aggregated data is accessible via self.mapped_gwa_cells_aggr_df attribute and permanently stored in the 'cells' store for future access.
- Return type:
None
Examples
Map GWA data to ERA5 grid with full dataset:
>>> processor.map_GWA_cells_to_ERA5(memory_resource_limitation=False) >>> print("GWA data mapped to ERA5 grid cells")
Map with memory optimization for large regions:
>>> processor.map_GWA_cells_to_ERA5(memory_resource_limitation=True) >>> print("Memory-optimized mapping completed")
Access mapped results:
>>> # Results are stored in datahandler >>> era5_with_gwa = processor.datahandler.from_store('cells') >>> print(f"ERA5 cells with GWA data: {len(era5_with_gwa)}") >>> print(f"Columns: {list(era5_with_gwa.columns)}")
Processing Workflow#
Data Loading: Load ERA5 grid cells from HDF5 store
GWA Loading: Load high-resolution GWA cells as GeoDataFrame
Spatial Overlay: Perform intersection between GWA and ERA5 cells
Coordinate Mapping: Update coordinates to ERA5 cell centers
Aggregation: Compute mean values for numeric attributes by ERA5 cell
Storage: Store aggregated results in HDF5 store with forced update
Spatial Operations#
Overlay Method: Intersection overlay to find spatial relationships
Aggregation Function: Mean aggregation for all numeric attributes
Coordinate Assignment: ERA5 cell coordinates replace GWA coordinates
Regional Processing: Separate processing by geographic regions
Memory Management: Regional processing reduces peak memory usage
Performance Considerations#
Processing time scales with number of GWA cells and ERA5 cells
Memory usage peaks during spatial overlay operations
Regional processing improves memory efficiency for large datasets
Storage operations may take time for large aggregated datasets
Spatial indexing improves performance for repeated operations
- raises FileNotFoundError:
If ERA5 grid cells are not found in the HDF5 store
- raises ValueError:
If spatial overlay operations fail due to geometry issues
- raises MemoryError:
If dataset is too large for available memory (use memory limitation)
- raises RuntimeError:
If HDF5 storage operations fail
Notes
Aggregation preserves all numeric wind resource attributes
Categorical attributes (like IEC class) may require special handling
Results overwrite existing data in the HDF5 store (force_update=True)
Processing is optimized for typical renewable energy analysis workflows
Spatial accuracy depends on the quality of ERA5 and GWA geometries
Large regions may require substantial processing time and memory
Results integrate seamlessly with ERA5-based capacity calculations
Data Quality#
Mean aggregation is appropriate for continuous wind variables
Statistical significance increases with more GWA cells per ERA5 cell
Spatial representation accuracy depends on resolution differences
Edge effects may occur at regional boundaries
- merged_data: DataArray
- prepare_GWA_data(windpseed_min=10, windpseed_max=20, memory_resource_limitation: bool = False) DataArray [source]
Download, process, and merge Global Wind Atlas raster data for the region.
This method orchestrates the complete GWA data preparation workflow including: downloading required raster files, loading and clipping them to regional boundaries, merging multiple data layers, and applying quality filters. The processed data is returned as a pandas DataFrame ready for analysis.
The method handles multiple GWA data types (wind speed, power density, IEC classes) and automatically downloads missing files from configured sources. Spatial clipping ensures data is limited to the region of interest, and wind speed filtering allows focus on viable wind resources.
- Parameters:
windpseed_min (float, default=10) -- Minimum wind speed threshold in m/s for filtering cells. Cells with wind speeds below this value are excluded from results.
windpseed_max (float, default=20) -- Maximum wind speed threshold in m/s for filtering cells. Cells with wind speeds above this value are excluded from results.
memory_resource_limitation (bool, default=False) -- Whether to enable memory-efficient processing for large datasets. If True, applies wind speed filtering to reduce memory usage. If False, uses full wind speed range (0-50 m/s) for processing.
- Returns:
Processed GWA data as pandas DataFrame with columns: - x, y: Spatial coordinates in regional CRS - windspeed_gwa: Wind speed at 100m height (m/s) - windpower_gwa: Wind power density at 100m height (W/m²) - IEC_Class_ExLoads: IEC wind class classifications - Additional fields as configured in GWA data configuration
- Return type:
pd.DataFrame
Examples
Prepare data with default wind speed range:
>>> gwa_data = processor.prepare_GWA_data() >>> print(f"Loaded {len(gwa_data)} wind resource cells")
Apply strict wind speed filtering for high-quality sites:
>>> high_wind_data = processor.prepare_GWA_data( ... windpseed_min=15, ... windpseed_max=25, ... memory_resource_limitation=True ... ) >>> print(f"High-wind sites: {len(high_wind_data)} cells")
Process full dataset without filtering:
>>> all_data = processor.prepare_GWA_data( ... windpseed_min=0, ... windpseed_max=50, ... memory_resource_limitation=False ... )
- Raises:
ConnectionError -- If GWA data download fails due to network issues
FileNotFoundError -- If GWA raster files cannot be found locally or remotely
ValueError -- If wind speed thresholds are invalid (min >= max)
RuntimeError -- If raster processing or spatial operations fail
Notes
Processing time depends on region size and number of data layers
Downloaded files are cached locally to avoid repeated downloads
Memory usage scales with region size and data resolution
Wind speed filtering significantly reduces memory requirements
Multiple raster files are automatically merged into unified dataset
Spatial coordinates are preserved for subsequent spatial analysis
Turbine Configuration#
Turbine database and configuration management.
- class RES.tech.OEDBTurbines(OEDB_config: dict)[source]
Bases:
object
- fetch_turbine_config(model)[source]
Fetches turbine data based on the resource type (e.g., 'wind') and saves the formatted configuration for the turbines found.
- format_and_save_turbine_config(turbine_data: dict, save_to: str)[source]
Formats (to sync Atlite's Requirement) and saves the turbine's specification data to a YAML configuration file.
- Parameters:
turbine_data (-) -- Turbine specification data.
save_to (-) -- The directory path where the YAML file will be saved.
- load_turbine_config()[source]
Loads the turbine configuration from a YAML file.
Units Management#
Unit conversion and management utilities.
- class RES.units.Units(config_file_path: Path = None, region_short_code: str = None, resource_type: str = None, SAVE_TO_DIR: Path = PosixPath('data'), EXCEL_FILENAME: str = 'units.csv')[source]
Bases:
AttributesParser
Unit conversion and metadata management system for renewable energy analysis.
This class provides standardized unit definitions and conversion capabilities for various parameters used throughout the renewable energy resource assessment workflow. It maintains a comprehensive dictionary of units for economic, technical, and energy parameters, enabling consistent data interpretation and reporting across different analysis modules.
Key Functionality: - Defines standard units for economic parameters (CAPEX, OPEX, LCOE) - Establishes energy and power unit conventions - Provides data persistence through HDF5 storage - Exports unit dictionaries for external reference - Ensures consistent unit usage across the analysis pipeline
- SAVE_TO_DIR
Directory for saving unit reference files
- Type:
Path
- EXCEL_FILENAME
Filename for CSV export of unit dictionary
- Type:
str
- datahandler
HDF5 storage interface for data persistence
- Type:
DataHandler
- Standard Units Defined:
capex: Million USD per MW (Mil. USD/MW)
fom: Million USD per MW (Mil. USD/MW) - Fixed O&M
vom: Million USD per MW (Mil. USD/MW) - Variable O&M
potential_capacity: Megawatts (MW)
p_lcoe: Megawatt-hours per USD (MWH/USD)
energy: Megawatt-hours (MWh)
energy_demand: Petajoules (Pj)
Example
>>> units_manager = Units( ... config_file_path="config/config.yaml", ... SAVE_TO_DIR=Path("data/units"), ... EXCEL_FILENAME="units_reference.csv" ... ) >>> units_manager.create_units_dictionary() >>> # Creates standardized unit reference for project
Notes
Units follow international energy industry standards
Economic parameters use million USD to match typical project scales
Energy units align with grid-scale renewable energy reporting
HDF5 storage enables efficient data access and version control
- EXCEL_FILENAME: str = 'units.csv'
- SAVE_TO_DIR: Path = PosixPath('data')
- create_units_dictionary()[source]
Create and persist a comprehensive dictionary of standardized units for renewable energy analysis.
This method establishes the authoritative unit reference for all parameters used in renewable energy resource assessment and economic analysis. It creates a standardized dictionary mapping parameter names to their corresponding units, ensuring consistency across all analysis modules and facilitating data interpretation and reporting.
Unit Categories: 1. Economic Parameters:
Capital expenditures (CAPEX) in Million USD/MW
Fixed and variable O&M costs in Million USD/MW
LCOE productivity metrics in MWH/USD
Technical Parameters: - Power capacity in Megawatts (MW) - Energy production in Megawatt-hours (MWh) - Energy demand in Petajoules (Pj)
Process: 1. Defines comprehensive units dictionary with industry-standard units 2. Converts dictionary to pandas DataFrame for structured storage 3. Persists data to HDF5 store for efficient access 4. Exports human-readable CSV file for external reference
Data Storage: - HDF5 format: Efficient binary storage for programmatic access - CSV format: Human-readable reference for documentation and verification
- Raises:
FileSystemError -- If output directory cannot be created
PermissionError -- If CSV file cannot be written
StorageError -- If HDF5 store operation fails
Example
>>> units_manager = Units(**config) >>> units_manager.create_units_dictionary() INFO: Units information created and saved to 'data/units.csv'
Notes
Units follow international energy industry conventions
Economic units scaled to typical renewable project magnitudes
CSV export includes parameter names and corresponding units
HDF5 storage enables version control and audit trails
Clustering and Aggregation#
Spatial clustering utilities for site aggregation.
Spatial clustering module for renewable energy resource assessment.
This module provides K-means clustering functionality for aggregating grid cells with similar renewable energy characteristics into representative clusters. The clustering is based on techno-economic metrics such as Levelized Cost of Electricity (LCOE) and potential capacity, enabling spatial aggregation for energy system modeling and optimization.
The module implements an automated workflow for determining optimal cluster numbers, performing spatial clustering, and creating representative cluster geometries that maintain spatial relationships while reducing computational complexity for large-scale renewable energy assessments.
Key Features#
Automated optimal cluster number determination using elbow method
Spatial clustering based on LCOE and capacity metrics
Grid cell identifier generation for data linking
Cluster geometry creation through spatial union operations
Regional boundary clipping for precise spatial extent
Visualization of clustering analysis results
Functions#
- assign_cluster_id(cells, source_column=sub_national_unit_tag, index_name='cell')
Generate unique identifiers for grid cells based on region and coordinates
- find_optimal_K(resource_type, data_for_clustering, region, wcss_tolerance, max_k)
Determine optimal number of clusters using elbow method and WCSS tolerance
- pre_process_cluster_mapping(cells_scored, vis_directory, wcss_tolerance, resource_type)
Preprocess data and determine optimal cluster numbers for each region
- cells_to_cluster_mapping(cells_scored, vis_directory, wcss_tolerance, resource_type, sort_columns)
Map grid cells to clusters based on similarity metrics and optimal cluster numbers
- create_cells_Union_in_clusters(cluster_map_gdf, region_optimal_k_df, resource_type)
Create unified cluster geometries by dissolving individual cell boundaries
- clip_cluster_boundaries_upto_regions(cell_cluster_gdf, gadm_regions_gdf, resource_type)
Clip cluster boundaries to precise regional administrative boundaries
Clustering Methodology#
The clustering approach follows a multi-step process:
Data Preparation: Grid cells with calculated LCOE and capacity metrics
Optimal K Determination: Uses elbow method with Within-Cluster Sum of Squares (WCSS)
Regional Clustering: Performs K-means clustering separately for each region
Spatial Aggregation: Creates unified cluster geometries through spatial union
Boundary Refinement: Clips results to precise administrative boundaries
The LCOE-based clustering ensures that cells with similar techno-economic characteristics are grouped together, creating representative clusters suitable for energy system optimization while maintaining spatial coherence.
Algorithm Details#
K-means Clustering: Uses scikit-learn implementation with multiple initializations
Elbow Method: Automatically determines optimal cluster count based on WCSS tolerance
Missing Data Handling: Imputes missing values using mean strategy
Spatial Preservation: Maintains geographic relationships through geometry operations
Regional Processing: Handles each administrative region independently
Usage Examples#
Basic clustering workflow:
>>> import pandas as pd
>>> import geopandas as gpd
>>> from RES.cluster import cells_to_cluster_mapping, create_cells_Union_in_clusters
>>>
>>> # Perform clustering analysis
>>> cluster_map_gdf, optimal_k_df = cells_to_cluster_mapping(
>>> cells_scored=scored_cells,
>>> vis_directory="vis/BC",
>>> wcss_tolerance=0.15,
>>> resource_type="solar",
>>> sort_columns=["lcoe_solar"]
>>> )
>>>
>>> # Create unified cluster geometries
>>> clusters_gdf, cluster_indices = create_cells_Union_in_clusters(
>>> cluster_map_gdf=cluster_map_gdf,
>>> region_optimal_k_df=optimal_k_df,
>>> resource_type="solar"
>>> )
Cell identification:
>>> # Generate unique cell identifiers
>>> cells_with_ids = assign_cluster_id(
>>> cells=grid_cells,
>>> source_column="Province",
>>> index_name="cell_id"
>>> )
Input Data Requirements#
The clustering functions expect GeoDataFrames with specific columns:
Required Columns: - 'x', 'y': Grid cell centroid coordinates - sub_national_unit_tag: Administrative region classification - 'lcoe_{resource_type}': Levelized cost of electricity - 'potential_capacity_{resource_type}': Maximum potential capacity - 'geometry': Spatial geometry (Polygon or Point)
Optional Columns: - 'capex_{resource_type}': Capital expenditure costs - 'fom_{resource_type}': Fixed operation and maintenance costs - 'vom_{resource_type}': Variable operation and maintenance costs - '{resource_type}_CF_mean': Average capacity factor - 'nearest_station': Nearest grid connection point - 'nearest_station_distance_km': Distance to grid connection
Output Data Structure#
Clustering results include:
Cluster Map GeoDataFrame: - Individual cells with assigned cluster numbers - Original cell attributes preserved - Cluster_No: Integer cluster identifier - Optimal_k: Optimal number of clusters for region
Unified Clusters GeoDataFrame: - Dissolved cluster geometries - Aggregated techno-economic parameters - Representative cluster characteristics - Spatial extent covering all member cells
Cluster Indices Dictionary: - Mapping of original cell indices to clusters - Structure: {region: {cluster_no: [cell_indices]}} - Enables traceability from clusters back to individual cells
Visualization Outputs#
The module generates several visualization products:
Elbow Plots: - WCSS vs. number of clusters for each region - Optimal cluster number identification - Saved to vis_directory/Regional_cluster_Elbow_Plots/
Performance Considerations#
Memory usage scales with number of grid cells and clusters
Processing time increases with higher max_k values
Imputation handles missing data but may affect clustering quality
Large regions may benefit from hierarchical clustering approaches
Dependencies#
pandas: Data manipulation and analysis
geopandas: Spatial data operations
numpy: Numerical computations
matplotlib.pyplot: Visualization
sklearn.cluster.KMeans: K-means clustering algorithm
sklearn.impute.SimpleImputer: Missing value imputation
pathlib: File path operations
logging: Progress and error reporting
RES.utility: Custom utility functions for spatial operations
Notes
Clustering is performed separately for each administrative region
WCSS tolerance controls the trade-off between cluster number and representation
Missing or infinite values are automatically handled through imputation
Cluster ranking is based on ascending LCOE values (lowest cost first)
Spatial relationships are preserved through geometry operations
Results are suitable for energy system optimization models
See also
-
, -
, -
- RES.cluster.assign_cluster_id(cells: GeoDataFrame, source_column: str = None, index_name: str = 'cell') GeoDataFrame [source]
Generate unique identifiers for grid cells based on region and coordinates.
Creates standardized cell identifiers that combine regional information with spatial coordinates to ensure uniqueness across the entire assessment domain. These identifiers serve as primary keys for data linking and result tracking throughout the assessment workflow.
- Parameters:
cells (gpd.GeoDataFrame) -- Input GeoDataFrame containing spatial data with 'x', 'y' coordinates and regional classification information
source_column (str, default None) -- Column name containing regional classification (e.g., province, state)
index_name (str, default 'cell') -- Name for the new unique identifier column
- Returns:
GeoDataFrame with new unique cell identifier column set as index
- Return type:
gpd.GeoDataFrame
Examples
Basic cell ID assignment:
>>> cells_with_ids = assign_cluster_id( ... cells=grid_cells, ... source_column='Province', ... index_name='cell_id' ... ) >>> print(cells_with_ids.index.name) # 'cell_id'
Custom identifier format:
>>> # Creates IDs like: "BC_-123.5_49.2" >>> cells = assign_cluster_id(cells, 'Province', 'unique_cell')
- Raises:
ValueError -- If source_column doesn't exist in the GeoDataFrame
ValueError -- If required coordinate columns 'x', 'y' are missing
Notes
Removes spaces from region names for consistent formatting
ID format: "{region}_{x_coord}_{y_coord}"
Coordinates maintain original decimal precision
Sets generated IDs as DataFrame index for efficient lookups
Essential for linking spatial analysis results across workflow steps
- RES.cluster.cells_to_cluster_mapping(cells_scored: DataFrame, vis_directory: str, wcss_tolerance: float, sub_national_unit_tag: str, resource_type: str, sort_columns: list) tuple[DataFrame, DataFrame] [source]
Map grid cells to clusters based on similarity metrics and optimal cluster numbers.
Performs spatial clustering of renewable energy grid cells by grouping cells with similar techno-economic characteristics (primarily LCOE) into representative clusters. The function implements a systematic approach to divide each region's cells into the optimal number of clusters determined through elbow method analysis.
This is the main clustering workflow function that transforms individual grid cells into clustered representations suitable for energy system optimization models, reducing computational complexity while preserving spatial and economic relationships.
- Parameters:
cells_scored (pd.DataFrame) -- Scored grid cells with techno-economic attributes Must contain LCOE, capacity, and regional classification data
vis_directory (str) -- Directory path for saving clustering visualization outputs Used for elbow plots and clustering analysis results
wcss_tolerance (float) -- Within-Cluster Sum of Squares tolerance (0.0 to 1.0) Controls cluster granularity vs. computational efficiency trade-off
resource_type (str) -- Renewable energy resource type ('solar', 'wind', 'bess') Determines which columns to use for clustering analysis
sort_columns (list) -- Column names for sorting cells before cluster assignment Typically includes LCOE or other ranking metrics
- Returns:
cells_cluster_map_df: Individual cells with assigned cluster numbers
optimal_k_df: Summary of optimal cluster counts by region
- Return type:
tuple[pd.DataFrame, pd.DataFrame]
Examples
Perform clustering for wind resources:
>>> cluster_map, optimal_k = cells_to_cluster_mapping( ... cells_scored=wind_cells_scored, ... vis_directory="vis/Alberta", ... wcss_tolerance=0.20, ... resource_type="wind", ... sort_columns=["lcoe_wind", "potential_capacity_wind"] ... ) >>> print(f"Created {cluster_map['Cluster_No'].max()} clusters across regions")
Clustering Methodology#
The clustering approach follows several key principles:
Regional Separation: Clustering is performed independently for each administrative region to maintain spatial coherence and respect political boundaries that affect renewable energy development.
LCOE-Based Similarity: Cells are grouped based on Levelized Cost of Electricity (LCOE) as the primary similarity metric, ensuring clusters represent similar economic viability.
Sorted Assignment: Within each region, cells are sorted by specified metrics (typically LCOE) before being assigned to clusters, ensuring that the best cells are distributed across clusters.
Equal Distribution: Cells are divided as evenly as possible across the optimal number of clusters for each region, preventing cluster size imbalances.
Algorithm Workflow#
Preprocessing: Call pre_process_cluster_mapping to determine optimal k
Region Filtering: Focus on regions with valid optimal cluster numbers
Cell Sorting: Sort cells within each region by specified criteria
Cluster Assignment: Divide sorted cells into optimal number of groups
Remainder Handling: Merge small remainder groups into larger clusters
Numbering: Assign sequential cluster numbers within each region
Cluster Assignment Strategy#
For each region with n cells and k optimal clusters: - Calculate step_size = n ÷ k - Assign cells [0:step_size] to cluster 1 - Assign cells [step_size:2*step_size] to cluster 2 - Continue until all cells are assigned - Merge any remainder cells into the last cluster
This ensures balanced cluster sizes while maintaining economic similarity through the pre-sorting step.
Output Data Structure#
cells_cluster_map_df contains: - All original cell attributes (LCOE, capacity, coordinates, etc.) - 'Cluster_No': Integer cluster identifier within region - 'Optimal_k': Total number of clusters for the cell's region - 'cell': Unique cell identifier (as index)
optimal_k_df contains: - sub_national_unit_tag : Administrative region unit (e.g. Region or Municipality etc.) - 'Optimal_k': Optimal number of clusters determined for region
Performance Considerations#
Memory usage scales linearly with number of cells
Processing time increases with number of regions and complexity
Sorting operations may be memory-intensive for large datasets
Cluster assignment is efficient O(n) operation per region
Quality Assurance#
Validates that all cells receive cluster assignments
Ensures cluster numbers are sequential within regions
Maintains data integrity through concatenation operations
Preserves spatial relationships through regional processing
Notes
Clustering preserves regional boundaries for political/administrative coherence
LCOE-based sorting ensures economic similarity within clusters
Balanced cluster sizes improve downstream optimization performance
Results are suitable for capacity expansion and dispatch optimization models
Cluster numbering resets for each region (regional scope)
- raises ValueError:
If required columns are missing or data validation fails
- raises KeyError:
If region names don't match between datasets
- raises RuntimeError:
If clustering assignment produces invalid results
See also
pre_process_cluster_mapping
Preprocessing and optimal k determination
create_cells_Union_in_clusters
Spatial union of clustered cells
find_optimal_K
Core optimal cluster number determination
- RES.cluster.clip_cluster_boundaries_upto_regions(cell_cluster_gdf: GeoDataFrame, gadm_regions_gdf: GeoDataFrame, resource_type) GeoDataFrame [source]
Clip cluster boundaries to precise regional administrative boundaries.
Refines cluster geometries by clipping them to exact administrative boundaries, ensuring that cluster extents respect political and administrative divisions. This final processing step removes any geometric artifacts from the clustering process and aligns results with official regional boundaries.
The function performs spatial clipping operations to trim cluster polygons to the precise extent of administrative regions, maintaining data integrity while ensuring geographic accuracy for policy and planning applications.
- Parameters:
cell_cluster_gdf (gpd.GeoDataFrame) -- Unified cluster geometries from create_cells_Union_in_clusters Contains cluster polygons that may extend beyond regional boundaries
gadm_regions_gdf (gpd.GeoDataFrame) -- Official administrative boundary geometries from GADM dataset Defines precise regional extents for clipping operations
resource_type (str) -- Resource type identifier ('solar', 'wind', 'bess') Used for column identification and sorting operations
- Returns:
Clipped cluster geometries with boundaries precisely aligned to administrative regions, sorted by LCOE in ascending order
- Return type:
gpd.GeoDataFrame
Examples
Clip wind clusters to provincial boundaries:
>>> clipped_clusters = clip_cluster_boundaries_upto_regions( ... cell_cluster_gdf=unified_clusters, ... gadm_regions_gdf=provincial_boundaries, ... resource_type="wind" ... ) >>> print(f"Clipped {len(clipped_clusters)} clusters to regional boundaries")
Clipping Operations#
Spatial Intersection: Clips cluster geometries using administrative boundaries
Topology Preservation: Maintains valid polygon geometry after clipping
Attribute Retention: Preserves all cluster attributes through clipping
Multi-geometry Handling: Manages potential multi-polygon results
Boundary Alignment Benefits#
Policy Compliance: Ensures clusters respect administrative jurisdictions
Planning Accuracy: Aligns with regional energy planning boundaries
Data Integrity: Removes geometric inconsistencies from processing
Visualization Quality: Improves map accuracy for stakeholder communication
Geometric Considerations#
Handles edge cases where clusters span multiple regions
Preserves cluster identity even after boundary clipping
Maintains geometric validity through robust clipping algorithms
May create multi-polygon geometries for clusters crossing boundaries
Sorting and Organization#
Results are sorted by LCOE in ascending order to facilitate: - Economic dispatch optimization - Merit order analysis - Least-cost development planning - Investment prioritization
Quality Assurance#
Validates geometric integrity after clipping operations
Ensures all clusters remain within administrative boundaries
Maintains attribute consistency through spatial operations
Preserves cluster ranking and identification
Performance Notes#
Clipping operations scale with geometric complexity
Large regions or detailed boundaries increase processing time
Memory usage depends on cluster and boundary detail level
Results are optimized for downstream energy modeling applications
Use Cases#
Regulatory Compliance: Ensuring development respects jurisdictions
Policy Analysis: Aligning renewable development with administrative units
Planning Integration: Connecting energy models with regional planning
Stakeholder Communication: Accurate maps for decision-maker engagement
Notes
Final step in the clustering workflow before energy system modeling
Essential for maintaining political and administrative coherence
Improves visual quality of cluster maps and analysis results
Ensures compatibility with regional energy planning frameworks
Results are ready for capacity expansion and dispatch optimization
- raises GeometryError:
If clipping operations produce invalid geometries
- raises ValueError:
If input datasets have incompatible coordinate systems
- raises AttributeError:
If required columns are missing from input data
See also
create_cells_Union_in_clusters
Preceding cluster creation function
gpd.GeoDataFrame.clip
Core spatial clipping operation
RES.boundaries.GADMBoundaries
Administrative boundary data source
- RES.cluster.create_cells_Union_in_clusters(cluster_map_gdf: GeoDataFrame, region_optimal_k_df: DataFrame, sub_national_unit_tag: str, resource_type: str) tuple[DataFrame, dict] [source]
Create unified cluster geometries by dissolving individual cell boundaries.
Transforms individual grid cells assigned to clusters into unified cluster geometries through spatial union operations. This process aggregates both geometric boundaries and techno-economic attributes to create representative cluster entities suitable for energy system optimization models.
The function performs spatial dissolve operations grouped by cluster number within each region, creating cohesive cluster polygons while maintaining traceability back to original cells through detailed index mapping.
- Parameters:
cluster_map_gdf (gpd.GeoDataFrame) -- Grid cells with cluster assignments from cells_to_cluster_mapping Must contain defined sub_national_unit_tag, 'Cluster_No', and geometric attributes
region_optimal_k_df (pd.DataFrame) -- Summary of optimal cluster numbers by region Contains defined sub_national_unit_tag and 'Optimal_k' columns
resource_type (str) -- Resource type identifier ('solar', 'wind', 'bess') Used for column naming and aggregation rules
- Returns:
dissolved_gdf: Unified cluster geometries with aggregated attributes
dissolved_indices: Mapping of cluster to original cell indices
- Return type:
tuple[pd.DataFrame, dict]
Examples
Create unified solar clusters:
>>> clusters_gdf, cell_mapping = create_cells_Union_in_clusters( ... cluster_map_gdf=mapped_cells, ... region_optimal_k_df=optimal_k_summary, ... resource_type="solar" ... ) >>> print(f"Created {len(clusters_gdf)} unified clusters") >>> print(f"Cluster 1 contains {len(cell_mapping['BC'][1])} original cells")
Aggregation Strategy#
Different attributes are aggregated using specific strategies:
Economic Metrics: - LCOE: Median value (representative of cluster economics) - CAPEX, FOM, VOM: First value (uniform within region/technology)
Performance Metrics: - Capacity Factor: Mean value (average performance) - Potential Capacity: Sum (total cluster capacity)
Infrastructure Metrics: - Nearest Station: First value (primary connection point) - Distance to Grid: First value (representative distance)
Classification: - Region, Cluster_No: First value (preserved identity)
Geometric Operations#
Spatial Dissolve: Union of cell geometries within each cluster
Topology Preservation: Maintains valid polygon geometry
Attribute Aggregation: Combines cell attributes per aggregation rules
Index Tracking: Records original cell indices for each cluster
Output Structure#
dissolved_gdf contains unified clusters with: - 'cluster_id': Unique cluster identifier (as index) - sub_national_unit_tag: Administrative region unit (e.g., Region or Municipality) - 'Cluster_No': Sequential cluster number within region - 'Rank': Cluster ranking based on LCOE (ascending) - Economic attributes: Aggregated costs and performance metrics - 'geometry': Unified cluster polygon geometry
dissolved_indices structure: ``` {
- 'region_name': {
cluster_no: [list_of_original_cell_indices], ...
}#
Processing Workflow#
Region Iteration: Process each region independently
Cluster Grouping: Group cells by cluster number within region
Index Recording: Store original cell indices before dissolving
Spatial Dissolve: Union geometries and aggregate attributes
Result Compilation: Concatenate all dissolved clusters
ID Assignment: Generate unique cluster identifiers
Ranking: Sort and rank clusters by economic metrics
Column Cleanup: Standardize column names for downstream use
Traceability Features#
The dissolved_indices dictionary enables: - Mapping clusters back to constituent cells - Detailed analysis of cluster composition - Validation of aggregation results - Disaggregation for detailed reporting
Quality Assurance#
Validates that all cells are included in clusters
Ensures geometric validity after spatial operations
Maintains attribute consistency through aggregation
Preserves regional and cluster identity information
Performance Considerations#
Memory usage scales with cluster complexity and number
Spatial operations may be computationally intensive
Large clusters with many cells require more processing time
Geometric simplification may be beneficial for very detailed cells
Notes
Cluster ranking facilitates economic dispatch optimization
Column name standardization removes resource type suffixes
Median LCOE provides robust cluster economic representation
Spatial union preserves geographic relationships
Results are optimized for energy system modeling workflows
- raises ValueError:
If cluster assignments are invalid or missing
- raises GeometryError:
If spatial dissolve operations fail
- raises KeyError:
If required columns are missing from input data
See also
cells_to_cluster_mapping
Preceding cluster assignment function
clip_cluster_boundaries_upto_regions
Boundary refinement function
gpd.GeoDataFrame.dissolve
Core spatial dissolve operation
- RES.cluster.find_optimal_K(resource_type: str, data_for_clustering: DataFrame, region: str, wcss_tolerance: float, max_k: int) DataFrame [source]
Determine optimal number of clusters using elbow method and WCSS tolerance.
Analyzes grid cells with renewable energy characteristics to find the optimal number of K-means clusters using the elbow method. The Within-Cluster Sum of Squares (WCSS) tolerance parameter controls the trade-off between cluster representation accuracy and computational complexity.
The function iteratively tests different cluster numbers (k) and calculates WCSS for each configuration. The optimal k is determined when WCSS falls below the specified tolerance threshold, indicating diminishing returns for additional clusters.
- Parameters:
resource_type (str) -- Type of renewable energy resource ('solar', 'wind', 'bess') Used for labeling and file naming
data_for_clustering (pd.DataFrame) -- Preprocessed data containing clustering features (LCOE, capacity) Must have no missing values or infinite values
region (str) -- Name of the administrative region being processed Used for plot titles and output messages
wcss_tolerance (float) -- Tolerance threshold as fraction of total WCSS (0.0 to 1.0) Lower values = more clusters, higher values = fewer clusters
max_k (int) -- Maximum number of clusters to test Limited by data size and computational constraints
- Returns:
Optimal number of clusters for the region Returns None if no optimal k found within tolerance
- Return type:
int or None
Examples
Find optimal clusters for solar data:
>>> optimal_k = find_optimal_K( ... resource_type="solar", ... data_for_clustering=clean_data, ... region="British Columbia", ... wcss_tolerance=0.15, ... max_k=20 ... ) >>> print(f"Optimal clusters: {optimal_k}")
Notes
WCSS measures squared distances from cluster centroids
Higher WCSS tolerance leads to fewer, more aggregated clusters
Lower WCSS tolerance leads to more, finer-grained clusters
Elbow plots are automatically generated and displayed
Function uses K-means with 10 random initializations for stability
Processing time increases quadratically with max_k
Algorithm Details#
Test k from 1 to min(max_k, data_size)
Calculate WCSS (inertia) for each k using K-means
Compute tolerance threshold as fraction of total WCSS
Find first k where WCSS ≤ tolerance threshold
Generate elbow plot with optimal k marked
The WCSS measures the sum of squared distances between each data point and its assigned cluster centroid. Lower WCSS indicates tighter, more homogeneous clusters but may lead to over-segmentation.
- raises ValueError:
If data_for_clustering is empty or contains only NaN values
- raises RuntimeError:
If K-means clustering fails for any k value
See also
sklearn.cluster.KMeans
K-means clustering implementation
pre_process_cluster_mapping
Preprocessing function that calls this method
- RES.cluster.pre_process_cluster_mapping(cells_scored: DataFrame, vis_directory: str, wcss_tolerance: float, sub_national_unit_tag: str, resource_type: str) tuple[DataFrame, DataFrame] [source]
Preprocess data and determine optimal cluster numbers for each region.
Performs comprehensive preprocessing of scored grid cells to prepare them for K-means clustering analysis. The function handles missing data, determines optimal cluster numbers for each administrative region, and generates visualization outputs for clustering analysis.
This function serves as the preprocessing pipeline that prepares raw scored cell data for the main clustering workflow, ensuring data quality and generating region-specific clustering parameters.
- Parameters:
cells_scored (pd.DataFrame) -- GeoDataFrame containing scored grid cells with LCOE and capacity data Must include columns: 'Region', 'lcoe_{resource_type}', 'potential_capacity_{resource_type}'
vis_directory (str) -- Base directory path for saving visualization outputs Elbow plots will be saved in subdirectory 'Regional_cluster_Elbow_Plots'
wcss_tolerance (float) -- WCSS tolerance threshold for optimal cluster determination (0.0 to 1.0) Controls trade-off between cluster number and representation accuracy
resource_type (str) -- Resource type identifier ('solar', 'wind', 'bess') Used for column name construction and labeling
- Returns:
cells_scored_cluster_mapped: Enhanced cell data with optimal k values and cell IDs
region_optimal_k_df: Summary of optimal cluster numbers by region
- Return type:
tuple[pd.DataFrame, pd.DataFrame]
Examples
Preprocess solar cell data:
>>> cells_mapped, optimal_k_summary = pre_process_cluster_mapping( ... cells_scored=scored_solar_cells, ... vis_directory="vis/BC", ... wcss_tolerance=0.15, ... resource_type="solar" ... ) >>> print(f"Processed {len(cells_mapped)} cells across {len(optimal_k_summary)} regions")
Processing Workflow#
Region Iteration: Process each unique administrative region separately
Data Validation: Check for required columns and sufficient data
Data Cleaning: Handle infinite values and missing data through imputation
Optimal K Finding: Apply elbow method to determine cluster numbers
Visualization: Generate and save elbow plots for each region
Data Integration: Merge optimal k values back to cell data
ID Assignment: Generate unique cell identifiers for data linking
Data Quality Handling#
Missing Columns: Regions without required columns are skipped
Infinite Values: Replaced with NaN for proper imputation
Empty Data: Regions with insufficient data are excluded
Imputation: Uses mean strategy for missing value replacement
Zero Clusters: Regions with optimal_k=0 are filtered out
Output Structure#
cells_scored_cluster_mapped contains: - All original cell attributes - 'Optimal_k': Optimal cluster number for the cell's region - 'cell': Unique cell identifier (set as index)
region_optimal_k_df contains: - 'Region': Administrative region name - 'Optimal_k': Optimal number of clusters for the region
Visualization Outputs#
Generates elbow plots saved to: {vis_directory}/Regional_cluster_Elbow_Plots/elbow_plot_region_{region}.png
Each plot shows: - WCSS vs. number of clusters - Optimal k marked with vertical line - Region-specific title and labels
Notes
Processing is performed region-by-region for spatial coherence
Imputation strategy can affect clustering quality
Visualization directory is created if it doesn't exist
Regions with insufficient data (< 2 cells) may be skipped
Memory usage scales with number of regions and cells per region
- raises ValueError:
If vis_directory path is invalid or cannot be created
- raises KeyError:
If required columns are missing from cells_scored
- raises RuntimeError:
If imputation or clustering fails for critical regions
See also
find_optimal_K
Core optimal cluster determination function
assign_cluster_id
Cell identifier generation function
cells_to_cluster_mapping
Main clustering workflow function
Note
If the above documentation doesn't render properly, this module provides clustering algorithms for renewable energy resource grouping and analysis.
Key functions from RES.cluster
:
assign_cluster_id()
: Generate unique cell identifiersdetermine_elbow_optimal_clusters()
: Automatic cluster number optimizationcluster_sites()
: K-means clustering with economic weightingget_representative_timeseries()
: Cluster-representative time series generation
Visualization Tools#
Comprehensive plotting and mapping utilities.
Visualization and plotting utilities for renewable energy resource assessment.
This module provides comprehensive visualization tools for displaying renewable energy assessment results including spatial maps, time series plots, capacity distributions, economic analysis charts, and interactive dashboards. It supports both static publication-quality figures and interactive web-based visualizations.
The visualization tools are designed to facilitate analysis interpretation, result communication, and workflow debugging through clear, informative graphics that highlight spatial patterns, temporal variations, and economic trade-offs in renewable energy development potential.
- Key Functions:
Spatial mapping: Choropleth maps of resource potential and constraints
Time series visualization: Capacity factor profiles and seasonal patterns
Economic analysis: LCOE distributions and cost component breakdowns
Cluster visualization: Site groupings and representative characteristics
Interactive dashboards: Web-based exploration interfaces
Export utilities: High-resolution figure generation for publications
- Dependencies:
matplotlib/seaborn: Static plotting and publication graphics
plotly: Interactive visualizations and dashboards
folium: Web-based interactive maps
geopandas: Spatial data visualization
xarray: Multi-dimensional data plotting
- RES.visuals.add_compass_arrow(ax, x: float = 0.9, y: float = 0.9, fontsize: float = 9, color: str = 'grey', length: float = 0.05, text_offset: float = 0.01, arrow_head_width: float = 6, arrow_width=1.5)[source]
Adds a simple north arrow to the plot. :param ax: The plot axes to annotate. :type ax: matplotlib.axes.Axes :param x: X position in axes fraction coordinates. :type x: float :param y: Y position in axes fraction coordinates. :type y: float :param length: Length of the arrow in axes fraction units. :type length: float :param text_offset: Offset for the 'N' label below the arrow. :type text_offset: float
- RES.visuals.add_compass_arrow_custom(ax, x: float = 0.9, y: float = 0.9, fontsize: float = 9, color: str = 'grey', length: float = 0.01, text_offset: float = 0.01, arrow_head_width: float = 6, arrow_border_width: float = 0.5, text: str = 'N')[source]
Alternative version with more arrow head customization. Uses the older arrow method for more control over head dimensions.
- RES.visuals.add_compass_to_plot(ax, x_offset=0.76, y_offset=0.92, size=14, triangle_size=0.02)[source]
Adds a simple upward-pointing triangle with an 'N' label below it as a North indicator.
- Parameters:
ax (matplotlib.axes.Axes) -- The plot axes to annotate.
x_offset (float) -- X position in axes fraction coordinates.
y_offset (float) -- Y position in axes fraction coordinates.
size (int) -- Font size for the 'N' label.
triangle_size (float) -- Radius of the triangle (in axes fraction units).
- RES.visuals.create_key_data_map_interactive(province_gadm_regions_gdf: GeoDataFrame, provincial_conservation_protected_lands: GeoDataFrame, aeroway_with_buffer_solar: GeoDataFrame, aeroway_with_buffer_wind: GeoDataFrame, aeroway: GeoDataFrame, provincial_bus_gdf: GeoDataFrame, current_region: dict, about_OSM_data: dict[dict], map_html_save_to: str)[source]
Creates an interactive map with key data for a specific province, including regions, conservation lands, aeroways, and bus nodes.
- Parameters:
province_gadm_regions_gdf (gpd.GeoDataFrame) -- GeoDataFrame containing the province's administrative regions.
provincial_conservation_protected_lands (gpd.GeoDataFrame) -- GeoDataFrame containing conservation and protected lands.
aeroway_with_buffer_solar (gpd.GeoDataFrame) -- GeoDataFrame containing solar aeroways with buffer zones.
aeroway_with_buffer_wind (gpd.GeoDataFrame) -- GeoDataFrame containing wind aeroways with buffer zones.
aeroway (gpd.GeoDataFrame) -- GeoDataFrame containing aeroways.
provincial_bus_gdf (gpd.GeoDataFrame) -- GeoDataFrame containing provincial bus routes.
current_region (dict) -- Dictionary containing information about the current region.
about_OSM_data (dict[dict]) -- Dictionary containing information about OSM data.
map_html_save_to (str) -- _description_
- RES.visuals.create_raster_image_with_legend(raster: str, cmap: str, title: str, plot_save_to: str)[source]
Creates a raster image with a legend for land classes.
- RES.visuals.create_sites_ts_plots_all_sites(resource_type: str, CF_ts_df: DataFrame, save_to_dir: str)[source]
Creates an interactive timeseries plot for the top sites of a given resource type. :param resource_type: The type of resource (e.g., 'solar', 'wind'). :type resource_type: str :param CF_ts_df: DataFrame containing the capacity factor timeseries data. :type CF_ts_df: pd.DataFrame :param save_to_dir: Directory to save the plot. :type save_to_dir: str
- RES.visuals.create_sites_ts_plots_all_sites_2(resource_type: str, CF_ts_df: DataFrame, save_to_dir: str)[source]
- RES.visuals.create_timeseries_interactive_plots(ts_df: DataFrame, save_to_dir: str)[source]
- RES.visuals.create_timeseries_plots(cells_df, CF_timeseries_df, max_resource_capacity, dissolved_indices, resampling, representative_color_palette, std_deviation_gradient, vis_directory)[source]
- RES.visuals.create_timeseries_plots_solar(cells_df, CF_timeseries_df, dissolved_indices, max_solar_capacity, resampling, solar_vis_directory)[source]
Generates time series plots for solar capacity factor (CF) data. :param cells_df: DataFrame containing cell information. :type cells_df: pd.DataFrame :param CF_timeseries_df: DataFrame containing capacity factor time series data. :type CF_timeseries_df: pd.DataFrame :param dissolved_indices: Dictionary mapping regions and cluster numbers to indices in CF_timeseries_df. :type dissolved_indices: dict :param max_solar_capacity: Maximum solar capacity for investment. :type max_solar_capacity: float :param resampling: Resampling frequency for the time series data. :type resampling: str :param solar_vis_directory: Directory to save the generated plots. :type solar_vis_directory: str
- RES.visuals.get_CF_wind_check_plot(cells: GeoDataFrame, gwa_raster_data: DataArray, boundary: GeoDataFrame, region_code: str, region_name: str, columns: list, figure_height: int = 7, font_family: str = 'sans-serif', save_to: str | Path = None)[source]
Plots GWA benchmark (left), CF_IEC3 (middle), and wind_CF_mean (right).
- RES.visuals.get_conservation_lands_plot(CPCAD_actual: GeoDataFrame, CPCAD_with_buffer: GeoDataFrame, save_to: Path | str, font_family: str = 'sans-serif')[source]
Creates a plot comparing original and buffered conservation lands.
- RES.visuals.get_data_in_map_plot(cells, resource_type: str = None, datafield: str = None, title: str = None, ax=None, compass_size: float = 10, font_family: str = None, discalimers: bool = False, show=True)[source]
Plots a map of renewable energy resources (solar or wind) with capacity factor, potential capacity, or LCOE. :param cells: GeoDataFrame containing the resource data. :type cells: gpd.GeoDataFrame :param resource_type: Type of renewable resource ('solar' or 'wind'). Defaults to None. :type resource_type: str, optional :param datafield: Data field to plot ('CF', 'CAPACITY', or 'SCORE'). Defaults to None. :type datafield: str, optional :param title: Title for the plot. Defaults to None. :type title: str, optional :param ax: Axes to plot on. If None, a new figure and axes are created. Defaults to None. :type ax: matplotlib.axes.Axes, optional :param compass_size: Size of the compass in the plot. Defaults to 10. :type compass_size: float, optional :param font_family: Font family for text in the plot. Defaults to 'sans-serif'. :type font_family: str, optional :param discalimers: Whether to include disclaimers in the plot. Defaults to False. :type discalimers: bool, optional :param show: Whether to display the plot. Defaults to True. :type show: bool, optional
- Returns:
The axes with the plotted map.
- Return type:
ax (matplotlib.axes.Axes)
- RES.visuals.get_selected_vs_missed_visuals(cells: GeoDataFrame, province_short_code, resource_type, lcoe_threshold: float, CF_threshold: float, capacity_threshold: float, text_box_x=0.4, text_box_y=0.95, title_y=1, title_x=0.6, font_size=10, dpi=1000, figsize=(12, 7), save=False)[source]
Generate visualizations for selected vs missed cells.
- Parameters:
cells (gpd.GeoDataFrame) -- GeoDataFrame containing cell data.
province_short_code (str) -- Short code for the province.
resource_type (str) -- Type of renewable resource (e.g., 'solar', 'wind').
lcoe_threshold (float) -- _description_
CF_threshold (float) -- _description_
capacity_threshold (float) -- _description_
text_box_x (float, optional) -- _description_. Defaults to .4.
text_box_y (float, optional) -- _description_. Defaults to .95.
title_y (int, optional) -- _description_. Defaults to 1.
title_x (float, optional) -- _description_. Defaults to 0.6.
font_size (int, optional) -- _description_. Defaults to 10.
dpi (int, optional) -- _description_. Defaults to 1000.
figsize (tuple, optional) -- _description_. Defaults to (12, 7).
save (bool, optional) -- _description_. Defaults to False.
- RES.visuals.get_stepwise_availability_plots(excluder: ExclusionContainer, region_shape: GeoDataFrame, raster_configs: list[dict], vector_configs: list[dict], save_to: str | Path)[source]
- RES.visuals.plot_data_in_GADM_regions(dataframe, data_column_df, gadm_regions_gdf, color_map, dpi, plt_title, plt_file_name, vis_directory)[source]
Plots data from a DataFrame on GADM regions using GeoPandas and Matplotlib.
- Parameters:
dataframe (pd.DataFrame) -- DataFrame containing the data to plot.
data_column_df (str) -- Name of the column in the DataFrame to plot.
gadm_regions_gdf (gpd.GeoDataFrame) -- GeoDataFrame containing the GADM regions.
color_map (str) -- Name of the color map to use for the plot.
dpi (int) -- Dots per inch for the plot.
plt_title (str) -- Title of the plot.
plt_file_name (str) -- File name for saving the plot.
vis_directory (str) -- Directory for saving the visualization.
- RES.visuals.plot_gaez_raster_with_boundary(raster_path, legend_csv, gdf_path, dst_crs='EPSG:4326', figsize=(12, 7), compass_length=0.1, font_family='serif', title=None, plot_save_to=None)[source]
Plot a GAEZ categorical raster with a shadowed boundary layer using colors from CSV.
- RES.visuals.plot_grid_lines(region_code: str, region_name: str, lines: GeoDataFrame, boundary: GeoDataFrame, font_family: str = None, figsize: tuple = (10, 8), dpi=500, save_to: str | Path = None, show: bool = True)[source]
Plots transmission lines with binned voltage levels in a specified region.
- RES.visuals.plot_resources_scatter_metric(resource_type: str, clusters_resources: GeoDataFrame, lcoe_threshold: float = 999, color=None, save_to_root: str | Path = 'vis')[source]
Generate a scatter plot visualizing the relationship between Capacity Factor (CF) and Levelized Cost of Energy (LCOE) for renewable energy resources (solar or wind). The plot highlights clusters of resources based on their potential capacity. :param resource_type: The type of renewable resource to plot. Must be either 'solar' or 'wind'. :type resource_type: str :param clusters_resources: A GeoDataFrame containing resource cluster data.
- Expected columns include:
'CF_mean': Average capacity factor of the resource cluster.
'lcoe': Levelized Cost of Energy for the resource cluster.
'potential_capacity': Potential capacity of the resource cluster (used for bubble size).
- Parameters:
lcoe_threshold (float) -- The maximum LCOE value to include in the plot. Clusters with LCOE above this threshold are excluded.
color (optional) -- Custom color for the scatter plot bubbles. Defaults to 'darkorange' for solar and 'navy' for wind.
save_to_root (str | Path, optional) -- Directory path where the plot image will be saved. Defaults to 'vis'.
- Returns:
The function saves the generated plot as a PNG image in the specified directory.
- Return type:
None
Notes
The size of the bubbles in the scatter plot represents the potential capacity of the resource clusters.
The x-axis (CF_mean) is formatted as percentages for better readability.
A legend is included to indicate the bubble sizes in gigawatts (GW).
The plot includes an annotation explaining the scoring methodology for LCOE.
The plot is saved as a transparent PNG image with a resolution of 600 dpi.
Example
>>> plot_resources_scatter_metric( ... resource_type='solar', ... clusters_resources=solar_clusters_gdf, ... lcoe_threshold=50, ... save_to_root='output/plots' ... )
- RES.visuals.plot_resources_scatter_metric_combined(solar_clusters: DataFrame, wind_clusters: DataFrame, bubbles_GW: list = [1, 5, 10], bubbles_scale: float = 0.4, lcoe_threshold: float = 200, font_family=None, figsize=(3.5, 2.5), dpi=1000, save_to_root: str = 'vis', set_transparent: bool = False)[source]
Plot combined scatter metrics for solar and wind resources.
- Parameters:
solar_clusters (pd.DataFrame) -- DataFrame containing solar cluster data.
wind_clusters (pd.DataFrame) -- DataFrame containing wind cluster data.
bubbles_GW (list, optional) -- List of bubble sizes in GW. Defaults to [1, 5, 10].
bubbles_scale (float, optional) -- Scaling factor for bubble sizes. Defaults to 0.4.
lcoe_threshold (float, optional) -- LCOE threshold for filtering. Defaults to 200.
font_family (str, optional) -- Font family for the plot. Defaults to 'sans-serif'.
save_to_root (str, optional) -- Directory to save the plot. Defaults to 'vis'.
set_transparent (bool, optional) -- Whether to set the background transparent. Defaults to False.
- RES.visuals.plot_with_matched_cells(ax, cells: GeoDataFrame, filtered_cells: GeoDataFrame, column: str, cmap: str, background_cell_linewidth: float, selected_cells_linewidth: float, font_size: int = 9)[source]
Helper function to plot cells with matched cells overlay.
- RES.visuals.size_for_legend(mw)[source]
Calculate bubble size for capacity-based map legends.
Converts megawatt capacity values to appropriate bubble sizes for proportional symbol maps, ensuring visual clarity and proper scaling across different capacity ranges.
- Parameters:
mw (float) -- Capacity value in megawatts
- Returns:
Scaled bubble size for mapping visualization
- Return type:
float
Examples
>>> size_for_legend(100) # 100 MW site 50.0 >>> size_for_legend(500) # 500 MW site 150.0
- RES.visuals.visualize_ss_nodes(substations_gdf, provincem_gadm_regions_gdf: GeoDataFrame, plot_name)[source]
Visualizes transmission nodes (buses) on a map with different colors based on substation types.
Parameters: - gadm_regions_gdf (GeoDataFrame): GeoDataFrame containing base regions to plot. - buses_gdf (GeoDataFrame): GeoDataFrame containing buses with 'substation_type' column. - plot_name (str): File path to save the plot image.
Returns: - None
The RES.visuals
module provides:
Spatial mapping with choropleth visualization
Time series plotting and seasonal analysis
Economic analysis charts and distributions
Interactive web-based dashboards
Publication-quality figure export
Local Data Store with HDF5 file#
HDF5-based data storage and retrieval.
- class RES.hdf5_handler.DataHandler(hdf_file_path: Path = None, silent_initiation: bool | None = True, show_structure: bool | None = False)[source]
Bases:
object
A class to handle reading and writing data to an HDF5 file. This class provides methods to save DataFrames or GeoDataFrames to an HDF5 file. This class is useful for managing large datasets efficiently, allowing for quick access and storage of structured data.
- Key Features:
Save DataFrames or GeoDataFrames to an HDF5 file with optional geometry handling.
Load data from the HDF5 file, converting WKT geometries back to GeoDataFrames.
Manage the structure of the HDF5 file, including showing the tree structure and deleting keys.
- Dependencies:
pandas: For DataFrame operations
geopandas: For GeoDataFrame operations
h5py: For HDF5 file handling
shapely: For geometry serialization and deserialization
- store
Path to the HDF5 file.
- Type:
Path
- data_new
Data to be saved.
- Type:
pd.DataFrame or gpd.GeoDataFrame
- data_ext
Existing data from the store.
- Type:
pd.DataFrame or gpd.GeoDataFrame
- updated_data
Updated data after merging new data.
- Type:
pd.DataFrame or gpd.GeoDataFrame
- __init__(hdf_file_path
Path, silent_initiation: Optional[bool] = True, show_structure: Optional[bool] = False): Initializes the DataHandler with the file path.
- to_store(data
pd.DataFrame or gpd.GeoDataFrame, key: str, hdf_file_path: Path = None, force_update: bool = False): Saves the DataFrame or GeoDataFrame to the HDF5 file.
- from_store(key
str): Loads data from the HDF5 store and handles geometry conversion.
- refresh()[source]
Initializes a new DataHandler instance with the current store path.
- show_tree(store_path
Path, show_dataset: bool = False): Recursively prints the hierarchy of an HDF5 file.
- del_key(store_path
Path, key_to_delete: str): Deletes a specific key from the HDF5 file.
- static del_key(store_path, key_to_delete: str)[source]
Deletes a specific key from the HDF5 file.
- Parameters:
store_path (Path) -- Path to the HDF5 file.
key_to_delete (str) -- The key to delete from the HDF5 file.
- Raises:
KeyError -- If the key does not exist in the HDF5 file.
- Returns:
This method prints the status of the deletion operation.
- Return type:
None
Example
>>> DataHandler.del_key(Path('data.h5'), 'my_key') This will delete 'my_key' from the 'data.h5' file if it exists
- from_store(key: str)[source]
Load data from the HDF5 store and handle geometry conversion.
- Parameters:
key (str) -- Key for loading the DataFrame or GeoDataFrame.
- Returns:
The loaded DataFrame or GeoDataFrame.
- Return type:
pd.DataFrame or gpd.GeoDataFrame
- Raises:
FileNotFoundError -- If the key is not found in the store.
TypeError -- If the loaded data is not a DataFrame or GeoDataFrame.
- refresh()[source]
Initialize a new DataHandler instance with the current store path. This method is useful for reloading the DataHandler with the same store path without needing to reinitialize the entire class.
- Parameters:
None
- Returns:
A new instance of DataHandler with the same store path.
- Return type:
DataHandler
- static show_tree(store_path, show_dataset: bool = False)[source]
This method provides a structured view of the keys and datasets within the HDF5 file, allowing users to understand its organization.
- Parameters:
store_path (Path) -- Path to the HDF5 file.
show_dataset (bool) -- If True, also show datasets within the groups.
- Raises:
Exception -- If there is an error reading the file.
- Returns:
This method prints the structure to the console.
- Return type:
None
- to_store(data: DataFrame, key: str, hdf_file_path: Path = None, force_update: bool = False)[source]
Save the DataFrame or GeoDataFrame to an HDF5 file.
- Parameters:
hdf_file_path (Path) -- Path to the HDF5 file. If None, it uses the existing store path.
data (pd.DataFrame or gpd.GeoDataFrame) -- The DataFrame or GeoDataFrame to save.
key (str) -- Key for saving the DataFrame to the HDF5 file.
force_update (bool) -- If True, force update the data even if it exists.
- Raises:
TypeError -- If the data is not a DataFrame or GeoDataFrame.
ValueError -- If the key is empty.
Note
If the above documentation doesn't render, this class provides HDF5-based data storage and retrieval capabilities for the RESource framework.
Utility Functions#
Common helper functions and data operations.
The RES.utility
module includes:
Configuration file parsing and validation
Data I/O operations (YAML, JSON, geospatial formats)
Coordinate transformations and spatial utilities
Hierarchical logging and progress reporting
URL downloading and caching mechanisms
Configuration Management#
All classes inherit configuration parsing capabilities from AttributesParser
, enabling:
YAML-based configuration management
Parameter validation and default value handling
Environment-specific settings (development, production)
Technology-specific parameter sets
Data Storage#
The framework uses HDF5-based storage through DataHandler
for:
Efficient large dataset management
Automated caching to avoid redundant computations
Cross-platform compatibility
Hierarchical data organization
Examples#
Basic Assessment Workflow#
from RES.RESources import RESources_builder
# Initialize assessment
builder = RESources_builder(
config_file_path="config/config_BC.yaml",
region_short_code="BC",
resource_type="wind"
)
# Execute complete workflow
results = builder.build(
select_top_sites=True,
use_pypsa_buses=True,
memory_resource_limitation=True
)
# Export results
builder.export_results(*results, save_to="output/BC_wind/")
Step-by-Step Analysis#
# Manual workflow control
cells = builder.get_grid_cells()
cells_with_capacity = builder.get_cell_capacity()
cells_with_timeseries = builder.get_CF_timeseries(cells_with_capacity)
scored_cells = builder.score_cells(cells_with_timeseries)
clusters = builder.get_clusters(scored_cells)
Notes#
All spatial data maintained in WGS84 (EPSG:4326) coordinate system
Time series generated at hourly resolution for full assessment years
Economic calculations follow NREL LCOE methodology
Clustering uses k-means with automatic optimization
Caching mechanisms minimize redundant computation
Modular design enables workflow customization
Warning
This page is under heavy development