Source code for RES.gaez

from pathlib import Path
from zipfile import ZipFile

import matplotlib.pyplot as plt
import rasterio
import requests
from rasterio.mask import mask

from RES import utility as utils
from RES.AttributesParser import AttributesParser
from RES.boundaries import GADMBoundaries

print_level_base=4

# Define the GAEZRasterProcessor class
[docs] class GAEZRasterProcessor(AttributesParser): """ GAEZ (Global Agro-Ecological Zones) raster data processor for renewable energy land constraint analysis. This class handles the download, extraction, clipping, and visualization of GAEZ raster datasets used in renewable energy resource assessment. GAEZ provides global spatial data on agricultural suitability, land resources, and ecological constraints that are essential for identifying suitable areas for renewable energy development while avoiding productive agricultural land. The processor integrates GAEZ land constraint data with regional boundaries to support renewable energy siting decisions and capacity assessments. It automatically downloads the required raster datasets, extracts specific layers based on configuration, clips them to regional boundaries, and generates visualization outputs for analysis. INHERITED METHODS FROM AttributesParser: ---------------------------------------- - get_gaez_data_config() -> Dict[str, dict]: Get GAEZ dataset configuration parameters - get_region_name() -> str: Get full region name for display purposes - Plus other configuration access methods INHERITED ATTRIBUTES FROM AttributesParser: ------------------------------------------- - config_file_path: Path to configuration file - region_short_code: Region identifier code - resource_type: Resource type identifier - Plus other configuration attributes OWN METHODS DEFINED IN THIS CLASS: ---------------------------------- - process_all_rasters(): Main pipeline for processing all configured raster types - plot_gaez_tif(): Generate visualization plots for processed raster data - __download_resources_zip_file__(): Download GAEZ ZIP archive from remote source - __extract_rasters__(): Extract required raster files from ZIP archive - __clip_to_boundary_n_plot__(): Clip rasters to regional boundaries and generate plots Parameters ---------- config_file_path : str or Path Path to configuration file containing GAEZ dataset parameters region_short_code : str Region identifier for boundary definition and file naming resource_type : str Resource type ('solar', 'wind', 'bess') - used for dependency injection Attributes ---------- gadmBoundary : GADMBoundaries GADM boundary processor for regional extent definition gaez_config : dict GAEZ dataset configuration parameters from config file gaez_root : Path Root directory for GAEZ data storage and processing zip_file : Path Path to the GAEZ ZIP archive file Rasters_in_use_direct : Path Directory for extracted and processed raster files raster_types : list List of raster type configurations to process region_boundary : gpd.GeoDataFrame Regional boundary geometry for clipping operations Methods ------- process_all_rasters(show=False) -> dict Main pipeline to download, extract, clip, and plot all configured rasters plot_gaez_tif(tif_path, color_map, plot_title, save_to, show=False) -> matplotlib.Figure Generate and save visualization plots for raster data Examples -------- Process GAEZ rasters for British Columbia: >>> from RES.gaez import GAEZRasterProcessor >>> gaez_processor = GAEZRasterProcessor( ... config_file_path="config/config_BC.yaml", ... region_short_code="BC", ... resource_type="solar" ... ) >>> raster_paths = gaez_processor.process_all_rasters(show=True) >>> print(f"Processed {len(raster_paths)} raster types") Access specific raster data: >>> # Raster paths are returned as dictionary >>> if 'slope' in raster_paths: ... slope_path = raster_paths['slope'] ... print(f"Slope raster available at: {slope_path}") Configuration Requirements -------------------------- The GAEZ configuration must include: ```yaml gaez_data: root: "data/downloaded_data/GAEZ" # Storage directory source: "https://s3.eu-west-1.amazonaws.com/data.gaezdev.aws.fao.org/LR.zip" zip_file: "LR.zip" # ZIP archive filename Rasters_in_use_direct: "Rasters_in_use" # Extraction directory raster_types: - name: "slope" raster: "slope.tif" zip_extract_direct: "slope" color_map: "terrain" # Additional raster type configurations... ``` Data Processing Workflow ------------------------ 1. **Configuration Loading**: Extract GAEZ parameters from config file 2. **Download Check**: Verify ZIP archive exists or download from source 3. **Extraction**: Extract required raster files from ZIP archive 4. **Boundary Processing**: Get regional boundaries from GADM processor 5. **Clipping**: Clip each raster to regional boundaries 6. **Visualization**: Generate plots for each processed raster 7. **Path Management**: Return dictionary of processed raster file paths Raster Type Configuration ------------------------- Each raster type requires: - **name**: Identifier for the raster layer - **raster**: Filename of the raster file within ZIP archive - **zip_extract_direct**: Directory path within ZIP archive - **color_map**: Matplotlib colormap for visualization Supported Raster Types ---------------------- Common GAEZ rasters include: - **Slope**: Terrain slope for accessibility analysis - **Soil Quality**: Agricultural productivity constraints - **Land Cover**: Vegetation and land use classifications - **Elevation**: Digital elevation model data - **Climate Zones**: Agro-ecological zone classifications Spatial Processing ------------------ - **Input CRS**: Inherits coordinate system from source rasters - **Clipping**: Uses regional boundaries with geometry buffering - **Output Format**: GeoTIFF files with preserved metadata - **Resolution**: Maintains original raster resolution - **Compression**: Optimized file storage for large datasets Visualization Features ---------------------- - **Automatic Plotting**: Generates plots for all processed rasters - **Custom Colormaps**: Configurable visualization schemes - **Coordinate Display**: Latitude/longitude axis labels - **Legend Integration**: Horizontal colorbar with value indicators - **File Output**: PNG format with high-resolution settings Performance Considerations -------------------------- - ZIP download time scales with file size (typically 100MB-1GB) - Extraction time depends on number of raster layers - Clipping operations are memory-intensive for large regions - Multiple raster processing benefits from parallel execution - Network connectivity affects initial download performance Integration Points ------------------ - **Boundaries**: Uses GADMBoundaries for regional extent definition - **Land Constraints**: Provides input data for land availability analysis - **Capacity Calculation**: Supports renewable energy siting decisions - **Visualization**: Integrates with broader visualization workflows Error Handling -------------- - **Download Failures**: Graceful handling of network issues - **Missing Files**: Clear error messages for missing raster files - **Extraction Errors**: Validation of ZIP archive contents - **Processing Failures**: Detailed logging for debugging Output Management ----------------- - **Organized Storage**: Systematic directory structure for processed data - **File Naming**: Consistent naming convention with region identifiers - **Metadata Preservation**: Maintains spatial reference and statistics - **Visualization Archive**: Organized plot storage for documentation Notes ----- - GAEZ data is provided by FAO (Food and Agriculture Organization) - Raster datasets are typically global coverage at moderate resolution - Processing large regions may require substantial disk space - Results integrate with renewable energy assessment workflows - Visualization outputs support decision-making and reporting - ZIP archives are cached locally to avoid repeated downloads Dependencies ------------ - requests: HTTP downloading of ZIP archives - rasterio: Raster data reading, processing, and writing - zipfile: ZIP archive extraction and management - pathlib: File path operations and directory management - matplotlib: Visualization and plot generation - RES.AttributesParser: Parent class for configuration management - RES.boundaries.GADMBoundaries: Regional boundary processing - RES.utility: Logging and status update functions Raises ------ ConnectionError If GAEZ data download fails or source is unavailable FileNotFoundError If required raster files are missing from ZIP archive ValueError If configuration parameters are invalid or incomplete RuntimeError If raster processing or clipping operations fail See Also -------- rasterio.mask.mask : Raster clipping functionality RES.boundaries.GADMBoundaries : Regional boundary processing RES.lands : Land constraint integration for renewable energy """ def __post_init__(self): """ Initialize GAEZ raster processor with configuration and boundary setup. Performs post-initialization setup including: - Calling parent class initialization - Setting up boundary processor with inherited attributes - Loading GAEZ configuration from config file - Creating directory structure for data storage - Configuring raster type processing parameters This method is automatically called after dataclass initialization to prepare the processor for raster data operations. Raises ------ FileNotFoundError If configuration file or directories cannot be created ValueError If required configuration parameters are missing """ super().__post_init__() self.required_args = { #order doesn't matter "config_file_path" : self.config_file_path, # INHERITED ATTRIBUTE from AttributesParser "region_short_code": self.region_short_code, # INHERITED ATTRIBUTE from AttributesParser "resource_type": self.resource_type # INHERITED ATTRIBUTE from AttributesParser } self.gadmBoundary= GADMBoundaries(**self.required_args) self.gaez_config: dict = super().get_gaez_data_config() self.gaez_root = Path(self.gaez_config.get('root', 'data/downloaded_data/GAEZ')) self.gaez_root.mkdir(parents=True, exist_ok=True) self.zip_file = Path(self.gaez_config['zip_file']) self.Rasters_in_use_direct = Path(self.gaez_config['Rasters_in_use_direct']) self.Rasters_in_use_direct.mkdir(parents=True, exist_ok=True) self.raster_types = self.gaez_config['raster_types'] self.region_boundary = None
[docs] def process_all_rasters(self, show:bool=False): """ Main pipeline to download, extract, clip, and plot rasters based on configuration. Executes the complete GAEZ raster processing workflow including: 1. Downloading ZIP archive if not present locally 2. Extracting required raster files from archive 3. Loading regional boundaries for clipping operations 4. Processing each configured raster type by clipping to boundaries 5. Generating visualization plots for all processed rasters This method orchestrates all processing steps and returns paths to the processed raster files for downstream analysis. Parameters ---------- show : bool, default=False Whether to display generated plots interactively during processing. If True, matplotlib plots will be shown on screen. If False, plots are saved to disk without display. Returns ------- dict Dictionary mapping raster type names to processed file paths. Keys are raster type names from configuration. Values are Path objects pointing to clipped raster files. Examples -------- Process all rasters with visualization: >>> raster_paths = processor.process_all_rasters(show=True) >>> print(f"Processed rasters: {list(raster_paths.keys())}") Process rasters for programmatic use: >>> raster_paths = processor.process_all_rasters(show=False) >>> slope_raster = raster_paths.get('slope') >>> if slope_raster: ... print(f"Slope data at: {slope_raster}") Raises ------ ConnectionError If ZIP archive download fails due to network issues FileNotFoundError If required raster files are missing from archive RuntimeError If raster clipping or processing operations fail Notes ----- - Processing time scales with number of raster types and region size - Large regions may require substantial disk space for processed rasters - Network connection required for initial ZIP archive download - Existing processed rasters are not regenerated unless missing - All plots are saved regardless of the show parameter setting """ if not (self.gaez_root / self.zip_file).exists(): self.__download_resources_zip_file__() raster_paths = {} self.__extract_rasters__() self.region_boundary = self.gadmBoundary.get_region_boundary() utils.print_update(level=print_level_base,message=f"{__name__}| Clipping Rasters to regional boundaries.. ") # Loop over raster types and process each for raster_type in self.raster_types: raster_path = self.__clip_to_boundary_n_plot__(raster_type, self.region_boundary.geometry, show) raster_paths[raster_type['name']] = raster_path utils.print_update(level=print_level_base,message=f"{__name__}| ✔ All required rasters for GAEZ processed and plotted successfully.") return raster_paths
def __download_resources_zip_file__(self): """ Download the GAEZ resources ZIP archive from remote source. Downloads the GAEZ raster data ZIP file from the configured source URL if it doesn't already exist locally. The ZIP archive contains all the raster datasets needed for land constraint analysis in renewable energy resource assessment. The method uses the source URL from configuration with a fallback to the default FAO GAEZ data source. Downloaded files are saved to the configured root directory for subsequent extraction and processing. Raises ------ ConnectionError If the download fails due to network connectivity issues requests.HTTPError If the HTTP response indicates an error (non-200 status code) IOError If the local file cannot be written due to permissions or disk space Notes ----- - Download size is typically 100MB-1GB depending on data coverage - Network timeout may occur for large files on slow connections - Existing ZIP files are not re-downloaded to save bandwidth - Progress is logged through utility print functions - File integrity is not explicitly verified after download """ url = self.gaez_config.get('source', 'https://s3.eu-west-1.amazonaws.com/data.gaezdev.aws.fao.org/LR.zip') response = requests.get(url) if response.status_code == 200: with open(self.gaez_root / self.zip_file, 'wb') as f: f.write(response.content) utils.print_update(level=print_level_base,message=f"{__name__}| GAEZ Raster Resource '.zip' file downloaded and saved to: {self.gaez_root}") else: utils.print_update(level=print_level_base,message=f"{__name__}| ❌ Failed to download the Resources zip file from GAEZ. Status code: {response.status_code}") def __extract_rasters__(self): """ Extract required raster files from the downloaded GAEZ ZIP archive. Selectively extracts only the raster files specified in the configuration from the GAEZ ZIP archive. Each raster type configuration defines which file to extract and where to place it within the local directory structure. The method preserves the internal ZIP directory structure while extracting files to organized local directories. Existing files are not re-extracted to avoid unnecessary processing time and disk operations. The extraction process is guided by the raster_types configuration which specifies: - raster: Filename of the raster file within the ZIP - zip_extract_direct: Directory path within the ZIP archive Progress and status messages are logged for each extraction operation to provide visibility into the processing workflow. Raises ------ zipfile.BadZipFile If the ZIP archive is corrupted or cannot be read FileNotFoundError If specified raster files are not found within the ZIP archive IOError If extraction fails due to disk space or permission issues Notes ----- - Only configured raster types are extracted, not the entire archive - Directory structure from ZIP is preserved in local extraction - Existing extracted files are skipped to save processing time - Large raster files may take considerable time to extract - Extraction location follows the Rasters_in_use_direct configuration """ with ZipFile(self.gaez_root / self.zip_file, 'r') as zip_ref: for raster_type in self.raster_types: raster_file = raster_type['raster'] zip_direct = raster_type['zip_extract_direct'] file_inside_zip = str(Path(zip_direct) / raster_file) # Ensure it's a single string target_path = self.gaez_root / self.Rasters_in_use_direct / zip_direct / raster_file if not target_path.exists(): # Check for existence as a string in zip_ref if file_inside_zip in zip_ref.namelist(): zip_ref.extract(file_inside_zip, path=self.gaez_root / self.Rasters_in_use_direct) utils.print_update(level=print_level_base,message=f"{__name__}| Raster file '{raster_file}' extracted from {file_inside_zip}") else: utils.print_update(level=print_level_base,message=f"{__name__}| Raster file '{raster_file}' not found in the archive {file_inside_zip}") else: utils.print_update(level=print_level_base,message=f"{__name__}| Raster file '{raster_file}' found in local directory, skipping download.") def __clip_to_boundary_n_plot__(self, raster_type, boundary_geom, show): """ Clip raster data to regional boundaries and generate visualization plot. Performs spatial clipping of GAEZ raster data to match the regional boundaries and creates a visualization plot of the clipped result. This method combines raster processing with immediate visualization to support analysis and quality control of the processed data. The clipping operation uses the rasterio.mask functionality to crop the input raster to the exact geometry of the regional boundaries, preserving the original raster properties while reducing the spatial extent to the region of interest. Parameters ---------- raster_type : dict Raster type configuration containing: - 'zip_extract_direct': Directory path within extraction structure - 'raster': Filename of the raster file to process - 'name': Display name for the raster type - 'color_map': Matplotlib colormap for visualization boundary_geom : list or geopandas.GeoSeries Geometry objects defining the clipping boundaries. Typically from regional boundary processing. show : bool Whether to display the generated plot interactively. If True, plot is shown on screen in addition to being saved. Returns ------- pathlib.Path Path to the clipped raster file saved to disk. The file maintains GeoTIFF format with spatial reference. Raises ------ rasterio.errors.RasterioIOError If the input raster file cannot be read or is corrupted ValueError If the boundary geometry is invalid or incompatible IOError If the output raster cannot be written due to disk issues Notes ----- - Clipped rasters preserve original resolution and data types - Output files are named with region short code prefix for identification - Visualization plots are automatically saved regardless of show parameter - Memory usage scales with raster size and complexity of clipping geometry - Coordinate reference systems are preserved from input raster - Plot files are saved in organized visualization directory structure """ zip_direct = raster_type['zip_extract_direct'] raster_file = raster_type['raster'] plot_title = raster_type['name'] color_map = raster_type['color_map'] input_raster = self.gaez_root / self.Rasters_in_use_direct / zip_direct / raster_file output_dir = self.gaez_root / self.Rasters_in_use_direct / zip_direct output_dir.mkdir(parents=True, exist_ok=True) clipped_raster_path = output_dir / f"{self.region_short_code}_{raster_file}" with rasterio.open(input_raster) as src: clipped_raster, clipped_transform = mask(src, boundary_geom, crop=True, indexes=src.indexes) clipped_meta = src.meta.copy() clipped_meta.update({ 'height': clipped_raster.shape[1], 'width': clipped_raster.shape[2], 'transform': clipped_transform }) with rasterio.open(clipped_raster_path, 'w', **clipped_meta) as dst: dst.write(clipped_raster) # Call visualization method plot_save_to = Path('vis/misc') / raster_file.replace('.tif', f'_raster_{self.region_short_code}.png') self.plot_gaez_tif(clipped_raster_path, color_map, plot_title, plot_save_to,show) utils.print_update(level=print_level_base+1,message=f"{__name__}| Clipped Raster plot for {super().get_region_name()} saved at: {plot_save_to}") # INHERITED METHOD from AttributesParser return clipped_raster_path
[docs] def plot_gaez_tif(self, tif_path, color_map, plot_title, save_to, show=False): """ Generate and save visualization plot for processed GAEZ raster data. Creates a publication-quality matplotlib visualization of the clipped GAEZ raster data with proper coordinate system display, color mapping, and legend information. The plot includes geographic extent display with latitude/longitude axes and a horizontal colorbar for value interpretation. This method supports both interactive display and file output, making it suitable for both exploratory analysis and report generation workflows. The visualization uses proper geographic coordinates and customizable color schemes to effectively communicate spatial patterns in the data. Parameters ---------- tif_path : str or pathlib.Path Path to the GeoTIFF raster file to visualize. Must be a valid raster file with spatial reference information. color_map : str Name of matplotlib colormap to use for visualization. Examples: 'terrain', 'viridis', 'plasma', 'coolwarm'. plot_title : str Title text to display at the top of the plot. Should describe the raster content and region. save_to : str or pathlib.Path Output path for saving the plot image file. Parent directories will be created if they don't exist. show : bool, default=False Whether to display the plot interactively on screen. If True, plot is shown in addition to being saved. If False, plot is only saved to file without display. Returns ------- matplotlib.figure.Figure The matplotlib Figure object containing the plot. Can be used for further customization or processing. Examples -------- Create a basic plot: >>> fig = processor.plot_gaez_tif( ... tif_path="data/BC_slope.tif", ... color_map="terrain", ... plot_title="Slope Analysis for British Columbia", ... save_to="plots/BC_slope.png" ... ) Create an interactive plot: >>> fig = processor.plot_gaez_tif( ... tif_path="data/BC_elevation.tif", ... color_map="viridis", ... plot_title="Elevation Map", ... save_to="plots/elevation.png", ... show=True ... ) Raises ------ rasterio.errors.RasterioIOError If the input TIF file cannot be read or is corrupted FileNotFoundError If the input TIF file does not exist ValueError If the colormap name is not recognized by matplotlib IOError If the output plot file cannot be written Notes ----- - Plot dimensions are fixed at 10x8 inches for consistency - Colorbar is positioned horizontally below the plot - Geographic extent is automatically derived from raster bounds - Output directories are created automatically if needed - Plot is always saved regardless of the show parameter - Figure is closed after processing to prevent memory leaks - NoData/masked values are handled transparently in visualization """ with rasterio.open(tif_path) as src: data = src.read(1, masked=True) extent = src.bounds save_to.parent.mkdir(parents=True, exist_ok=True) fig, ax = plt.subplots(figsize=(10, 8)) im = ax.imshow(data, cmap=color_map, extent=[extent.left, extent.right, extent.bottom, extent.top]) plt.colorbar(im, ax=ax, label="Layer Class", orientation="horizontal", fraction=0.05, pad=0.08) ax.set_title(plot_title) ax.set_xlabel("Longitude") ax.set_ylabel("Latitude") ax.grid(visible=False) plt.tight_layout() plt.savefig(save_to) if show: plt.show() plt.close(fig) return fig