Data Management in DIVA

This document describes the process of handling data for the DIVA chatbot, such as downloading and processing climate and geographical data.

Setting up the python environment

Before starting, ensure that the required Python environment is prepared according to the instructions found in the “How to Use DIVA” documentation.

Install the necessary libraries by adding the following imports to your Python script:

import xarray as xr
import os
import geopandas as gpd
import pandas as pd

Downloading climate data from “Cache B”

(!) Access to this data is conditioned to special access authorization. Please refer to DestinE’s data policy for more details: https://destine-data-lake-docs.data.destination-earth.eu/en/latest/dedl-discovery-and-data-access/DestinE-Data-Policy-for-DestinE-Digital-Twin-Outputs/DestinE-Data-Policy-for-DestinE-Digital-Twin-Outputs.html

The climate data is stored in “Cache B” as NetCDF files and can be read using the xarray library in Python. Below is an example of available collections in Cache B:

cache_b_collections = [
    "https://cacheb.dcms.destine.eu/era5/reanalysis-era5-land-no-antartica-v0.zarr",
    "https://cacheb.dcms.destine.eu/era5/reanalysis-era5-single-levels-v0.zarr",
    "https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-IFS-NEMO-0001-high-sfc-v0.zarr",
    "https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-IFS-NEMO-0001-high-o2d-v0.zarr",
    "https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-IFS-NEMO-0001-high-pl-v0.zarr",
    "https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-ICON-0001-high-sfc-v0.zarr",
]

You can load a collection using xarray as follows:

da = xr.open_dataset(
    cache_b_collections[0],
    engine='zarr',
    storage_options={"client_kwargs": {"trust_env": "true"}},
    chunks={}
)

Once the dataset is loaded, you can select specific variables of interest. For example:

da = da[['tp']]  # Selecting the variable 'tp' (total precipitation)

Viewing and Saving Data Locally

At this stage, the data is not downloaded but simply “viewed.” To save the data locally, you can use the following method:

path_export = "/path/to/netcdf_file.nc"
data.to_netcdf(path_export)

The saved NetCDF file can be easily reopened later:

data = xr.open_dataset(path_export)

Downloading shapefiles for geographical zones

Shapefiles are used to define geographical zones (e.g., cities, countries). While shapefiles can be sourced from various platforms online, for this project, we use Eurostat’s resources: Eurostat GISCO

Once downloaded, the shapefiles can be loaded into Python using the geopandas library:

shapefile = gpd.read_file(path_data)

The loaded shapefile is similar to a pandas DataFrame, allowing you to perform standard operations for data processing.