How to use Diva¶
DIVA is a chatbot with fully open-source underlying code which can be used as an API.
Preparing the environment¶
Install Python 3 in your operating system.
Install the diva package from the Git repository using the following command (using Mews GitLab as an example):
pip install --upgrade git+https://gitlab.eurobios.com/esa/diva.git
Download the needed data:
Create a folder called data on your computer that will contain all the needed data.
Climate data:
Four climate variables are supported by diva (temperature, wind speed, surface pressure, and precipitation).
The data can be downloaded from Cache B (or from another source). They should be saved in NetCDF (.nc) format with three dimensions: time, longitude, and latitude.
Shapefiles:
Download the shapefiles for cities and countries. These shapefiles are used to filter data.
World cities:
Download the CSV file worldcities.csv, which contains a list of world cities. This is used to compare cities identified by NLP when no match is found in the shapefiles (limited to Europe).
Add the settings file in your home directory:
For Linux: Your home directory is /home/<user>/.
For Windows: Your home directory is C:/Users/<user>/.
Follow these steps:
4.1. Create a new folder in your home directory called: .diva
4.2. Add a settings file called settings.ini and configure the paths for your data and LLM models as follows:
[paths] path_data = /home/user/diva_package/data/ shapefile_countries = NUTS_RG_10M_2021_4326.shp shapefile_cities = cities.shp worldcities = worldcities/worldcities.csv path_model_mistral7B = /home/user/LLM_models/Mistral_instruct_v01
If you wish, the data sources are available in the “Information” tab of the Diva application.
Loading and reading data¶
Before you start, ensure that the environment preparation, as mentioned earlier, has been correctly done.
The climate data, including temperature, wind, pressure, and precipitation, are stored locally in NetCDF (.nc) format. These files contain multi-dimensional data, with key dimensions such as longitude, latitude, and time. This structure allows for the efficient representation of data across various geographic locations and time periods.
In DIVA, we use the DataCollection class from the diva.data.dataset package to load and manage these datasets. Before using DataCollection, we process user-defined parameters (such as time range, location, and variable of interest) with the IGraphGenerator class. This step ensures that all parameters are correctly prepared for data extraction.
Here is an example of how to load the data with DataCollection:
First, import the needed libraries
import pandas as pd
from diva.data.dataset import DataCollection
from diva import parameters
Definition of parameters used by the graphic class. These parameters are extracted by the LLM from the user prompt
# Define parameters for data extraction
locs = [{'location_name_orig': 'Paris',
'location_name': 'Paris',
'addresstype': 'city'}]
time_intervals = [[pd.to_datetime('2020-01-01', format="%Y-%m-%d"),
pd.to_datetime('2020-12-31', format="%Y-%m-%d")]]
Creation of an instance of the class DataCollection to read and filter data
# Load the data collection based on processed parameters
dc = DataCollection(parameters.cache_b_collections, 't2m')
dc = dc.sample_time(time_intervals)
dc = dc.apply_masks(locs)
dc = dc.spatial_aggregation()
The data can be accessed now
# Get raw data
raw_vals = dc.get_values()
# Get aggregated data
dc = dc.temporal_aggregation(aggreg_type='MS', agreg_func='mean') # aggreg type can be : "MS" for month, "YS" for year and "D" for day.
aggreg_vals = dc.get_values()
Overview of Steps:
Parameter Definition: User-specified parameters such as time range, location, variable of interest, and aggregation type are defined and structured.
Data Loading and Preprocessing: The DataCollection class is used to handle the data loading process. This includes applying masks for spatial filtering, sampling over specific time intervals, and performing spatial and temporal aggregations.
Data Access: Processed data can then be accessed for analysis in raw or aggregated form using built-in methods.
Key Methods
sample_time(time_intervals): Samples data within the specified time intervals.
apply_masks(locs): Filters data based on spatial location masks.
spatial_aggregation(): Aggregates data spatially.
temporal_aggregation(aggreg_type, agreg_func): Aggregates data temporally based on the aggregation type (e.g., monthly or yearly) and the specified aggregation function (mean, max, etc.).
get_values(): Retrieves the processed data in its current state (raw or aggregated).
This modular approach ensures flexibility and efficiency in processing climate data for analysis and visualization purposes.
Generating Graphs¶
This section demonstrates how to generate visualizations using the ServiceGeneratePlotlyGraph class. This approach allows users to create interactive and visually appealing graphs, such as line charts, from climate data. The process is designed to be simple and flexible, enabling users to quickly generate visualizations for their analysis.
Before you start, ensure that your Python directory is structured correctly with the diva package located in ./diva/src/.
First, import the needed libraries
# Import needed libraries
import streamlit as st
from diva.graphs.service_graph_generation import ServiceGeneratePlotlyGraph
Definition of parameters used by the graphic class. These parameters are extracted by the LLM from the user prompt
# Define parameters for data extraction in dictionary format
params = {
"starttime": "2020-01-01", # start of period: 2020-01-01
"endtime": "2020-12-31", # end of period: 2020-12-31
"location": "Rome", # city: Rome (can also be a country)
"elementofinterest": "temperature", # variable to analyze
"graph_type": "line chart", # type of graph to generate
"aggreg_type": "month" # aggregation type (e.g., 'month', 'year')
}
Initializing the Streamlit Session:
Streamlit is used to manage the application’s state, such as saving generated plots and storing messages. Initialize the session state as follows:
# Initiate the Streamlit session state for saving figs
st.session_state.plots_history = {}
st.session_state.messages = {}
Create an instance of the ServiceGeneratePlotlyGraph class and use it to generate and display the graph. Here’s how:
# Create an instance of the class
gg = ServiceGeneratePlotlyGraph(params=params, langage="English")
# Generate and display the graph
gg.generate(show=True)
Explanation of Steps:
Parameters Setup: The params dictionary defines the scope of the graph, such as the variable (elementofinterest), time range (starttime and endtime), and the type of graph (graph_type).
Session Initialization: Streamlit’s session state is used to store the generated graph history (plots_history) and any accompanying messages (messages).
Graph Generation: The ServiceGeneratePlotlyGraph instance takes the parameters and creates the requested graph. The generate(show=True) method renders the graph on the screen.
This approach enables users to generate insightful visualizations effortlessly. You can adapt the parameters to focus on different elements, locations, or time periods.
Using the LLM for Prompt Processing and Responses¶
This section demonstrates how to use the llm and chat modules in DIVA to process user prompts, classify them, and generate text-based answers. This is particularly useful for interactive scenarios where natural language processing is required to extract parameters or provide answers.
Before you start, ensure that your Python directory is structured correctly with the diva package located in ./diva/src/.
# Import needed libraries
from diva.chat import ModuleChat
from diva.config import ModuleConfig
# Create instances of the required classes
module_chat = ModuleChat()
module_config = ModuleConfig(module_chat)
ModuleChat supervises the operations necessary to ensure a coherent chat experience with the user. The tasks managed by this module are:
defining whether memory of the previous user prompts and/or chatbot answers are needed.
rephrasing the user prompt 1) to remove typos without distorting the original prompt, and 2) to add the memory component, if needed.
classifying the meaning of the user prompt between asking a graphical representation based on the parameters specified in the prompt, and merely discussing.
defining whether the user prompt is a command (demand or order) or a comment (feedback).
assessing whether the command in the user prompt is relevant given the chatbot scope.
fetching a context to orient the generation of the chatbot answer.
assessing the toxicity of the user prompt
generating the textual answer to the user prompt.
adding disclaimers to the chatbot answer, if needed.
ModuleConfig is called only in the case of a demand for a graphical representation. Its main purpose is to identify and extract the parameters to make the asked graph. The tasks managed by this module are:
identifying and extracting parameters in the user prompt, such as the location, the time of interest and the name of the climate variable that the user wants to plot.
identifying missing parameters
completing missing parameters with the values in memory, if any.
asking missing parameters to the user, if the parameters are absolutely necessary to make the graph and if the memory of the past parameters is empty.
ModuleConfig relies on ModuleChat, as shown by the fact that ModuleConfig takes for argument during its instantiation an instance of ModuleChat. What is less explicit is that ModuleChat also relies on ModuleConfig for some specific tasks.
Discussion mode
# Define the prompt
prompt = 'Hello, how are you ?'
# Processing of the prompt
module_chat.create_user_prompt(prompt)
print("Prompt toxicity score:", module_chat.chat.prompt.toxicity) # min: 0, max: 10
print("Command?", module_chat.chat.prompt.is_command) # if False, the prompt is a comment
module_chat.prompt_classification()
print("Type of prompt: ", module_chat.chat.prompt.type)
module_chat.is_prompt_in_scope()
print("Within scope: ", module_chat.chat.prompt.within_scope)
print("Fetched context: ", module_chat.chat.prompt.context)
module_chat.prompt_rephrasing()
print("Original prompt:", module_chat.prompt.original)
print("Rephrased prompt:", module_chat.prompt.rephrased)
# Alternatively, calling the prompt object returns either the rephrased prompt (if available) or the original prompt
print("Prompt: ", module_chat.prompt)
# Display the main prompt attributes
print(module_chat.prompt.__repr__())
module_chat.create_user_prompt : registers the prompt within the chat object. It also initializes the processing of the prompt, especially the assessments of its toxicity and whether it is a command or a comment.
module_chat.prompt_classification : classifies the prompt as “visualisation” if the underlying demand is the generation of a graph, or as “discussion”.
module_chat.is_prompt_in_scope : verifies whether the prompt demand falls within the chatbotscope, and depending on the request expressed in the prompt, fetches contexts to help the generation of an answer.
module_chat.prompt_rephrasing : operates two tasks: identifying whether memory of previous exchanges is needed, and rephrasing the prompt while accounting for memory, if needed.
The output of most of these tasks depend on the output of the other tasks, such that tasks later in the sequence may modify the output of another task accomplished earlier in the sequence. Rather than calling all the prompt attributes again to check whether any has changed, one may simply use the __repr__() command on the prompt object to display all major prompt attributes at once.
Now, everything is ready to generate the chatbot answer.
# Generate the answer
module_chat.generate_text_answer()
# Retrieve the answer
llm_answer = module_chat.chat.llm_answer
# Show the prompt & answer
print("- prompt :", prompt)
print("- answer :", llm_answer)
"""
outputs:
- prompt : Hello, how are you ?
- answer : I am DIVA, a platform and chatbot assistant created by the engineers of Mews Labs for the project DestinE (Destination Earth). My role is to answer questions about climate data and provide visual representations of evidence-based climate data. I support the variables or metrics of temperature, wind, pressure, and precipitation. My data covers most of Europe and is available in a time range between 1970 and 2039. I am unable to answer queries that are out of topic regarding my context information.
"""
Visualisation mode
Extracting the graph parameters from the user prompt can be done as such:
# prompt creation
module_chat.create_user_prompt('Make a histogram of thedaily temperature in Paris in the summer 2023?')
# Processing of the prompt
module_chat.prompt_classification()
module_chat.is_prompt_in_scope()
module_chat.prompt_rephrasing()
# Display the main prompt attributes
print(module_chat.prompt.__repr__())
"""
ouput:
Prompt(original=Make a histogram of thedaily temperature in Paris in the summer 2023, rephrased="Create a histogram of the daily temperature in Paris during the summer of 2023.", type=visualisation, within_scope=True, toxicity=0, command=True, semantic_similarity=10, type_context=visualisation)
"""
# Generate the text answer
module_chat.generate_text_answer()
# Show the prompt & answer
print("- prompt :", module_chat.prompt.original)
print("- answer :", module_chat.chat.llm_answer)
"""
ouputs:
- prompt : Make a histogram of thedaily temperature in Paris in the summer 2023
- answer : Let me provide a graph to help address your request.
"""
# Get the graph parameters from the prompt
if module_chat.chat.prompt.type == "visualisation":
module_config.prompt_to_config()
# display information related to the config
print(module_config.config)
"""
output:
{'starttime': '2023-06-21', 'endtime': '2023-09-21', 'location': 'Paris', 'climate_variable': 'temperature', 'graph_type': 'histogram', 'aggreg_type': 'day', 'agg_operator': 'mean'}
"""
print(module_config.config.__repr__())
"""
output:
Config creation_time: 1.306, from user prompt: "Create a histogram of the daily temperature in Paris during the summer of 2023.", missings: [])
"""
# Access individual parameters
print("Location:", module_config.config.location) # 'Paris'
print("Start time:", module_config.config.start_time) # '2023-06-21'
print("End time:", module_config.config.end_time) # '2023-09-21'
print("Climate variable:", module_config.config.climate_variable) # 'temperature'
print("Graph type:", module_config.config.graph_type) # 'histogram'
print("Aggregation type:", module_config.config.aggregation_type) # 'day'
print("Aggregation operator:", module_config.config.aggregation_operator) # 'mean'
myconfig.prompt_to_config : uses the LLM, keywords recognition as well as NLP libraries to identify and extract the parameters from the user prompt. When making DIVA, we called the ensemble of parameters used to generate the graph a config, hence the name of the function. The config attributes can now be written in a dictionnary, for example by using the function diva.tools.convert_to_dict_config, dictionnary that will be used to generate the graph, as explained in a previous section.
Optional
All the tasks involving the LLM are managed internally by the chat and config modules so that the LLM never needs to be explicitly called. Yet, some use cases may require to call the LLM directly, such as for test purposes. The lines of codes to do so are as follows:
from diva.llm import llms
generated_text = llms.generator("Hello, how are you ?")
print(generated_text)
"""
output:
I'm just a computer program, so I don't have feelings or emotions. I'm here to help you with any questions or problems you have. Is there something specific you'd like to talk about?
"""
llms is a module containing all the ready-to-use instances of the class Model. Each LLM is assigned an instance of the class Model which loads the model, the tokenizer and the usage pipeline. Calling the instance of Model results in running the pipeline. Currenly, most of Diva’s internal tasks are managed by a single text-generator Model: lmms.generator. Note that using the LLM this way bypasses all the services implemented in the chat and config modules. For example, a typical answer to this prompt does not mention DIVA because all the contexts assigned to the prompt in the chat module are bypassed.