Environmental impact¶
Digital activity, especially related to Artificial Intelligence, is a non-negligible domain of greenhouse gases emission due to its high demand in energy. Moreover, computer components in cloud servers such as CPU and GPU need to be cooled down with water resources. To give the user of DIVA a rough estimation of the environmental impact of its queries, DIVA displays estimated metrics for:
energy consumption
CO2 emissions
water consumption
Estimation process¶
Unfortunately, calculating with the utmost precision the energy consumption of a cloud virtual machine is complex, if not impossible from the customer’s side. Futhermore, the physical components are often shared between different customers, which complexifies even more the calculation.
What we know:
physical CPU: We can retrieve the number of cores with the psutil library. We also know that the data center hosting DIVA likely uses an Intel Xeon CPU. We can also measure usage percentage of the CPU with psutils.
RAM: We know the amount of RAM or our server, and we can use psutil to retrieve this information. The energy consumption of the RAM depends on the level of activity of the CPU.
GPU: Nvidia provides the nvml library monitoring GPU activity. pyJoules, another library, relies on nvml to measure the energy consumption of the Nvidia GPU.
GHG: (greenhouse gases). The data center hosting DIVA provides information about its energy efficiency. Therefore we can convert the energy consumption into CO2 emissions
water: The data center hosting DIVA provides information about its water usage effectiveness (WUE), therefore we can estimate how much water is needed in proportion to the energy used.
What we don’t know:
physical CPU: the exact energy consumption, unlike the GPU. Intel processor can support RAPL which allows to measure the CPU energy consumption. However, cloud servers do not always allow such measurements to be done.
We cannot measure the energy consumption of the CPU, but we can still estimate it by other means.
To estimate the energy consumption of the CPU, we rely on the dataset provided here https://docs.google.com/spreadsheets/d/1DqYgQnEDLQVQm5acMAhLgHLD8xXCG9BIrk-_Nv6jF3k/edit?gid=985503428#gid=985503428 . This dataset contains a list of physical CPUs, mostly Intel Xeon, with different numbers of cores. The Thermal Design Power (TDP) of the CPUs, an indicator of the power used by the CPU and dissipated as heat, is also specified in the dataset and can be used as a proxy for energy consumption (1 W = 1 J/s). More precisely, the TDP is given for the lower state of activity of the CPU (SL1), in watt (W). The dataset also contains usage percentage of the CPU from 0 to 100 with associated power in watt.
Estimating CPU energy consumption
With these data, we can:
calculate an average TDP multiplier for each percentage of CPU usage mentioned in the dataset. The result is stored in diva/data/cpu_watt_grid.csv.
infer the TDP of the server’s CPU by comparison with the TDP of the CPU in the dataset that have a similar number of cores.
With psutil, we get the average percentage of CPU usage during the processing of the query.
The rough estimation of the energy consumption is simply the multiplication of the estimated TDP in watt by the duration in seconds and by the estimated TDP multiplier corresponding to the closest % of CPU usage.
CPU ENERGY (J) = DURATION (s) * TDP (W) * TDP MULTIPLIER BASED ON % OF CPU USED
Estimating RAM energy consumption
The dataset gives estimations of the RAM power consumption in W/GB in four scenarios: CPU inactive (idle), low CPU activity, medium CPU activity and high CPU activity. We assigned percentage of CPU usage to each level of activity found in the the dataset: inactive: 0%, low: 10%, medium: 50%, high: 100% (result stored in diva/data/ram_watt_grid.csv). To estimate the RAM power consumption, we multiply the total RAM size in GB by the duration in seconds and by the coefficient corresponding to the closest % of CPU usage.
RAM ENERGY (J) = RAM SIZE (GB) * DURATION (s) * COEFFICIENT BASED ON % OF CPU USED (W/GB)
Estimating GPU energy consumption
The GPU energy consumption is measured by pyJoules, and seems to return values in mJ.
Total energy consumption
We approximate the energy consumption for a query as the sum of the energy comsumption of the CPU, RAM and GPU.
TOTAL ENERGY (J) = RAM ENERGY (J) + CPU ENERGY (J) + GPU ENERGY (J)
Other hardware components need energy, but we decided to focus only on these three energy intensive components because of the relative feasability of making estimations. The purpose of the displayed metrics is not to be utterly precise, but to give an estimation of the hidden environmental cost of using DIVA, and to raise awareness about these costs often hidden to the user.
To be fair, since DIVA is deployed independently of the number of user logged in or queries sent, we substract the consumption in idle state to the estimated result, such as to give only the additional energy and environmental cost of sending queries.
Overall, this means that our estimations are most likely lower than the true energetic and environmental impact.
Broader perspective¶
A query to DIVA typically uses between 2 kJ and 3 kJ (idle consumption not substracted). Graph generation tends to consume less energy than text generation because text generation requires more LLM usage.
A query to ChatGPT-4 uses 0.001 to 0.01 kWh (1 to 10 Wh) in average, thus 3.6 to 36 kJ (https://www.livemint.com/mint-lounge/ideas/ai-carbon-footprint-openai-chatgpt-water-google-microsoft-111697802189371.html )
A search on Google uses 0.0003 kWh (0.3 Wh) in average, thus 1.008 kJ (data from 2009: https://googleblog.blogspot.com/2009/01/powering-google-search.html )