XArray: the power of pandas for multidimensional arrays

  Рет қаралды 14,620

PyCon UK

PyCon UK

Күн бұрын

Processing thousands of satellite images to understand air quality in the UK - it's efficient and easy with XArray
Robin Wilson
Monday 17th, 12:30 (Ferrier Hall)
A talk (25 minutes)
"I wish there was a way to easily manipulate this huge multi-dimensional array in Python...", I thought, as I stared at a huge chunk of satellite data on my laptop. The data was from a satellite measuring air quality - and I wanted to slice and dice the data in some supposedly simple ways. Using pure numpy - the go-to library when the words 'multi-dimensional', 'array' and 'python' are mentioned in the same sentence - was just such a pain. What I wished for was something like pandas - with datetime indexes, fancy ways of selecting subsets, group-by operations and so on - but something that would work with my huge multi-dimensional array.
The solution: XArray - a wonderful library which provides the power of pandas for multi-dimensional data. In this talk I will introduce the XArray library by showing how just a few lines of code can answer questions about my data that would take a lot of complex code to answer with pure numpy - questions like 'What is the average air quality in March?', 'What is the time series of air quality in Southampton?' and 'What is the seasonal average air quality for each census output area?'.
After demonstrating how these questions can be answered easily with XArray, I will introduce the fundamental XArray data types, and show how indexes can be added to raw arrays to fully utilise the power of XArray. I will discuss how to get data in and out of XArray, and how XArray can use dask for high-performance data processing on multiple cores, or distributed across multiple machines. Finally I will leave you with a taster of some of the advanced features of XArray - including seamless access to data via the internet using OpenDAP, complex apply functions, and XArray extension libraries.
The speaker suggested this session is suitable for data scientists.

Пікірлер
TIMTOWTDI
25:51
PyCon UK
Рет қаралды 135
My MEAN sister annoys me! 😡 Use this gadget #hack
00:24
JOON
Рет қаралды 4,7 МЛН
Triple kill😹
00:18
GG Animation
Рет қаралды 18 МЛН
Visualising data in NetCDF format
39:56
EUMETSAT
Рет қаралды 65 М.
Xarray Tutorial | xarray fundamentals
1:52:23
Anderson Banihirwe
Рет қаралды 22 М.
Dask Demo Day - 2023.10.19
47:16
Dask
Рет қаралды 825
Seaborn Is The Easier Matplotlib
22:39
NeuralNine
Рет қаралды 177 М.
NetCDF Why and How: Creating Publication Quality NetCDF Datasets
1:16:33
The Boundary of Computation
12:59
Mutual Information
Рет қаралды 1 МЛН
My MEAN sister annoys me! 😡 Use this gadget #hack
00:24
JOON
Рет қаралды 4,7 МЛН