tobac example: Compute bulk statistics during segmentation

This example shows how to derive some basic statistics for the segmented precipitation features associated with isolated deep convective clouds using the same data as in our example for precipitation tracking. As usual, we perform the segmentation step using the output from the feature detection, but we require some statistics to be calculated for the segmented features.

[1]:

# Import libraries
import iris
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import shutil
from six.moves import urllib
from pathlib import Path
%matplotlib inline

[2]:

# Disable a few warnings:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, append=True)
warnings.filterwarnings('ignore', category=RuntimeWarning, append=True)
warnings.filterwarnings('ignore', category=FutureWarning, append=True)
warnings.filterwarnings('ignore',category=pd.io.pytables.PerformanceWarning)

[3]:

# Import tobac itself
import tobac
print('using tobac version', str(tobac.__version__))

using tobac version 1.5.3

[4]:

data_out=Path('../')
# Download the data: This only has to be done once for all tobac examples and can take a while
data_file = list(data_out.rglob('data/Example_input_Precip.nc'))
if len(data_file) == 0:
    file_path='https://zenodo.org/records/3195910/files/climate-processes/tobac_example_data-v1.0.1.zip'
    #file_path='http://zenodo..'
    tempfile=Path('temp.zip')
    print('start downloading data')
    request=urllib.request.urlretrieve(file_path, tempfile)
    print('start extracting data')
    shutil.unpack_archive(tempfile, data_out)
    tempfile.unlink()
    print('data extracted')
    data_file = list(data_out.rglob('data/Example_input_Precip.nc'))

[5]:

#Set up directory to save output:
savedir=Path("Save")
if not savedir.is_dir():
    savedir.mkdir()
plot_dir=Path("Plot")
if not plot_dir.is_dir():
    plot_dir.mkdir()

[6]:

Precip=iris.load_cube(str(data_file[0]),'surface_precipitation_average')

Feature detection

[7]:

parameters_features={}
parameters_features['position_threshold']='weighted_diff'
parameters_features['sigma_threshold']=0.5
parameters_features['min_distance']=0
parameters_features['sigma_threshold']=1
parameters_features['threshold']=[1,2,3,4,5,10,15] #mm/h
parameters_features['n_erosion_threshold']=0
parameters_features['n_min_threshold']=3

# get temporal and spation resolution of the data
dxy,dt=tobac.get_spacings(Precip)

[8]:

# Feature detection based on based on surface precipitation field and a range of thresholds
print('starting feature detection based on multiple thresholds')
Features= tobac.feature_detection_multithreshold(Precip,dxy,**parameters_features)
print('feature detection done')
Features.to_hdf(savedir / 'Features.h5','table')
print('features saved')

starting feature detection based on multiple thresholds

feature detection done

features saved

Segmentation with bulk statistics

Segmentation is performed based on a watershedding and a threshold value. The statistics are calculated for each individual feature region and added to the feature output dataframe from the segmentation process. You can decide which statistics to calculate by providing a dictionary with the name of the metric as keys (this will be the name of the column added to the dataframe) and functions as values. Note that it is also possible to provide input parameter to these functions.

Setting parameters for segmentation:

[9]:

# Dictionary containing keyword arguments for segmentation step:
parameters_segmentation={}
parameters_segmentation['method']='watershed'
parameters_segmentation['threshold']=1  # mm/h mixing ratio

# get temporal and spation resolution of the data
dxy,dt=tobac.get_spacings(Precip)

Defining the dictionary for the statistics to be calculated

[10]:

statistics = {}
statistics['mean_precip'] = np.mean
statistics['total_precip'] = np.sum
statistics['max_precip'] = np.max

For some functions, we need to provide additional input parameters, e.g. np.percentile(). These can be provided as key word arguments in form of a dictionary. So instead of the function, you can provide a tuple with both the function and its respective input parameters:

[11]:

statistics['percentiles'] = (np.percentile, {'q': [95,99]})

[12]:

# Perform Segmentation and save resulting mask to NetCDF file:
print('Starting segmentation based on surface precipitation')
Mask,Features_Precip=tobac.segmentation_2D(Features,Precip,dxy,**parameters_segmentation, statistic=statistics)
print('segmentation based on surface precipitation performed, start saving results to files')
iris.save([Mask], savedir / 'Mask_Segmentation_precip.nc', zlib=True, complevel=4)
Features_Precip.to_hdf(savedir / 'Features_Precip.h5', 'table')
print('segmentation surface precipitation performed and saved')

Starting segmentation based on surface precipitation

segmentation based on surface precipitation performed, start saving results to files

segmentation surface precipitation performed and saved

Look at the output:

[13]:

Features_Precip.head()

[13]:

	idx	hdim_1	hdim_2	num	threshold_value	feature	time	timestr	south_north	...	y	latitude	longitude	projection_x_coordinate	x	ncells	mean_precip	total_precip	max_precip	percentiles
0	1	50.065727	139.857477	9	1	1	2013-06-19 20:05:00	2013-06-19 20:05:00	331.065727	...	331.065727	29.846362	-94.172015	210678.738492	420.857477	10	1.629695	16.296951	2.289786	([2.221776068210602, 2.276183712482452],)
1	15	120.527119	172.500325	4	1	2	2013-06-19 20:05:00	2013-06-19 20:05:00	401.527119	...	401.527119	30.166929	-93.996892	227000.162468	453.500325	10	1.409547	14.095468	1.819811	([1.8030404090881347, 1.8164567756652832],)
2	18	126.779273	145.368401	15	1	3	2013-06-19 20:05:00	2013-06-19 20:05:00	407.779273	...	407.779273	30.196499	-94.139960	213434.200454	426.368401	11	2.441526	26.856783	3.771701	([3.710712432861328, 3.759503173828125],)
3	34	111.611369	155.452030	4	2	4	2013-06-19 20:05:00	2013-06-19 20:05:00	392.611369	...	392.611369	30.126871	-94.087317	218476.015240	436.452030	19	1.938501	36.831512	4.067666	([3.940941762924194, 4.042321195602417],)
4	35	111.765231	164.938866	8	2	5	2013-06-19 20:05:00	2013-06-19 20:05:00	392.765231	...	392.765231	30.127221	-94.037226	223219.433218	445.938866	20	2.486886	49.737709	4.380943	([4.087516045570374, 4.3222578477859495],)

5 rows × 22 columns

[ ]: