tobac example: Compute bulk statistics during segmentation

This example shows how to derive some basic statistics for the segmented precipitation features associated with isolated deep convective clouds using the same data as in our example for precipitation tracking. As usual, we perform the segmentation step using the output from the feature detection, but we require some statistics to be calculated for the segmented features.

[1]:
# Import libraries
import iris
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import shutil
from six.moves import urllib
from pathlib import Path
%matplotlib inline
[2]:
# Disable a few warnings:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, append=True)
warnings.filterwarnings('ignore', category=RuntimeWarning, append=True)
warnings.filterwarnings('ignore', category=FutureWarning, append=True)
warnings.filterwarnings('ignore',category=pd.io.pytables.PerformanceWarning)
[3]:
# Import tobac itself
import tobac
print('using tobac version', str(tobac.__version__))
using tobac version 1.5.3
[4]:
data_out=Path('../')
# Download the data: This only has to be done once for all tobac examples and can take a while
data_file = list(data_out.rglob('data/Example_input_Precip.nc'))
if len(data_file) == 0:
    file_path='https://zenodo.org/records/3195910/files/climate-processes/tobac_example_data-v1.0.1.zip'
    #file_path='http://zenodo..'
    tempfile=Path('temp.zip')
    print('start downloading data')
    request=urllib.request.urlretrieve(file_path, tempfile)
    print('start extracting data')
    shutil.unpack_archive(tempfile, data_out)
    tempfile.unlink()
    print('data extracted')
    data_file = list(data_out.rglob('data/Example_input_Precip.nc'))
[5]:
#Set up directory to save output:
savedir=Path("Save")
if not savedir.is_dir():
    savedir.mkdir()
plot_dir=Path("Plot")
if not plot_dir.is_dir():
    plot_dir.mkdir()
[6]:
Precip=iris.load_cube(str(data_file[0]),'surface_precipitation_average')

Feature detection

[7]:
parameters_features={}
parameters_features['position_threshold']='weighted_diff'
parameters_features['sigma_threshold']=0.5
parameters_features['min_distance']=0
parameters_features['sigma_threshold']=1
parameters_features['threshold']=[1,2,3,4,5,10,15] #mm/h
parameters_features['n_erosion_threshold']=0
parameters_features['n_min_threshold']=3

# get temporal and spation resolution of the data
dxy,dt=tobac.get_spacings(Precip)
[8]:
# Feature detection based on based on surface precipitation field and a range of thresholds
print('starting feature detection based on multiple thresholds')
Features= tobac.feature_detection_multithreshold(Precip,dxy,**parameters_features)
print('feature detection done')
Features.to_hdf(savedir / 'Features.h5','table')
print('features saved')
starting feature detection based on multiple thresholds
feature detection done
features saved

Segmentation with bulk statistics

Segmentation is performed based on a watershedding and a threshold value. The statistics are calculated for each individual feature region and added to the feature output dataframe from the segmentation process. You can decide which statistics to calculate by providing a dictionary with the name of the metric as keys (this will be the name of the column added to the dataframe) and functions as values. Note that it is also possible to provide input parameter to these functions.

Setting parameters for segmentation:

[9]:
# Dictionary containing keyword arguments for segmentation step:
parameters_segmentation={}
parameters_segmentation['method']='watershed'
parameters_segmentation['threshold']=1  # mm/h mixing ratio

# get temporal and spation resolution of the data
dxy,dt=tobac.get_spacings(Precip)

Defining the dictionary for the statistics to be calculated

[10]:
statistics = {}
statistics['mean_precip'] = np.mean
statistics['total_precip'] = np.sum
statistics['max_precip'] = np.max

For some functions, we need to provide additional input parameters, e.g. np.percentile(). These can be provided as key word arguments in form of a dictionary. So instead of the function, you can provide a tuple with both the function and its respective input parameters:

[11]:
statistics['percentiles'] = (np.percentile, {'q': [95,99]})
[12]:
# Perform Segmentation and save resulting mask to NetCDF file:
print('Starting segmentation based on surface precipitation')
Mask,Features_Precip=tobac.segmentation_2D(Features,Precip,dxy,**parameters_segmentation, statistic=statistics)
print('segmentation based on surface precipitation performed, start saving results to files')
iris.save([Mask], savedir / 'Mask_Segmentation_precip.nc', zlib=True, complevel=4)
Features_Precip.to_hdf(savedir / 'Features_Precip.h5', 'table')
print('segmentation surface precipitation performed and saved')
Starting segmentation based on surface precipitation
segmentation based on surface precipitation performed, start saving results to files
segmentation surface precipitation performed and saved

Look at the output:

[13]:
Features_Precip.head()
[13]:
frame idx hdim_1 hdim_2 num threshold_value feature time timestr south_north ... y latitude longitude projection_x_coordinate x ncells mean_precip total_precip max_precip percentiles
0 0 1 50.065727 139.857477 9 1 1 2013-06-19 20:05:00 2013-06-19 20:05:00 331.065727 ... 331.065727 29.846362 -94.172015 210678.738492 420.857477 10 1.629695 16.296951 2.289786 ([2.221776068210602, 2.276183712482452],)
1 0 15 120.527119 172.500325 4 1 2 2013-06-19 20:05:00 2013-06-19 20:05:00 401.527119 ... 401.527119 30.166929 -93.996892 227000.162468 453.500325 10 1.409547 14.095468 1.819811 ([1.8030404090881347, 1.8164567756652832],)
2 0 18 126.779273 145.368401 15 1 3 2013-06-19 20:05:00 2013-06-19 20:05:00 407.779273 ... 407.779273 30.196499 -94.139960 213434.200454 426.368401 11 2.441526 26.856783 3.771701 ([3.710712432861328, 3.759503173828125],)
3 0 34 111.611369 155.452030 4 2 4 2013-06-19 20:05:00 2013-06-19 20:05:00 392.611369 ... 392.611369 30.126871 -94.087317 218476.015240 436.452030 19 1.938501 36.831512 4.067666 ([3.940941762924194, 4.042321195602417],)
4 0 35 111.765231 164.938866 8 2 5 2013-06-19 20:05:00 2013-06-19 20:05:00 392.765231 ... 392.765231 30.127221 -94.037226 223219.433218 445.938866 20 2.486886 49.737709 4.380943 ([4.087516045570374, 4.3222578477859495],)

5 rows × 22 columns

[ ]: