tobac example: Compute bulk statistics during segmentation
This example shows how to derive some basic statistics for the segmented precipitation features associated with isolated deep convective clouds using the same data as in our example for precipitation tracking. As usual, we perform the segmentation step using the output from the feature detection, but we require some statistics to be calculated for the segmented features.
[1]:
# Import libraries
import iris
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import shutil
from six.moves import urllib
from pathlib import Path
%matplotlib inline
[2]:
# Disable a few warnings:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, append=True)
warnings.filterwarnings('ignore', category=RuntimeWarning, append=True)
warnings.filterwarnings('ignore', category=FutureWarning, append=True)
warnings.filterwarnings('ignore',category=pd.io.pytables.PerformanceWarning)
[3]:
# Import tobac itself
import tobac
print('using tobac version', str(tobac.__version__))
using tobac version 1.5.3
[4]:
data_out=Path('../')
# Download the data: This only has to be done once for all tobac examples and can take a while
data_file = list(data_out.rglob('data/Example_input_Precip.nc'))
if len(data_file) == 0:
file_path='https://zenodo.org/records/3195910/files/climate-processes/tobac_example_data-v1.0.1.zip'
#file_path='http://zenodo..'
tempfile=Path('temp.zip')
print('start downloading data')
request=urllib.request.urlretrieve(file_path, tempfile)
print('start extracting data')
shutil.unpack_archive(tempfile, data_out)
tempfile.unlink()
print('data extracted')
data_file = list(data_out.rglob('data/Example_input_Precip.nc'))
[5]:
#Set up directory to save output:
savedir=Path("Save")
if not savedir.is_dir():
savedir.mkdir()
plot_dir=Path("Plot")
if not plot_dir.is_dir():
plot_dir.mkdir()
[6]:
Precip=iris.load_cube(str(data_file[0]),'surface_precipitation_average')
Feature detection
[7]:
parameters_features={}
parameters_features['position_threshold']='weighted_diff'
parameters_features['sigma_threshold']=0.5
parameters_features['min_distance']=0
parameters_features['sigma_threshold']=1
parameters_features['threshold']=[1,2,3,4,5,10,15] #mm/h
parameters_features['n_erosion_threshold']=0
parameters_features['n_min_threshold']=3
# get temporal and spation resolution of the data
dxy,dt=tobac.get_spacings(Precip)
[8]:
# Feature detection based on based on surface precipitation field and a range of thresholds
print('starting feature detection based on multiple thresholds')
Features= tobac.feature_detection_multithreshold(Precip,dxy,**parameters_features)
print('feature detection done')
Features.to_hdf(savedir / 'Features.h5','table')
print('features saved')
starting feature detection based on multiple thresholds
feature detection done
features saved
Segmentation with bulk statistics
Segmentation is performed based on a watershedding and a threshold value. The statistics are calculated for each individual feature region and added to the feature output dataframe from the segmentation process. You can decide which statistics to calculate by providing a dictionary with the name of the metric as keys (this will be the name of the column added to the dataframe) and functions as values. Note that it is also possible to provide input parameter to these functions.
Setting parameters for segmentation:
[9]:
# Dictionary containing keyword arguments for segmentation step:
parameters_segmentation={}
parameters_segmentation['method']='watershed'
parameters_segmentation['threshold']=1 # mm/h mixing ratio
# get temporal and spation resolution of the data
dxy,dt=tobac.get_spacings(Precip)
Defining the dictionary for the statistics to be calculated
[10]:
statistics = {}
statistics['mean_precip'] = np.mean
statistics['total_precip'] = np.sum
statistics['max_precip'] = np.max
For some functions, we need to provide additional input parameters, e.g. np.percentile(). These can be provided as key word arguments in form of a dictionary. So instead of the function, you can provide a tuple with both the function and its respective input parameters:
[11]:
statistics['percentiles'] = (np.percentile, {'q': [95,99]})
[12]:
# Perform Segmentation and save resulting mask to NetCDF file:
print('Starting segmentation based on surface precipitation')
Mask,Features_Precip=tobac.segmentation_2D(Features,Precip,dxy,**parameters_segmentation, statistic=statistics)
print('segmentation based on surface precipitation performed, start saving results to files')
iris.save([Mask], savedir / 'Mask_Segmentation_precip.nc', zlib=True, complevel=4)
Features_Precip.to_hdf(savedir / 'Features_Precip.h5', 'table')
print('segmentation surface precipitation performed and saved')
Starting segmentation based on surface precipitation
segmentation based on surface precipitation performed, start saving results to files
segmentation surface precipitation performed and saved
Look at the output:
[13]:
Features_Precip.head()
[13]:
frame | idx | hdim_1 | hdim_2 | num | threshold_value | feature | time | timestr | south_north | ... | y | latitude | longitude | projection_x_coordinate | x | ncells | mean_precip | total_precip | max_precip | percentiles | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 50.065727 | 139.857477 | 9 | 1 | 1 | 2013-06-19 20:05:00 | 2013-06-19 20:05:00 | 331.065727 | ... | 331.065727 | 29.846362 | -94.172015 | 210678.738492 | 420.857477 | 10 | 1.629695 | 16.296951 | 2.289786 | ([2.221776068210602, 2.276183712482452],) |
1 | 0 | 15 | 120.527119 | 172.500325 | 4 | 1 | 2 | 2013-06-19 20:05:00 | 2013-06-19 20:05:00 | 401.527119 | ... | 401.527119 | 30.166929 | -93.996892 | 227000.162468 | 453.500325 | 10 | 1.409547 | 14.095468 | 1.819811 | ([1.8030404090881347, 1.8164567756652832],) |
2 | 0 | 18 | 126.779273 | 145.368401 | 15 | 1 | 3 | 2013-06-19 20:05:00 | 2013-06-19 20:05:00 | 407.779273 | ... | 407.779273 | 30.196499 | -94.139960 | 213434.200454 | 426.368401 | 11 | 2.441526 | 26.856783 | 3.771701 | ([3.710712432861328, 3.759503173828125],) |
3 | 0 | 34 | 111.611369 | 155.452030 | 4 | 2 | 4 | 2013-06-19 20:05:00 | 2013-06-19 20:05:00 | 392.611369 | ... | 392.611369 | 30.126871 | -94.087317 | 218476.015240 | 436.452030 | 19 | 1.938501 | 36.831512 | 4.067666 | ([3.940941762924194, 4.042321195602417],) |
4 | 0 | 35 | 111.765231 | 164.938866 | 8 | 2 | 5 | 2013-06-19 20:05:00 | 2013-06-19 20:05:00 | 392.765231 | ... | 392.765231 | 30.127221 | -94.037226 | 223219.433218 | 445.938866 | 20 | 2.486886 | 49.737709 | 4.380943 | ([4.087516045570374, 4.3222578477859495],) |
5 rows × 22 columns
[ ]: