Methods and Parameters for Segmentation

This notebook explores the segmentation function of tobac and its parameters:

We start with the usual imports:

[1]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

%matplotlib inline

import seaborn as sns

sns.set_context("talk")

import warnings

warnings.filterwarnings("ignore")
[2]:
import tobac
import tobac.testing

Required Inputs

To perform a segmentation we need a dataset with already detected features. Therefore, we take advantage of the testing.make_sample_data_2D_3blobs_inv()-utility and detect features with different thresholds:

[3]:
data = tobac.testing.make_sample_data_2D_3blobs_inv(data_type="xarray")
dxy, dt = tobac.utils.get_spacings(data)

plt.figure(figsize=(6, 9))
data.isel(time=50).plot(x="x", y="y")
[3]:
<matplotlib.collections.QuadMesh at 0x13d3275d0>
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Segmentation_4_1.png
[4]:
%%capture
thresholds = [9, 14, 17]
features = tobac.feature_detection_multithreshold(
    data, dxy, thresholds, position_threshold="weighted_abs"
)

The resulting dataset can now be used as argument for the segmentation()-function. The other required inputs are the original dataset, the spacing and a threshold.

[5]:
mask, features_mask = tobac.segmentation_2D(features, data, dxy, threshold=9)

The created segments are provided as mask, which is the first returned object of the function. The second output is the features-dataset again, but with the additional ncells-variable, which gives us the number of datapoints belonging to the feature:

[6]:
features_mask["ncells"][1]
[6]:
67.0

Notice that this number can be deviate from the num-value, because watershedding works differently from just detecting the values exceeeding the threshold. For example, for the second feature ncells contains one additional datapoint compared to the original feature detection:

[7]:
features_mask["num"][1]
[7]:
66

The created segments can be visualized with a contour plot of tha mask:

[8]:
plt.figure(figsize=(6, 9))
data.isel(time=50).plot(x="x", y="y")
mask.isel(time=50).plot.contour(levels=[0.5], x="x", y="y", colors="k")
plt.title("Created segments")
[8]:
Text(0.5, 1.0, 'Created segments')
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Segmentation_13_1.png

Different Thresholds

It is important to highlight that (in contrast to the feature detection), segmentation is only possible with single threshold values. Because of that, we have to call the function multiple times with different threshold values to explore the influence of this argument:

[9]:
%%capture
mask_1, features_mask_1 = tobac.segmentation_2D(features, data, dxy, threshold=9)
mask_2, features_mask_2 = tobac.segmentation_2D(features, data, dxy, threshold=14)
mask_3, features_mask_3 = tobac.segmentation_2D(features, data, dxy, threshold=17)

To visualize the segments we can use contour-plots of the masks:

[10]:
thresholds = [9, 14, 17]
masks = [mask_1, mask_2, mask_3]
colors = ["w", "r", "b"]

fig, ax = plt.subplots(ncols=1, figsize=(6, 9))
data.isel(time=50).plot(ax=ax, x="x", y="y")

for n, mask, color in zip(thresholds, masks, colors):
    contour = mask.isel(time=50).plot.contour(levels=[n], x="x", y="y", colors=color)
    ax.clabel(contour, inline=True, fontsize=10)

ax.set_title("Segments for different threshold values")
[10]:
Text(0.5, 1.0, 'Segments for different threshold values')
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Segmentation_18_1.png

Obviously, a lower threshold value prodcuces a larger segment and if a feature does not exceed the value at all, no segment is associated.

Choosing Method and Target

The segmentation uses certain techniques to associate areas or volumes to each identified feature. Watershedding is the default and the only implemented option at the moment, but in future realeases the method will be selected by the method-keyword:

[11]:
%%capture
mask_1, features_mask_1 = tobac.segmentation_2D(
    features, data, dxy, threshold=9, method="watershed"
)

Analogous to the feature detection, it is also possible to apply the segmentation to minima by changing the target keyword:

[12]:
%%capture

data = -tobac.testing.make_sample_data_2D_3blobs_inv(data_type="xarray")
dxy, dt = tobac.utils.get_spacings(data)
thresholds = [-9, -14, -17]
features = tobac.feature_detection_multithreshold(
    data, dxy, thresholds, target="minimum"
)

mask_1, features_mask_1 = tobac.segmentation_2D(
    features, data, dxy, threshold=-9, target="minimum"
)
mask_2, features_mask_2 = tobac.segmentation_2D(
    features, data, dxy, threshold=-14, target="minimum"
)
mask_3, features_mask_3 = tobac.segmentation_2D(
    features, data, dxy, threshold=-17, target="minimum"
)
[13]:
masks = [mask_1, mask_2, mask_3]
colors = ["r", "b", "w"]
thresholds = [-9, -14, -17]

fig, ax = plt.subplots(ncols=1, figsize=(6, 9))
data.isel(time=50).plot(ax=ax, x="x", y="y")

for n, mask, color in zip(thresholds, masks, colors):
    contour = (
        (n * mask).isel(time=50).plot.contour(levels=[n], colors=color, x="x", y="y")
    )
    ax.clabel(contour, inline=True, fontsize=10)

ax.set_title("Segments for different threshold values")
[13]:
Text(0.5, 1.0, 'Segments for different threshold values')
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Segmentation_24_1.png

Setting a maximum Distance

Another way of determining the size of our segments is the max_distance-parameter. It defines a maximum distance the segment can have from the coordinates of feature (in meters). This enables us, for example, to ensure that the segments of different features do not touch each other when we use a very low threshold value:

[14]:
%%capture

data = tobac.testing.make_sample_data_2D_3blobs_inv(data_type="xarray")
dxy, dt = tobac.utils.get_spacings(data)
thresh = 0.1

features = tobac.feature_detection_multithreshold(data, dxy, threshold=3)
mask_0, features_0 = tobac.segmentation_2D(features, data, dxy, threshold=thresh)

As you can see the threshold value was set to a value of 0.1. The result is that the segments of the two upper features will touch:

[15]:
fig, ax = plt.subplots(figsize=(6, 9))
data.isel(time=50).plot(ax=ax, x="x", y="y")
mask_0.isel(time=50).plot.contour(levels=[0.5], ax=ax, colors="r", x="x", y="y")
ax.set_title("Segments without maximum Distance")
[15]:
Text(0.5, 1.0, 'Segments without maximum Distance')
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Segmentation_28_1.png

We can prevent this from happening by using the max_distance parameter to specify a maximum distance the border of the segment can have from the feature in meter:

[16]:
%%capture

mask_1, features_mask_1 = tobac.segmentation_2D(
    features, data, dxy, threshold=thresh, max_distance=40000
)
mask_2, features_mask_2 = tobac.segmentation_2D(
    features, data, dxy, threshold=thresh, max_distance=20000
)
mask_3, features_mask_3 = tobac.segmentation_2D(
    features, data, dxy, threshold=thresh, max_distance=5000
)
[17]:
masks = [mask_1, mask_2, mask_3]
colors = ["w", "r", "k"]
distances = [4e4, 2e4, 5e3]

fig, ax = plt.subplots(ncols=1, figsize=(6, 9))
data.isel(time=50).plot(ax=ax, x="x", y="y")

for n, mask, color in zip(distances, masks, colors):
    contour = (
        (n * mask).isel(time=50).plot.contour(levels=[n], colors=color, x="x", y="y")
    )
    ax.clabel(contour, inline=True, fontsize=10)

ax.set_title("Segments for different maximum distances")
[17]:
Text(0.5, 1.0, 'Segments for different maximum distances')
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Segmentation_31_1.png

Handling 3d-Data

The remaining parameters level and vertical_coord are useful only for the segemtation of 3-dimensional inputs and will be covered in the notebook for 3d-data (TBD).