Methods and Parameters for Feature Detection: Part 1

In this notebook, we will take a detailed look at tobac’s feature detection and examine some of its parameters. We concentrate on:

Minima/Maxima and Multiple Thresholds for Feature Identification
Feature Position
Sigma Parameter for Smoothing of Noisy Data
Band Pass Filter for Input Fields

[1]:

import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

%matplotlib inline

import seaborn as sns

sns.set_context("talk")

import warnings

warnings.filterwarnings("ignore")

[2]:

import tobac
import tobac.testing

Minima/Maxima and Multiple Thresholds for Feature Identification

Feature identification search for local maxima in the data.

When working different inputs it is sometimes necessary to switch the feature detection from finding maxima to minima. Furthermore, for more complex datasets containing multiple features differing in magnitude, a categorization according to this magnitude is desirable. Both will be demonstrated with the make_sample_data_2D_3blobs() function, which creates such a dataset. For the search for minima we will simply turn the dataset negative:

[3]:

data = tobac.testing.make_sample_data_2D_3blobs()
neg_data = -data

Let us have a look at frame number 50:

[4]:

n = 50

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 10))

im1 = ax1.imshow(data.data[50], cmap="Greys")
ax1.set_title("data")
cbar = plt.colorbar(im1, ax=ax1)

im2 = ax2.imshow(neg_data.data[50], cmap="Greys")
ax2.set_title(" negative data")
cbar = plt.colorbar(im2, ax=ax2)

../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_8_0.png

As you can see the data has 3 maxima/minima with different extremal values. To capture these, we use list comprehensions to obtain multiple thresholds in the range of the data:

[5]:

thresholds = [i for i in range(9, 18)]
neg_thresholds = [-i for i in range(9, 18)]

These can now be passed as arguments to feature_detection_multithreshold(). With the target-keyword we can set a flag whether to search for minima or maxima. The standard is "maxima".

[6]:

%%capture

dxy, dt = tobac.utils.get_spacings(data)

features = tobac.feature_detection_multithreshold(
    data, dxy, thresholds, target="maximum"
)
features_inv = tobac.feature_detection_multithreshold(
    neg_data, dxy, neg_thresholds, target="minimum"
)

Let’s scatter the detected features onto frame 50 of the dataset and create colorbars for the threshold values:

[7]:

mask_1 = features["frame"] == n
mask_2 = features_inv["frame"] == n

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 10))

ax1.imshow(data.data[50], cmap="Greys")
im1 = ax1.scatter(
    features.where(mask_1)["hdim_2"],
    features.where(mask_1)["hdim_1"],
    c=features.where(mask_1)["threshold_value"],
    cmap="tab10",
)
cbar = plt.colorbar(im1, ax=ax1)
cbar.ax.set_ylabel("threshold")

ax2.imshow(neg_data.data[50], cmap="Greys")
im2 = ax2.scatter(
    features_inv.where(mask_2)["hdim_2"],
    features_inv.where(mask_2)["hdim_1"],
    c=features_inv.where(mask_2)["threshold_value"],
    cmap="tab10",
)
cbar = plt.colorbar(im2, ax=ax2)
cbar.ax.set_ylabel("threshold")

[7]:

Text(0, 0.5, 'threshold')

../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_14_1.png

The three features were found in both data sets, and the color bars indicate which threshold they belong to. When using multiple thresholds, note that the order of the list is important. Each feature is assigned the threshold value that was reached last. Therefore, it makes sense to start with the lowest value in case of maxima.

Sigma Parameter for Smoothing of Noisy Data

Before the features are searched a gausian filter is applied to the data in order to smooth it. So let’s import the filter used by tobac for a demonstration:

[13]:

from scipy.ndimage import gaussian_filter

This filter works performing a convolution of a (in our case 2-dimensional) gaussian function

$h(x, y) = \frac{1}{2 \pi \sigma^2} \exp \left( - \frac{x^2+y^2}{2 \sigma^2} \right)$

with our data and with this parameter we set the value of $\sigma$ .

The effect of this filter can best be demonstrated on very sharp edges in the input. Therefore we create an array from a boolean mask of another 2d-Gaussian, which has only values of 0 or 1:

[14]:

x = np.linspace(-2, 2)
y = np.linspace(-2, 2)
xx, yy = np.meshgrid(x, y)

exp = np.exp(-(xx**2 + yy**2))

gaussian_data = np.expand_dims(exp, axis=0)

and we add some random noise to it:

[15]:

np.random.seed(54321)

noise = 0.3 * np.random.randn(*gaussian_data.shape)
data_sharp = np.array(gaussian_data > 0.5, dtype="float32")

data_sharp += noise

plt.figure(figsize=(8, 8))
plt.imshow(data_sharp[0])
plt.colorbar(shrink=0.8)

[15]:

<matplotlib.colorbar.Colorbar at 0x137d40250>

../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_33_1.png

If we apply this filter to the data with increasing sigmas, increasingly smoothed data will be the result:

[16]:

non_smooth_data = gaussian_filter(data_sharp, sigma=0)
smooth_data = gaussian_filter(data_sharp, sigma=1)
smoother_data = gaussian_filter(data_sharp, sigma=5)

fig, axes = plt.subplots(ncols=3, figsize=(16, 10))

im0 = axes[0].imshow(non_smooth_data[0], vmin=0, vmax=1)
axes[0].set_title(r"$\sigma = 0$")

im1 = axes[1].imshow(smooth_data[0], vmin=0, vmax=1)
axes[1].set_title(r"$\sigma = 1$")

im2 = axes[2].imshow(smoother_data[0], vmin=0, vmax=1)
axes[2].set_title(r"$\sigma = 5$")

cbar = fig.colorbar(im1, ax=axes.tolist(), shrink=0.4)
cbar.set_ticks(np.linspace(0, 1, 11))

../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_35_0.png

This is what happens in the background, when the feature_detection_multithreshold() function is called. The default value of sigma_threshold is 0.5. The next step is trying to detect features of the dataset with these sigma_threshold values. We first need an xarray DataArray again:

[17]:

date = np.datetime64("2022-04-01T00:00")
input_data = xr.DataArray(
    data=data_sharp, coords={"time": np.expand_dims(date, axis=0), "y": y, "x": x}
)

Now we set a threshold and detect the features:

[18]:

%%capture

threshold = 0.9
features_sharp = tobac.feature_detection_multithreshold(
    input_data, dxy, threshold, sigma_threshold=0
)
features_smooth = tobac.feature_detection_multithreshold(
    input_data, dxy, threshold, sigma_threshold=1
)
features_smoother = tobac.feature_detection_multithreshold(
    input_data, dxy, threshold, sigma_threshold=5
)

Attempting to plot the results

[19]:

fig, axes = plt.subplots(ncols=3, figsize=(14, 10))
plot_kws = dict(
    color="red",
    marker="s",
    s=100,
    edgecolors="w",
    linewidth=3,
)

im0 = axes[0].imshow(input_data[0])
axes[0].set_title(r"$\sigma = 0$")
axes[0].scatter(
    features_sharp["hdim_2"], features_sharp["hdim_1"], label="features", **plot_kws
)
axes[0].legend()

im0 = axes[1].imshow(input_data[0])
axes[1].set_title(r"$\sigma = 1$")
axes[1].scatter(features_smooth["hdim_2"], features_smooth["hdim_1"], **plot_kws)

im0 = axes[2].imshow(input_data[0])
axes[2].set_title(r"$\sigma = 5$")
try:
    axes[2].scatter(
        features_smoother["hdim_2"], features_smoother["hdim_1"], **plot_kws
    )
except:
    print("WARNING: No Feature Detected!")

WARNING: No Feature Detected!

../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_41_1.png

Noise may cause some false detections (left panel) that are significantly reduced when a suitable smoothing parameter is chosen (middle panel).

Band-Pass Filter for Input Fields via Parameter `wavelength_filtering`

This parameter can be understood best, when looking at real instead of snythethic data. An example of usage is given here

Methods and Parameters for Feature Detection: Part 1

Minima/Maxima and Multiple Thresholds for Feature Identification

Feature Position

Sigma Parameter for Smoothing of Noisy Data

Band-Pass Filter for Input Fields via Parameter wavelength_filtering

Band-Pass Filter for Input Fields via Parameter `wavelength_filtering`