Methods and Parameters for Feature Detection: Part 1

In this notebook, we will take a detailed look at tobac’s feature detection and examine some of its parameters. We concentrate on:

[1]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

%matplotlib inline

import seaborn as sns

sns.set_context("talk")

import warnings

warnings.filterwarnings("ignore")
[2]:
import tobac
import tobac.testing

Minima/Maxima and Multiple Thresholds for Feature Identification

Feature identification search for local maxima in the data.

When working different inputs it is sometimes necessary to switch the feature detection from finding maxima to minima. Furthermore, for more complex datasets containing multiple features differing in magnitude, a categorization according to this magnitude is desirable. Both will be demonstrated with the make_sample_data_2D_3blobs() function, which creates such a dataset. For the search for minima we will simply turn the dataset negative:

[3]:
data = tobac.testing.make_sample_data_2D_3blobs()
neg_data = -data

Let us have a look at frame number 50:

[4]:
n = 50

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 10))

im1 = ax1.imshow(data.data[50], cmap="Greys")
ax1.set_title("data")
cbar = plt.colorbar(im1, ax=ax1)

im2 = ax2.imshow(neg_data.data[50], cmap="Greys")
ax2.set_title(" negative data")
cbar = plt.colorbar(im2, ax=ax2)
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_8_0.png

As you can see the data has 3 maxima/minima with different extremal values. To capture these, we use list comprehensions to obtain multiple thresholds in the range of the data:

[5]:
thresholds = [i for i in range(9, 18)]
neg_thresholds = [-i for i in range(9, 18)]

These can now be passed as arguments to feature_detection_multithreshold(). With the target-keyword we can set a flag whether to search for minima or maxima. The standard is "maxima".

[6]:
%%capture

dxy, dt = tobac.utils.get_spacings(data)

features = tobac.feature_detection_multithreshold(
    data, dxy, thresholds, target="maximum"
)
features_inv = tobac.feature_detection_multithreshold(
    neg_data, dxy, neg_thresholds, target="minimum"
)

Let’s scatter the detected features onto frame 50 of the dataset and create colorbars for the threshold values:

[7]:
mask_1 = features["frame"] == n
mask_2 = features_inv["frame"] == n

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 10))

ax1.imshow(data.data[50], cmap="Greys")
im1 = ax1.scatter(
    features.where(mask_1)["hdim_2"],
    features.where(mask_1)["hdim_1"],
    c=features.where(mask_1)["threshold_value"],
    cmap="tab10",
)
cbar = plt.colorbar(im1, ax=ax1)
cbar.ax.set_ylabel("threshold")

ax2.imshow(neg_data.data[50], cmap="Greys")
im2 = ax2.scatter(
    features_inv.where(mask_2)["hdim_2"],
    features_inv.where(mask_2)["hdim_1"],
    c=features_inv.where(mask_2)["threshold_value"],
    cmap="tab10",
)
cbar = plt.colorbar(im2, ax=ax2)
cbar.ax.set_ylabel("threshold")
[7]:
Text(0, 0.5, 'threshold')
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_14_1.png

The three features were found in both data sets, and the color bars indicate which threshold they belong to. When using multiple thresholds, note that the order of the list is important. Each feature is assigned the threshold value that was reached last. Therefore, it makes sense to start with the lowest value in case of maxima.

Feature Position

To explore the influence of the position_threshold flag we need a radially asymmetric feature. Let’s create a simple one by adding two 2d-gaussians and add an extra dimension for the time, which is required for working with tobac:

[8]:
x = np.linspace(-2, 2)
y = np.linspace(-2, 2)
xx, yy = np.meshgrid(x, y)

exp1 = 1.5 * np.exp(-((xx + 0.5) ** 2 + (yy + 0.5) ** 2))
exp2 = 0.5 * np.exp(-((0.5 - xx) ** 2 + (0.5 - yy) ** 2))

asymmetric_data = np.expand_dims(exp1 + exp2, axis=0)

plt.figure(figsize=(10, 10))
plt.imshow(asymmetric_data[0])
plt.colorbar(shrink=0.8)
[8]:
<matplotlib.colorbar.Colorbar at 0x1377ed290>
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_18_1.png

To feed this data into the feature detection we need to convert it into an xarray.DataArray. Before we do that we select an arbitrary time and date for the single frame of our synthetic field:

[9]:
date = np.datetime64(
    "2022-04-01T00:00",
)
assym = xr.DataArray(
    data=asymmetric_data, coords={"time": np.expand_dims(date, axis=0), "y": y, "x": x}
)
assym
[9]:
<xarray.DataArray (time: 1, y: 50, x: 50)> Size: 20kB
array([[[0.01666536, 0.02114886, 0.02648334, ..., 0.00083392,
         0.00058509, 0.00040694],
        [0.02114886, 0.02683866, 0.03360844, ..., 0.00109464,
         0.00077154, 0.0005393 ],
        [0.02648334, 0.03360844, 0.04208603, ..., 0.00142435,
         0.00100896, 0.00070907],
        ...,
        [0.00083392, 0.00109464, 0.00142435, ..., 0.01405279,
         0.01121917, 0.00883872],
        [0.00058509, 0.00077154, 0.00100896, ..., 0.01121917,
         0.00895731, 0.00705704],
        [0.00040694, 0.0005393 , 0.00070907, ..., 0.00883872,
         0.00705704, 0.00556009]]])
Coordinates:
  * time     (time) datetime64[ns] 8B 2022-04-01
  * y        (y) float64 400B -2.0 -1.918 -1.837 -1.755 ... 1.837 1.918 2.0
  * x        (x) float64 400B -2.0 -1.918 -1.837 -1.755 ... 1.837 1.918 2.0

Since we do not have a dt in this dataset, we can not use the get_spacings()-utility this time and need to calculate the dxy spacing manually:

[10]:
dxy = assym.diff("x")

Finally, we choose a threshold in the datarange and apply the feature detection with the four position_threshold flags - ‘center’ - ‘extreme’ - ‘weighted_diff’ - ‘weighted_abs’

[11]:
%%capture

threshold = 0.2
features_center = tobac.feature_detection_multithreshold(
    assym, dxy, threshold, position_threshold="center"
)
features_extreme = tobac.feature_detection_multithreshold(
    assym, dxy, threshold, position_threshold="extreme"
)
features_diff = tobac.feature_detection_multithreshold(
    assym, dxy, threshold, position_threshold="weighted_diff"
)
features_abs = tobac.feature_detection_multithreshold(
    assym, dxy, threshold, position_threshold="weighted_abs"
)
[12]:
plt.figure(figsize=(10, 10))
plt.imshow(assym[0])
plt.scatter(
    features_center["hdim_2"],
    features_center["hdim_1"],
    color="black",
    marker="x",
    label="center",
)
plt.scatter(
    features_extreme["hdim_2"],
    features_extreme["hdim_1"],
    color="red",
    marker="x",
    label="extreme",
)
plt.scatter(
    features_diff["hdim_2"],
    features_diff["hdim_1"],
    color="purple",
    marker="x",
    label="weighted_diff",
)
plt.scatter(
    features_abs["hdim_2"],
    features_abs["hdim_1"],
    color="green",
    marker="+",
    label="weighted_abs",
)
plt.colorbar(shrink=0.8)
plt.legend()
[12]:
<matplotlib.legend.Legend at 0x1377f2b10>
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_25_1.png

As you can see this parameter specifies how the postion of the feature is defined. These are the descriptions given in the code:

  • extreme: get position as max/min position inside the identified region

  • center : get position as geometrical centre of identified region

  • weighted_diff: get position as centre of identified region, weighted by difference from the threshold

  • weighted_abs: get position as centre of identified region, weighted by absolute values if the field

Sigma Parameter for Smoothing of Noisy Data

Before the features are searched a gausian filter is applied to the data in order to smooth it. So let’s import the filter used by tobac for a demonstration:

[13]:
from scipy.ndimage import gaussian_filter

This filter works performing a convolution of a (in our case 2-dimensional) gaussian function

h(x, y) = \frac{1}{2 \pi \sigma^2} \exp \left( - \frac{x^2+y^2}{2 \sigma^2} \right)

with our data and with this parameter we set the value of \sigma.

The effect of this filter can best be demonstrated on very sharp edges in the input. Therefore we create an array from a boolean mask of another 2d-Gaussian, which has only values of 0 or 1:

[14]:
x = np.linspace(-2, 2)
y = np.linspace(-2, 2)
xx, yy = np.meshgrid(x, y)

exp = np.exp(-(xx**2 + yy**2))

gaussian_data = np.expand_dims(exp, axis=0)

and we add some random noise to it:

[15]:
np.random.seed(54321)

noise = 0.3 * np.random.randn(*gaussian_data.shape)
data_sharp = np.array(gaussian_data > 0.5, dtype="float32")

data_sharp += noise

plt.figure(figsize=(8, 8))
plt.imshow(data_sharp[0])
plt.colorbar(shrink=0.8)
[15]:
<matplotlib.colorbar.Colorbar at 0x137d40250>
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_33_1.png

If we apply this filter to the data with increasing sigmas, increasingly smoothed data will be the result:

[16]:
non_smooth_data = gaussian_filter(data_sharp, sigma=0)
smooth_data = gaussian_filter(data_sharp, sigma=1)
smoother_data = gaussian_filter(data_sharp, sigma=5)

fig, axes = plt.subplots(ncols=3, figsize=(16, 10))

im0 = axes[0].imshow(non_smooth_data[0], vmin=0, vmax=1)
axes[0].set_title(r"$\sigma = 0$")

im1 = axes[1].imshow(smooth_data[0], vmin=0, vmax=1)
axes[1].set_title(r"$\sigma = 1$")

im2 = axes[2].imshow(smoother_data[0], vmin=0, vmax=1)
axes[2].set_title(r"$\sigma = 5$")

cbar = fig.colorbar(im1, ax=axes.tolist(), shrink=0.4)
cbar.set_ticks(np.linspace(0, 1, 11))
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_35_0.png

This is what happens in the background, when the feature_detection_multithreshold() function is called. The default value of sigma_threshold is 0.5. The next step is trying to detect features of the dataset with these sigma_threshold values. We first need an xarray DataArray again:

[17]:
date = np.datetime64("2022-04-01T00:00")
input_data = xr.DataArray(
    data=data_sharp, coords={"time": np.expand_dims(date, axis=0), "y": y, "x": x}
)

Now we set a threshold and detect the features:

[18]:
%%capture

threshold = 0.9
features_sharp = tobac.feature_detection_multithreshold(
    input_data, dxy, threshold, sigma_threshold=0
)
features_smooth = tobac.feature_detection_multithreshold(
    input_data, dxy, threshold, sigma_threshold=1
)
features_smoother = tobac.feature_detection_multithreshold(
    input_data, dxy, threshold, sigma_threshold=5
)

Attempting to plot the results

[19]:
fig, axes = plt.subplots(ncols=3, figsize=(14, 10))
plot_kws = dict(
    color="red",
    marker="s",
    s=100,
    edgecolors="w",
    linewidth=3,
)

im0 = axes[0].imshow(input_data[0])
axes[0].set_title(r"$\sigma = 0$")
axes[0].scatter(
    features_sharp["hdim_2"], features_sharp["hdim_1"], label="features", **plot_kws
)
axes[0].legend()

im0 = axes[1].imshow(input_data[0])
axes[1].set_title(r"$\sigma = 1$")
axes[1].scatter(features_smooth["hdim_2"], features_smooth["hdim_1"], **plot_kws)

im0 = axes[2].imshow(input_data[0])
axes[2].set_title(r"$\sigma = 5$")
try:
    axes[2].scatter(
        features_smoother["hdim_2"], features_smoother["hdim_1"], **plot_kws
    )
except:
    print("WARNING: No Feature Detected!")
WARNING: No Feature Detected!
../../_images/examples4doc_Basics_Methods-and-Parameters-for-Feature-Detection_Part_1_41_1.png

Noise may cause some false detections (left panel) that are significantly reduced when a suitable smoothing parameter is chosen (middle panel).

Band-Pass Filter for Input Fields via Parameter wavelength_filtering

This parameter can be understood best, when looking at real instead of snythethic data. An example of usage is given here