tobac package

Submodules

tobac.analysis module

tobac.analysis.cell_analysis module

Perform analysis on the properties of tracked cells

tobac.analysis.cell_analysis.calculate_overlap(track_1, track_2, min_sum_inv_distance=None, min_mean_inv_distance=None)

Count the number of time frames in which the individual cells of two tracks are present together and calculate their mean and summed inverse distance.

Parameters:
  • track_1 (pandas.DataFrame) – The tracks conaining the cells to analyze.

  • track_2 (pandas.DataFrame) – The tracks conaining the cells to analyze.

  • min_sum_inv_distance (float, optional) – Minimum of the inverse net distance for two cells to be counted as overlapping. Default is None.

  • min_mean_inv_distance (float, optional) – Minimum of the inverse mean distance for two cells to be counted as overlapping. Default is None.

Returns:

overlap – DataFrame containing the columns cell_1 and cell_2 with the index of the cells from the tracks, n_overlap with the number of frames both cells are present in, mean_inv_distance with the mean inverse distance and sum_inv_distance with the summed inverse distance of the cells.

Return type:

pandas.DataFrame

tobac.analysis.cell_analysis.cell_statistics(input_cubes, track, mask, aggregators, cell, output_path='./', output_name='Profiles', width=10000, z_coord='model_level_number', dimensions=['x', 'y'], **kwargs)
Parameters:
  • input_cubes (iris.cube.Cube) –

  • track (dask.dataframe.DataFrame) –

  • mask (iris.cube.Cube) – Cube containing mask (int id for tracked volumes 0 everywhere else).

  • list (aggregators) – list of iris.analysis.Aggregator instances

  • cell (int) – Integer id of cell to create masked cube for output.

  • output_path (str, optional) – Default is ‘./’.

  • output_name (str, optional) – Default is ‘Profiles’.

  • width (int, optional) – Default is 10000.

  • z_coord (str, optional) – Name of the vertical coordinate in the cube. Default is ‘model_level_number’.

  • dimensions (list of str, optional) – Default is [‘x’, ‘y’].

  • **kwargs

Return type:

None

tobac.analysis.cell_analysis.cell_statistics_all(input_cubes, track, mask, aggregators, output_path='./', cell_selection=None, output_name='Profiles', width=10000, z_coord='model_level_number', dimensions=['x', 'y'], **kwargs)
Parameters:
  • input_cubes (iris.cube.Cube) –

  • track (dask.dataframe.DataFrame) –

  • mask (iris.cube.Cube) – Cube containing mask (int id for tracked volumes 0 everywhere else).

  • aggregators (list) – list of iris.analysis.Aggregator instances

  • output_path (str, optional) – Default is ‘./’.

  • cell_selection (optional) – Default is None.

  • output_name (str, optional) – Default is ‘Profiles’.

  • width (int, optional) – Default is 10000.

  • z_coord (str, optional) – Name of the vertical coordinate in the cube. Default is ‘model_level_number’.

  • dimensions (list of str, optional) – Default is [‘x’, ‘y’].

  • **kwargs

Return type:

None

tobac.analysis.cell_analysis.cog_cell(cell, Tracks=None, M_total=None, M_liquid=None, M_frozen=None, Mask=None, savedir=None)
Parameters:
  • cell (int) – Integer id of cell to create masked cube for output.

  • Tracks (optional) – Default is None.

  • M_total (subset of cube, optional) – Default is None.

  • M_liquid (subset of cube, optional) – Default is None.

  • M_frozen (subset of cube, optional) – Default is None.

  • savedir (str) – Default is None.

Return type:

None

tobac.analysis.cell_analysis.histogram_cellwise(Track, variable=None, bin_edges=None, quantity='max', density=False)

Create a histogram of the maximum, minimum or mean of a variable for the cells (series of features linked together over multiple timesteps) of a track. Essentially a wrapper of the numpy.histogram() method.

Parameters:
  • Track (pandas.DataFrame) – The track containing the variable to create the histogram from.

  • variable (string, optional) – Column of the DataFrame with the variable on which the histogram is to be based on. Default is None.

  • bin_edges (int or ndarray, optional) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a ndarray, it defines a monotonically increasing array of bin edges, including the rightmost edge.

  • quantity ({'max', 'min', 'mean'}, optional) – Flag determining wether to use maximum, minimum or mean of a variable from all timeframes the cell covers. Default is ‘max’.

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

Returns:

  • hist (ndarray) – The values of the histogram

  • bin_edges (ndarray) – The edges of the histogram

  • bin_centers (ndarray) – The centers of the histogram intervalls

Raises:

ValueError – If quantity is not ‘max’, ‘min’ or ‘mean’.

tobac.analysis.cell_analysis.lifetime_histogram(Track, bin_edges=array([0, 20, 40, 60, 80, 100, 120, 140, 160, 180]), density=False, return_values=False)

Compute the lifetime histogram of tracked cells.

Parameters:
  • Track (pandas.DataFrame) – Dataframe of linked features, containing the columns ‘cell’ and ‘time_cell’.

  • bin_edges (int or ndarray, optional) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a ndarray, it defines a monotonically increasing array of bin edges, including the rightmost edge. The unit is minutes. Default is np.arange(0, 200, 20).

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

  • return_values (bool, optional) – Bool determining wether the lifetimes of the features are returned from this function. Default is False.

Returns:

  • hist (ndarray) – The values of the histogram.

  • bin_edges (ndarray) – The edges of the histogram.

  • bin_centers (ndarray) – The centers of the histogram intervalls.

  • minutes, optional (ndarray) – Numpy.array of the lifetime of each feature in minutes. Returned if return_values is True.

tobac.analysis.cell_analysis.velocity_histogram(track, bin_edges=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]), density=False, method_distance=None, return_values=False)

Create an velocity histogram of the tracked cells. If the DataFrame does not contain a velocity column, the velocities are calculated.

Parameters:
  • track (pandas.DataFrame) –

    DataFrame of the linked features, containing the columns ‘cell’,

    ’time’ and either ‘projection_x_coordinate’ and ‘projection_y_coordinate’ or ‘latitude’ and ‘longitude’.

  • bin_edges (int or ndarray, optional) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a ndarray, it defines a monotonically increasing array of bin edges, including the rightmost edge. Default is np.arange(0, 30000, 500).

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

  • methods_distance ({None, 'xy', 'latlon'}, optional) – Method of distance calculation, used to calculate the velocity. ‘xy’ uses the length of the vector between the two features, ‘latlon’ uses the haversine distance. None checks wether the required coordinates are present and starts with ‘xy’. Default is None.

  • return_values (bool, optional) – Bool determining wether the velocities of the features are returned from this function. Default is False.

Returns:

  • hist (ndarray) – The values of the histogram.

  • bin_edges (ndarray) – The edges of the histogram.

  • velocities , optional (ndarray) – Numpy array with the velocities of each feature.

tobac.analysis.feature_analysis module

Perform analysis on the properties of detected features

tobac.analysis.feature_analysis.area_histogram(features, mask, bin_edges=array([0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000, 10500, 11000, 11500, 12000, 12500, 13000, 13500, 14000, 14500, 15000, 15500, 16000, 16500, 17000, 17500, 18000, 18500, 19000, 19500, 20000, 20500, 21000, 21500, 22000, 22500, 23000, 23500, 24000, 24500, 25000, 25500, 26000, 26500, 27000, 27500, 28000, 28500, 29000, 29500]), density=False, method_area=None, return_values=False, representative_area=False)

Create an area histogram of the features. If the DataFrame does not contain an area column, the areas are calculated.

Parameters:
  • features (pandas.DataFrame) – DataFrame of the features.

  • mask (iris.cube.Cube) – Cube containing mask (int for tracked volumes 0 everywhere else). Needs to contain either projection_x_coordinate and projection_y_coordinate or latitude and longitude coordinates. The output of a segmentation should be used here.

  • bin_edges (int or ndarray, optional) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a ndarray, it defines a monotonically increasing array of bin edges, including the rightmost edge. Default is np.arange(0, 30000, 500).

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

  • return_values (bool, optional) – Bool determining wether the areas of the features are returned from this function. Default is False.

  • representive_area (bool, optional) – If False, no weights will associated to the values. If True, the weights for each area will be the areas itself, i.e. each bin count will have the value of the sum of all areas within the edges of the bin. Default is False.

Returns:

  • hist (ndarray) – The values of the histogram.

  • bin_edges (ndarray) – The edges of the histogram.

  • bin_centers (ndarray) – The centers of the histogram intervalls.

  • areas (ndarray, optional) – A numpy array approximating the area of each feature.

tobac.analysis.feature_analysis.histogram_featurewise(Track, variable=None, bin_edges=None, density=False)

Create a histogram of a variable from the features (detected objects at a single time step) of a track. Essentially a wrapper of the numpy.histogram() method.

Parameters:
  • Track (pandas.DataFrame) – The track containing the variable to create the histogram from.

  • variable (string, optional) – Column of the DataFrame with the variable on which the histogram is to be based on. Default is None.

  • bin_edges (int or ndarray, optional) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge.

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

Returns:

  • hist (ndarray) – The values of the histogram

  • bin_edges (ndarray) – The edges of the histogram

  • bin_centers (ndarray) – The centers of the histogram intervalls

tobac.analysis.feature_analysis.nearestneighbordistance_histogram(features, bin_edges=array([0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000, 10500, 11000, 11500, 12000, 12500, 13000, 13500, 14000, 14500, 15000, 15500, 16000, 16500, 17000, 17500, 18000, 18500, 19000, 19500, 20000, 20500, 21000, 21500, 22000, 22500, 23000, 23500, 24000, 24500, 25000, 25500, 26000, 26500, 27000, 27500, 28000, 28500, 29000, 29500]), density=False, method_distance=None, return_values=False)

Create an nearest neighbor distance histogram of the features. If the DataFrame does not contain a ‘min_distance’ column, the distances are calculated.

bin_edgesint or ndarray, optional

If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a ndarray, it defines a monotonically increasing array of bin edges, including the rightmost edge. Default is np.arange(0, 30000, 500).

densitybool, optional

If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

method_distance{None, ‘xy’, ‘latlon’}, optional

Method of distance calculation. ‘xy’ uses the length of the vector between the two features, ‘latlon’ uses the haversine distance. None checks wether the required coordinates are present and starts with ‘xy’. Default is None.

return_valuesbool, optional

Bool determining wether the nearest neighbor distance of the features are returned from this function. Default is False.

Returns:

  • hist (ndarray) – The values of the histogram.

  • bin_edges (ndarray) – The edges of the histogram.

  • distances, optional (ndarray) – A numpy array with the nearest neighbor distances of each feature.

tobac.analysis.spatial module

Calculate spatial properties (distances, velocities, areas, volumes) of tracked objects

tobac.analysis.spatial.calculate_area(features, mask, method_area=None, vertical_coord=None)

Calculate the area of the segments for each feature.

Parameters:
  • features (pandas.DataFrame) – DataFrame of the features whose area is to be calculated.

  • mask (iris.cube.Cube) – Cube containing mask (int for tracked volumes 0 everywhere else). Needs to contain either projection_x_coordinate and projection_y_coordinate or latitude and longitude coordinates.

  • method_area ({None, 'xy', 'latlon'}, optional) – Flag determining how the area is calculated. ‘xy’ uses the areas of the individual pixels, ‘latlon’ uses the area_weights method of iris.analysis.cartography, None checks wether the required coordinates are present and starts with ‘xy’. Default is None.

  • vertical_coord (None | str, optional (default: None)) – Name of the vertical coordinate. If None, tries to auto-detect. It looks for the coordinate or the dimension name corresponding to the string.

Returns:

features – DataFrame of the features with a new column ‘area’, containing the calculated areas.

Return type:

pandas.DataFrame

Raises:

ValueError – If neither latitude/longitude nor projection_x_coordinate/projection_y_coordinate are present in mask_coords. If latitude/longitude coordinates are 2D. If latitude/longitude shapes are not supported. If method is undefined, i.e. method is neither None, ‘xy’ nor ‘latlon’.

tobac.analysis.spatial.calculate_areas_2Dlatlon(_2Dlat_coord, _2Dlon_coord)

Calculate an array of cell areas when given two 2D arrays of latitude and longitude values

NOTE: This currently assuems that the lat/lon grid is orthogonal, which is not strictly true! It’s close enough for most cases, but should be updated in future to use the cross product of the distances to the neighbouring cells. This will require the use of a more advanced calculation. I would advise using pyproj at some point in the future to solve this issue and replace haversine distance.

Parameters:
  • _2Dlat_coord (AuxCoord) – Iris auxilliary coordinate containing a 2d grid of latitudes for each point.

  • _2Dlon_coord (AuxCoord) – Iris auxilliary coordinate containing a 2d grid of longitudes for each point.

Returns:

area – A numpy array approximating the area of each cell.

Return type:

ndarray

tobac.analysis.spatial.calculate_distance(feature_1, feature_2, method_distance=None)

Compute the distance between two features. It is based on either lat/lon coordinates or x/y coordinates.

Parameters:
  • feature_1 (pandas.DataFrame or pandas.Series) – Dataframes containing multiple features or pandas.Series of one feature. Need to contain either projection_x_coordinate and projection_y_coordinate or latitude and longitude coordinates.

  • feature_2 (pandas.DataFrame or pandas.Series) – Dataframes containing multiple features or pandas.Series of one feature. Need to contain either projection_x_coordinate and projection_y_coordinate or latitude and longitude coordinates.

  • method_distance ({None, 'xy', 'latlon'}, optional) – Method of distance calculation. ‘xy’ uses the length of the vector between the two features, ‘latlon’ uses the haversine distance. None checks wether the required coordinates are present and starts with ‘xy’. Default is None.

Returns:

distance – Float with the distance between the two features in meters if the input are two pandas.Series containing one feature, pandas.Series of the distances if one of the inputs contains multiple features.

Return type:

float or pandas.Series

tobac.analysis.spatial.calculate_velocity(track, method_distance=None)

Calculate the velocities of a set of linked features.

Parameters:
  • track (pandas.DataFrame) –

    Dataframe of linked features, containing the columns ‘cell’,

    ’time’ and either ‘projection_x_coordinate’ and ‘projection_y_coordinate’ or ‘latitude’ and ‘longitude’.

  • method_distance ({None, 'xy', 'latlon'}, optional) – Method of distance calculation, used to calculate the velocity. ‘xy’ uses the length of the vector between the two features, ‘latlon’ uses the haversine distance. None checks wether the required coordinates are present and starts with ‘xy’. Default is None.

Returns:

track – DataFrame from the input, with an additional column ‘v’, contain the value of the velocity for every feature at every possible timestep

Return type:

pandas.DataFrame

tobac.analysis.spatial.calculate_velocity_individual(feature_old, feature_new, method_distance=None)

Calculate the mean velocity of a feature between two timeframes.

Parameters:
  • feature_old (pandas.Series) – pandas.Series of a feature at a certain timeframe. Needs to contain a ‘time’ column and either projection_x_coordinate and projection_y_coordinate or latitude and longitude coordinates.

  • feature_new (pandas.Series) – pandas.Series of the same feature at a later timeframe. Needs to contain a ‘time’ column and either projection_x_coordinate and projection_y_coordinate or latitude and longitude coordinates.

  • method_distance ({None, 'xy', 'latlon'}, optional) – Method of distance calculation, used to calculate the velocity. ‘xy’ uses the length of the vector between the two features, ‘latlon’ uses the haversine distance. None checks wether the required coordinates are present and starts with ‘xy’. Default is None.

Returns:

velocity – Value of the approximate velocity.

Return type:

float

tobac.analysis.spatial.haversine(lat1, lon1, lat2, lon2)

Computes the Haversine distance in kilometers.

Calculates the Haversine distance between two points (based on implementation CIS https://github.com/cedadev/cis).

Parameters:
  • lat1 (array of latitude, longitude) – First point or points as array in degrees.

  • lon1 (array of latitude, longitude) – First point or points as array in degrees.

  • lat2 (array of latitude, longitude) – Second point or points as array in degrees.

  • lon2 (array of latitude, longitude) – Second point or points as array in degrees.

Returns:

arclen * RADIUS_EARTH – Array of Distance(s) between the two points(-arrays) in kilometers.

Return type:

array

tobac.centerofgravity module

Identify center of gravity and mass for analysis.

tobac.centerofgravity.calculate_cog(tracks, mass, mask)

Calculate center of gravity and mass for each tracked cell.

Parameters:
  • tracks (pandas.DataFrame) – DataFrame containing trajectories of cell centers.

  • mass (iris.cube.Cube) – Cube of quantity (need coordinates ‘time’, ‘geopotential_height’,’projection_x_coordinate’ and ‘projection_y_coordinate’).

  • mask (iris.cube.Cube) – Cube containing mask (int > where belonging to area/volume of feature, 0 else).

Returns:

tracks_out – Dataframe containing t, x, y, z positions of center of gravity and total mass of each tracked cell at each timestep.

Return type:

pandas.DataFrame

tobac.centerofgravity.calculate_cog_domain(mass)

Calculate center of gravity and mass for entire domain.

Parameters:

mass (iris.cube.Cube) – Cube of quantity (need coordinates ‘time’, ‘geopotential_height’,’projection_x_coordinate’ and ‘projection_y_coordinate’).

Returns:

tracks_out – Dataframe containing t, x, y, z positions of center of gravity and total mass of the entire domain.

Return type:

pandas.DataFrame

tobac.centerofgravity.calculate_cog_untracked(mass, mask)

Calculate center of gravity and mass for untracked domain parts.

Parameters:
  • mass (iris.cube.Cube) – Cube of quantity (need coordinates ‘time’, ‘geopotential_height’,’projection_x_coordinate’ and ‘projection_y_coordinate’).

  • mask (iris.cube.Cube) – Cube containing mask (int > where belonging to area/volume of feature, 0 else).

Returns:

tracks_out – Dataframe containing t, x, y, z positions of center of gravity and total mass for untracked part of the domain.

Return type:

pandas.DataFrame

tobac.centerofgravity.center_of_gravity(cube_in)

Calculate center of gravity and sum of quantity.

Parameters:

cube_in (iris.cube.Cube) – Cube (potentially masked) of quantity (need coordinates ‘geopotential_height’,’projection_x_coordinate’ and ‘projection_y_coordinate’).

Returns:

  • x (float) – X position of center of gravity.

  • y (float) – Y position of center of gravity.

  • z (float) – Z position of center of gravity.

  • variable_sum (float) – Sum of quantity of over unmasked part of the cube.

tobac.feature_detection module

Provide feature detection.

This module can work with any two-dimensional field. To identify the features, contiguous regions above or below a threshold are determined and labelled individually. To describe the specific location of the feature at a specific point in time, different spatial properties are used to describe the identified region. [2]_

References

tobac.feature_detection.feature_detection_multithreshold(field_in: iris.cube.Cube, dxy: float = None, threshold: list[float] = None, min_num: int = 0, target: typing_extensions.Literal[maximum, minimum] = 'maximum', position_threshold: typing_extensions.Literal[center, extreme, weighted_diff, weighted abs] = 'center', sigma_threshold: float = 0.5, n_erosion_threshold: int = 0, n_min_threshold: int = 0, min_distance: float = 0, feature_number_start: int = 1, PBC_flag: typing_extensions.Literal[none, hdim_1, hdim_2, both] = 'none', vertical_coord: str = None, vertical_axis: int = None, detect_subset: dict = None, wavelength_filtering: tuple = None, dz: float | None = None, strict_thresholding: bool = False, statistic: dict[str, ~typing.Callable | tuple[~typing.Callable, dict]] | None = None) pandas.DataFrame

Perform feature detection based on contiguous regions.

The regions are above/below a threshold.

Parameters:
  • field_in (iris.cube.Cube) – 2D field to perform the tracking on (needs to have coordinate ‘time’ along one of its dimensions),

  • dxy (float) – Grid spacing of the input data (in meter).

  • thresholds (list of floats, optional) – Threshold values used to select target regions to track. The feature detection is inclusive of the threshold value(s), i.e. values greater/less than or equal are included in the target region. Default is None.

  • target ({'maximum', 'minimum'}, optional) – Flag to determine if tracking is targetting minima or maxima in the data. Default is ‘maximum’.

  • position_threshold ({'center', 'extreme', 'weighted_diff',) – ‘weighted_abs’}, optional Flag choosing method used for the position of the tracked feature. Default is ‘center’.

  • sigma_threshold (float, optional) – Standard deviation for intial filtering step. Default is 0.5.

  • n_erosion_threshold (int, optional) – Number of pixels by which to erode the identified features. Default is 0.

  • n_min_threshold (int, optional) – Minimum number of identified contiguous pixels for a feature to be detected. Default is 0.

  • min_distance (float, optional) – Minimum distance between detected features (in meters). Default is 0.

  • feature_number_start (int, optional) – Feature id to start with. Default is 1.

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

  • vertical_coord (str) – Name of the vertical coordinate. If None, tries to auto-detect. It looks for the coordinate or the dimension name corresponding to the string.

  • vertical_axis (int or None.) – The vertical axis number of the data. If None, uses vertical_coord to determine axis. This must be >=0.

  • detect_subset (dict-like or None) – Whether to run feature detection on only a subset of the data. If this is not None, it will subset the grid that we run feature detection on to the range specified for each axis specified. The format of this dict is: {axis-number: (start, end)}, where axis-number is the number of the axis to subset, start is inclusive, and end is exclusive. For example, if your data are oriented as (time, z, y, x) and you want to only detect on values between z levels 10 and 29, you would set: {1: (10, 30)}.

  • wavelength_filtering (tuple, optional) – Minimum and maximum wavelength for horizontal spectral filtering in meter. Default is None.

  • dz (float) – Constant vertical grid spacing (m), optional. If not specified and the input is 3D, this function requires that altitude is available in the features input. If you specify a value here, this function assumes that it is the constant z spacing between points, even if `z_coordinate_name` is specified.

  • strict_thresholding (Bool, optional) – If True, a feature can only be detected if all previous thresholds have been met. Default is False.

Returns:

features – Detected features. The structure of this dataframe is explained here

Return type:

pandas.DataFrame

tobac.feature_detection.feature_detection_multithreshold_timestep(data_i: ~numpy.array, i_time: int, threshold: list[float] | None = None, min_num: int = 0, target: typing_extensions.Literal[maximum, minimum] = 'maximum', position_threshold: typing_extensions.Literal[center, extreme, weighted_diff, weighted abs] = 'center', sigma_threshold: float = 0.5, n_erosion_threshold: int = 0, n_min_threshold: int = 0, min_distance: float = 0, feature_number_start: int = 1, PBC_flag: typing_extensions.Literal[none, hdim_1, hdim_2, both] = 'none', vertical_axis: int | None = None, dxy: float = -1, wavelength_filtering: tuple[float] | None = None, strict_thresholding: bool = False, statistic: dict[str, ~typing.Callable | tuple[~typing.Callable, dict]] | None = None) pandas.DataFrame

Find features in each timestep.

Based on iteratively finding regions above/below a set of thresholds. Smoothing the input data with the Gaussian filter makes output less sensitive to noisiness of input data.

Parameters:
  • data_i (iris.cube.Cube) – 3D field to perform the feature detection (single timestep) on.

  • i_time (int) – Number of the current timestep.

  • threshold (list of floats, optional) – Threshold value used to select target regions to track. The feature detection is inclusive of the threshold value(s), i.e. values greater/less than or equal are included in the target region. Default is None.

  • min_num (int, optional) – This parameter is not used in the function. Default is 0.

  • target ({'maximum', 'minimum'}, optinal) – Flag to determine if tracking is targetting minima or maxima in the data. Default is ‘maximum’.

  • position_threshold ({'center', 'extreme', 'weighted_diff',) – ‘weighted_abs’}, optional Flag choosing method used for the position of the tracked feature. Default is ‘center’.

  • sigma_threshold (float, optional) – Standard deviation for intial filtering step. Default is 0.5.

  • n_erosion_threshold (int, optional) – Number of pixels by which to erode the identified features. Default is 0.

  • n_min_threshold (int, optional) – Minimum number of identified contiguous pixels for a feature to be detected. Default is 0.

  • min_distance (float, optional) – Minimum distance between detected features (in meters). Default is 0.

  • feature_number_start (int, optional) – Feature id to start with. Default is 1.

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

  • vertical_axis (int) – The vertical axis number of the data.

  • dxy (float) – Grid spacing in meters.

  • wavelength_filtering (tuple, optional) – Minimum and maximum wavelength for spectral filtering in meters. Default is None.

  • strict_thresholding (Bool, optional) – If True, a feature can only be detected if all previous thresholds have been met. Default is False.

  • statistic (dict, optional) – Default is None. Optional parameter to calculate bulk statistics within feature detection. Dictionary with callable function(s) to apply over the region of each detected feature and the name of the statistics to appear in the feature ou tput dataframe. The functions should be the values and the names of the metric the keys (e.g. {‘mean’: np.mean})

Returns:

features_threshold – Detected features for individual timestep.

Return type:

pandas DataFrame

tobac.feature_detection.feature_detection_threshold(data_i: array, i_time: int, threshold: float | None = None, min_num: int = 0, target: typing_extensions.Literal[maximum, minimum] = 'maximum', position_threshold: typing_extensions.Literal[center, extreme, weighted_diff, weighted_abs] = 'center', sigma_threshold: float = 0.5, n_erosion_threshold: int = 0, n_min_threshold: int = 0, min_distance: float = 0, idx_start: int = 0, PBC_flag: typing_extensions.Literal[none, hdim_1, hdim_2, both] = 'none', vertical_axis: int = 0) tuple[pandas.DataFrame, dict]

Find features based on individual threshold value.

Parameters:
  • data_i (np.array) – 2D or 3D field to perform the feature detection (single timestep) on.

  • i_time (int) – Number of the current timestep.

  • threshold (float, optional) – Threshold value used to select target regions to track. The feature detection is inclusive of the threshold value(s), i.e. values greater/less than or equal are included in the target region. The feature detection is inclusive of the threshold value(s), i.e. values greater/less than or equal are included in the target region. Default is None.

  • target ({'maximum', 'minimum'}, optional) – Flag to determine if tracking is targetting minima or maxima in the data. Default is ‘maximum’.

  • position_threshold ({'center', 'extreme', 'weighted_diff',) – ‘weighted_abs’}, optional Flag choosing method used for the position of the tracked feature. Default is ‘center’.

  • sigma_threshold (float, optional) – Standard deviation for intial filtering step. Default is 0.5.

  • n_erosion_threshold (int, optional) – Number of pixels by which to erode the identified features. Default is 0.

  • n_min_threshold (int, optional) – Minimum number of identified contiguous pixels for a feature to be detected. Default is 0.

  • min_distance (float, optional) – Minimum distance between detected features (in meters). Default is 0.

  • idx_start (int, optional) – Feature id to start with. Default is 0.

  • PBC_flag ({'none', 'hdim_1', 'hdim_2', 'both'}) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

  • vertical_axis (int) – The vertical axis number of the data.

Returns:

  • features_threshold (pandas DataFrame) – Detected features for individual threshold.

  • regions (dict) – Dictionary containing the regions above/below threshold used for each feature (feature ids as keys).

tobac.feature_detection.feature_position(hdim1_indices: list[int], hdim2_indices: list[int], vdim_indices: list[int] | None = None, region_small: ~numpy.ndarray | None = None, region_bbox: list[int] | tuple[int] | None = None, track_data: ~numpy.ndarray | None = None, threshold_i: float | None = None, position_threshold: typing_extensions.Literal[center, extreme, weighted_diff, weighted abs] = 'center', target: typing_extensions.Literal[maximum, minimum] | None = None, PBC_flag: typing_extensions.Literal[none, hdim_1, hdim_2, both] = 'none', hdim1_min: int = 0, hdim1_max: int = 0, hdim2_min: int = 0, hdim2_max: int = 0) tuple[float]

Determine feature position with regard to the horizontal dimensions in pixels from the identified region above threshold values

Parameters:
  • hdim1_indices (list) – indices of pixels in region along first horizontal dimension

  • hdim2_indices (list) – indices of pixels in region along second horizontal dimension

  • vdim_indices (list, optional) – List of indices of feature along optional vdim (typically `z`)

  • region_small (2D or 3D array-like) – A true/false array containing True where the threshold is met and false where the threshold isn’t met. This array should be the the size specified by region_bbox, and can be a subset of the overall input array (i.e., `track_data`).

  • region_bbox (list or tuple with length of 4 or 6) – The coordinates that region_small occupies within the total track_data array. This is in the order that the coordinates come from the `get_label_props_in_dict` function. For 2D data, this should be: (hdim1 start, hdim 2 start, hdim 1 end, hdim 2 end). For 3D data, this is: (vdim start, hdim1 start, hdim 2 start, vdim end, hdim 1 end, hdim 2 end).

  • track_data (2D or 3D array-like) – 2D or 3D array containing the data

  • threshold_i (float) – The threshold value that we are testing against

  • position_threshold ({'center', 'extreme', 'weighted_diff', ') –

    weighted abs’} How to select the single point position from our data. ‘center’ picks the geometrical centre of the region, and is typically not recommended. ‘extreme’ picks the maximum or minimum value inside the region (max/min set by

    `target`) ‘weighted_diff’ picks the centre of the region weighted by the distance from the threshold value

    ’weighted_abs’ picks the centre of the region weighted by the absolute values of the field

  • target ({'maximum', 'minimum'}) – Used only when position_threshold is set to ‘extreme’, this sets whether it is looking for maxima or minima.

  • PBC_flag ({'none', 'hdim_1', 'hdim_2', 'both'}) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

  • hdim1_min (int) – Minimum real array index of the first horizontal dimension (for PBCs)

  • hdim1_max (int) – Maximum real array index of the first horizontal dimension (for PBCs) Note that this coordinate is INCLUSIVE, meaning that this is the maximum coordinate value, and it is not a length.

  • hdim2_min (int) – Minimum real array index of the first horizontal dimension (for PBCs)

  • hdim2_max (int) – Maximum real array index of the first horizontal dimension (for PBCs) Note that this coordinate is INCLUSIVE, meaning that this is the maximum coordinate value, and it is not a length.

Returns:

If input data is 2D, this will be a 2-element tuple of floats, where the first element is the feature position along the first horizontal dimension and the second element is the feature position along the second horizontal dimension. If input data is 3D, this will be a 3-element tuple of floats, where the first element is the feature position along the vertical dimension and the second two elements are the feature position on the first and second horizontal dimensions. Note for PBCs: this point can be >hdim1_max or hdim2_max if the point is between hdim1_max and hdim1_min. For example, if a feature lies exactly between hdim1_max and hdim1_min, the output could be between hdim1_max and hdim1_max+1. While a value between hdim1_min-1 and hdim1_min would also be valid, we choose to overflow on the max side of things.

Return type:

2-element or 3-element tuple of floats

tobac.feature_detection.filter_min_distance(features: pandas.DataFrame, dxy: float | None = None, dz: float | None = None, min_distance: float | None = None, x_coordinate_name: str | None = None, y_coordinate_name: str | None = None, z_coordinate_name: str | None = None, target: typing_extensions.Literal[maximum, minimum] = 'maximum', PBC_flag: typing_extensions.Literal[none, hdim_1, hdim_2, both] = 'none', min_h1: int = 0, max_h1: int = 0, min_h2: int = 0, max_h2: int = 0) pandas.DataFrame

Function to remove features that are too close together. If two features are closer than min_distance, it keeps the larger feature.

Parameters:
  • features (pandas DataFrame) – features

  • dxy (float) – Constant horzontal grid spacing (meters).

  • dz (float) – Constant vertical grid spacing (meters), optional. If not specified and the input is 3D, this function requires that z_coordinate_name is available in the features input. If you specify a value here, this function assumes that it is the constant z spacing between points, even if `z_coordinate_name` is specified.

  • min_distance (float) – minimum distance between detected features (meters)

  • x_coordinate_name (str) – The name of the x coordinate to calculate distance based on in meters. This is typically projection_x_coordinate. Currently unused.

  • y_coordinate_name (str) – The name of the y coordinate to calculate distance based on in meters. This is typically projection_y_coordinate. Currently unused.

  • z_coordinate_name (str or None) – The name of the z coordinate to calculate distance based on in meters. This is typically altitude. If None, tries to auto-detect.

  • target ({'maximum', 'minimum'}, optional) – Flag to determine if tracking is targeting minima or maxima in the data. Default is ‘maximum’.

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

  • min_h1 (int, optional) – Minimum real point in hdim_1, for use with periodic boundaries.

  • max_h1 (int, optional) – Maximum point in hdim_1, exclusive. max_h1-min_h1 should be the size.

  • min_h2 (int, optional) – Minimum real point in hdim_2, for use with periodic boundaries.

  • max_h2 (int, optional) – Maximum point in hdim_2, exclusive. max_h2-min_h2 should be the size.

Returns:

features after filtering

Return type:

pandas DataFrame

tobac.feature_detection.remove_parents(features_thresholds: pandas.DataFrame, regions_i: dict, regions_old: dict, strict_thresholding: bool = False) pandas.DataFrame

Remove parents of newly detected feature regions.

Remove features where its regions surround newly detected feature regions.

Parameters:
  • features_thresholds (pandas.DataFrame) – Dataframe containing detected features.

  • regions_i (dict) – Dictionary containing the regions greater/lower than and equal to threshold for the newly detected feature (feature ids as keys).

  • regions_old (dict) – Dictionary containing the regions greater/lower than and equal to threshold from previous threshold (feature ids as keys).

  • strict_thresholding (Bool, optional) – If True, a feature can only be detected if all previous thresholds have been met. Default is False.

Returns:

features_thresholds – Dataframe containing detected features excluding those that are superseded by newly detected ones.

Return type:

pandas.DataFrame

tobac.feature_detection.test_overlap(region_inner: list[tuple[int]], region_outer: list[tuple[int]]) bool

Test for overlap between two regions

Parameters:
  • region_1 (list) – list of 2-element tuples defining the indices of all cell in the region

  • region_2 (list) – list of 2-element tuples defining the indices of all cell in the region

Returns:

overlap – True if there are any shared points between the two regions

Return type:

bool

tobac.merge_split module

Tobac merge and split This submodule is a post processing step to address tracked cells which merge/split. The first iteration of this module is to combine the cells which are merging but have received a new cell id (and are considered a new cell) once merged. In general this submodule will label merged/split cells with a TRACK number in addition to its CELL number.

tobac.merge_split.merge_split_MEST(TRACK, dxy, distance=None, frame_len=5)

function to postprocess tobac track data for merge/split cells using a minimum euclidian spanning tree

Parameters:
  • TRACK (pandas.core.frame.DataFrame) – Pandas dataframe of tobac Track information

  • dxy (float, mandatory) – The x/y grid spacing of the data. Should be in meters.

distancefloat, optional

Distance threshold determining how close two features must be in order to consider merge/splitting. Default is 25x the x/y grid spacing of the data, given in dxy. The distance should be in units of meters.

frame_lenfloat, optional

Threshold for the maximum number of frames that can separate the end of cell and the start of a related cell. Default is five (5) frames.

Returns:

d

xarray dataset of tobac merge/split cells with parent and child designations.

Parent/child variables include:

  • cell_parent_track_id: The associated track id for each cell. All cells that have merged or split will have the same parent track id. If a cell never merges/splits, only one cell will have a particular track id.

  • feature_parent_cell_id: The associated parent cell id for each feature. All features in a given cell will have the same cell id. This is the original TRACK cell_id.

  • feature_parent_track_id: The associated parent track id for each feature. This is not the same as the cell id number.

  • track_child_cell_count: The total number of features belonging to all child cells of a given track id.

  • cell_child_feature_count: The total number of features for each cell.

Return type:

xarray.core.dataset.Dataset

Example usage:

d = merge_split_MEST(Track) ds = tobac.utils.standardize_track_dataset(Track, refl_mask) both_ds = xr.merge([ds, d],compat =’override’) both_ds = tobac.utils.compress_all(both_ds) both_ds.to_netcdf(os.path.join(savedir,’Track_features_merges.nc’))

tobac.plotting module

Provide methods for plotting analyzed data.

Plotting routines including both visualizations for the entire dataset including all tracks, and detailed visualizations for individual cells and their properties.

References

tobac.plotting.animation_mask_field(track, features, field, mask, interval=500, figsize=(10, 10), **kwargs)

Create animation of field, features and segments of all timeframes.

Parameters:
  • track (pandas.DataFrame) – Output of linking_trackpy.

  • features (pandas.DataFrame) – Output of the feature detection.

  • field (iris.cube.Cube) – Original input data.

  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes 0 everywhere else), output of the segmentation step.

  • interval (int, optional) – Delay between frames in milliseconds. Default is 500.

  • figsize (tupel of float, optional) – Width, height of the plot in inches. Default is (10, 10).

  • **kwargs

Returns:

animation – Created animation as object.

Return type:

matplotlib.animation.FuncAnimation

tobac.plotting.make_map(axes)

Configure the parameters of cartopy for plotting.

Parameters:

axes (cartopy.mpl.geoaxes.GeoAxesSubplot) – GeoAxesSubplot to configure.

Returns:

axes – Cartopy axes to configure

Return type:

cartopy.mpl.geoaxes.GeoAxesSubplot

tobac.plotting.map_tracks(track, axis_extent=None, figsize=None, axes=None, untracked_cell_value=-1)

Plot the trajectories of the cells on a map.

Parameters:
  • track (pandas.DataFrame) – Dataframe containing the linked features with a column ‘cell’.

  • axis_extent (matplotlib.axes, optional) – Array containing the bounds of the longitude and latitude values. The structure is [long_min, long_max, lat_min, lat_max]. Default is None.

  • figsize (tuple of floats, optional) – Width, height of the plot in inches. Default is (10, 10).

  • axes (cartopy.mpl.geoaxes.GeoAxesSubplot, optional) – GeoAxesSubplot to use for plotting. Default is None.

  • untracked_cell_value (int or np.nan, optional) – Value of untracked cells in track[‘cell’]. Default is -1.

Returns:

axes – Axes with the plotted trajectories.

Return type:

cartopy.mpl.geoaxes.GeoAxesSubplot

Raises:

ValueError – If no axes is passed.

tobac.plotting.plot_histogram_cellwise(track, bin_edges, variable, quantity, axes=None, density=False, **kwargs)

Plot the histogram of a variable based on the cells.

Parameters:
  • track (pandas.DataFrame) – DataFrame of the features containing the variable as column and a column ‘cell’.

  • bin_edges (int or ndarray) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge.

  • variable (string) – Column of the DataFrame with the variable on which the histogram is to be based on. Default is None.

  • quantity ({'max', 'min', 'mean'}, optional) – Flag determining wether to use maximum, minimum or mean of a variable from all timeframes the cell covers. Default is ‘max’.

  • axes (matplotlib.axes.Axes, optional) – Matplotlib axes to plot on. Default is None.

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

  • **kwargs

Returns:

plot_hist – List containing the matplotlib.lines.Line2D instance of the histogram

Return type:

list

tobac.plotting.plot_histogram_featurewise(Track, bin_edges, variable, axes=None, density=False, **kwargs)

Plot the histogram of a variable based on the features.

Parameters:
  • Track (pandas.DataFrame) – DataFrame of the features containing the variable as column.

  • bin_edges (int or ndarray) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge.

  • variable (str) – Column of the DataFrame with the variable on which the histogram is to be based on.

  • axes (matplotlib.axes.Axes, optional) – Matplotlib axes to plot on. Default is None.

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

  • **kwargs

Returns:

plot_hist – List containing the matplotlib.lines.Line2D instance of the histogram

Return type:

list

tobac.plotting.plot_lifetime_histogram(track, axes=None, bin_edges=array([0, 20, 40, 60, 80, 100, 120, 140, 160, 180]), density=False, **kwargs)

Plot the liftetime histogram of the cells.

Parameters:
  • track (pandas.DataFrame) – DataFrame of the features containing the columns ‘cell’ and ‘time_cell’.

  • axes (matplotlib.axes.Axes, optional) – Matplotlib axes to plot on. Default is None.

  • bin_edges (int or ndarray, optional) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge. Default is np.arange(0, 200, 20).

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

  • **kwargs

Returns:

plot_hist – List containing the matplotlib.lines.Line2D instance of the histogram

Return type:

list

tobac.plotting.plot_lifetime_histogram_bar(track, axes=None, bin_edges=array([0, 20, 40, 60, 80, 100, 120, 140, 160, 180]), density=False, width_bar=1, shift=0.5, **kwargs)

Plot the liftetime histogram of the cells as bar plot.

Parameters:
  • track (pandas.DataFrame) – DataFrame of the features containing the columns ‘cell’ and ‘time_cell’.

  • axes (matplotlib.axes.Axes, optional) – Matplotlib axes to plot on. Default is None.

  • bin_edges (int or ndarray, optional) – If bin_edges is an int, it defines the number of equal-width bins in the given range. If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge.

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Default is False.

  • width_bar (float) – Width of the bars. Default is 1.

  • shift (float) – Value to shift the bin centers to the right. Default is 0.5.

  • **kwargs

Returns:

plot_hist – matplotlib.container.BarContainer instance of the histogram

Return type:

matplotlib.container.BarContainer

tobac.plotting.plot_mask_cell_individual_3Dstatic(cell_i, track, cog, features, mask_total, field_contour, field_filled, axes=None, xlim=None, ylim=None, label_field_contour=None, cmap_field_contour='Blues', norm_field_contour=None, linewidths_contour=0.8, contour_labels=False, vmin_field_contour=0, vmax_field_contour=50, levels_field_contour=None, nlevels_field_contour=10, label_field_filled=None, cmap_field_filled='summer', norm_field_filled=None, vmin_field_filled=0, vmax_field_filled=100, levels_field_filled=None, nlevels_field_filled=10, title=None, feature_number=False, ele=10.0, azim=210.0)

Make plots for cell in fixed frame and with one background field as filling and one background field as contrours Input: Output:

tobac.plotting.plot_mask_cell_individual_follow(cell_i, track, cog, features, mask_total, field_contour, field_filled, axes=None, width=10000, label_field_contour=None, cmap_field_contour='Blues', norm_field_contour=None, linewidths_contour=0.8, contour_labels=False, vmin_field_contour=0, vmax_field_contour=50, levels_field_contour=None, nlevels_field_contour=10, label_field_filled=None, cmap_field_filled='summer', norm_field_filled=None, vmin_field_filled=0, vmax_field_filled=100, levels_field_filled=None, nlevels_field_filled=10, title=None)

Make individual plot for cell centred around cell and with one background field as filling and one background field as contrours Input: Output:

tobac.plotting.plot_mask_cell_individual_static(cell_i, track, cog, features, mask_total, field_contour, field_filled, axes=None, xlim=None, ylim=None, label_field_contour=None, cmap_field_contour='Blues', norm_field_contour=None, linewidths_contour=0.8, contour_labels=False, vmin_field_contour=0, vmax_field_contour=50, levels_field_contour=None, nlevels_field_contour=10, label_field_filled=None, cmap_field_filled='summer', norm_field_filled=None, vmin_field_filled=0, vmax_field_filled=100, levels_field_filled=None, nlevels_field_filled=10, title=None, feature_number=False)

Make plots for cell in fixed frame and with one background field as filling and one background field as contrours Input: Output:

tobac.plotting.plot_mask_cell_track_2D3Dstatic(cell, track, cog, features, mask_total, field_contour, field_filled, width=10000, n_extend=1, name='test', plotdir='./', file_format=['png'], figsize=(3.937007874015748, 3.937007874015748), dpi=300, ele=10, azim=30, **kwargs)

Make plots for all cells with fixed frame including entire development of the cell and with one background field as filling and one background field as contrours Input: Output:

tobac.plotting.plot_mask_cell_track_3Dstatic(cell, track, cog, features, mask_total, field_contour, field_filled, width=10000, n_extend=1, name='test', plotdir='./', file_format=['png'], figsize=(3.937007874015748, 3.937007874015748), dpi=300, **kwargs)

Make plots for all cells with fixed frame including entire development of the cell and with one background field as filling and one background field as contrours Input: Output:

tobac.plotting.plot_mask_cell_track_follow(cell, track, cog, features, mask_total, field_contour, field_filled, width=10000, name='test', plotdir='./', file_format=['png'], figsize=(3.937007874015748, 3.937007874015748), dpi=300, **kwargs)

Make plots for all cells centred around cell and with one background field as filling and one background field as contrours Input: Output:

tobac.plotting.plot_mask_cell_track_static(cell, track, cog, features, mask_total, field_contour, field_filled, width=10000, n_extend=1, name='test', plotdir='./', file_format=['png'], figsize=(3.937007874015748, 3.937007874015748), dpi=300, **kwargs)

Make plots for all cells with fixed frame including entire development of the cell and with one background field as filling and one background field as contrours Input: Output:

tobac.plotting.plot_mask_cell_track_static_timeseries(cell, track, cog, features, mask_total, field_contour, field_filled, track_variable=None, variable=None, variable_ylabel=None, variable_label=[None], variable_legend=False, variable_color=None, width=10000, n_extend=1, name='test', plotdir='./', file_format=['png'], figsize=(7.874015748031496, 3.937007874015748), dpi=300, **kwargs)

Make plots for all cells with fixed frame including entire development of the cell and with one background field as filling and one background field as contrours Input: Output:

tobac.plotting.plot_tracks_mask_field(track, field, mask, features, axes=None, axis_extent=None, plot_outline=True, plot_marker=True, marker_track='x', markersize_track=4, plot_number=True, plot_features=False, marker_feature=None, markersize_feature=None, title=None, title_str=None, vmin=None, vmax=None, n_levels=50, cmap='viridis', extend='neither', orientation_colorbar='horizontal', pad_colorbar=0.05, label_colorbar=None, fraction_colorbar=0.046, rasterized=True, linewidth_contour=1)

Plot field, features and segments of a timeframe and on a map projection. It is required to pass vmin, vmax, axes and axis_extent as keyword arguments.

Parameters:
  • track (pandas.DataFrame) – One or more timeframes of a dataframe generated by linking_trackpy.

  • field (iris.cube.Cube) – One frame/time step of the original input data.

  • mask (iris.cube.Cube) – One frame/time step of the Cube containing mask (int id for tracked volumes 0 everywhere else), output of the segmentation step.

  • features (pandas.DataFrame) – Output of the feature detection, one or more frames/time steps.

  • axes (cartopy.mpl.geoaxes.GeoAxesSubplot) – GeoAxesSubplot to use for plotting. Default is None.

  • axis_extent (ndarray) – Array containing the bounds of the longitude and latitude values. The structure is [long_min, long_max, lat_min, lat_max]. Default is None.

  • plot_outline (bool, optional) – Boolean defining whether the outlines of the segments are plotted. Default is True.

  • plot_marker (bool, optional) – Boolean defining whether the positions of the features from the track dataframe are plotted. Default is True.

  • marker_track (str, optional) – String defining the shape of the marker for the feature positions from the track dataframe. Default is ‘x’.

  • markersize_track (int, optional) – Int defining the size of the marker for the feature positions from the track dataframe. Default is 4.

  • plot_number (bool, optional) – Boolean defining wether the index of the cells is plotted next to the individual feature position. Default is True.

  • plot_features (bool, optional) – Boolean defining wether the positions of the features from the features dataframe are plotted. Default is True.

  • marker_feature (optional) – String defining the shape of the marker for the feature positions from the features dataframe. Default is None.

  • markersize_feature (optional) – Int defining the size of the marker for the feature positions from the features dataframe. Default is None.

  • title (str, optional) – Flag determining the title of the plot. ‘datestr’ uses date and time of the field. None sets not title. Default is None.

  • title_str (str, optional) – Additional string added to the beginning of the title. Default is None.

  • vmin (float) – Lower bound of the colorbar. Default is None.

  • vmax (float) – Upper bound of the colorbar. Default is None.

  • n_levels (int, optional) – Number of levels of the contour plot of the field. Default is 50.

  • cmap ({'viridis',...}, optional) – Colormap of the countour plot of the field. matplotlib.colors. Default is ‘viridis’.

  • extend (str, optional) – Determines the coloring of values that are outside the levels range. If ‘neither’, values outside the levels range are not colored. If ‘min’, ‘max’ or ‘both’, color the values below, above or below and above the levels range. Values below min(levels) and above max(levels) are mapped to the under/over values of the Colormap. Default is ‘neither’.

  • orientation_colorbar (str, optional) – Orientation of the colorbar, ‘horizontal’ or ‘vertical’ Default is ‘horizontal’.

  • pad_colorbar (float, optional) – Fraction of original axes between colorbar and new image axes. Default is 0.05.

  • label_colorbar (str, optional) – Label of the colorbar. If none, name and unit of the field are used. Default is None.

  • fraction_colorbar (float, optional) – Fraction of original axes to use for colorbar. Default is 0.046.

  • rasterized (bool, optional) – True enables, False disables rasterization. Default is True.

  • linewidth_contour (int, optional) – Linewidth of the contour plot of the segments. Default is 1.

Returns:

axes – Axes with the plot.

Return type:

cartopy.mpl.geoaxes.GeoAxesSubplot

Raises:

ValueError – If axes are not cartopy.mpl.geoaxes.GeoAxesSubplot. If mask.ndim is neither 2 nor 3.

tobac.plotting.plot_tracks_mask_field_loop(track, field, mask, features, axes=None, name=None, plot_dir='./', figsize=(3.937007874015748, 3.937007874015748), dpi=300, margin_left=0.05, margin_right=0.05, margin_bottom=0.05, margin_top=0.05, **kwargs)

Plot field, feature positions and segments onto individual maps for all timeframes and save them as pngs.

Parameters:
  • track (pandas.DataFrame) – Output of linking_trackpy.

  • field (iris.cube.Cube) – Original input data.

  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes, 0 everywhere else). Output of the segmentation step.

  • features (pandas.DataFrame) – Output of the feature detection.

  • axes (cartopy.mpl.geoaxes.GeoAxesSubplot, optional) – Not used. Default is None.

  • name (str, optional) – Filename without file extension. Same for all pngs. If None, the name of the field is used. Default is None.

  • plot_dir (str, optional) – Path where the plots will be saved. Default is ‘./’.

  • figsize (tuple of floats, optional) – Width, height of the plot in inches. Default is (10/2.54, 10/2.54).

  • dpi (int, optional) – Plot resolution. Default is 300.

  • margin_left (float, optional) – The position of the left edge of the axes, as a fraction of the figure width. Default is 0.05.

  • margin_right (float, optional) – The position of the right edge of the axes, as a fraction of the figure width. Default is 0.05.

  • margin_bottom (float, optional) – The position of the bottom edge of the axes, as a fraction of the figure width. Default is 0.05.

  • margin_top (float, optional) – The position of the top edge of the axes, as a fraction of the figure width. Default is 0.05.

  • **kwargs

Return type:

None

tobac.segmentation module

Provide segmentation techniques.

Segmentation techniques are used to associate areas or volumes to each identified feature. The segmentation is implemented using watershedding techniques from the field of image processing with a fixed threshold value. This value has to be set specifically for every type of input data and application. The segmentation can be performed for both two-dimensional and three-dimensional data. At each timestep, a marker is set at the position (weighted mean center) of each feature identified in the detection step in an array otherwise filled with zeros. In case of the three-dimentional watershedding, all cells in the column above the weighted mean center position of the identified features fulfilling the threshold condition are set to the respective marker. The algorithm then fills the area (2D) or volume (3D) based on the input field starting from these markers until reaching the threshold. If two or more features are directly connected, the border runs along the watershed line between the two regions. This procedure creates a mask that has the same form as the input data, with the corresponding integer number at all grid points that belong to a feature, else with zero. This mask can be conveniently and efficiently used to select the volume of each feature at a specific time step for further analysis or visialization.

References

tobac.segmentation.add_markers(features: pandas.DataFrame, marker_arr: array, seed_3D_flag: typing_extensions.Literal[column, box], seed_3D_size: int | tuple[int] = 5, level: None | slice = None, PBC_flag: typing_extensions.Literal[none, hdim_1, hdim_2, both] = 'none') array

Adds markers for watershedding using the features dataframe to the marker_arr.

Parameters:
  • features (pandas.DataFrame) – Features for one point in time to add as markers.

  • marker_arr (2D or 3D array-like) – Array to add the markers to. Assumes a (z, y, x) configuration.

  • seed_3D_flag (str('column', 'box')) – Seed 3D field at feature positions with either the full column or a box of user-set size

  • seed_3D_size (int or tuple (dimensions equal to dimensions of field)) – This sets the size of the seed box when seed_3D_flag is ‘box’. If it’s an integer (units of number of pixels), the seed box is identical in all dimensions. If it’s a tuple, it specifies the seed area for each dimension separately, in units of pixels. Note: we strongly recommend the use of odd numbers for this. If you give an even number, your seed box will be biased and not centered around the feature. Note: if two seed boxes overlap, the feature that is seeded will be the closer feature.

  • level (slice or None) – If seed_3D_flag is ‘column’, the levels at which to seed the cells for the watershedding algorithm. If None, seeds all levels.

  • PBC_flag ({'none', 'hdim_1', 'hdim_2', 'both'}) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

Returns:

The marker array

Return type:

2D or 3D array like (same type as marker_arr)

tobac.segmentation.check_add_unseeded_across_bdrys(dim_to_run: str, segmentation_mask: array, unseeded_labels: array, border_min: int, border_max: int, markers_arr: array, inplace: bool = True) array

Add new markers to unseeded but eligible regions when they are bordering an appropriate boundary.

Parameters:
  • dim_to_run ({'hdim_1', 'hdim_2'}) – what dimension to run

  • segmentation_mask (np.array) – the incomming segmentation mask

  • unseeded_labels (np.array) – The list of labels that are unseeded

  • border_min (int) – minimum real point in the dimension we are running on

  • border_max (int) – maximum real point in the dimension we are running on (inclusive)

  • markers_arr (np.array) – The array of markers to re-run segmentation with

  • inplace (bool) – whether or not to modify markers_arr in place

Return type:

markers_arr with new markers added

tobac.segmentation.segmentation(features: pandas.DataFrame, field: iris.cube.Cube, dxy: float, threshold: float = 0.003, target: typing_extensions.Literal[maximum, minimum] = 'maximum', level: None | slice = None, method: typing_extensions.Literal[watershed] = 'watershed', max_distance: None | float = None, vertical_coord: str | None = None, PBC_flag: typing_extensions.Literal[none, hdim_1, hdim_2, both] = 'none', seed_3D_flag: typing_extensions.Literal[column, box] = 'column', seed_3D_size: int | tuple[int] = 5, segment_number_below_threshold: int = 0, segment_number_unassigned: int = 0, statistic: dict[str, Callable | tuple[Callable, dict]] | None = None) tuple[iris.cube.Cube, pandas.DataFrame]

Use watershedding to determine region above a threshold value around initial seeding position for all time steps of the input data. Works both in 2D (based on single seeding point) and 3D and returns a mask with zeros everywhere around the identified regions and the feature id inside the regions.

Calls segmentation_timestep at each individal timestep of the input data.

Parameters:
  • features (pandas.DataFrame) – Output from trackpy/maketrack.

  • field (iris.cube.Cube) – Containing the field to perform the watershedding on.

  • dxy (float) – Grid spacing of the input data in meters.

  • threshold (float, optional) – Threshold for the watershedding field to be used for the mask. Default is 3e-3.

  • target ({'maximum', 'minimum'}, optional) – Flag to determine if tracking is targetting minima or maxima in the data. Default is ‘maximum’.

  • level (slice of iris.cube.Cube, optional) – Levels at which to seed the cells for the watershedding algorithm. Default is None.

  • method ({'watershed'}, optional) – Flag determining the algorithm to use (currently watershedding implemented). ‘random_walk’ could be uncommented.

  • max_distance (float, optional) – Maximum distance from a marker allowed to be classified as belonging to that cell in meters. Default is None.

  • vertical_coord ({'auto', 'z', 'model_level_number', 'altitude',) – ‘geopotential_height’}, optional Name of the vertical coordinate for use in 3D segmentation case

  • PBC_flag ({'none', 'hdim_1', 'hdim_2', 'both'}) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

  • seed_3D_flag (str('column', 'box')) – Seed 3D field at feature positions with either the full column (default) or a box of user-set size

  • seed_3D_size (int or tuple (dimensions equal to dimensions of field)) – This sets the size of the seed box when seed_3D_flag is ‘box’. If it’s an integer (units of number of pixels), the seed box is identical in all dimensions. If it’s a tuple, it specifies the seed area for each dimension separately, in units of pixels. Note: we strongly recommend the use of odd numbers for this. If you give an even number, your seed box will be biased and not centered around the feature. Note: if two seed boxes overlap, the feature that is seeded will be the closer feature.

  • segment_number_below_threshold (int) – the marker to use to indicate a segmentation point is below the threshold.

  • segment_number_unassigned (int) – the marker to use to indicate a segmentation point is above the threshold but unsegmented.

  • statistic (dict, optional) – Default is None. Optional parameter to calculate bulk statistics within feature detection. Dictionary with callable function(s) to apply over the region of each detected feature and the name of the statistics to appear in the feature output dataframe. The functions should be the values and the names of the metric the keys (e.g. {‘mean’: np.mean})

Returns:

  • segmentation_out (iris.cube.Cube) – Mask, 0 outside and integer numbers according to track inside the area/volume of the feature.

  • features_out (pandas.DataFrame) – Feature dataframe including the number of cells (2D or 3D) in the segmented area/volume of the feature at the timestep.

Raises:

ValueError – If field_in.ndim is neither 3 nor 4 and ‘time’ is not included in coords.

tobac.segmentation.segmentation_2D(features, field, dxy, threshold=0.003, target='maximum', level=None, method='watershed', max_distance=None, PBC_flag='none', seed_3D_flag='column', statistic=None)

Wrapper for the segmentation()-function.

tobac.segmentation.segmentation_3D(features, field, dxy, threshold=0.003, target='maximum', level=None, method='watershed', max_distance=None, PBC_flag='none', seed_3D_flag='column', statistic=None)

Wrapper for the segmentation()-function.

tobac.segmentation.segmentation_timestep(field_in: iris.cube.Cube, features_in: pandas.DataFrame, dxy: float, threshold: float = 0.003, target: typing_extensions.Literal[maximum, minimum] = 'maximum', level: None | slice = None, method: typing_extensions.Literal[watershed] = 'watershed', max_distance: None | float = None, vertical_coord: str | None = None, PBC_flag: typing_extensions.Literal[none, hdim_1, hdim_2, both] = 'none', seed_3D_flag: typing_extensions.Literal[column, box] = 'column', seed_3D_size: int | tuple[int] = 5, segment_number_below_threshold: int = 0, segment_number_unassigned: int = 0, statistic: dict[str, Callable | tuple[Callable, dict]] | None = None) tuple[iris.cube.Cube, pandas.DataFrame]

Perform watershedding for an individual time step of the data. Works for both 2D and 3D data

Parameters:
  • field_in (iris.cube.Cube) – Input field to perform the watershedding on (2D or 3D for one specific point in time).

  • features_in (pandas.DataFrame) – Features for one specific point in time.

  • dxy (float) – Grid spacing of the input data in metres

  • threshold (float, optional) – Threshold for the watershedding field to be used for the mask. The watershedding is exclusive of the threshold value, i.e. values greater (less) than the threshold are included in the target region, while values equal to the threshold value are excluded. Default is 3e-3.

  • target ({'maximum', 'minimum'}, optional) – Flag to determine if tracking is targeting minima or maxima in the data to determine from which direction to approach the threshold value. Default is ‘maximum’.

  • level (slice of iris.cube.Cube, optional) – Levels at which to seed the cells for the watershedding algorithm. Default is None.

  • method ({'watershed'}, optional) – Flag determining the algorithm to use (currently watershedding implemented).

  • max_distance (float, optional) – Maximum distance from a marker allowed to be classified as belonging to that cell in meters. Default is None.

  • vertical_coord (str, optional) – Vertical coordinate in 3D input data. If None, input is checked for one of {‘z’, ‘model_level_number’, ‘altitude’,’geopotential_height’} as a likely coordinate name

  • PBC_flag ({'none', 'hdim_1', 'hdim_2', 'both'}) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

  • seed_3D_flag (str('column', 'box')) – Seed 3D field at feature positions with either the full column (default) or a box of user-set size

  • seed_3D_size (int or tuple (dimensions equal to dimensions of field)) – This sets the size of the seed box when seed_3D_flag is ‘box’. If it’s an integer (units of number of pixels), the seed box is identical in all dimensions. If it’s a tuple, it specifies the seed area for each dimension separately, in units of pixels. Note: we strongly recommend the use of odd numbers for this. If you give an even number, your seed box will be biased and not centered around the feature. Note: if two seed boxes overlap, the feature that is seeded will be the closer feature.

  • segment_number_below_threshold (int) – the marker to use to indicate a segmentation point is below the threshold.

  • segment_number_unassigned (int) – the marker to use to indicate a segmentation point is above the threshold but unsegmented. This can be the same as segment_number_below_threshold, but can also be set separately.

  • statistics (boolean, optional) – Default is None. If True, bulk statistics for the data points assigned to each feature are saved in output.

Returns:

  • segmentation_out (iris.cube.Cube) – Mask, 0 outside and integer numbers according to track inside the ojects.

  • features_out (pandas.DataFrame) – Feature dataframe including the number of cells (2D or 3D) in the segmented area/volume of the feature at the timestep.

Raises:

ValueError – If target is neither ‘maximum’ nor ‘minimum’. If vertical_coord is not in {‘auto’, ‘z’, ‘model_level_number’, ‘altitude’, geopotential_height’}. If there is more than one coordinate name. If the spatial dimension is neither 2 nor 3. If method is not ‘watershed’.

tobac.segmentation.watershedding_2D(track, field_in, **kwargs)

Wrapper for the segmentation()-function.

tobac.segmentation.watershedding_3D(track, field_in, **kwargs)

Wrapper for the segmentation()-function.

tobac.testing module

Containing methods to make simple sample data for testing.

tobac.testing.generate_grid_coords(min_max_coords, lengths)

Generates a grid of coordinates, such as fake lat/lons for testing.

Parameters:
  • min_max_coords (array-like, either length 2, length 4, or length 6.) – The minimum and maximum values in each dimension as: (min_dim1, max_dim1, min_dim2, max_dim2, min_dim3, max_dim3) to use all 3 dimensions. You can omit any dimensions that you aren’t using.

  • lengths (array-like, either length 1, 2, or 3.) – The lengths of values in each dimension. Length must equal 1/2 the length of min_max_coords.

Returns:

array-like of grid coordinates in the number of dimensions requested and with the number of arrays specified (meshed coordinates)

Return type:

1, 2, or 3 array-likes

tobac.testing.generate_single_feature(start_h1, start_h2, start_v=None, spd_h1=1, spd_h2=1, spd_v=1, min_h1=0, max_h1=None, min_h2=0, max_h2=None, num_frames=1, dt=datetime.timedelta(seconds=300), start_date=datetime.datetime(2022, 1, 1, 0, 0), PBC_flag='none', frame_start=0, feature_num=1, feature_size=None, threshold_val=None)

Function to generate a dummy feature dataframe to test the tracking functionality

Parameters:
  • start_h1 (float) – Starting point of the feature in hdim_1 space

  • start_h2 (float) – Starting point of the feature in hdim_2 space

  • start_v (float, optional) – Starting point of the feature in vdim space (if 3D). For 2D, set to None. Default is None

  • spd_h1 (float, optional) – Speed (per frame) of the feature in hdim_1 Default is 1

  • spd_h2 (float, optional) – Speed (per frame) of the feature in hdim_2 Default is 1

  • spd_v (float, optional) – Speed (per frame) of the feature in vdim Default is 1

  • min_h1 (int, optional) – Minimum value of hdim_1 allowed. If PBC_flag is not ‘none’, then this will be used to know when to wrap around periodic boundaries. If PBC_flag is ‘none’, features will disappear if they are above/below these bounds. Default is 0

  • max_h1 (int, optional) – Similar to min_h1, but the max value of hdim_1 allowed. Default is 1000

  • min_h2 (int, optional) – Similar to min_h1, but the minimum value of hdim_2 allowed. Default is 0

  • max_h2 (int, optional) – Similar to min_h1, but the maximum value of hdim_2 allowed. Default is 1000

  • num_frames (int, optional) – Number of frames to generate Default is 1

  • dt (datetime.timedelta, optional) – Difference in time between each frame Default is datetime.timedelta(minutes=5)

  • start_date (datetime.datetime, optional) – Start datetime Default is datetime.datetime(2022, 1, 1, 0)

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

  • frame_start (int) – Number to start the frame at Default is 1

  • feature_num (int, optional) – What number to start the feature at Default is 1

  • feature_size (int or None) – ‘num’ column in output; feature size If None, doesn’t set this column

  • threshold_val (float or None) – Threshold value of this feature

tobac.testing.get_single_pbc_coordinate(h1_min, h1_max, h2_min, h2_max, h1_coord, h2_coord, PBC_flag='none')

Function to get the PBC-adjusted coordinate for an original non-PBC adjusted coordinate.

Parameters:
  • h1_min (int) – Minimum point in hdim_1

  • h1_max (int) – Maximum point in hdim_1

  • h2_min (int) – Minimum point in hdim_2

  • h2_max (int) – Maximum point in hdim_2

  • h1_coord (int) – hdim_1 query coordinate

  • h2_coord (int) – hdim_2 query coordinate

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

Returns:

Returns a tuple of (hdim_1, hdim_2).

Return type:

tuple

Raises:

ValueError – Raises a ValueError if the point is invalid (e.g., h1_coord < h1_min when PBC_flag = ‘none’)

tobac.testing.get_start_end_of_feat(center_point, size, axis_min, axis_max, is_pbc=False)

Gets the start and ending points for a feature given a size and PBC conditions

Parameters:
  • center_point (float) – The center point of the feature

  • size (float) – The size of the feature in this dimension

  • axis_min (int) – Minimum point on the axis (usually 0)

  • axis_max (int) – Maximum point on the axis (exclusive). This is 1 after the last real point on the axis, such that axis_max - axis_min is the size of the axis

  • is_pbc (bool) – True if we should give wrap around points, false if we shouldn’t.

Returns:

  • tuple (start_point, end_point)

  • Note that if is_pbc is True, start_point can be less than axis_min and

  • end_point can be greater than or equal to axis_max. This is designed to be used with

  • `get_pbc_coordinates`

tobac.testing.lists_equal_without_order(a, b)

This will make sure the inner list contain the same, but doesn’t account for duplicate groups. from: https://stackoverflow.com/questions/31501909/assert-list-of-list-equality-without-order-in-python/31502000

tobac.testing.make_dataset_from_arr(in_arr, data_type='xarray', time_dim_num=None, z_dim_num=None, z_dim_name='altitude', y_dim_num=0, x_dim_num=1)

Makes a dataset (xarray or iris) for feature detection/segmentation from a raw numpy/dask/etc. array.

Parameters:
  • in_arr (array-like) – The input array to convert to iris/xarray

  • data_type (str('xarray' or 'iris'), optional) – Type of the dataset to return Default is ‘xarray’

  • time_dim_num (int or None, optional) – What axis is the time dimension on, None for a single timestep Default is None

  • z_dim_num (int or None, optional) – What axis is the z dimension on, None for a 2D array

  • z_dim_name (str) – What the z dimension name is named

  • y_dim_num (int) – What axis is the y dimension on, typically 0 for a 2D array Default is 0

  • x_dim_num (int, optional) – What axis is the x dimension on, typically 1 for a 2D array Default is 1

Return type:

Iris or xarray dataset with everything we need for feature detection/tracking.

tobac.testing.make_feature_blob(in_arr, h1_loc, h2_loc, v_loc=None, h1_size=1, h2_size=1, v_size=1, shape='rectangle', amplitude=1, PBC_flag='none')

Function to make a defined “blob” in location (zloc, yloc, xloc) with user-specified shape and amplitude. Note that this function will round the size and locations to the nearest point within the array.

Parameters:
  • in_arr (array-like) – input array to add the “blob” to

  • h1_loc (float) – Center hdim_1 location of the blob, required

  • h2_loc (float) – Center hdim_2 location of the blob, required

  • v_loc (float, optional) – Center vdim location of the blob, optional. If this is None, we assume that the dataset is 2D. Default is None

  • h1_size (float, optional) – Size of the bubble in array coordinates in hdim_1 Default is 1

  • h2_size (float, optional) – Size of the bubble in array coordinates in hdim_2 Default is 1

  • v_size (float, optional) – Size of the bubble in array coordinates in vdim Default is 1

  • shape (str('rectangle'), optional) – The shape of the blob that is added. For now, this is just rectangle ‘rectangle’ adds a rectangular/rectangular prism bubble with constant amplitude amplitude. Default is “rectangle”

  • amplitude (float, optional) – Maximum amplitude of the blob Default is 1

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

Returns:

An array with the same type as in_arr that has the blob added.

Return type:

array-like

tobac.testing.make_sample_data_2D_3blobs(data_type='iris')

Create a simple dataset to use in tests.

The grid has a grid spacing of 1km in both horizontal directions and 100 grid cells in x direction and 200 in y direction. Time resolution is 1 minute and the total length of the dataset is 100 minutes around a arbitrary date (2000-01-01 12:00). The longitude and latitude coordinates are added as 2D aux coordinates and arbitrary, but in realisitic range. The data contains three individual blobs travelling on a linear trajectory through the dataset for part of the time.

Parameters:

data_type ({'iris', 'xarray'}, optional) – Choose type of the dataset that will be produced. Default is ‘iris’

Returns:

sample_data

Return type:

iris.cube.Cube or xarray.DataArray

tobac.testing.make_sample_data_2D_3blobs_inv(data_type='iris')

Create a version of the dataset with switched coordinates.

Create a version of the dataset created in the function make_sample_cube_2D, but with switched coordinate order for the horizontal coordinates for tests to ensure that this does not affect the results.

Parameters:

data_type ({'iris', 'xarray'}, optional) – Choose type of the dataset that will be produced. Default is ‘iris’

Returns:

sample_data

Return type:

iris.cube.Cube or xarray.DataArray

tobac.testing.make_sample_data_3D_3blobs(data_type='iris', invert_xy=False)

Create a simple dataset to use in tests.

The grid has a grid spacing of 1km in both horizontal directions and 100 grid cells in x direction and 200 in y direction. Time resolution is 1 minute and the total length of the dataset is 100 minutes around a abritraty date (2000-01-01 12:00). The longitude and latitude coordinates are added as 2D aux coordinates and arbitrary, but in realisitic range. The data contains three individual blobs travelling on a linear trajectory through the dataset for part of the time.

Parameters:
  • data_type ({'iris', 'xarray'}, optional) – Choose type of the dataset that will be produced. Default is ‘iris’

  • invert_xy (bool, optional) – Flag to determine wether to switch x and y coordinates Default is False

Returns:

sample_data

Return type:

iris.cube.Cube or xarray.DataArray

tobac.testing.make_simple_sample_data_2D(data_type='iris')

Create a simple dataset to use in tests.

The grid has a grid spacing of 1km in both horizontal directions and 100 grid cells in x direction and 500 in y direction. Time resolution is 1 minute and the total length of the dataset is 100 minutes around a abritraty date (2000-01-01 12:00). The longitude and latitude coordinates are added as 2D aux coordinates and arbitrary, but in realisitic range. The data contains a single blob travelling on a linear trajectory through the dataset for part of the time.

Parameters:

data_type ({'iris', 'xarray'}, optional) – Choose type of the dataset that will be produced. Default is ‘iris’

Returns:

sample_data

Return type:

iris.cube.Cube or xarray.DataArray

tobac.testing.set_arr_2D_3D(in_arr, value, start_h1, end_h1, start_h2, end_h2, start_v=None, end_v=None)

Function to set part of in_arr for either 2D or 3D points to value. If start_v and end_v are not none, we assume that the array is 3D. If they are none, we will set the array as if it is a 2D array.

Parameters:
  • in_arr (array-like) – Array of values to set

  • value (int, float, or array-like of size (end_v-start_v, end_h1-start_h1, end_h2-start_h2)) – The value to assign to in_arr. This will work to assign an array, but the array must have the same dimensions as the size specified in the function.

  • start_h1 (int) – Start index to set for hdim_1

  • end_h1 (int) – End index to set for hdim_1 (exclusive, so it acts like [start_h1:end_h1])

  • start_h2 (int) – Start index to set for hdim_2

  • end_h2 (int) – End index to set for hdim_2

  • start_v (int, optional) – Start index to set for vdim Default is None

  • end_v (int, optional) – End index to set for vdim Default is None

Returns:

in_arr with the new values set.

Return type:

array-like

tobac.tracking module

Provide tracking methods.

The individual features and associated area/volumes identified in each timestep have to be linked into trajectories to analyse the time evolution of their properties for a better understanding of the underlying physical processes. The implementations are structured in a way that allows for the future addition of more complex tracking methods recording a more complex network of relationships between features at different points in time.

References

tobac.tracking.add_cell_time(t: pandas.DataFrame, cell_number_unassigned: int)

add cell time as time since the initiation of each cell

Parameters:
  • t (pandas.DataFrame) – trajectories with added coordinates

  • cell_number_unassigned (int) – unassigned cell value

Returns:

t – trajectories with added cell time

Return type:

pandas.Dataframe

tobac.tracking.build_distance_function(min_h1, max_h1, min_h2, max_h2, PBC_flag)

Function to build a partial `calc_distance_coords_pbc` function suitable for use with trackpy

Parameters:
  • min_h1 (int) – Minimum point in hdim_1

  • max_h1 (int) – Maximum point in hdim_1

  • min_h2 (int) – Minimum point in hdim_2

  • max_h2 (int) – Maximum point in hdim_2

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

Returns:

A version of calc_distance_coords_pbc suitable to be called by just f(coords_1, coords_2)

Return type:

function object

tobac.tracking.fill_gaps(t, order=1, extrapolate=0, frame_max=None, hdim_1_max=None, hdim_2_max=None)

Add cell time as time since the initiation of each cell.

Parameters:
  • t (pandas.DataFrame) – Trajectories from trackpy.

  • order (int, optional) – Order of polynomial used to extrapolate trajectory into gaps and beyond start and end point. Default is 1.

  • extrapolate (int, optional) – Number or timesteps to extrapolate trajectories. Default is 0.

  • frame_max (int, optional) – Size of input data along time axis. Default is None.

  • hdim_1_max (int, optional) – Size of input data along first and second horizontal axis. Default is None.

  • hdim2_max (int, optional) – Size of input data along first and second horizontal axis. Default is None.

Returns:

t – Trajectories from trackpy with with filled gaps and potentially extrapolated.

Return type:

pandas.DataFrame

tobac.tracking.linking_trackpy(features, field_in, dt, dxy, dz=None, v_max=None, d_max=None, d_min=None, subnetwork_size=None, memory=0, stubs=1, time_cell_min=None, order=1, extrapolate=0, method_linking='random', adaptive_step=None, adaptive_stop=None, cell_number_start=1, cell_number_unassigned=-1, vertical_coord='auto', min_h1=None, max_h1=None, min_h2=None, max_h2=None, PBC_flag='none')

Perform Linking of features in trajectories.

The linking determines which of the features detected in a specific timestep is most likely identical to an existing feature in the previous timestep. For each existing feature, the movement within a time step is extrapolated based on the velocities in a number previous time steps. The algorithm then breaks the search process down to a few candidate features by restricting the search to a circular search region centered around the predicted position of the feature in the next time step. For newly initialized trajectories, where no velocity from previous time steps is available, the algorithm resorts to the average velocity of the nearest tracked objects. v_max and d_min are given as physical quantities and then converted into pixel-based values used in trackpy. This allows for tracking that is controlled by physically-based parameters that are independent of the temporal and spatial resolution of the input data. The algorithm creates a continuous track for the feature that is the most probable based on the previous cell path.

Parameters:
  • features (pandas.DataFrame) – Detected features to be linked.

  • field_in (None) – Input field. Not currently used; can be set to None.

  • dt (float) – Time resolution of tracked features in seconds.

  • dxy (float) – Horizontal grid spacing of the input data in meters.

  • dz (float) – Constant vertical grid spacing (meters), optional. If not specified and the input is 3D, this function requires that vertical_coord is available in the features input. If you specify a value here, this function assumes that it is the constant z spacing between points, even if `vertical_coord` is specified.

  • d_max (float, optional) – Maximum search range in meters. Only one of d_max, d_min, or v_max can be set. Default is None.

  • d_min (float, optional) – Deprecated. Only one of d_max, d_min, or v_max can be set. Default is None.

  • subnetwork_size (int, optional) – Maximum size of subnetwork for linking. This parameter should be adjusted when using adaptive search. Usually a lower value is desired in that case. For a more in depth explanation have look here If None, 30 is used for regular search and 15 for adaptive search. Default is None.

  • v_max (float, optional) – Speed at which features are allowed to move in meters per second. Only one of d_max, d_min, or v_max can be set. Default is None.

  • memory (int, optional) –

    Number of output timesteps features allowed to vanish for to be still considered tracked. Default is 0. .. warning :: This parameter should be used with caution, as it

    can lead to erroneous trajectory linking, espacially for data with low time resolution.

  • stubs (int, optional) – Minimum number of timesteps of a tracked cell to be reported Default is 1

  • time_cell_min (float, optional) – Minimum length in time that a cell must be tracked for to be considered a valid cell in seconds. Default is None.

  • order (int, optional) – Order of polynomial used to extrapolate trajectory into gaps and ond start and end point. Default is 1.

  • extrapolate (int, optional) – Number or timesteps to extrapolate trajectories. Currently unused. Default is 0.

  • method_linking ({'random', 'predict'}, optional) – Flag choosing method used for trajectory linking. Default is ‘random’, although we typically encourage users to use ‘predict’.

  • adaptive_step (float, optional) – Reduce search range by multiplying it by this factor. Needs to be used in combination with adaptive_stop. Default is None.

  • adaptive_stop (float, optional) – If not None, when encountering an oversize subnet, retry by progressively reducing search_range by multiplying with adaptive_step until the subnet is solvable. If search_range becomes <= adaptive_stop, give up and raise a SubnetOversizeException. Needs to be used in combination with adaptive_step. Default is None.

  • cell_number_start (int, optional) – Cell number for first tracked cell. Default is 1

  • cell_number_unassigned (int) – Number to set the unassigned/non-tracked cells to. Note that if you set this to np.nan, the data type of ‘cell’ will change to float. Default is -1

  • vertical_coord (str) – Name of the vertical coordinate. The vertical coordinate used must be meters. If None, tries to auto-detect. It looks for the coordinate or the dimension name corresponding to the string. To use dz, set this to None.

  • min_h1 (int) – Minimum hdim_1 value, required when PBC_flag is ‘hdim_1’ or ‘both’

  • max_h1 (int) – Maximum hdim_1 value, required when PBC_flag is ‘hdim_1’ or ‘both’

  • min_h2 (int) – Minimum hdim_2 value, required when PBC_flag is ‘hdim_2’ or ‘both’

  • max_h2 (int) – Maximum hdim_2 value, required when PBC_flag is ‘hdim_2’ or ‘both’

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

Returns:

trajectories_final – Dataframe of the linked features, containing the variable ‘cell’, with integers indicating the affiliation of a feature to a specific track, and the variable ‘time_cell’ with the time the cell has already existed.

Return type:

pandas.DataFrame

Raises:

ValueError – If method_linking is neither ‘random’ nor ‘predict’.

tobac.tracking.remap_particle_to_cell_nv(particle_cell_map, input_particle)

Remaps the particles to new cells given an input map and the current particle. Helper function that is designed to be vectorized with np.vectorize

Parameters:
  • particle_cell_map (dict-like) – The dictionary mapping particle number to cell number

  • input_particle (key for particle_cell_map) – The particle number to remap

tobac.utils modules

tobac.utils.bulk_statistics module

Support functions to compute bulk statistics of features, either as a postprocessing step or within feature detection or segmentation.

tobac.utils.bulk_statistics.get_statistics(features: pandas.DataFrame, labels: ~numpy.ndarray[int], *fields: tuple[~numpy.ndarray], statistic: dict[str, ~typing.Callable | tuple[~typing.Callable, dict]] = {'ncells': <function count_nonzero>}, index: list[int] | None = None, default: None | float = None, id_column: str = 'feature', collapse_axis: None | int | list[int] = None) pandas.DataFrame

Get bulk statistics for objects (e.g. features or segmented features) given a labelled mask of the objects and any input field with the same dimensions or that can be broadcast with labels according to numpy-like broadcasting rules.

The statistics are added as a new column to the existing feature dataframe. Users can specify which statistics are computed by providing a dictionary with the column name of the metric and the respective function.

Parameters:
  • features (pd.DataFrame) – Dataframe with features or segmented features (output from feature detection or segmentation), which can be for the specific timestep or for the whole dataset

  • labels (np.ndarray[int]) – Mask with labels of each regions to apply function to (e.g. output of segmentation for a specific timestep)

  • *fields (tuple[np.ndarray]) – Fields to give as arguments to each function call. If the shape does not match that of labels, numpy-style broadcasting will be applied.

  • statistic (dict[str, Callable], optional (default: {'ncells':np.count_nonzero})) – Dictionary with function(s) to apply over each region as values and the name of the respective statistics as keys. Default is to just count the number of cells associated with each feature and write it to the feature dataframe.

  • index (None | list[int], optional (default: None)) – list of indices of regions in labels to apply function to. If None, will default to all integer feature labels in labels.

  • default (None | float, optional (default: None)) – default value to return in a region that has no values.

  • id_column (str, optional (default: "feature")) – Name of the column in feature dataframe that contains IDs that match with the labels in mask. The default is the column “feature”.

  • collapse_axis (None | int | list[int], optional (default: None):) – Index or indices of axes of labels to collapse. This will reduce the dimensionality of labels while allowing labelled features to overlap. This can be used, for example, to calculate the footprint area (2D) of 3D labels

Returns:

features – Updated feature dataframe with bulk statistics for each feature saved in a new column.

Return type:

pd.DataFrame

tobac.utils.bulk_statistics.get_statistics_from_mask(features: pandas.DataFrame, segmentation_mask: xarray.DataArray, *fields: xarray.DataArray, statistic: dict[str, tuple[~typing.Callable]] = {'Mean': <function mean>}, index: None | list[int] = None, default: None | float = None, id_column: str = 'feature', collapse_dim: None | str | list[str] = None) pandas.DataFrame

Derives bulk statistics for each object in the segmentation mask, and returns a features Dataframe with these properties for each feature.

Parameters:
  • features (pd.DataFrame) – Dataframe with segmented features (output from feature detection or segmentation). Timesteps must not be exactly the same as in segmentation mask but all labels in the mask need to be present in the feature dataframe.

  • segmentation_mask (xr.DataArray) – Segmentation mask output

  • *fields (xr.DataArray[np.ndarray]) – Field(s) with input data. If field does not have a time dimension it will be considered time invariant, and the entire field will be passed for each time step in segmentation_mask. If the shape does not match that of labels, numpy-style broadcasting will be applied.

  • statistic (dict[str, Callable], optional (default: {'ncells':np.count_nonzero})) – Dictionary with function(s) to apply over each region as values and the name of the respective statistics as keys. Default is to calculate the mean value of the field over each feature.

  • index (None | list[int], optional (default: None)) – list of indexes of regions in labels to apply function to. If None, will default to all integers between 1 and the maximum value in labels

  • default (None | float, optional (default: None)) – default value to return in a region that has no values

  • id_column (str, optional (default: "feature")) – Name of the column in feature dataframe that contains IDs that match with the labels in mask. The default is the column “feature”.

  • collapse_dim (None | str | list[str], optional (defailt: None)) –

    Dimension names of labels to collapse, allowing, e.g. calulcation of statistics on 2D

    fields for the footprint of 3D objects

    features: pd.DataFrame

    Updated feature dataframe with bulk statistics for each feature saved in a new column

tobac.utils.decorators module

Decorators for use with other tobac functions

tobac.utils.decorators.convert_cube_to_dataarray(cube)

Convert an iris cube to an xarray dataarray, averting error for integer dtype cubes in xarray<v2023.06

Parameters:

cube (iris.cube.Cube) – Iris data cube

Returns:

dataarray – dataarray converted from cube. If the cube’s core data is a masked array and has integer dtype, the returned datarray will have a numpy array with masked values filled with the minimum value for that integer dtype. Otherwise the data will be identical to that produced using xr.DataArray.from_iris

Return type:

xr.DataArray

tobac.utils.decorators.iris_to_xarray(save_iris_info: bool = False)
tobac.utils.decorators.irispandas_to_xarray(save_iris_info: bool = False)
tobac.utils.decorators.njit_if_available(func, **kwargs)

Decorator to wrap a function with numba.njit if available. If numba isn’t available, it just returns the function.

Parameters:
  • func (function object) – Function to wrap with njit

  • kwargs – Keyword arguments to pass to numba njit

tobac.utils.decorators.xarray_to_iris()
tobac.utils.decorators.xarray_to_irispandas()

tobac.utils.general module

General tobac utilities

tobac.utils.general.add_coordinates(t, variable_cube)

Add coordinates from the input cube of the feature detection to the trajectories/features.

Parameters:
  • t (pandas.DataFrame) – Trajectories/features from feature detection or linking step.

  • variable_cube (iris.cube.Cube) – Input data used for the tracking with coordinate information to transfer to the resulting DataFrame. Needs to contain the coordinate ‘time’.

Returns:

t – Trajectories with added coordinates.

Return type:

pandas.DataFrame

tobac.utils.general.add_coordinates_3D(t, variable_cube, vertical_coord=None, vertical_axis=None, assume_coords_fixed_in_time=True)
Function adding coordinates from the tracking cube to the trajectories

for the 3D case: time, longitude&latitude, x&y dimensions, and altitude

Parameters:
  • t (pandas DataFrame) – trajectories/features

  • variable_cube (iris.cube.Cube) – Cube (usually the one you are tracking on) at least conaining the dimension of ‘time’. Typically, ‘longitude’,’latitude’,’x_projection_coordinate’,’y_projection_coordinate’, and ‘altitude’ (if 3D) are the coordinates that we expect, although this function will happily interpolate along any dimension coordinates you give.

  • vertical_coord (str or int) – Name or axis number of the vertical coordinate. If None, tries to auto-detect. If it is a string, it looks for the coordinate or the dimension name corresponding to the string. If it is an int, it assumes that it is the vertical axis. Note that if you only have a 2D or 3D coordinate for altitude, you must pass in an int.

  • vertical_axis (int or None) – Axis number of the vertical.

  • assume_coords_fixed_in_time (bool) – If true, it assumes that the coordinates are fixed in time, even if the coordinates say they vary in time. This is, by default, True, to preserve legacy functionality. If False, it assumes that if a coordinate says it varies in time, it takes the coordinate at its word.

Returns:

trajectories with added coordinates

Return type:

pandas DataFrame

tobac.utils.general.combine_feature_dataframes(feature_df_list, renumber_features=True, old_feature_column_name=None, sort_features_by=None)

Function to combine a list of tobac feature detection dataframes into one combined dataframe that can be used for tracking or segmentation. :param feature_df_list: A list of dataframes (generated, for example, by

running feature detection on multiple nodes).

Parameters:
  • renumber_features (bool, optional (default: True)) – If true, features are renumber with contiguous integers. If false, the old feature numbers will be retained, but an exception will be raised if there are any non-unique feature numbers. If you have non-unique feature numbers and want to preserve them, use the old_feature_column_name to save the old feature numbers to under a different column name.

  • old_feature_column_name (str or None, optional (default: None)) – The column name to preserve old feature numbers in. If None, these old numbers will be deleted. Users may want to enable this feature if they have run segmentation with the separate dataframes and therefore old feature numbers.

  • sort_features_by (list, str or None, optional (default: None)) – The sorting order to pass to Dataframe.sort_values for the merged dataframe. If None, will default to [“frame”, “idx”] if renumber_features is True, or “feature” if renumber_features is False.

Returns:

One combined DataFrame.

Return type:

pd.DataFrame

tobac.utils.general.combine_tobac_feats(list_of_feats, preserve_old_feat_nums=None)

WARNING: This function has been deprecated and will be removed in a future release, please use ‘combine_feature_dataframes’ instead

Function to combine a list of tobac feature detection dataframes into one combined dataframe that can be used for tracking or segmentation. :param list_of_feats: A list of dataframes (generated, for example, by

running feature detection on multiple nodes).

Parameters:

preserve_old_feat_nums (str or None) – The column name to preserve old feature numbers in. If None, these old numbers will be deleted. Users may want to enable this feature if they have run segmentation with the separate dataframes and therefore old feature numbers.

Returns:

One combined DataFrame.

Return type:

pd.DataFrame

tobac.utils.general.get_bounding_box(x, buffer=1)

Finds the bounding box of a ndarray, i.e. the smallest bounding rectangle for nonzero values as explained here: https://stackoverflow.com/questions/31400769/bounding-box-of-numpy-array :param x: Array for which the bounding box is to be determined. :type x: numpy.ndarray :param buffer: Number to set a buffer between the nonzero values and

the edges of the box. Default is 1.

Returns:

bbox – Dimensionwise list of the indices representing the edges of the bounding box.

Return type:

list

tobac.utils.general.get_spacings(field_in, grid_spacing=None, time_spacing=None, average_method='arithmetic')

Determine spatial and temporal grid spacing of the input data.

Parameters:
  • field_in (iris.cube.Cube) – Input field where to get spacings.

  • grid_spacing (float, optional) – Manually sets the grid spacing if specified. Default is None.

  • time_spacing (float, optional) – Manually sets the time spacing if specified. Default is None.

  • average_method (string, optional) –

    Defines how spacings in x- and y-direction are combined.

    • ’arithmetic’ : standard arithmetic mean like (dx+dy)/2

    • ’geometric’ : geometric mean; conserves gridbox area

    Default is ‘arithmetic’.

Returns:

  • dxy (float) – Grid spacing in metres.

  • dt (float) – Time resolution in seconds.

Raises:

ValueError – If input_cube does not contain projection_x_coord and projection_y_coord or keyword argument grid_spacing.

tobac.utils.general.spectral_filtering(dxy, field_in, lambda_min, lambda_max, return_transfer_function=False)

This function creates and applies a 2D transfer function that can be used as a bandpass filter to remove certain wavelengths of an atmospheric input field (e.g. vorticity, IVT, etc).

Parameters:

dxyfloat

Grid spacing in m.

field_in: numpy.array

2D field with input data.

lambda_min: float

Minimum wavelength in m.

lambda_max: float

Maximum wavelength in m.

return_transfer_function: boolean, optional

default: False. If set to True, then the 2D transfer function and the corresponding wavelengths are returned.

Returns:

filtered_field: numpy.array

Spectrally filtered 2D field of data (with same shape as input data).

transfer_function: tuple

Two 2D fields, where the first one corresponds to the wavelengths in the spectral space of the domain and the second one to the 2D transfer function of the bandpass filter. Only returned, if return_transfer_function is True.

tobac.utils.general.standardize_track_dataset(TrackedFeatures, Mask, Projection=None)

CAUTION: this function is experimental. No data structures output are guaranteed to be supported in future versions of tobac. Combine a feature mask with the feature data table into a common dataset. returned by tobac.segmentation with the TrackedFeatures dataset returned by tobac.linking_trackpy. Also rename the variables to be more descriptive and comply with cf-tree. Convert the default cell parent ID to an integer table. Add a cell dimension to reflect Projection is an xarray DataArray TODO: Add metadata attributes :param TrackedFeatures: xarray dataset of tobac Track information, the xarray dataset returned by tobac.tracking.linking_trackpy :type TrackedFeatures: xarray.core.dataset.Dataset :param Mask: xarray dataset of tobac segmentation mask information, the xarray dataset returned

by tobac.segmentation.segmentation

Parameters:

Projection (xarray.core.dataarray.DataArray, default = None) – array.DataArray of the original input dataset (gridded nexrad data for example). If using gridded nexrad data, this can be input as: data[‘ProjectionCoordinateSystem’] An example of the type of information in the dataarray includes the following attributes: latitude_of_projection_origin :29.471900939941406 longitude_of_projection_origin :-95.0787353515625 _CoordinateTransformType :Projection _CoordinateAxes :x y z time _CoordinateAxesTypes :GeoX GeoY Height Time grid_mapping_name :azimuthal_equidistant semi_major_axis :6370997.0 inverse_flattening :298.25 longitude_of_prime_meridian :0.0 false_easting :0.0 false_northing :0.0

Returns:

ds – xarray dataset of merged Track and Segmentation Mask datasets with renamed variables.

Return type:

xarray.core.dataset.Dataset

tobac.utils.general.transform_feature_points(features, new_dataset, latitude_name=None, longitude_name=None, altitude_name=None, max_time_away=None, max_space_away=None, max_vspace_away=None, warn_dropped_features=True)

Function to transform input feature dataset horizontal grid points to a different grid. The typical use case for this function is to transform detected features to perform segmentation on a different grid.

The existing feature dataset must have some latitude/longitude coordinates associated with each feature, and the new_dataset must have latitude/longitude available with the same name. Note that due to xarray/iris incompatibilities, we suggest that the input coordinates match the standard_name from Iris.

Parameters:
  • features (pd.DataFrame) – Input feature dataframe

  • new_dataset (iris.cube.Cube or xarray) – The dataset to transform the

  • latitude_name (str) – The name of the latitude coordinate. If None, tries to auto-detect.

  • longitude_name (str) – The name of the longitude coordinate. If None, tries to auto-detect.

  • altitude_name (str) – The name of the altitude coordinate. If None, tries to auto-detect.

  • max_time_away (datetime.timedelta) – The maximum time delta to associate feature points away from.

  • max_space_away (float) – The maximum horizontal distance (in meters) to transform features to.

  • max_vspace_away (float) – The maximum vertical distance (in meters) to transform features to.

  • warn_dropped_features (bool) – Whether or not to print a warning message if one of the max_* options is going to result in features that are dropped.

Returns:

transformed_features – A new feature dataframe, with the coordinates transformed to the new grid, suitable for use in segmentation

Return type:

pd.DataFrame

tobac.utils.mask module

Provide essential methods for masking

tobac.utils.mask.column_mask_from2D(mask_2D, cube, z_coord='model_level_number')

Turn 2D watershedding mask into a 3D mask of selected columns.

Parameters:
  • cube (iris.cube.Cube) – Data cube.

  • mask_2D (iris.cube.Cube) – 2D cube containing mask (int id for tacked volumes 0 everywhere else).

  • z_coord (str) – Name of the vertical coordinate in the cube.

Returns:

mask_2D – 3D cube containing columns of 2D mask (int id for tracked volumes, 0 everywhere else).

Return type:

iris.cube.Cube

tobac.utils.mask.mask_all_surface(mask, masked=False, z_coord='model_level_number')

Create surface projection of 3d-mask for all features by collapsing one coordinate.

Parameters:
  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes 0 everywhere else).

  • masked (bool, optional) – Bool determining whether to mask the mask for the cell where it is 0. Default is False

  • z_coord (str, optional) – Name of the coordinate to collapse. Default is ‘model_level_number’.

Returns:

mask_i_surface – Collapsed Masked cube for the features with the maximum value along the collapsed coordinate.

Return type:

iris.cube.Cube (2D)

tobac.utils.mask.mask_cell(mask, cell, track, masked=False)

Create mask for specific cell.

Parameters:
  • mask (iris.cube.Cube) – Cube containing mask (int id for tracked volumes 0 everywhere else).

  • cell (int) – Integer id of cell to create masked cube for.

  • track (pandas.DataFrame) – Output of the linking.

  • masked (bool, optional) – Bool determining whether to mask the mask for the cell where it is 0. Default is False.

Returns:

mask_i – Mask for a specific cell.

Return type:

numpy.ndarray

tobac.utils.mask.mask_cell_surface(mask, cell, track, masked=False, z_coord='model_level_number')

Create surface projection of 3d-mask for individual cell by collapsing one coordinate.

Parameters:
  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes, 0 everywhere else).

  • cell (int) – Integer id of cell to create masked cube for.

  • track (pandas.DataFrame) – Output of the linking.

  • masked (bool, optional) – Bool determining whether to mask the mask for the cell where it is 0. Default is False.

  • z_coord (str, optional) – Name of the coordinate to collapse. Default is ‘model_level_number’.

Returns:

mask_i_surface – Collapsed Masked cube for the cell with the maximum value along the collapsed coordinate.

Return type:

iris.cube.Cube

tobac.utils.mask.mask_cube(cube_in, mask)

Mask cube where mask is not zero.

Parameters:
  • cube_in (iris.cube.Cube) – Unmasked data cube.

  • mask (iris.cube.Cube) – Mask to use for masking, >0 where cube is supposed to be masked.

Returns:

variable_cube_out – Masked cube.

Return type:

iris.cube.Cube

tobac.utils.mask.mask_cube_all(variable_cube, mask)

Mask cube (iris.cube) for tracked volume.

Parameters:
  • variable_cube (iris.cube.Cube) – Unmasked data cube.

  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes 0 everywhere else).

Returns:

variable_cube_out – Masked cube for untracked volume.

Return type:

iris.cube.Cube

tobac.utils.mask.mask_cube_cell(variable_cube, mask, cell, track)

Mask cube for tracked volume of an individual cell.

Parameters:
  • variable_cube (iris.cube.Cube) – Unmasked data cube.

  • mask (iris.cube.Cube) – Cube containing mask (int id for tracked volumes, 0 everywhere else).

  • cell (int) – Integer id of cell to create masked cube for.

  • track (pandas.DataFrame) – Output of the linking.

Returns:

variable_cube_out – Masked cube with data for respective cell.

Return type:

iris.cube.Cube

tobac.utils.mask.mask_cube_features(variable_cube, mask, feature_ids)

Mask cube for tracked volume of one or more specific features.

Parameters:
  • variable_cube (iris.cube.Cube) – Unmasked data cube.

  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes, 0 everywhere else).

  • feature_ids (int or list of ints) – Integer ids of features to create masked cube for.

Returns:

variable_cube_out – Masked cube with data for respective features.

Return type:

iris.cube.Cube

tobac.utils.mask.mask_cube_untracked(variable_cube, mask)

Mask cube (iris.cube) for untracked volume.

Parameters:
  • variable_cube (iris.cube.Cube) – Unmasked data cube.

  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes 0 everywhere else).

Returns:

variable_cube_out – Masked cube for untracked volume.

Return type:

iris.cube.Cube

tobac.utils.mask.mask_features(mask, feature_ids, masked=False)

Create mask for specific features.

Parameters:
  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes 0 everywhere else).

  • feature_ids (int or list of ints) – Integer ids of the features to create the masked cube for.

  • masked (bool, optional) – Bool determining whether to mask the mask for the cell where it is 0. Default is False.

Returns:

mask_i – Masked cube for specific features.

Return type:

numpy.ndarray

tobac.utils.mask.mask_features_surface(mask, feature_ids, masked=False, z_coord='model_level_number')

Create surface projection of 3d-mask for specific features by collapsing one coordinate.

Parameters:
  • mask (iris.cube.Cube) – Cube containing mask (int id for tacked volumes 0 everywhere else).

  • feature_ids (int or list of ints) – Integer ids of the features to create the masked cube for.

  • masked (bool, optional) – Bool determining whether to mask the mask for the cell where it is 0. Default is False

  • z_coord (str, optional) – Name of the coordinate to collapse. Default is ‘model_level_number’.

Returns:

mask_i_surface – Collapsed Masked cube for the features with the maximum value along the collapsed coordinate.

Return type:

iris.cube.Cube

tobac.utils.periodic_boundaries module

tobac.utils.periodic_boundaries.adjust_pbc_point(in_dim: int, dim_min: int, dim_max: int) int

Function to adjust a point to the other boundary for PBCs

Parameters:
  • in_dim (int) – Input coordinate to adjust

  • dim_min (int) – Minimum point for the dimension

  • dim_max (int) – Maximum point for the dimension (inclusive)

Returns:

The adjusted point on the opposite boundary

Return type:

int

Raises:

ValueError – If in_dim isn’t on one of the boundary points

tobac.utils.periodic_boundaries.calc_distance_coords_pbc(coords_1, coords_2, min_h1, max_h1, min_h2, max_h2, PBC_flag)

Function to calculate the distance between cartesian coordinate set 1 and coordinate set 2. Note that we assume both coordinates are within their min/max already.

Parameters:
  • coords_1 (2D or 3D array-like) – Set of coordinates passed in from trackpy of either (vdim, hdim_1, hdim_2) coordinates or (hdim_1, hdim_2) coordinates.

  • coords_2 (2D or 3D array-like) – Similar to coords_1, but for the second pair of coordinates

  • min_h1 (int) – Minimum point in hdim_1

  • max_h1 (int) – Maximum point in hdim_1, exclusive. max_h1-min_h1 should be the size.

  • min_h2 (int) – Minimum point in hdim_2

  • max_h2 (int) – Maximum point in hdim_2, exclusive. max_h2-min_h2 should be the size.

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

Returns:

Distance between coords_1 and coords_2 in cartesian space.

Return type:

float

tobac.utils.periodic_boundaries.get_pbc_coordinates(h1_min: int, h1_max: int, h2_min: int, h2_max: int, h1_start_coord: int, h1_end_coord: int, h2_start_coord: int, h2_end_coord: int, PBC_flag: str = 'none') list[tuple[int, int, int, int]]

Function to get the real (i.e., shifted away from periodic boundaries) coordinate boxes of interest given a set of coordinates that may cross periodic boundaries. This computes, for example, multiple bounding boxes to encompass the real coordinates when given periodic coordinates that loop around to the other boundary.

For example, if you pass in [as h1_start_coord, h1_end_coord, h2_start_coord, h2_end_coord] (-3, 5, 2,6) with PBC_flag of ‘both’ or ‘hdim_1’, h1_max of 10, and h1_min of 0 this function will return: [(0,5,2,6), (7,10,2,6)].

If you pass in something outside the bounds of the array, this will truncate your requested box. For example, if you pass in [as h1_start_coord, h1_end_coord, h2_start_coord, h2_end_coord] (-3, 5, 2,6) with PBC_flag of ‘none’ or ‘hdim_2’, this function will return: [(0,5,2,6)], assuming h1_min is 0.

Parameters:
  • h1_min (int) – Minimum array value in hdim_1, typically 0.

  • h1_max (int) – Maximum array value in hdim_1 (exclusive). h1_max - h1_min should be the size in h1.

  • h2_min (int) – Minimum array value in hdim_2, typically 0.

  • h2_max (int) – Maximum array value in hdim_2 (exclusive). h2_max - h2_min should be the size in h2.

  • h1_start_coord (int) – Start coordinate in hdim_1. Can be < h1_min if dealing with PBCs.

  • h1_end_coord (int) – End coordinate in hdim_1. Can be >= h1_max if dealing with PBCs.

  • h2_start_coord (int) – Start coordinate in hdim_2. Can be < h2_min if dealing with PBCs.

  • h2_end_coord (int) – End coordinate in hdim_2. Can be >= h2_max if dealing with PBCs.

  • PBC_flag (str('none', 'hdim_1', 'hdim_2', 'both')) – Sets whether to use periodic boundaries, and if so in which directions. ‘none’ means that we do not have periodic boundaries ‘hdim_1’ means that we are periodic along hdim1 ‘hdim_2’ means that we are periodic along hdim2 ‘both’ means that we are periodic along both horizontal dimensions

Returns:

A list of tuples containing (h1_start, h1_end, h2_start, h2_end) of each of the boxes needed to encompass the coordinates.

Return type:

list of tuples

tobac.utils.periodic_boundaries.transfm_pbc_point(in_dim, dim_min, dim_max)

Function to transform a PBC-feature point for contiguity

Parameters:
  • in_dim (int) – Input coordinate to adjust

  • dim_min (int) – Minimum point for the dimension

  • dim_max (int) – Maximum point for the dimension (inclusive)

Returns:

The transformed point

Return type:

int

tobac.utils.periodic_boundaries.weighted_circmean(values: np.ndarray, weights: np.ndarray, high: float = 6.283185307179586, low: float = 0, axis: int | None = None) np.ndarray

Calculate the weighted circular mean over a set of values. If all the weights are equal, this function is equivalent to scipy.stats.circmean

Parameters:
  • values (array-like) – Array of values to calculate the mean over

  • weights (array-like) – Array of weights corresponding to each value

  • high (float, optional) – Upper bound of the range of values. Defaults to 2*pi

  • low (float, optional) – Lower bound of the range of values. Defaults to 0

  • axis (int | None, optional) – Axis over which to take the average. If None, the average will be taken over the entire array. Defaults to None

Returns:

rescaled_average – The weighted, circular mean over the given values

Return type:

numpy.ndarray

tobac.wrapper module

tobac.wrapper.maketrack(field_in, grid_spacing=None, time_spacing=None, target='maximum', v_max=None, d_max=None, memory=0, stubs=5, order=1, extrapolate=0, method_detection='threshold', position_threshold='center', sigma_threshold=0.5, n_erosion_threshold=0, threshold=1, min_num=0, min_distance=0, method_linking='random', cell_number_start=1, subnetwork_size=None, adaptive_stop=None, adaptive_step=None, return_intermediate=False)
tobac.wrapper.tracking_wrapper(field_in_features, field_in_segmentation, time_spacing=None, grid_spacing=None, parameters_features=None, parameters_tracking=None, parameters_segmentation=None)

Module contents