tobac.utils.bulk_statistics.get_statistics#

tobac.utils.bulk_statistics.get_statistics(features, labels, *fields, statistic={'ncells': <function count_nonzero>}, index=None, default=None, id_column='feature', collapse_axis=None)#

Get bulk statistics for objects (e.g. features or segmented features) given a labelled mask of the objects and any input field with the same dimensions or that can be broadcast with labels according to numpy-like broadcasting rules.

The statistics are added as a new column to the existing feature dataframe. Users can specify which statistics are computed by providing a dictionary with the column name of the metric and the respective function.

Parameters:
  • features (pd.DataFrame) – Dataframe with features or segmented features (output from feature detection or segmentation), which can be for the specific timestep or for the whole dataset

  • labels (np.ndarray[int]) – Mask with labels of each regions to apply function to (e.g. output of segmentation for a specific timestep)

  • *fields (tuple[xr.DataArray]) – Fields to give as arguments to each function call. If the shape does not match that of labels, numpy-style broadcasting will be applied.

  • statistic (dict[str, Callable], optional (default: {'ncells':np.count_nonzero})) – Dictionary with function(s) to apply over each region as values and the name of the respective statistics as keys. Default is to just count the number of cells associated with each feature and write it to the feature dataframe.

  • index (None | list[int], optional (default: None)) – list of indices of regions in labels to apply function to. If None, will default to all integer feature labels in labels.

  • default (None | float, optional (default: None)) – default value to return in a region that has no values.

  • id_column (str, optional (default: "feature")) – Name of the column in feature dataframe that contains IDs that match with the labels in mask. The default is the column “feature”.

  • collapse_axis (None | int | list[int], optional (default: None):) – Index or indices of axes of labels to collapse. This will reduce the dimensionality of labels while allowing labelled features to overlap. This can be used, for example, to calculate the footprint area (2D) of 3D labels

Returns:

features – Updated feature dataframe with bulk statistics for each feature saved in a new column.

Return type:

pd.DataFrame