tobac.utils.bulk_statistics#
Description
Support functions to compute bulk statistics of features, either as a postprocessing step or within feature detection or segmentation.
- tobac.utils.bulk_statistics.get_statistics(features, labels, *fields, statistic={'ncells': <function count_nonzero>}, index=None, default=None, id_column='feature', collapse_axis=None)#
Get bulk statistics for objects (e.g. features or segmented features) given a labelled mask of the objects and any input field with the same dimensions or that can be broadcast with labels according to numpy-like broadcasting rules.
The statistics are added as a new column to the existing feature dataframe. Users can specify which statistics are computed by providing a dictionary with the column name of the metric and the respective function.
- Parameters:
features (pd.DataFrame) – Dataframe with features or segmented features (output from feature detection or segmentation), which can be for the specific timestep or for the whole dataset
labels (np.ndarray[int]) – Mask with labels of each regions to apply function to (e.g. output of segmentation for a specific timestep)
*fields (tuple[xr.DataArray]) – Fields to give as arguments to each function call. If the shape does not match that of labels, numpy-style broadcasting will be applied.
statistic (dict[str, Callable], optional (default: {'ncells':np.count_nonzero})) – Dictionary with function(s) to apply over each region as values and the name of the respective statistics as keys. Default is to just count the number of cells associated with each feature and write it to the feature dataframe.
index (None | list[int], optional (default: None)) – list of indices of regions in labels to apply function to. If None, will default to all integer feature labels in labels.
default (None | float, optional (default: None)) – default value to return in a region that has no values.
id_column (str, optional (default: "feature")) – Name of the column in feature dataframe that contains IDs that match with the labels in mask. The default is the column “feature”.
collapse_axis (None | int | list[int], optional (default: None):) – Index or indices of axes of labels to collapse. This will reduce the dimensionality of labels while allowing labelled features to overlap. This can be used, for example, to calculate the footprint area (2D) of 3D labels
- Returns:
features – Updated feature dataframe with bulk statistics for each feature saved in a new column.
- Return type:
pd.DataFrame
- tobac.utils.bulk_statistics.get_statistics_from_mask(features, segmentation_mask, *fields, statistic={'Mean': <function mean>}, index=None, default=None, id_column='feature', collapse_dim=None, time_var_name='time', time_padding=None)#
Derives bulk statistics for each object in the segmentation mask, and returns a features Dataframe with these properties for each feature.
- Parameters:
features (pd.DataFrame) – Dataframe with segmented features (output from feature detection or segmentation). Timesteps must not be exactly the same as in segmentation mask but all labels in the mask need to be present in the feature dataframe.
segmentation_mask (xr.DataArray) – Segmentation mask output
*fields (tuple[xr.DataArray]) – Field(s) with input data. If field does not have a time dimension it will be considered time invariant, and the entire field will be passed for each time step in segmentation_mask. If the shape does not match that of labels, numpy-style broadcasting will be applied.
statistic (dict[str, Callable], optional (default: {'ncells':np.count_nonzero})) – Dictionary with function(s) to apply over each region as values and the name of the respective statistics as keys. Default is to calculate the mean value of the field over each feature.
index (None | list[int], optional (default: None)) – list of indexes of regions in labels to apply function to. If None, will default to all integers between 1 and the maximum value in labels
default (None | float, optional (default: None)) – default value to return in a region that has no values
id_column (str, optional (default: "feature")) – Name of the column in feature dataframe that contains IDs that match with the labels in mask. The default is the column “feature”.
collapse_dim (None | str | list[str], optional (default: None)) – Dimension names of labels to collapse, allowing, e.g. calulcation of statistics on 2D fields for the footprint of 3D objects
time_var_name (str, optional (default: "time")) – The name of the time dimension in the input fields and the time column in features, by default “time”
time_padding (timedelta, optional (default: None)) –
- If set, allows for statistics to be associated with a feature input
timestep that is within time_padding off of the feature. Extremely useful when converting between micro- and nanoseconds, as is common when using Pandas dataframes.
- returns:
features – Updated feature dataframe with bulk statistics for each feature saved in a new column
- rtype:
pd.DataFrame
- Return type:
pandas.DataFrame
Classes
partial(func, *args, **keywords) - new function with partial application of the given arguments and keywords. |
|
Difference between two datetime values. |
Functions
|
Generator that iterates over time through a paired field dataarray and a features dataframe. |
|
Get bulk statistics for objects (e.g. features or segmented features) given a labelled mask of the objects and any input field with the same dimensions or that can be broadcast with labels according to numpy-like broadcasting rules. |
|
Derives bulk statistics for each object in the segmentation mask, and returns a features Dataframe with these properties for each feature. |