tobac.utils.bulk_statistics.get_statistics_from_mask#
- tobac.utils.bulk_statistics.get_statistics_from_mask(features, segmentation_mask, *fields, statistic={'Mean': <function mean>}, index=None, default=None, id_column='feature', collapse_dim=None, time_var_name='time', time_padding=None)#
Derives bulk statistics for each object in the segmentation mask, and returns a features Dataframe with these properties for each feature.
- Parameters:
features (pd.DataFrame) – Dataframe with segmented features (output from feature detection or segmentation). Timesteps must not be exactly the same as in segmentation mask but all labels in the mask need to be present in the feature dataframe.
segmentation_mask (xr.DataArray) – Segmentation mask output
*fields (tuple[xr.DataArray]) – Field(s) with input data. If field does not have a time dimension it will be considered time invariant, and the entire field will be passed for each time step in segmentation_mask. If the shape does not match that of labels, numpy-style broadcasting will be applied.
statistic (dict[str, Callable], optional (default: {'ncells':np.count_nonzero})) – Dictionary with function(s) to apply over each region as values and the name of the respective statistics as keys. Default is to calculate the mean value of the field over each feature.
index (None | list[int], optional (default: None)) – list of indexes of regions in labels to apply function to. If None, will default to all integers between 1 and the maximum value in labels
default (None | float, optional (default: None)) – default value to return in a region that has no values
id_column (str, optional (default: "feature")) – Name of the column in feature dataframe that contains IDs that match with the labels in mask. The default is the column “feature”.
collapse_dim (None | str | list[str], optional (default: None)) – Dimension names of labels to collapse, allowing, e.g. calulcation of statistics on 2D fields for the footprint of 3D objects
time_var_name (str, optional (default: "time")) – The name of the time dimension in the input fields and the time column in features, by default “time”
time_padding (timedelta, optional (default: None)) –
- If set, allows for statistics to be associated with a feature input
timestep that is within time_padding off of the feature. Extremely useful when converting between micro- and nanoseconds, as is common when using Pandas dataframes.
- returns:
features – Updated feature dataframe with bulk statistics for each feature saved in a new column
- rtype:
pd.DataFrame
- Return type:
pandas.DataFrame