vak.transforms.frame_labels.functional.postprocess¶

vak.transforms.frame_labels.functional.postprocess(frame_labels: ndarray[Any, dtype[_ScalarType_co]], timebin_dur: float, background_label: int = 0, min_segment_dur: float | None = None, majority_vote: bool = False, boundary_labels: ndarray[Any, dtype[_ScalarType_co]] | None = None) → ndarray[Any, dtype[_ScalarType_co]][source]¶

Apply post-processing transformations to a vector of frame labels.

Optional post-processing consist of two transforms, that both rely on there being a label that corresponds to the background class. The first removes any segments that are shorter than a specified duration, by converting labels in those segments to the background class label. The second performs a “majority vote” transform within run of labels that is bordered on both sides by the “background” label. I.e., it counts the number of times any label occurs in that segment, and then assigns all bins the most common label.

The function performs those steps in this order (pseudo-code):

if min_segment_dur:
    frame_labels = remove_short_segments(frame_labels, labelmap, min_segment_dur)
if majority_vote:
    frame_labels = majority_vote(frame_labels, labelmap)
return frame_labels

Parameters:

frame_labels (numpy.ndarray) – A vector where each element represents a label for a frame, either a single sample in audio or a single time bin from a spectrogram. Output of a neural network.
timebin_dur (float) – Duration of a time bin in a spectrogram, e.g., as estimated from vector of times using vak.timebins.timebin_dur_from_vec.
background_label (int) – Label that was given to segments that were not labeled in annotation, e.g. silent periods between annotated segments. Default is 0.
min_segment_dur (float) – Minimum duration of segment, in seconds. If specified, then any segment with a duration less than min_segment_dur is removed from frame_labels. Default is None, in which case no segments are removed.
majority_vote (bool) – If True, transform segments containing multiple labels into segments with a single label by taking a “majority vote”, i.e. assign all time bins in the segment the most frequently occurring label in the segment. This transform requires either a background label or a vector of boundary labels. Default is False.
boundary_labels (numpy.ndarray, optional.) – Vector of integers {0, 1}, where 1 indicates a boundary, and 0 indicates no boundary. Output of one head of a frame classification model, that has been trained to classify each frame as either “boundary” or “no boundary”. Optional, default is None. If supplied, this vector is used to find segments before applying post-processing, instead of recovering them from frame_labels using the background_label

Returns:

frame_labels – Vector of frame labels after post-processing is applied.

Return type:

numpy.ndarray