vak.transforms.frame_labels.functional.to_segments#

vak.transforms.frame_labels.functional.to_segments(frame_labels: ndarray, labelmap: dict, frame_times: ndarray, n_decimals_trunc: int = 5) → tuple[ndarray, ndarray, ndarray][source]#

Convert a vector of frame labels into segments in the form of onset indices, offset indices, and labels.

Finds where continuous runs of a single label start and stop in timebins, and considers each of these runs a segment.

The function returns vectors of labels and onsets and offsets in units of seconds.

Parameters:

frame_labels (numpy.ndarray) – A vector where each element represents a label for a frame, either a single sample in audio or a single time bin from a spectrogram. Output of a neural network.
labelmap (dict) – That maps labels to consecutive integers. The mapping is inverted to convert back to labels.
frame_times (numpy.ndarray) – Vector of times; the times are either the time of samples in audio, or the bin centers of columns in a spectrogram, returned by function that generated spectrogram. Used to convert onset and offset indices in frame_labels to seconds.
n_decimals_trunc (int) – Number of decimal places to keep when truncating the timebin duration calculated from the vector of times t. Default is 5.

Returns:

labels (numpy.ndarray) – Vector where each element is a label for a segment with its onset and offset indices given by the corresponding element in onset_inds and offset_inds.
onsets_s (numpy.ndarray) – Vector where each element is the onset in seconds a segment. Each onset corresponds to the value at the same index in labels.
offsets_s (numpy.ndarray) – Vector where each element is the offset in seconds of a segment. Each offset corresponds to the value at the same index in labels.