vak.datapipes.frame_classification.infer_datapipe.InferDatapipe¶

class vak.datapipes.frame_classification.infer_datapipe.InferDatapipe(dataset_path: str | pathlib.Path, dataset_df: pd.DataFrame, input_type: str, split: str, sample_ids: npt.NDArray, inds_in_sample: npt.NDArray, frame_dur: float, window_size: int, frames_standardizer: FramesStandardizer | None = None, frames_padval: float = 0.0, frame_labels_padval: int = -1, return_padding_mask: bool = False, subset: str | None = None)[source]¶

Bases: object

A datapipe class used for neural network models with the frame classification task, where the source data consists of audio signals or spectrograms of varying lengths.

dataset_path¶

Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().

Type:: pathlib.Path

split¶

The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.

Type:: str

subset¶

Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

Type:: str, optional

dataset_df¶

A frame classification dataset, represented as a pandas.DataFrame. This will be only the rows that correspond to either subset or split from the dataset_df that was passed in when instantiating the class.

Type:: pandas.DataFrame

frames_paths¶

Paths to npy files containing frames, either spectrograms or audio signals that are input to the model.

Type:: numpy.ndarray

frame_labels_paths¶

Paths to npy files containing vectors with a label for each frame. The targets for the outputs of the model.

Type:: numpy.ndarray

input_type¶

The type of input to the neural network model. One of {‘audio’, ‘spect’}.

Type:: str

sample_ids¶

Indexing vector representing which sample from the dataset every frame belongs to.

Type:: numpy.ndarray

inds_in_sample¶

Indexing vector representing which index within each sample from the dataset that every frame belongs to.

Type:: numpy.ndarray

frame_dur¶

Duration of a frame, i.e., a single sample in audio or a single timebin in a spectrogram.

Type:: float

window_size¶

Size of windows to return; number of frames.

Type:: int

frames_standardizer¶

Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets, vak.transforms.defaults.frame_classification.TrainItemTransform.

Type:: vak.transforms.FramesStandardizer, optional

__init__(dataset_path: str | pathlib.Path, dataset_df: pd.DataFrame, input_type: str, split: str, sample_ids: npt.NDArray, inds_in_sample: npt.NDArray, frame_dur: float, window_size: int, frames_standardizer: FramesStandardizer | None = None, frames_padval: float = 0.0, frame_labels_padval: int = -1, return_padding_mask: bool = False, subset: str | None = None)[source]¶

Initialize a new instance of an InferDatapipe.

Parameters:

dataset_path (pathlib.Path) – Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().
dataset_df (pandas.DataFrame) – A frame classification dataset, represented as a pandas.DataFrame.
input_type (str) – The type of input to the neural network model. One of {‘audio’, ‘spect’}.
split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.
sample_ids (numpy.ndarray) – Indexing vector representing which sample from the dataset every frame belongs to.
inds_in_sample (numpy.ndarray) – Indexing vector representing which index within each sample from the dataset that every frame belongs to.
frame_dur (float) – Duration of a frame, i.e., a single sample in audio or a single timebin in a spectrogram.
frames_standardizer (vak.transforms.FramesStandardizer, optional) – Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets, vak.transforms.defaults.frame_classification.InferItemTransform.
window_size (int) – Size of windows to return; number of frames.
frames_padval (float) – Value to pad frames with. Added to end of array, the “right side”. Argument to PadToWindow transform. Default is 0.0.
frame_labels_padval (int) – Value to pad frame labels vector with. Added to the end of the array. Argument to PadToWindow transform. Default is -1. Used with ignore_index argument of torch.nn.CrossEntropyLoss.
return_padding_mask (bool) – if True, the dictionary returned by ItemTransform classes will include a boolean vector to use for cropping back down to size before padding. padding_mask has size equal to width of padded array, i.e. original size plus padding at the end, and has values of 1 where columns in padded are from the original array, and values of 0 where columns were added for padding.
subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

Methods

`__init__`(dataset_path, dataset_df, ...[, ...])	Initialize a new instance of an `InferDatapipe`.
`from_dataset_path`(dataset_path, window_size)	Make a `InferDatapipe` instance, given the path to a frame classification dataset.

Attributes

`duration`
`shape`

classmethod from_dataset_path(dataset_path: str | pathlib.Path, window_size: int, frames_standardizer: FramesStandardizer | None = None, frames_padval: float = 0.0, frame_labels_padval: int = -1, return_padding_mask: bool = False, split: str = 'val', subset: str | None = None)[source]¶

Make a InferDatapipe instance, given the path to a frame classification dataset.

Parameters:

dataset_path (pathlib.Path) – Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().
window_size (int) – Size of windows to return; number of frames.
frames_standardizer (vak.transforms.FramesStandardizer, optional) – Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets, vak.transforms.defaults.frame_classification.TrainItemTransform.
frames_padval (float) – Value to pad frames with. Added to end of array, the “right side”. Argument to PadToWindow transform. Default is 0.0.
frame_labels_padval (int) – Value to pad frame labels vector with. Added to the end of the array. Argument to PadToWindow transform. Default is -1. Used with ignore_index argument of torch.nn.CrossEntropyLoss.
return_padding_mask (bool) – if True, the dictionary returned by ItemTransform classes will include a boolean vector to use for cropping back down to size before padding. padding_mask has size equal to width of padded array, i.e. original size plus padding at the end, and has values of 1 where columns in padded are from the original array, and values of 0 where columns were added for padding.
split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}. Default is “val”.
subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

Returns:

infer_datapipe

Return type:

InferDatapipe