vak.datapipes.frame_classification.infer_datapipe.InferDatapipe

class vak.datapipes.frame_classification.infer_datapipe.InferDatapipe(dataset_path: str | pathlib.Path, dataset_df: pd.DataFrame, input_type: str, split: str, sample_ids: npt.NDArray, inds_in_sample: npt.NDArray, frame_dur: float, window_size: int, frames_standardizer: FramesStandardizer | None = None, frames_padval: float = 0.0, frame_labels_padval: int = -1, return_padding_mask: bool = False, subset: str | None = None)[source]

Bases: object

A datapipe class used for neural network models with the frame classification task, where the source data consists of audio signals or spectrograms of varying lengths.

dataset_path

Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().

Type:

pathlib.Path

split

The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.

Type:

str

subset

Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

Type:

str, optional

dataset_df

A frame classification dataset, represented as a pandas.DataFrame. This will be only the rows that correspond to either subset or split from the dataset_df that was passed in when instantiating the class.

Type:

pandas.DataFrame

frames_paths

Paths to npy files containing frames, either spectrograms or audio signals that are input to the model.

Type:

numpy.ndarray

frame_labels_paths

Paths to npy files containing vectors with a label for each frame. The targets for the outputs of the model.

Type:

numpy.ndarray

input_type

The type of input to the neural network model. One of {‘audio’, ‘spect’}.

Type:

str

sample_ids

Indexing vector representing which sample from the dataset every frame belongs to.

Type:

numpy.ndarray

inds_in_sample

Indexing vector representing which index within each sample from the dataset that every frame belongs to.

Type:

numpy.ndarray

frame_dur

Duration of a frame, i.e., a single sample in audio or a single timebin in a spectrogram.

Type:

float

window_size

Size of windows to return; number of frames.

Type:

int

frames_standardizer

Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets, vak.transforms.defaults.frame_classification.TrainItemTransform.

Type:

vak.transforms.FramesStandardizer, optional

__init__(dataset_path: str | pathlib.Path, dataset_df: pd.DataFrame, input_type: str, split: str, sample_ids: npt.NDArray, inds_in_sample: npt.NDArray, frame_dur: float, window_size: int, frames_standardizer: FramesStandardizer | None = None, frames_padval: float = 0.0, frame_labels_padval: int = -1, return_padding_mask: bool = False, subset: str | None = None)[source]

Initialize a new instance of an InferDatapipe.

Parameters:
  • dataset_path (pathlib.Path) – Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().

  • dataset_df (pandas.DataFrame) – A frame classification dataset, represented as a pandas.DataFrame.

  • input_type (str) – The type of input to the neural network model. One of {‘audio’, ‘spect’}.

  • split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.

  • sample_ids (numpy.ndarray) – Indexing vector representing which sample from the dataset every frame belongs to.

  • inds_in_sample (numpy.ndarray) – Indexing vector representing which index within each sample from the dataset that every frame belongs to.

  • frame_dur (float) – Duration of a frame, i.e., a single sample in audio or a single timebin in a spectrogram.

  • frames_standardizer (vak.transforms.FramesStandardizer, optional) – Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets, vak.transforms.defaults.frame_classification.InferItemTransform.

  • window_size (int) – Size of windows to return; number of frames.

  • frames_padval (float) – Value to pad frames with. Added to end of array, the “right side”. Argument to PadToWindow transform. Default is 0.0.

  • frame_labels_padval (int) – Value to pad frame labels vector with. Added to the end of the array. Argument to PadToWindow transform. Default is -1. Used with ignore_index argument of torch.nn.CrossEntropyLoss.

  • return_padding_mask (bool) – if True, the dictionary returned by ItemTransform classes will include a boolean vector to use for cropping back down to size before padding. padding_mask has size equal to width of padded array, i.e. original size plus padding at the end, and has values of 1 where columns in padded are from the original array, and values of 0 where columns were added for padding.

  • subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

Methods

__init__(dataset_path, dataset_df, ...[, ...])

Initialize a new instance of an InferDatapipe.

from_dataset_path(dataset_path, window_size)

Make a InferDatapipe instance, given the path to a frame classification dataset.

Attributes

duration

shape

classmethod from_dataset_path(dataset_path: str | pathlib.Path, window_size: int, frames_standardizer: FramesStandardizer | None = None, frames_padval: float = 0.0, frame_labels_padval: int = -1, return_padding_mask: bool = False, split: str = 'val', subset: str | None = None)[source]

Make a InferDatapipe instance, given the path to a frame classification dataset.

Parameters:
  • dataset_path (pathlib.Path) – Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().

  • window_size (int) – Size of windows to return; number of frames.

  • frames_standardizer (vak.transforms.FramesStandardizer, optional) – Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets, vak.transforms.defaults.frame_classification.TrainItemTransform.

  • frames_padval (float) – Value to pad frames with. Added to end of array, the “right side”. Argument to PadToWindow transform. Default is 0.0.

  • frame_labels_padval (int) – Value to pad frame labels vector with. Added to the end of the array. Argument to PadToWindow transform. Default is -1. Used with ignore_index argument of torch.nn.CrossEntropyLoss.

  • return_padding_mask (bool) – if True, the dictionary returned by ItemTransform classes will include a boolean vector to use for cropping back down to size before padding. padding_mask has size equal to width of padded array, i.e. original size plus padding at the end, and has values of 1 where columns in padded are from the original array, and values of 0 where columns were added for padding.

  • split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}. Default is “val”.

  • subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

Returns:

infer_datapipe

Return type:

InferDatapipe