vak.datapipes.frame_classification.train_datapipe.TrainDatapipe¶
- class vak.datapipes.frame_classification.train_datapipe.TrainDatapipe(dataset_path: str | Path, dataset_df: DataFrame, input_type: str, split: str, sample_ids: ndarray[Any, dtype[_ScalarType_co]], inds_in_sample: ndarray[Any, dtype[_ScalarType_co]], window_size: int, frame_dur: float, stride: int = 1, subset: str | None = None, window_inds: ndarray[Any, dtype[_ScalarType_co]] | None = None, frames_standardizer: FramesStandardizer | None = None)[source]¶
Bases:
object
Dataset used for training neural network models on the frame classification task, where the source data consists of audio signals or spectrograms of varying lengths.
Unlike
vak.datasets.frame_classification.InferDatapipe
, this class does not return entire samples from the source dataset. Instead each paired samples \((x_i, y_i)\) returned by this dataset class consists of a window \(x_i\) of fixed length \(w\) from the underlying dataX
of total length \(T\). Each \(y_i\) is a vector of the same size \(w\), containing an integer class label for each frame in the window \(x_i\). The entire dataset consists of some number of windows \(I\) determined by astride
parameter \(s\), \(I = (T - w) / s\).The underlying data consists of single arrays for both the input to the network
X
and the targets for the network outputY
. These single arraysX
andY
are created by concatenating samples from the source data, e.g., audio files or spectrogram arrays. (This is true forvak.datasets.frame_classification.InferDatapipe
as well.) The dimensions of \(X\) will be (channels, …, frames), i.e., audio will have dimensions (channels, samples) and spectrograms will have dimensions (channels, frequency bins, time bins). The signal \(X\) may be either audio or spectrogram, meaning that a frame will be either a single sample in an audio signal or a single time bin in a spectrogram. The last dimension ofX
will always be the number of total frames in the dataset, either audio samples or spectrogram time bins, andY
will be the same size, containing an integer class label for each frame.- dataset_path¶
Path to directory that represents a frame classification dataset, as created by
vak.prep.prep_frame_classification_dataset()
.- Type:
- subset¶
Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
- Type:
str, optional
- dataset_df¶
A frame classification dataset, represented as a
pandas.DataFrame
. This will be only the rows that correspond to eithersubset
orsplit
from thedataset_df
that was passed in when instantiating the class.- Type:
- frame_paths¶
Paths to npy files containing frames, either spectrograms or audio signals that are input to the model.
- Type:
- frame_labels_paths¶
Paths to npy files containing vectors with a label for each frame. The targets for the outputs of the model.
- Type:
- sample_ids¶
Indexing vector representing which sample from the dataset every frame belongs to.
- Type:
- inds_in_sample¶
Indexing vector representing which index within each sample from the dataset that every frame belongs to.
- Type:
- frames_standardizer¶
Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets,
vak.transforms.defaults.frame_classification.TrainItemTransform
.- Type:
vak.transforms.FramesStandardizer, optional
- frame_dur¶
Duration of a frame, i.e., a single sample in audio or a single timebin in a spectrogram.
- Type:
- stride¶
The size of the stride used to determine which windows are included in the dataset. The default is 1. Used to compute
window_inds
, with the functionvak.datasets.frame_classification.train_datapipe.get_window_inds()
.- Type:
- window_inds¶
A vector of valid window indices for the dataset. If specified, this takes precedence over
stride
.- Type:
numpy.ndarray, optional
- frames_standardizer¶
Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets,
vak.transforms.defaults.frame_classification.TrainItemTransform
.- Type:
vak.transforms.FramesStandardizer, optional
- __init__(dataset_path: str | Path, dataset_df: DataFrame, input_type: str, split: str, sample_ids: ndarray[Any, dtype[_ScalarType_co]], inds_in_sample: ndarray[Any, dtype[_ScalarType_co]], window_size: int, frame_dur: float, stride: int = 1, subset: str | None = None, window_inds: ndarray[Any, dtype[_ScalarType_co]] | None = None, frames_standardizer: FramesStandardizer | None = None)[source]¶
Initialize a new instance of a TrainDatapipe.
- Parameters:
dataset_path (pathlib.Path) – Path to directory that represents a frame classification dataset, as created by
vak.prep.prep_frame_classification_dataset()
.dataset_df (pandas.DataFrame) – A frame classification dataset, represented as a
pandas.DataFrame
.input_type (str) – The type of input to the neural network model. One of {‘audio’, ‘spect’}.
split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.
sample_ids (numpy.ndarray) – Indexing vector representing which sample from the dataset every frame belongs to.
inds_in_sample (numpy.ndarray) – Indexing vector representing which index within each sample from the dataset that every frame belongs to.
window_size (int) – Size of windows to return; number of frames.
frame_dur (float) – Duration of a frame, i.e., a single sample in audio or a single timebin in a spectrogram.
stride (int) – The size of the stride used to determine which windows are included in the dataset. The default is 1. Used to compute
window_inds
, with the functionvak.datasets.frame_classification.train_datapipe.get_window_inds()
.subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
window_inds (numpy.ndarray, optional) – A vector of valid window indices for the dataset. If specified, this takes precedence over
stride
.frames_standardizer (vak.transforms.FramesStandardizer, optional) – Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets,
vak.transforms.defaults.frame_classification.TrainItemTransform
.
Methods
__init__
(dataset_path, dataset_df, ...[, ...])Initialize a new instance of a TrainDatapipe.
from_dataset_path
(dataset_path, window_size)Make a
TrainDatapipe
instance, given the path to a frame classification dataset.Attributes
duration
shape
- classmethod from_dataset_path(dataset_path: str | Path, window_size: int, stride: int = 1, split: str = 'train', subset: str | None = None, frames_standardizer: FramesStandardizer | None = None)[source]¶
Make a
TrainDatapipe
instance, given the path to a frame classification dataset.- Parameters:
dataset_path (pathlib.Path) – Path to directory that represents a frame classification dataset, as created by
vak.prep.prep_frame_classification_dataset()
.window_size (int) – Size of windows to return; number of frames.
stride (int) – The size of the stride used to determine which windows are included in the dataset. The default is 1. Used to compute
window_inds
, with the functionvak.datasets.frame_classification.train_datapipe.get_window_inds()
.split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.
subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
frames_standardizer (vak.transforms.FramesStandardizer, optional) – Transform applied to frames, the input to the neural network model. Optional, default is None. If supplied, will be used with the transform applied to inputs and targets,
vak.transforms.defaults.frame_classification.TrainItemTransform
.
- Returns:
dataset
- Return type:
vak.datasets.frame_classification.TrainDatapipe