vak.datasets.frame_classification.window_dataset.WindowDataset#

class vak.datasets.frame_classification.window_dataset.WindowDataset(dataset_path: str | Path, dataset_df: DataFrame, input_type: str, split: str, sample_ids: ndarray[Any, dtype[_ScalarType_co]], inds_in_sample: ndarray[Any, dtype[_ScalarType_co]], window_size: int, frame_dur: float, stride: int = 1, subset: str | None = None, window_inds: ndarray[Any, dtype[_ScalarType_co]] | None = None, transform: Callable | None = None, target_transform: Callable | None = None)[source]#

Bases: object

Dataset used for training neural network models on the frame classification task, where the source data consists of audio signals or spectrograms of varying lengths.

Unlike vak.datasets.frame_classification.FramesDataset, this class does not return entire samples from the source dataset. Instead each paired samples \((x_i, y_i)\) returned by this dataset class consists of a window \(x_i\) of fixed length \(w\) from the underlying data X of total length \(T\). Each \(y_i\) is a vector of the same size \(w\), containing an integer class label for each frame in the window \(x_i\). The entire dataset consists of some number of windows \(I\) determined by a stride parameter \(s\), \(I = (T - w) / s\).

The underlying data consists of single arrays for both the input to the network X and the targets for the network output Y. These single arrays X and Y are created by concatenating samples from the source data, e.g., audio files or spectrogram arrays. (This is true for vak.datasets.frame_classification.FramesDataset as well.) The dimensions of \(X\) will be (channels, …, frames), i.e., audio will have dimensions (channels, samples) and spectrograms will have dimensions (channels, frequency bins, time bins). The signal \(X\) may be either audio or spectrogram, meaning that a frame will be either a single sample in an audio signal or a single time bin in a spectrogram. The last dimension of X will always be the number of total frames in the dataset, either audio samples or spectrogram time bins, and Y will be the same size, containing an integer class label for each frame.

dataset_path#

Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().

Type:

pathlib.Path

split#

The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.

Type:

str

subset#

Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

Type:

str, optional

dataset_df#

A frame classification dataset, represented as a pandas.DataFrame. This will be only the rows that correspond to either subset or split from the dataset_df that was passed in when instantiating the class.

Type:

pandas.DataFrame

input_type#

The type of input to the neural network model. One of {‘audio’, ‘spect’}.

Type:

str

frame_paths#

Paths to npy files containing frames, either spectrograms or audio signals that are input to the model.

Type:

numpy.ndarray

frame_labels_paths#

Paths to npy files containing vectors with a label for each frame. The targets for the outputs of the model.

Type:

numpy.ndarray

sample_ids#

Indexing vector representing which sample from the dataset every frame belongs to.

Type:

numpy.ndarray

inds_in_sample#

Indexing vector representing which index within each sample from the dataset that every frame belongs to.

Type:

numpy.ndarray

window_size#

Size of windows to return; number of frames.

Type:

int

frame_dur#

Duration of a frame, i.e., a single sample in audio or a single timebin in a spectrogram.

Type:

float

stride#

The size of the stride used to determine which windows are included in the dataset. The default is 1. Used to compute window_inds, with the function vak.datasets.frame_classification.window_dataset.get_window_inds().

Type:

int

window_inds#

A vector of valid window indices for the dataset. If specified, this takes precedence over stride.

Type:

numpy.ndarray, optional

transform#
The transform applied to the frames,

the input to the neural network \(x\).

Type:

callable

target_transform#

The transform applied to the target for the output of the neural network \(y\).

Type:

callable

__init__(dataset_path: str | Path, dataset_df: DataFrame, input_type: str, split: str, sample_ids: ndarray[Any, dtype[_ScalarType_co]], inds_in_sample: ndarray[Any, dtype[_ScalarType_co]], window_size: int, frame_dur: float, stride: int = 1, subset: str | None = None, window_inds: ndarray[Any, dtype[_ScalarType_co]] | None = None, transform: Callable | None = None, target_transform: Callable | None = None)[source]#

Initialize a new instance of a WindowDataset.

Parameters:
  • dataset_path (pathlib.Path) – Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().

  • dataset_df (pandas.DataFrame) – A frame classification dataset, represented as a pandas.DataFrame.

  • input_type (str) – The type of input to the neural network model. One of {‘audio’, ‘spect’}.

  • split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.

  • sample_ids (numpy.ndarray) – Indexing vector representing which sample from the dataset every frame belongs to.

  • inds_in_sample (numpy.ndarray) – Indexing vector representing which index within each sample from the dataset that every frame belongs to.

  • window_size (int) – Size of windows to return; number of frames.

  • frame_dur (float) – Duration of a frame, i.e., a single sample in audio or a single timebin in a spectrogram.

  • stride (int) – The size of the stride used to determine which windows are included in the dataset. The default is 1. Used to compute window_inds, with the function vak.datasets.frame_classification.window_dataset.get_window_inds().

  • subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

  • window_inds (numpy.ndarray, optional) – A vector of valid window indices for the dataset. If specified, this takes precedence over stride.

  • transform (callable) – The transform applied to the input to the neural network \(x\).

  • target_transform (callable) – The transform applied to the target for the output of the neural network \(y\).

Methods

__init__(dataset_path, dataset_df, ...[, ...])

Initialize a new instance of a WindowDataset.

from_dataset_path(dataset_path, window_size)

Make a WindowDataset instance, given the path to a frame classification dataset.

Attributes

duration

shape

classmethod from_dataset_path(dataset_path: str | Path, window_size: int, stride: int = 1, split: str = 'train', subset: str | None = None, transform: Callable | None = None, target_transform: Callable | None = None)[source]#

Make a WindowDataset instance, given the path to a frame classification dataset.

Parameters:
  • dataset_path (pathlib.Path) – Path to directory that represents a frame classification dataset, as created by vak.prep.prep_frame_classification_dataset().

  • window_size (int) – Size of windows to return; number of frames.

  • stride (int) – The size of the stride used to determine which windows are included in the dataset. The default is 1. Used to compute window_inds, with the function vak.datasets.frame_classification.window_dataset.get_window_inds().

  • split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.

  • subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

  • transform (callable) – The transform applied to the input to the neural network \(x\).

  • target_transform (callable) – The transform applied to the target for the output of the neural network \(y\).

Returns:

dataset

Return type:

vak.datasets.frame_classification.WindowDataset