vak.prep.unit_dataset.unit_dataset.prep_unit_dataset#

vak.prep.unit_dataset.unit_dataset.prep_unit_dataset(audio_format: str, output_dir: str, spect_params: dict, data_dir: list | None = None, annot_format: str | None = None, annot_file: str | Path | None = None, labelset: set | None = None, context_s: float = 0.005) → DataFrame[source]#

Prepare a dataset of units from sequences, e.g., all syllables segmented out of a dataset of birdsong.

Parameters:

audio_format –
output_dir –
spect_params –
data_dir –
annot_format –
annot_file –
labelset –
context_s –

Returns:

unit_df (pandas.DataFrame) – A DataFrame representing all the units in the dataset.
shape (tuple) – A tuple representing the shape of all spectograms in the dataset. The spectrograms of all units are padded so that they are all as wide as the widest unit (i.e, the one with the longest duration).