03-data-slicing.Rmd
The observations of the training set will be slices with a given length of the raw wave files. For instance, if an original wave file has duration of 55 seconds, then the slicing with interval of 1 second and no overlap will result in 55 disjoint 1 second long slices.
library(mestrado)
wav_dir <- system.file("wav_sample", package = "mestrado")
temp_dir <- tempdir()
slices_path <- slice_wavs(wav_dir, temp_dir)
slices_path
#> [1] "/tmp/RtmpkLsiuB"
slices <- list.files(slices_path)
slices[4:7]
#> [1] "Megascops-atricapilla-1261496@0@0@.wav"
#> [2] "Megascops-atricapilla-1393458@0@0@.wav"
#> [3] "Megascops-choliba-118111@0@0@.wav"
#> [4] "Megascops-choliba-1891062@0@0@.wav"
The resulting file names was designed to make it “parser friendly”. It goes well with tidyr::separate(sep = "@")
. This data wis useful when matching with the annotations of the presense/absensce of a bird song or any type of event of interest.
library(tidyverse)
slices_metadata <- tibble(
file_name = slices
) %>%
tidyr::separate(file_name, c("species", "start", "end"), sep = "@")
slices_metadata %>% head()
#> # A tibble: 6 x 3
#> species start end
#> <chr> <chr> <chr>
#> 1 file34a2723b4d29.so <NA> <NA>
#> 2 file34a27391be01.txt <NA> <NA>
#> 3 Glaucidium-minutissimum-24426 0 0
#> 4 Megascops-atricapilla-1261496 0 0
#> 5 Megascops-atricapilla-1393458 0 0
#> 6 Megascops-choliba-118111 0 0