02-data-prep.Rmd
library(mestrado)
All the audios donwloaded from Xeno-Canto and Wikiaves comes as mp3 files. Also, they can be stereo or mono, in potentially any sample rate and unnormalized. The first step of the data preparation standardize these properties.
Steps to apply to every mp3 file:
mp3_dir <- system.file("mp3_sample", package = "mestrado")
mp3_files <- list.files(mp3_dir, full.names = TRUE)
temp_dir <- tempdir()
wav_files <- tidy_audio_dir(mp3_files, temp_dir)
By default the files are stored in disk in a temporary folder, but it can be set by the user. Be aware because the default behaviour is to overwrite files.
mp3_files[1]
#> [1] "/tmp/Rtmp5PtfF0/temp_libpath2fc46d094fd/mestrado/mp3_sample/Glaucidium-minutissimum-24426.mp3"
tuneR::readMP3(mp3_files[1])
#>
#> Wave Object
#> Number of Samples: 1180800
#> Duration (seconds): 26.78
#> Samplingrate (Hertz): 44100
#> Channels (Mono/Stereo): Stereo
#> PCM (integer format): TRUE
#> Bit (8/16/24/32/64): 16
wav_files[1]
#> [1] "/tmp/RtmpWnrUAt/Glaucidium-minutissimum-24426.wav"
tuneR::readWave(wav_files[1])
#>
#> Wave Object
#> Number of Samples: 428409
#> Duration (seconds): 26.78
#> Samplingrate (Hertz): 16000
#> Channels (Mono/Stereo): Mono
#> PCM (integer format): TRUE
#> Bit (8/16/24/32/64): 16
tuneR::readMP3()
Sometimes tuneR::readMP3()
crashes. Pick defective mp3 files manually was needed.
# Do not run
library(mestrado)
# list of defective mp3 that make `tuneR::readMP3()` crashes
black_list <- c(
"data-raw/mp3_originals//Glaucidium-minutissimum-2693093.mp3",
"data-raw/mp3_originals//Pulsatrix-koeniswaldiana-3343722.mp3",
"data-raw/mp3_originals//Pulsatrix-koeniswaldiana-3108787.mp3",
"data-raw/mp3_originals//Megascops-choliba-3673347.mp3",
"data-raw/mp3_originals//Megascops-choliba-3769718.mp3"
)
mp3_files <- list.files("data-raw/mp3_originals/", pattern = "mp3$", full.names = TRUE)
wavs_ok <- stringr::str_replace(list.files("data-raw/wavs", pattern = "wav$", full.names = TRUE), "wav$", "mp3") %>%
stringr::str_replace("wavs", "mp3_originals/")
# filter out those that were processed already and the black listed ones.
mp3_files <- mp3_files[!mp3_files %in% c(wavs_ok, black_list)]
wav_files <- tidy_audio_dir(mp3_files, dest_dir = "data-raw/wavs")