NocMigR package: (Deprecated!: See NocMigR2)
This package is in a very preliminary state and provides some
workflows for processing large sound files (e.g., NocMig
, NFC
,
AudioMoth
), with a main emphasis on automatising the detection of
events (i.e., extracting calls with time-stamps) that can be easily
reviewed in Audacity. Note: On the
occasion of some recent changes to the data privacy policy and ownership
of Audacity I highly suggest to stick to version 3.0.2!
All major computation steps are carried out by sophisticated libraries called in the background. Including:
-
R packages
-
python packages
To install the package, use …
devtools::install_github("mottensmann/NocMigR")
Load the package once installed …
library(NocMigR)
The package contains an example file captured using an AudioMoth recorder. To reduce file size, a segment of five minutes was resampled at 44.1 kHz and saved as 128 kbps mp3 file. In addition to a lot of noise there is short segment of interest (scale call of a Eurasian Pygmy Owl Glaucidium passerinum).
## get path to test_audio.mp3
path <- system.file("extdata", "20211220_064253.mp3", package = "NocMigR")
## create temp folder
dir.create("example")
#> Warning in dir.create("example"): 'example' already exists
## copy to test_folder
file.copy(path, "example")
## convert to wav
bioacoustics::mp3_to_wav("example/20211220_064253.mp3", delete = T)
file.rename(from = "example/20211220_064253.wav", to = "example/20211220_064253.WAV")
Plot spectrogram to see there is a lot of noise and a few spikes reflecting actual signals …
## read audio
audio <- tuneR::readWave("example/20211220_064253.WAV")
## plot spectrum
bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))
Naming files using a string that combines the recording date and
starting time (YYYYMMDD_HHMMSS
) is convenient for archiving and
analysing audio files (e.g, default of
AudioMoth). Some (most?) of the
popular field recorders (e.g., Olympus LS, Tascam DR or Sony PCM) use
different, rather uninformative naming schemes (date and number at
best), but the relevant information to construct a proper date_time
string is embedded in the meta data of the recording (accessible using
file.info()
, but requires correct settings of the internal clock!).
For instance, long recording sessions using an Olympus LS-3 will create
multiple files, all of which share the same creation and modification
times (with respect to the first recording). By contrast, the Sony
PCM-D100 saved files individually (i.e., all have unique ctimes and
mtimes). Presets to rename files are available for both types
described here.
## Simulate = T allows to see what would happen without altering files
rename_recording(path = "example",
format = "WAV",
recorder = "Olympus LS-3",
simulate = T)
#> old.name seconds
#> example/20211220_064253.WAV 20211220_064253.WAV 300.0686
#> example/20211220_064253_extracted.WAV 20211220_064253_extracted.WAV 10.6300
#> example/merged_events.WAV merged_events.WAV 10.6300
#> time new.name
#> example/20211220_064253.WAV 2023-09-15 21:43:47 20230915_214347.WAV
#> example/20211220_064253_extracted.WAV 2023-09-15 21:48:47 20230915_214847.WAV
#> example/merged_events.WAV 2023-09-15 21:48:58 20230915_214858.WAV
This function allows to split long audio recordings into smaller chunks
for processing with bioacoustics::threshold_detection
. To keep the
time information, files are written with the corresponding starting
time. *The task is performed using a python script queried using
reticulate
## split in segments
split_wave(file = "20211220_064253.WAV", # which file
path = "example", # where to find it
segment = 30, # cut in 30 sec segments
downsample = 32000) # resample at 32000
#>
#> Downsampling of 20211220_064253.WAV to 32000 Hz... done
#> Split ...
## show files
list.files("example/split/")
#> [1] "20211220_064253.WAV" "20211220_064323.WAV" "20211220_064353.WAV"
#> [4] "20211220_064423.WAV" "20211220_064453.WAV" "20211220_064523.WAV"
#> [7] "20211220_064553.WAV" "20211220_064623.WAV" "20211220_064653.WAV"
#> [10] "20211220_064723.WAV" "20211220_064753.WAV"
## delete folder
unlink("example/split", recursive = TRUE)
This functions is a wrapper to bioacoustics::threshold_detection()
aiming at extracting calls based on the signal to noise ratio and some
target-specific assumptions about approximate call frequencies and
durations. Check ?bioacoustics::threshold_detection()
for details.
Note, only some of the parameters that are defined in
bioacoustics::threshold_detection()
are used right know. For long
recordings (i.e, several hours) it makes sense to run on segments as
created before to avoid memory issues. Here we use the demo sound file
as it is
## run detection threshold algorithm
TD <- find_events(wav.file = "example/20211220_064253.WAV",
threshold = 8, # Signal-to-noise ratio in db
min_dur = 20, # min length in ms
max_dur = 300, # max length in ms
LPF = 5000, # low-pass filter at 500 Hz
HPF = 1000) # high-pass filter at 4 kHz
## Review events
head(TD$data$event_data[,c("filename", "starting_time", "duration", "freq_max_amp")])
#> filename starting_time duration freq_max_amp
#> 1 20211220_064253.WAV 00:00:46.576 168.34467 1477.762
#> 2 20211220_064253.WAV 00:00:47.045 190.11338 1646.544
#> 3 20211220_064253.WAV 00:00:47.887 116.82540 1790.127
#> 4 20211220_064253.WAV 00:00:48.277 150.92971 1827.046
#> 5 20211220_064253.WAV 00:00:48.774 91.42857 1964.311
#> 6 20211220_064253.WAV 00:00:49.332 21.04308 2264.046
## display spectrogram based on approximate location of first six events
audio <- tuneR::readWave("example/20211220_064253.WAV",
from = 46,
to = 50,
units = "seconds")
bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))
In addition to the output shown above, a file with labels for reviewing
events in Audacity
is created (wrapping seewave::write.audacity()
).
Refines the output of find_events
by first adding a buffer (default 1
second on both sides of the event) and subsequently merging overlapping
selections to make the output more pretty. Additionally, allows to
filter based on expected frequencies (i.e., checks maximum amplitude
frequency is within the frequency band defined by HPF:LPF
)
## extract events based on object TD
df <- extract_events(threshold_detection = TD,
path = "example",
format = "WAV",
LPF = 4000,
HPF = 1000,
buffer = 1)
#>
#> Existing files '_extracted.WAV will be overwritten!
#> 6 selections overlapped
Display refined events …
## display spectrogram based on first six events
audio <- tuneR::readWave("example/20211220_064253.WAV",
from = df$from,
to = df$to,
units = "seconds")
bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))
Takes the output of the previous operation and concatenates audio
signals as well as labels into files called merged.events.wav
and
merged.events.txt
respectively. This option comes handy if there are
many input files in the working directory.
merge_events(path = "example")
#>
#> Existing files merged_events.WAV will be overwritten!
Process all files within a directory and run the steps shown above
batch_process(
path = "example",
format = "WAV",
segment = NULL,
downsample = NULL,
SNR = 8,
target = data.frame(min_dur = 20, # min length in ms
max_dur = 300, # max length in ms
LPF = 5000, # low-pass filter at 500 Hz
HPF = 1000),
rename = FALSE)
#> Start processing: 2023-09-15 22:06:40 [Input audio 5 minutes @ 44100 Hz ]
#> Search for events ...
#> Warning in find_events(wav.file = x, overwrite = TRUE, threshold = SNR, : NAs
#> introduced by coercion
#> Warning in find_events(wav.file = x, overwrite = TRUE, threshold = SNR, : NAs
#> introduced by coercion
#> done
#> Extract events ...
#>
#> Existing files '_extracted.WAV will be overwritten!
#> 8 selections overlapped
#> In total 1 events detected
#> Merge events and write audio example/merged_events.WAV
#>
#> Existing files merged_events.WAV will be overwritten!
#> Finished processing: 2023-09-15 22:06:42
#> Run time: 1.77 seconds
#> filename from to starting_time event
#> 1 20211220_064253.WAV 45.576 47.62258 2021-12-20 06:43:39 46.576
#> 2 20211220_064253.WAV 46.045 48.09204 2021-12-20 06:43:40 47.045
#> 3 20211220_064253.WAV 46.887 49.32528 2021-12-20 06:43:40 47.887
#> 4 20211220_064253.WAV 47.774 49.82277 2021-12-20 06:43:41 48.774
#> 5 20211220_064253.WAV 48.332 50.38133 2021-12-20 06:43:42 49.332
#> 6 20211220_064253.WAV 152.434 156.35420 2021-12-20 06:45:26 153.434
Recording | Sample.rate | Downsampled | Channels | Run.time |
---|---|---|---|---|
60 h | 96000 Hz | 441000 Hz | Mono | 2.02 h |
60 h | 96000 Hz | 441000 Hz | Mono | 1.76 h |
11.91 | 96000 Hz | 441000 Hz | Stereo | 1.39 h |
10.6 h | 96000 Hz | 441000 Hz | Mono | 1.3 h |
2.73 | 96000 Hz | 441000 Hz | Mono | 4.88 min |
Run times all steps, notebook ~ Intel i5-4210M, 2 cores ~ 8 GB RAM
Recording | Sample.rate | Downsampled | Channels | Run.time |
---|---|---|---|---|
7.5 h | 96000 Hz | 441000 Hz | Mono | 14.52 min |
Run times only event detection, notebook ~ Intel i5-4210M, 2 cores ~ 8 GB RAM
Update:
With adequate computational power there is no need to split even larger wave files into segments of one hour. This way, the event detection process is much faster (steps 3:6), usually less than four minutes for an entire NocMig night!
#>
#>
#> | Recording | Sample.rate | Downsampled | Channels | Run.time |
#> |:---------:|:-----------:|:-----------:|:--------:|:---------:|
#> | 114.99 h | 48000 Hz | 441000 Hz | Mono | 26.79 min |
#>
#> Table: 115h AudioMoth recording, notebook ~ AMD RYZEN 7, 16 cores ~ 24 GB RAM
Retrieve weather data via Bright Sky (de Maeyer 2020) and compose a string describing a NocMig session from dusk to dawn for a given location. Note, the comment follows suggestions by HGON (Schütze et al 2022)
## example for Bielefeld
## -----------------------------------------------------------------------------
NocMig_meta(date = Sys.Date() - 2,
lat = 52.032,
lon = 8.517)
#> Teilliste 1: 13.9-14.9.2023, 20:23-06:25, trocken, 12°C, ESE, 2 km/h
#> Teilliste 2: 13.9-14.9.2023, 20:23-06:25, trocken, 9°C, ESE, 3 km/h
Recently I started to play with BirdNET. First trials suggest that only few calls of interest are missed, and the majority is correctly labelled using the BirdNET_GLOBAL_6K_V2.4 model. Currently, it is rather difficult to run BirdNET through RStudio on a windows computer, and hence a few lines of python code are pasted to a Linux (Ubuntu) command line
- Setup list of target species
## Creates a species list by subsetting from the full model
## -----------------------------------------------------------------
BirdNET_species.list(names = c("Glaucidium passerinum", "Bubo bubo"),
scientific = T,
out = "example/species.txt")
#> # A tibble: 2 × 2
#> scientific_name englisch_name
#> <chr> <chr>
#> 1 Bubo bubo Eurasian Eagle-Owl
#> 2 Glaucidium passerinum Eurasian Pygmy-Owl
Run analyze.py using a command line program (e.g. Ubuntu on windows). See details in the documentation of Birdnet
## run BirdNET-Analyzer in a bash shell
## --------------------
python3 analyze.py --i /exampele --o /exampele --slist /example/species.txt --rtype 'audacity' --threads 1 --locale 'de'
The function BirdNET
(see ?BirdNET for details) does the following:
- Reshape audacity labels created by
analyze.py
to include the event time: - Write records to xlsx file (BirdNET.xlsx) as a template to simplify inspection and verification:
df <- BirdNET(path = "example/")
#> Created example//BirdNET.xlsx
df[["Records"]]
#> Taxon T1 T2 Score Verification
#> 1 Sperlingskauz 2021-12-20 06:43:38 2021-12-20 06:43:41 0.516 NA
#> 2 Sperlingskauz 2021-12-20 06:45:29 2021-12-20 06:45:32 0.378 NA
#> 3 Sperlingskauz 2021-12-20 06:45:35 2021-12-20 06:45:38 0.126 NA
#> Correction Quality Comment T0
#> 1 NA NA NA 2021-12-20 06:42:53
#> 2 NA NA NA 2021-12-20 06:42:53
#> 3 NA NA NA 2021-12-20 06:42:53
#> File
#> 1 example/20211220_064253.BirdNET.results.txt
#> 2 example/20211220_064253.BirdNET.results.txt
#> 3 example/20211220_064253.BirdNET.results.txt
## records per species and day
df[["Records.dd"]]
#> # A tibble: 1 × 3
#> # Groups: species [1]
#> species date n
#> <chr> <date> <int>
#> 1 Sperlingskauz 2021-12-20 3
## records per species and hour
df[["Records.hh"]]
#> # A tibble: 1 × 3
#> # Groups: species [1]
#> species hour n
#> <fct> <int> <int>
#> 1 Sperlingskauz 6 3
Extract detections and export them as wav files. For easier access to verify records files are named as ‘Species_Date_Time.WAV’ (see below).
## extract events
BirdNET_extract(path = "example/",
hyperlink = F) ## If T: create hyperlink as excel formula
## show files
list.files("example/extracted/Sperlingskauz/")
#> [1] "Sperlingskauz_20211220_064338.WAV" "Sperlingskauz_20211220_064529.WAV"
#> [3] "Sperlingskauz_20211220_064535.WAV"
## clean-up
unlink("example", recursive = TRUE)