SurfPerch
TensorFlow 2
Model Details
SurfPerch is a domain-adapted model for classification of sounds in coral reefs. It provides embeddings which allow the efficient development of classifiers for fish, boat, and other kinds of sounds, as described in Williams et al, 2024. It is adapted from the Google Perch bird vocalization classifier.
Model Format
SavedModel file for TF 2.0.
Training Data
The model was trained on coral reef recordings, supplemented by Xeno-Canto bird recordings (xeno-canto.org) and Freesound. The coral reef recordings are fully described in the SurfPerch paper; we replicate the acknowledgements here:
- The United States Virgin Islands datasets were collected in a collaboration between Sound Ocean Science and local partner Corina Marks at Thriving Islands under a USVI Scientific Research Permit: DFW22021X.
- The Mozambique dataset was collected in a collaboration between Sound Ocean Science and local partners Dr Mario LeBrato and Karen Bowles at the Bazaruto Center for Scientific Studies under a Department of Conservation Permit: 04/GDG/ANAC/MTA/2020
- The Tanzanian dataset was collected in a collaboration between Sound Ocean Science and local partners Dr Mario LeBrato and Karen Bowles at Chumbe at Island Coral Park, Zanzibar, Tanzania under under CHICOP Zanzibar research permits. The Thailand dataset was collected under a citizen science program operated by Black Turtle Dive, Thailand.
- The Florida boats dataset was gathered under a Special Activity License (license number: SAL-21-1798-SRP) granted by the Florida Fish and Wildlife Conservation Commission. Funding was provided by a Donald R Nelson Behaviour Research Award to CW by the American Elasmobranch Society.
- The Kenyan dataset was collected in a collaboration between Mars Global and local partners Angus Roberts and Viola Roberts at the Ocean Trust, with permissions from the Lamu County Department of Fisheries.
- Soundscape data from Indonesia were collected as part of the monitoring program for the Mars Coral Reef Restoration Project, in collaboration with Universitas Hasanuddin. We thank Lily Damayanti, Pippa Mansell, David Smith and the Mars Sustainable Solutions team for support with fieldwork logistics. We also thank the Department of Marine Affairs and Fisheries of the Province of South Sulawesi, the Government Offices of the Kabupaten of Pangkep, Pulau Bontosua and Pulau Badi, and the communities of Pulau Bontosua and Pulau Badi for their support. B.W.’s fieldwork in Indonesia was conducted under an Indonesian national research permit issued by BRIN (number 109A/SIP/IV/FR/3/2023), with T.B.R. as the permit’s Indonesian researcher/counterpart, and associated ethical approval given by BRIN. We thank Prof J. Jompa and Prof R.A. Rappe at Universitas Hasanuddin for logistical assistance with permit and visa applications.
- Remaining dataset were gathered in a collaboration between Conservation Metrics Inc. and the Cornell Lab of Ornithology with funding support from Oceankind and the Cornell Atkinson Center for Sustainability, Cornell Lab of Ornithology.
Model Inputs
The model accepts 5-second audio windows, sampled at 32kHz.
Model Outputs
The model outputs a dictionary containing:
embedding
: An embedding vector with shape[B, 1280]
.frontend
: The PCEN Mel-Spectrogram computed from the audio.reef_label
: Model logits[B, 38]
for coral reef classes.fsd50k_label
: Model logits[B, 200]
for freesound classes.label
: Model logits[B, 10932]
for bird species.genus
: Model logits[B, 2333]
for bird species.family
: Model logits[B, 249]
for bird species.order
: Model logits[B, 41]
for bird order. The meaning of the logits classes are described in the CSV files alongside the model weights.
Model Usage
The model can be used as a standard TF 2.0 SavedModel. We also provide a wrapper class in the Perch GitHub repository: chirp.inference.models.TaxonomyModelTF
.
Fine-Tuning
We are not supporting fine-tuning, but feel free to jury rig something!
License
Copyright 2024 Google, LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Example Use
Usage
While the models may be used as standard TF SavedModels, we suggest using our inference wrappers, provided in the Perch github repository. These will automatically load label class lists, and facilitate restriction to species subsets.
Example using TFHub Lib
import tensorflow_hub as hub
import numpy as np
# Input: 5 seconds of silence as mono 32 kHz waveform samples.
waveform = np.zeros(5 * 32000, dtype=np.float32)
# Run the model, check the output.
outputs = model.infer_tf(waveform[np.newaxis, :])
Example using Perch/Chirp library
from chirp.inference import models
# Input: 5 seconds of silence as mono 32 kHz waveform samples.
waveform = np.zeros(5 * 32000, dtype=np.float32)
model = models.TaxonomyModelTF(SAVED_MODEL_PATH, 5.0, 5.0)
outputs = model.embed(audio)
# do something with outputs.embeddings and outputs.logits['label']
1 comments
Awesome work with EDA & insights. Also applied efficient model.