API Reference

Core functionality

birdvoxclassify.core.apply_hierarchical_consistency(formatted_pred_dict, taxonomy, level_threshold_dict=None, detection_threshold=0.5)[source]

Obtain the best predicted candidate class for a prediction at all taxonomic levels, enforcing “top-down” hierarchical consistency. That is, starting from the “coarsest” taxonomic level, if the most probable class is considered “present” (estimated probability greater than a threshold), it is considered the best candidate for that level, and only taxonomic children of this class will be considered when choosing candidates for “finer” taxonomic levels. If the most probable class is not considered “present” (estimated probability below the same threshold), then the “other” class is chosen as the best candidate, with the probability assigned to be the complement of the most probable “consistent” class.

Parameters:
formatted_pred_dict : dict

Formatted dictionary of predictions.

taxonomy : dict or None [default: None]

Taxonomy JSON object used to apply hierarchical consistency. If None, then hierarchical_consistency must be False.

level_threshold_dict : dict or None [default: None]

Optional dictionary of detection thresholds for each taxonomic level.

detection_threshold : float [default: 0.5]

Detection threshold applied uniformly to all classes at all levels. If level_threshold_dict is provided, this is ignored.

Returns:
best_candidates_dict : dict

Formatted dictionary specifying the best candidate for each taxonomic level.

birdvoxclassify.core.batch_generator(filepath_list, batch_size=512)[source]

Returns a generator that, from a list of filepaths, yields batches of PCEN images and the corresponding filenames.

Parameters:
filepath_list : list[str]

(Non-empty) list of filepaths to audio files for which to generate batches of PCEN images and the corresponding filenames

batch_size : int [default: 512]

Size of yielded batches

Yields:
batch : np.ndarray [shape: (batch_size, top_freq_id, n_hops, 1)]

PCEN batch

batch_filepaths : list[str]

List of filepaths corresponding to the clips in the batch

birdvoxclassify.core.compute_pcen(audio, sr, input_format=True)[source]

Computes PCEN (per-channel-energy normalization) for the given audio clip.

Parameters:
audio : np.ndarray [shape: (N,)]

Audio array

sr : int

Sample rate

input_format : bool [default: True]

If True, adds an additional channel dimension (of size 1) and ensures that a fixed number of PCEN frames (corresponding to get_pcen_settings()['n_hops']) is returned. If number of frames is greater, the center frames are returned. If the the number of frames is less, empty frames are padded.

Returns:
pcen : np.ndarray [shape: (top_freq_id, n_hops, 1) or (top_freq_id, num_frames)]

Per-channel energy normalization processed Mel spectrogram. If input_format=True, will be in shape (top_freq_id, n_hops, 1). Otherwise it will be in shape (top_freq_id, num_frames), where num_frames is the number of PCEN frames for the entire audio clip.

birdvoxclassify.core.format_pred(pred_list, taxonomy)[source]

Formats a list of predictions for a single audio clip into a more human-readable JSON object using the given taxonomy object.

The output will be in the following format:

{
  <prediction level> : {
    <taxonomy id> : {
      "probability": <float>,
      "common_name": <str>,
      "scientific_name": <str>,
      "taxonomy_level_names": <str>,
      "taxonomy_level_aliases": <dict of aliases>,
      "child_ids": <list of children IDs>
    },
    ...
  },
  ...
}
Parameters:
pred_list : list[np.ndarray [shape (1, num_labels) or (num_labels,)]

List of predictions at the taxonomical levels predicted by the model for a single example. num_labels may be different for each of the different levels of the taxonomy.

taxonomy : dict

Taxonomy JSON object

Returns:
formatted_pred_dict : dict

Prediction dictionary object

birdvoxclassify.core.format_pred_batch(batch_pred_list, taxonomy)[source]

Formats a list of predictions for a batch of audio clips into a more human-readable JSON object using the given taxonomy object. The output will be in the form of a list of JSON objects in the format returned by format_pred.

Parameters:
batch_pred_list : list[np.ndarray [shape (batch_size, num_labels)] ]

List of predictions at the taxonomical levels predicted by the model for a batch of examples. num_labels may be different for each of the different levels of the taxonomy.

taxonomy : dict

Taxonomy JSON object

Returns:
pred_dict_list : list[dict]

List of JSON dictionary objects

birdvoxclassify.core.get_batch_best_candidates(batch_pred_list=None, batch_formatted_pred_list=None, taxonomy=None, hierarchical_consistency=True)[source]

Obtain the best candidate classes for each prediction in a batch.

Parameters:
batch_pred_list : list or None [default: None]

List of batch predictions. If not provided, batch_formatted_pred_list must be provided.

batch_formatted_pred_list : list or None [default: None]

List of formatted batch predictions. If not provided, batch_pred_list must be provided.

taxonomy : dict or None [default: None]

Taxonomy JSON object used to apply hierarchical consistency. If None, then hierarchical_consistency must be False.

hierarchical_consistency : bool [default: True]

If True, apply hierarchical consistency to predictions.

Returns:
batch_best_candidates_list : list

List of formatted dictionaries specifying the best candidates for each taxonomic level.

birdvoxclassify.core.get_best_candidates(pred_list=None, formatted_pred_dict=None, taxonomy=None, hierarchical_consistency=True)[source]

Obtain the best predicted candidate class for a prediction at all taxonomic levels. The output will be in the following format:

{
  <prediction level> : {
    "probability": <float>,
    "common_name": <str>,
    "scientific_name": <str>,
    "taxonomy_level_names": <str>,
    "taxonomy_level_aliases": <dict of aliases>,
    "child_ids": <list of children IDs>
  },
  ...
}
Parameters:
pred_list : list[np.ndarray [shape (1, num_labels) or (num_labels,)] or None [default: None]

List of predictions at the taxonomical levels predicted by the model for a single example. If provided, taxonomy, must also be provided.

If not provided, formatted_pred_dict must be provided.

formatted_pred_dict : dict or None [default: None]

Formatted dictionary of predictions. If not provided, pred_list must be provided.

taxonomy : dict or None [default: None]

Taxonomy JSON object used to apply hierarchical consistency. If None, then hierarchical_consistency must be False.

hierarchical_consistency : bool [default: True]

If True, apply hierarchical consistency to predictions.

Returns:
best_candidates_dict : dict

Formatted dictionary specifying the best candidate for each taxonomic level.

birdvoxclassify.core.get_model_path(model_name)[source]

Returns path to the bird species classification model of the given name.

Parameters:
model_name : str

Name of classifier model. Should be in format <model id>_<taxonomy version>-<taxonomy md5sum>. v0.3.1 UPDATE: model names with taxonomy md5 checksum 2e7e1bbd434a35b3961e315cfe3832fc or beb9234f0e13a34c7ac41db72e85addd are not available in this version but are restored in v0.3.1 for backwards compatibility. They will no longer be supported starting with v0.4. Please use model names with taxonomy md5 checksums 3c6d869456b2705ea5805b6b7d08f870 and 2f6efd9017669ef5198e48d8ec7dce4c (respectively) instead.

Returns:
model_path : str

Path to classifier model weights. Should be in format <BirdVoxClassify dir>/resources/models/<model id>_<taxonomy version>-<taxonomy md5sum>.h5

birdvoxclassify.core.get_output_path(filepath, suffix, output_dir)[source]

Returns output path to file containing bird species classification predictions for a given audio clip file.

Parameters:
filepath : str

Path to audio file to be processed

suffix : str

String to append to filename (including extension)

output_dir : str or None

Path to directory where file will be saved. If None, will use directory of given filepath.

Returns:
output_path : str

Path to output file

birdvoxclassify.core.get_pcen_settings()[source]

Returns dictionary of Mel spectrogram and PCEN parameters for preparing the input to the bird species classification models.

Returns:
pcen_settings : dict[str, *]

Dictionary of Mel spectrogram and PCEN parameters

birdvoxclassify.core.get_taxonomy_node(ref_id, taxonomy)[source]

Gets node in taxonomy corresponding to the given reference ID (e.g. 1.4.1)

Parameters:
ref_id : str

Taxonomy reference ID

taxonomy : dict

Taxonomy JSON object

Returns:
node : dict[str, *]

Taxonomy node, containing information about the entity corresponding to the given taxonomy reference ID

birdvoxclassify.core.get_taxonomy_path(model_name)[source]

Get the path to the taxonomy corresponding to the model of the given name.

Specifically, with a model name of the format:

<model id>_<taxonomy version>-<taxonomy md5sum>

the path to taxonomy file <BirdVoxClassify dir>/resources/taxonomy/<taxonomy version>.json is returned. The MD5 checksum of this file is compared to <taxonomy md5sum> to ensure that the content of the taxonomy file matches the format of the output that the model is expected to produce.

Parameters:
model_name : str

Name of model. Should be in format <model id>_<taxonomy version>-<taxonomy md5sum>. v0.3.1 UPDATE: model names with taxonomy md5 checksums 2e7e1bbd434a35b3961e315cfe3832fc or beb9234f0e13a34c7ac41db72e85addd are not available in this version but are restored in v0.3.1 for backwards compatibility. They will no longer be supported starting with v0.4. Please use model names with taxonomy md5 checksums 3c6d869456b2705ea5805b6b7d08f870 and 2f6efd9017669ef5198e48d8ec7dce4c (respectively) instead.

Returns:
taxonomy_path : str

Path to taxonomy file, which should be in format <BirdVoxClassify dir>/resources/taxonomy/<taxonomy version>.json

birdvoxclassify.core.load_classifier(model_name)[source]

Loads bird species classification model of the given name.

Parameters:
model_name : str

Name of classifier model. Should be in format <model id>_<taxonomy version>-<taxonomy md5sum>. v0.3.1 UPDATE: model names with taxonomy md5 checksum 2e7e1bbd434a35b3961e315cfe3832fc or beb9234f0e13a34c7ac41db72e85addd are not available in this version but are restored in v0.3.1 for backwards compatibility. They will no longer be supported starting with v0.4. Please use model names with taxonomy md5 checksums 3c6d869456b2705ea5805b6b7d08f870 and 2f6efd9017669ef5198e48d8ec7dce4c (respectively) instead.

Returns:
classifier : keras.models.Model

Bird species classification model

birdvoxclassify.core.load_taxonomy(taxonomy_path)[source]

Loads taxonomy JSON file as an OrderedDict to ensure consistent ordering. Taxonomy files specify output encodings in order from coarse to fine by convention.

Please use this function instead of manually loading the taxonomy!

Parameters:
taxonomy_path : str

Path to taxonomy file.

Returns:
taxonomy : OrderedDict

Taxonomy object

birdvoxclassify.core.predict(pcen, classifier, logger_level=20)[source]

Performs bird species classification on PCEN arrays using the given model.

Parameters:
pcen : np.ndarray [shape (n_mels, n_hops, 1) or (batch_size, n_mels, n_hops, 1)

PCEN array for a single clip or a batch of clips

classifier : keras.models.Model

Bird species classification model object

logger_level : int [default: logging.INFO]

Logger level

Returns:
pred_list : list[np.ndarray [shape (batch_size or 1, num_labels)] ]

List of predictions at the taxonomical levels predicted by the model. num_labels may be different for each of the different levels of the taxonomy. If a single example is given (i.e. there is no batch dimension in the input PCEN), batch_size = 1.

birdvoxclassify.core.process_file(filepaths, output_dir=None, output_summary_path=None, classifier=None, taxonomy=None, batch_size=512, suffix='', select_best_candidates=False, hierarchical_consistency=True, logger_level=20, model_name='birdvoxclassify-taxonet_tv1hierarchical-3c6d869456b2705ea5805b6b7d08f870')[source]

Runs bird species classification model on one or more audio clips.

Parameters:
filepaths : list or str

Filepath or list of filepaths of audio files for which to run prediction

output_dir : str or None [default: None]

Output directory used for outputting per-file prediction JSON files. If None, no per-file prediction JSON files are produced.

output_summary_path : str or None [default: None]

Output path for summary prediction JSON file for all processed audio files. If None, no summary prediction file is produced.

classifier : keras.models.Model or None [default: None]

Bird species classification model object. If None, the model corresponding to model_name is loaded.

taxonomy : dict or None [default: None]

Taxonomy JSON object. If None, the taxonomy corresponding to model_name is loaded.

batch_size : int [default: 512]

Batch size for predictions

suffix : str [default: ""]

String to append to filename

select_best_candidates : bool [default: False]

If True, best candidates will be provided in output dictionary instead of all classes and their probabilities.

hierarchical_consistency : bool [default: True]

If True and if select_best_candidates is True, apply hierarchical consistency when selecting best candidates.

logger_level : int [default: logging.INFO]

Logger level

model_name : str [default birdvoxclassify.DEFAULT_MODEL_NAME]

Name of classifier model. Should be in format <model id>_<taxonomy version>-<taxonomy md5sum>. v0.3.1 UPDATE: model names with taxonomy md5sum 2e7e1bbd434a35b3961e315cfe3832fc or beb9234f0e13a34c7ac41db72e85addd are not available in this version but are restored in v0.3.1 for backwards compatibility. They will no longer be supported starting with v0.4. Please use model names with taxonomy md5 checksums 3c6d869456b2705ea5805b6b7d08f870 and 2f6efd9017669ef5198e48d8ec7dce4c (respectively) instead.

Returns:
output_dict : dict[str, dict]

Output dictionary mapping audio filename to prediction dictionary. If select_best_candidates is False, the dictionary is in the format produced by format_pred. Otherwise, the dictionary is in the format produced by get_best_candidates.