Pyroomacoustics

https://travis-ci.org/LCAV/pyroomacoustics.svg?branch=pypi-release Documentation Status

Summary

Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms; a fast C implementation of the image source model for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers; and finally, reference implementations of popular algorithms for beamforming, direction finding, and adaptive filtering. Together, they form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step.

Room Acoustics Simulation

Consider the following scenario.

Suppose, for example, you wanted to produce a radio crime drama, and it so happens that, according to the scriptwriter, the story line absolutely must culminate in a satanic mass that quickly degenerates into a violent shootout, all taking place right around the altar of the highly reverberant acoustic environment of Oxford’s Christ Church cathedral. To ensure that it sounds authentic, you asked the Dean of Christ Church for permission to record the final scene inside the cathedral, but somehow he fails to be convinced of the artistic merit of your production, and declines to give you permission. But recorded in a conventional studio, the scene sounds flat. So what do you do?

—Schnupp, Nelken, and King, Auditory Neuroscience, 2010

Faced with this difficult situation, pyroomacoustics can save the day by simulating the environment of the Christ Church cathedral!

At the core of the package is a room impulse response (RIR) generator based on the image source model that can handle

  • Convex and non-convex rooms
  • 2D/3D rooms

Both a pure python implementation and a C accelerator are included for maximum speed and compatibility.

The philosophy of the package is to abstract all necessary elements of an experiment using object oriented programming concept. Each of these elements is represented using a class and an experiment can be designed by combining these elements just as one would do in a real experiment.

Let’s imagine we want to simulate a delay-and-sum beamformer that uses a linear array with four microphones in a shoe box shaped room that contains only one source of sound. First, we create a room object, to which we add a microphone array object, and a sound source object. Then, the room object has methods to compute the RIR between source and receiver. The beamformer object then extends the microphone array class and has different methods to compute the weights, for example delay-and-sum weights. See the example below to get an idea of what the code looks like.

The Room class also allows one to process sound samples emitted by sources, effectively simulating the propagation of sound between sources and microphones. At the input of the microphones composing the beamformer, an STFT (short time Fourier transform) engine allows to quickly process the signals through the beamformer and evaluate the output.

Reference Implementations

In addition to its core image source model simulation, pyroomacoustics also contains a number of reference implementations of popular audio processing algorithms for

We use an object-oriented approach to abstract the details of specific algorithms, making them easy to compare. Each algorithm can be tuned through optional parameters. We have tried to pre-set values for the tuning parameters so that a run with the default values will in general produce reasonable results.

Datasets

In an effort to simplify the use of datasets, we provide a few wrappers that allow to quickly load and sort through some popular speech corpora. At the moment we support the following.

For more details, see the doc.

Quick Install

Install the package with pip:

$ pip install pyroomacoustics

The requirements are:

* numpy
* scipy
* matplotlib

Example

Here is a quick example of how to create and visual the response of a beamformer in a room.

import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra

# Create a 4 by 6 metres shoe box room
room = pra.ShoeBox([4,6])

# Add a source somewhere in the room
room.add_source([2.5, 4.5])

# Create a linear array beamformer with 4 microphones
# with angle 0 degrees and inter mic distance 10 cm
R = pra.linear_2D_array([2, 1.5], 4, 0, 0.04)
room.add_microphone_array(pra.Beamformer(R, room.fs))

# Now compute the delay and sum weights for the beamformer
room.mic_array.rake_delay_and_sum_weights(room.sources[0][:1])

# plot the room and resulting beamformer
room.plot(freq=[1000, 2000, 4000, 8000], img_order=0)
plt.show()

A comprehensive set of examples covering most of the functionalities of the package can be found in the examples folder of the github repository.

Authors

  • Robin Scheibler
  • Ivan Dokmanić
  • Sidney Barthe
  • Eric Bezzam
  • Hanjie Pan

How to contribute

If you would like to contribute, please clone the repository and send a pull request.

For more details, see our CONTRIBUTING page.

Academic publications

This package was developed to support academic publications. The package contains implementations for DOA algorithms and acoustic beamformers introduced in the following papers.

  • H. Pan, R. Scheibler, I. Dokmanic, E. Bezzam and M. Vetterli. FRIDA: FRI-based DOA estimation for arbitrary array layout, ICASSP 2017, New Orleans, USA, 2017.
  • I. Dokmanić, R. Scheibler and M. Vetterli. Raking the Cocktail Party, in IEEE Journal of Selected Topics in Signal Processing, vol. 9, num. 5, p. 825 - 836, 2015.
  • R. Scheibler, I. Dokmanić and M. Vetterli. Raking Echoes in the Time Domain, ICASSP 2015, Brisbane, Australia, 2015.

If you use this package in your own research, please cite our paper describing it.

R. Scheibler, E. Bezzam, I. Dokmanić, Pyroomacoustics: A Python package for audio room simulations and array processing algorithms, Proc. IEEE ICASSP, Calgary, CA, 2018.

License

Copyright (c) 2014-2017 EPFL-LCAV

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Table of contents

Contributing

If you want to contribute to pyroomacoustics and make it better, your help is very welcome. Contributing is also a great way to learn more about the package itself.

Ways to contribute

  • File bug reports
  • Improvements to the documentation are always more than welcome. Keeping a good clean documentation is a challenging task and any help is appreciated.
  • Feature requests
  • If you implemented an extra DOA/adaptive filter/beamforming algorithm: that’s awesome! We’d love to add it to the package.
  • Suggestion of improvements to the code base are also welcome.

Coding style

We try to stick to PEP8 as much as possible. Variables, functions, modules and packages should be in lowercase with underscores. Class names in CamelCase.

Documentation

Docstrings should follow the numpydoc style.

We recommend the following steps for generating the documentation:

  • Create a separate environment, e.g. with Anaconda, as such: conda create -n mkdocs27 python=2.7 sphinx numpydoc mock
  • Switch to the environment: source activate mkdocs27
  • Install the theme for ReadTheDocs: pip install sphinx-rtd-theme
  • Navigate to the docs folder and run: ./make_apidoc.sh
  • Build and view the documentation locally with: make html
  • Open in your browser: docs/_build/html/index.html

Unit Tests

As much as possible, for every new function added to the code base, add a short test script in pyroomacoustics/tests. The names of the script and the functions running the test should be prefixed by test_. The tests are started by running nosetests at the root of the package.

How to make a clean pull request

Look for a project’s contribution instructions. If there are any, follow them.

  • Create a personal fork of the project on Github.
  • Clone the fork on your local machine. Your remote repo on Github is called origin.
  • Add the original repository as a remote called upstream.
  • If you created your fork a while ago be sure to pull upstream changes into your local repository.
  • Create a new branch to work on! Branch from develop if it exists, else from master.
  • Implement/fix your feature, comment your code.
  • Follow the code style of the project, including indentation.
  • If the project has tests run them!
  • Write or adapt tests as needed.
  • Add or change the documentation as needed.
  • Squash your commits into a single commit with git’s interactive rebase. Create a new branch if necessary.
  • Push your branch to your fork on Github, the remote origin.
  • From your fork open a pull request in the correct branch. Target the project’s develop branch if there is one, else go for master!
  • If the maintainer requests further changes just push them to your branch. The PR will be updated automatically.
  • Once the pull request is approved and merged you can pull the changes from upstream to your local repo and delete your extra branch(es).

And last but not least: Always write your commit messages in the present tense. Your commit message should describe what the commit, when applied, does to the code – not what you did to the code.

Reference

This guide is based on the nice template by @MarcDiethelm available under MIT License.

Changelog

All notable changes to pyroomacoustics will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased

Added
  • Added noise reduction sub-package denoise with spectral subtraction class and example.
  • Renamed realtime to transform and added deprecation warning.
  • Added a cython function to efficiently compute the fractional delays in the room impulse response from time delays and attenuations
  • notebooks folder.
  • Demo IPython notebook (with WAV files) of several features of the package.
  • Wrapper for Google’s Speech Command Dataset and an example usage script in examples.
  • Lots of new features in the pyroomacoustics.realtime subpackage
    • The STFT class can now be used both for frame-by-frame processing or for bulk processing
    • The functionality will replace the methods pyroomacoustics.stft, pyroomacoustics.istft, pyroomacoustics.overlap_add, etc,
    • The new function pyroomacoustics.realtime.compute_synthesis_window computes the optimal synthesis window given an analysis window and the frame shift
    • Extensive tests for the pyroomacoustics.realtime module
    • Convenience functions pyroomacoustics.realtime.analysis and pyroomacoustics.realtime.synthesis with an interface similar to pyroomacoustics.stft and pyroomacoustics.istft (which are now deprecated and will disappear soon)
    • The ordering of axis in the output from bulk STFT is now (n_frames, n_frequencies, n_channels)
    • Support for Intel’s mkl_fft package
    • axis (along which to perform DFT) and bits parameters for DFT class.
Changed
  • Improved documentation and docstrings
  • Using now the built-in RIR generator in examples/doa_algorithms.py
  • Improved the download/uncompress function for large datasets
  • Dusted the code for plotting on the sphere in pyroomacoustics.doa.grid.GridSphere
Deprecation Notice
  • The methods pyroomacoustics.stft, pyroomacoustics.istft, pyroomacoustics.overlap_add, etc, are now deprecated and will be removed in the near future

0.1.18 - 2018-04-24

Added
  • Added AuxIVA (independent vector analysis) to bss subpackage.
  • Added BSS IVA example
Changed
  • Moved Trinicon blind source separation algorithm to bss subpackage.
Bugfix
  • Correct a bug that causes 1st order sources to be generated for max_order==0 in pure python code

0.1.17 - 2018-03-23

Bugfix
  • Fixed issue #22 on github. Added INCREF before returning Py_None in C extension.

0.1.16 - 2018-03-06

Added
  • Base classes for Dataset and Sample in pyroomacoustics.datasets
  • Methods to filter datasets according to the metadata of samples
  • Deprecation warning for the TimitCorpus interface
Changed
  • Add list of speakers and sentences from CMU ARCTIC
  • CMUArcticDatabase basedir is now the top directory where CMU_ARCTIC database should be saved. Not the directory above as it previously was.
  • Libroom C extension is now a proper package. It can be imported.
  • Libroom C extension now compiles on windows with python>=3.5.

0.1.15 - 2018-02-23

Bugfix
  • Added pyroomacoustics.datasets to list of sub-packages in setup.py

0.1.14 - 2018-02-20

Added
  • Changelog
  • CMU ARCTIC corpus wrapper in pyroomacoustics.datasets
Changed
  • Moved TIMIT corpus wrapper from pyroomacoustics.recognition module to sub-package pyroomacoustics.datasets.timit

Room Simulation

Room

The three main classes are pyroomacoustics.room.Room, pyroomacoustics.soundsource.SoundSource, and pyroomacoustics.beamforming.MicrophoneArray. On a high level, a simulation scenario is created by first defining a room to which a few sound sources and a microphone array are attached. The actual audio is attached to the source as raw audio samples. The image source method (ISM) is then used to find all image sources up to a maximum specified order and room impulse responses (RIR) are generated from their positions. The microphone signals are then created by convolving the audio samples associated to sources with the appropriate RIR. Since the simulation is done on discrete-time signals, a sampling frequency is specified for the room and the sources it contains. Microphones can optionally operate at a different sampling frequency; a rate conversion is done in this case.

Simulating a Shoebox Room

We will first walk through the steps to simulate a shoebox-shaped room in 3D.

Create the room

So-called shoebox rooms are pallelepipedic rooms with 4 or 6 walls (in 2D and 3D, respectiely), all at right angles. They are defined by a single vector that contains the lengths of the walls. They have the advantage of being simple to define and very efficient to simulate. A 9m x 7.5m x 3.5m room is simply defined like this

import pyroomacoustics as pra
room = pra.ShoeBox([9, 7.5, 3.5], fs=16000, absorption=0.35, max_order=17)

The second argument is the sampling frequency at which the RIR will be generated. Note that the default value of fs is 8 kHz. The third argument is the absorption of the walls, namely reflections are multiplied by (1 - absorption) for every wall they hit. The fourth argument is the maximum number of reflections allowed in the ISM.

The relationship between absorption/max_order and reverberation time (the T60 or RT60 in the acoustics literature) is not straightforward. Sabine’s formula can be used to some extent to set these parameters.

Add sources and microphones

Sources are fairly straighforward to create. They take their location as single mandatory argument, and a signal and start time as optional arguments. Here we create a source located at [2.5, 3.73, 1.76] within the room, that will utter the content of the wav file speech.wav starting at 1.3 s into the simulation.

# import a mono wavfile as the source signal
# the sampling frequency should match that of the room
from scipy.io import wavfile
_, audio = wavfile.read('speech.wav')

my_source = pra.SoundSource([2.5, 3.73, 1.76], signal=audio, delay=1.3)

# place the source in the room
room.add_source(my_source)

The locations of the microphones in the array should be provided in a numpy nd-array of size (ndim, nmics), that is each column contains the coordinates of one microphone. This array is used to construct a pyroomacoustics.beamforming.MicrophoneArray object, together with the sampling frequency for the microphone. Note that it can be different from that of the room, in which case resampling will occur. Here, we create an array with two microphones placed at [6.3, 4.87, 1.2] and [6.3, 4.93, 1.2].

# define the location of the array
import numpy as np
R = np.c_[
    [6.3, 4.87, 1.2],  # mic 1
    [6.3, 4.93, 1.2],  # mic 2
    ]

# the fs of the microphones is the same as the room
mic_array = pra.MicrophoneArray(R, room.fs)

# finally place the array in the room
room.add_microphone_array(mic_array)

A number of routines exist to create regular array geometries in 2D.

Create the Room Impulse Response

At this point, the RIRs are simply created by invoking the ISM via pyroomacoustics.room.Room.image_source_model(). This function will generate all the images sources up to the order required and use them to generate the RIRs, which will be stored in the rir attribute of room. The attribute rir is a list of lists so that the outer list is on microphones and the inner list over sources.

room.compute_rir()

# plot the RIR between mic 1 and source 0
import matplotlib.pyplot as plt
plt.plot(room.rir[1][0])
plt.show()
Simulate sound propagation

By calling pyroomacoustics.room.Room.simulate(), a convolution of the signal of each source (if not None) will be performed with the corresponding room impulse response. The output from the convolutions will be summed up at the microphones. The result is stored in the signals attribute of room.mic_array with each row corresponding to one microphone.

room.simulate()

# plot signal at microphone 1
plt.plot(room.mic_array.signals[1,:])

Example

import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra

# Create a 4 by 6 metres shoe box room
room = pra.ShoeBox([4,6])

# Add a source somewhere in the room
room.add_source([2.5, 4.5])

# Create a linear array beamformer with 4 microphones
# with angle 0 degrees and inter mic distance 10 cm
R = pra.linear_2D_array([2, 1.5], 4, 0, 0.04)
room.add_microphone_array(pra.Beamformer(R, room.fs))

# Now compute the delay and sum weights for the beamformer
room.mic_array.rake_delay_and_sum_weights(room.sources[0][:1])

# plot the room and resulting beamformer
room.plot(freq=[1000, 2000, 4000, 8000], img_order=0)
plt.show()
class pyroomacoustics.room.Room(walls, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None)

Bases: object

A Room object has as attributes a collection of pyroomacoustics.wall.Wall objects, a pyroomacoustics.beamforming.MicrophoneArray array, and a list of pyroomacoustics.soundsource.SoundSource. The room can be two dimensional (2D), in which case the walls are simply line segments. A factory method pyroomacoustics.room.Room.from_corners() can be used to create the room from a polygon. In three dimensions (3D), the walls are two dimensional polygons, namely a collection of points lying on a common plane. Creating rooms in 3D is more tedious and for convenience a method pyroomacoustics.room.Room.extrude() is provided to lift a 2D room into 3D space by adding vertical walls and a parallel $B!H(Bceiling$B!I(B (see Figure 4b).

The Room is sub-classed by :py:obj:pyroomacoustics.room.ShoeBox` which creates a rectangular (2D) or parallelepipedic (3D) room. Such rooms benefit from an efficient algorithm for the image source method.

Attribute walls:
 (Wall array) list of walls forming the room
Attribute fs:(int) sampling frequency
Attribute t0:(float) time offset
Attribute max_order:
 (int) the maximum computed order for images
Attribute sigma2_awgn:
 (float) ambient additive white gaussian noise level
Attribute sources:
 (SoundSource array) list of sound sources
Attribute mics:(MicrophoneArray) array of microphones
Attribute normals:
 (numpy.ndarray 2xN or 3xN, N=number of walls) array containing normal vector for each wall, used for calculations
Attribute corners:
 (numpy.ndarray 2xN or 3xN, N=number of walls) array containing a point belonging to each wall, used for calculations
Attribute absorption:
 (numpy.ndarray size N, N=number of walls) array containing the absorption factor for each wall, used for calculations
Attribute dim:(int) dimension of the room (2 or 3 meaning 2D or 3D)
Attribute wallsId:
 (int dictionary) stores the mapping “wall name -> wall id (in the array walls)”
add_microphone_array(micArray)
add_source(position, signal=None, delay=0)
check_visibility_for_all_images(source, p, use_libroom=True)

Checks visibility from a given point for all images of the given source.

This function tests visibility for all images of the source and returns the results in an array.

Parameters:
  • source – (SoundSource) the sound source object (containing all its images)
  • p – (np.array size 2 or 3) coordinates of the point where we check visibility
Returns:

(int array) list of results of visibility for each image -1 : unchecked (only during execution of the function) 0 (False) : not visible 1 (True) : visible

compute_rir()

Compute the room impulse response between every source and microphone

convex_hull()

Finds the walls that are not in the convex hull

direct_snr(x, source=0)

Computes the direct Signal-to-Noise Ratio

extrude(height, v_vec=None, absorption=0.0)

Creates a 3D room by extruding a 2D polygon. The polygon is typically the floor of the room and will have z-coordinate zero. The ceiling

Parameters:
  • height (float) – The extrusion height
  • v_vec (array-like 1D length 3, optionnal) – A unit vector. An orientation for the extrusion direction. The ceiling will be placed as a translation of the floor with respect to this vector (The default is [0,0,1]).
  • absorption (float or array-like) – Absorption coefficients for all the walls. If a scalar, then all the walls will have the same absorption. If an array is given, it should have as many elements as there will be walls, that is the number of vertices of the polygon plus two. The two last elements are for the floor and the ceiling, respectively. (default 1)
first_order_images(source_position)
classmethod from_corners(corners, absorption=0.0, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None)

Creates a 2D room by giving an array of corners.

Parameters:
  • corners – (np.array dim 2xN, N>2) list of corners, must be antiClockwise oriented
  • absorption – (float array or float) list of absorption factor for each wall or single value for all walls
Returns:

(Room) instance of a 2D room

get_bbox()

Returns a bounding box for the room

get_wall_by_name(name)

Returns the instance of the wall by giving its name.

Parameters:name – (string) name of the wall
Returns:(Wall) instance of the wall with this name
image_source_model(use_libroom=True)
is_inside(p, include_borders=True)

Checks if the given point is inside the room.

Parameters:
  • p (array_like, length 2 or 3) – point to be tested
  • include_borders (bool, optional) – set true if a point on the wall must be considered inside the room
Returns:

Return type:

True if the given point is inside the room, False otherwise.

is_obstructed(source, p, imageId=0)

Checks if there is a wall obstructing the line of sight going from a source to a point.

Parameters:
  • source – (SoundSource) the sound source (containing all its images)
  • p – (np.array size 2 or 3) coordinates of the point where we check obstruction
  • imageId – (int) id of the image within the SoundSource object
Returns:

(bool) False (0) : not obstructed True (1) : obstructed

is_visible(source, p, imageId=0)

Returns true if the given sound source (with image source id) is visible from point p.

Parameters:
  • source – (SoundSource) the sound source (containing all its images)
  • p – (np.array size 2 or 3) coordinates of the point where we check visibility
  • imageId – (int) id of the image within the SoundSource object
Returns:

(bool) False (0) : not visible True (1) : visible

make_c_room()

Wrapper around the C libroom

plot(img_order=None, freq=None, figsize=None, no_axis=False, mic_marker_size=10, **kwargs)

Plots the room with its walls, microphones, sources and images

plot_rir(FD=False)
print_wall_sequences(source)
simulate(recompute_rir=False)

Simulates the microphone signal at every microphone in the array

class pyroomacoustics.room.ShoeBox(p, fs=8000, t0=0.0, absorption=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None)

Bases: pyroomacoustics.room.Room

This class extends room for shoebox room in 3D space.

extrude(height)

Overload the extrude method from 3D rooms

Transforms

Module contents

Algorithms

DFT
STFT

Dataset Wrappers

Module contents

The Datasets Sub-package is responsible to deliver wrappers around a few popular audio datasets to make them easier to use.

Two base class pyroomacoustics.datasets.base.Dataset and pyroomacoustics.datasets.base.Sample wrap together the audio samples and their meta data. The general idea is to create a sample object with an attribute containing all metadata. Dataset objects that have a collection of samples can then be created and can be filtered according to the values in the metadata.

Many of the functions with match or filter will take an arbitrary number of keyword arguments. The keys should match some metadata in the samples. Then there are three ways that match occurs between a key/value pair and an attribute sharing the same key.

  1. value == attribute
  2. value is a list and attribute in value == True
  3. value is a callable (a function) and value(attribute) == True
Example 1
# Prepare a few artificial samples
samples = [
    {
        'data' : 0.99,
        'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'one' },
    },
    {
        'data' : 2.1,
        'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'two' },
    },
    {
        'data' : 1.02,
        'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'one' },
    },
    {
        'data' : 2.07,
        'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'two' },
    },
    ]

corpus = Dataset()
for s in samples:
    new_sample = Sample(s['data'], **s['metadata'])
    corpus.add_sample(new_sample)

# Then, it possible to display summary info about the corpus
print(corpus)

# The number of samples in the corpus is given by ``len``
print('Number of samples:', len(corpus))

# And we can access samples with the slice operator
print('Sample #2:')
print(corpus[2])    # (shortcut for `corpus.samples[2]`)

# We can obtain a new corpus with only male subject
corpus_male_only = corpus.filter(sex='male')
print(corpus_male_only)

# Only retain speakers above 40 years old
corpus_older = corpus.filter(age=lambda a : a > 40)
print(corpus_older)
Example 2 (CMU ARCTIC)
# This example involves the CMU ARCTIC corpus available at
# http://www.festvox.org/cmu_arctic/

import matplotlib.pyplot as plt
import pyroomacoustics as pra

# Here, the corpus for speaker bdl is automatically downloaded
# if it is not available already
corpus = pra.datasets.CMUArcticCorpus(download=True, speaker=['bdl'])

# print dataset info and 10 sentences
print(corpus)
corpus.head(n=10)

# let's extract all samples containing the word 'what'
keyword = 'what'
matches = corpus.filter(text=lambda t : keyword in t)
print('The number of sentences containing "{}": {}'.format(keyword, len(matches)))
for s in matches.sentences:
    print('  *', s)

# if the sounddevice package is available, we can play the sample
matches[0].play()

# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()
Example 3 (Google’s Speech Commands Dataset)
# This example involves Google's Speech Commands Dataset available at
# https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

import matplotlib.pyplot as plt
import pyroomacoustics as pra

# The dataset is automatically downloaded if not available and 10 of each word is selected
dataset = pra.datasets.GoogleSpeechCommands(download=True, subset=10, seed=0)

# print dataset info, first 10 entries, and all sounds
print(dataset)
dataset.head(n=10)
print("All sounds in the dataset:")
print(dataset.classes)

# filter by specific word
selected_word = 'yes'
matches = dataset.filter(word=selected_word)
print("Number of '%s' samples : %d" % (selected_word, len(matches)))

# if the sounddevice package is available, we can play the sample
matches[0].play()

# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()

Datasets Available

CMU ARCTIC Corpus
The CMU ARCTIC Dataset

The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US English single speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.

The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databses include US English male (bdl) and female (slt) speakers (both experinced voice talent) as well as other accented speakers.

The 1132 sentence prompt list is available from cmuarctic.data

The distributions include 16KHz waveform and simultaneous EGG signals. Full phoentically labelling was perfromed by the CMU Sphinx using the FestVox based labelling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labelling etc.

License: Permissive, attribution required

Price: Free

URL: http://www.festvox.org/cmu_arctic/

class pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus(basedir=None, download=False, build=True, **kwargs)

Bases: pyroomacoustics.datasets.base.Dataset

This class will load the CMU ARCTIC corpus in a structure amenable to be processed.

basedir

str, option – The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.

info

dict – A dictionary whose keys are the labels of metadata fields attached to the samples. The values are lists of all distinct values the field takes.

sentences

list of CMUArcticSentence – The list of all utterances in the corpus

Parameters:
  • basedir (str, optional) – The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.
  • download (bool, optional) – If the corpus does not exist, download it.
  • speaker (str or list of str, optional) – A list of the CMU ARCTIC speakers labels. If provided, only those speakers are loaded. By default, all speakers are loaded.
  • sex (str or list of str, optional) – Can be ‘female’ or ‘male’
  • lang (str or list of str, optional) – The language, only ‘English’ is available here
  • accent (str of list of str, optional) – The accent of the speaker
build_corpus(**kwargs)

Build the corpus with some filters (sex, lang, accent, sentence_tag, sentence)

filter(**kwargs)

Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute
  2. value is a list and attribute in value == True
  3. value is a callable (a function) and value(attribute) == True
class pyroomacoustics.datasets.cmu_arctic.CMUArcticSentence(path, **kwargs)

Bases: pyroomacoustics.datasets.base.AudioSample

Create the sentence object

Parameters:
  • path (str) – the path to the audio file
  • **kwargs – metadata as a list of keyword arguments
data

array_like – The actual audio signal

fs

int – sampling frequency

plot(**kwargs)

Plot the spectrogram

Google Speech Commands
Google’s Speech Commands Dataset

The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.

More info about the dataset can be found at the link below:

https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

AIY website for contributing recordings:

https://aiyprojects.withgoogle.com/open_speech_recording

Tutorial on creating a word classifier:

https://www.tensorflow.org/versions/master/tutorials/audio_recognition

class pyroomacoustics.datasets.google_speech_commands.GoogleSample(path, **kwargs)

Bases: pyroomacoustics.datasets.base.AudioSample

Create the sound object.

Parameters:
  • path (str) – the path to the audio file
  • **kwargs – metadata as a list of keyword arguments
data

array_like – the actual audio signal

fs

int – sampling frequency

plot(**kwargs)

Plot the spectogram

class pyroomacoustics.datasets.google_speech_commands.GoogleSpeechCommands(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)

Bases: pyroomacoustics.datasets.base.Dataset

This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.

basedir

str – The directory where the Speech Commands Dataset is located/downloaded.

size_by_samples

dict – A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.

subdirs

list – The list of subdirectories in basedir, where each sound type is the name of a subdirectory.

classes

list – The list of all sounds, same as the keys of size_by_samples.

Parameters:
  • basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.
  • download (bool, optional) – If the corpus does not exist, download it.
  • build (bool, optional) – Whether or not to build the dataset. By default, it is.
  • subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.
  • seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default, seed=0.
build_corpus(subset=None, **kwargs)

Build the corpus with some filters (speech or not speech, sound type).

filter(**kwargs)

Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute
  2. value is a list and attribute in value == True
  3. value is a callable (a function) and value(attribute) == True
TIMIT Corpus
The TIMIT Dataset

The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).

The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.

Unfortunately, this is a proprietary dataset. A licensed can be obtained for $125 to $250 depending on your status (academic or otherwise).

Deprecation Warning: The interface of TimitCorpus will change in the near future to match that of pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus

URL: https://catalog.ldc.upenn.edu/ldc93s1

class pyroomacoustics.datasets.timit.Sentence(path)

Create the sentence object

Parameters:path ((string)) – the path to the particular sample
speaker

str – Speaker initials

id

str – a digit to disambiguate identical initials

sex

str – Speaker gender (M or F)

dialect

str – Speaker dialect region number:

  1. New England
  2. Northern
  3. North Midland
  4. South Midland
  5. Southern
  6. New York City
  7. Western
  8. Army Brat (moved around)
fs

int – sampling frequency

samples

array_like (n_samples,) – the audio track

text

str – the text of the sentence

words

list – list of Word objects forming the sentence

phonems

list – List of phonems contained in the sentence. Each element is a dictionnary containing a ‘bnd’ with the limits of the phonem and ‘name’ that is the phonem transcription.

play()

Play the sound sample

plot(L=512, hop=128, zpb=0, phonems=False, **kwargs)
class pyroomacoustics.datasets.timit.TimitCorpus(basedir)

TimitCorpus class

Parameters:
  • basedir ((string)) – The location of the TIMIT database
  • directories ((list of strings)) – The subdirectories containing the data ([‘TEST’,’TRAIN’])
  • sentence_corpus ((dict)) – A dictionnary that contains a list of Sentence objects for each sub-directory
  • word_corpus ((dict)) – A dictionnary that contains a list of Words objects for each sub-directory and word available in the corpus
build_corpus(sentences=None, dialect_region=None, speakers=None, sex=None)

Build the corpus

The TIMIT database structure is encoded in the directory sturcture:

basedir
TEST/TRAIN
Regional accent index (1 to 8)
Speakers (one directory per speaker)
Sentences (one file per sentence)
Parameters:
  • sentences ((list)) – A list containing the sentences to which we want to restrict the corpus Example: sentences=[‘SA1’,’SA2’]
  • dialect_region ((list of int)) – A list to which we restrict the dialect regions Example: dialect_region=[1, 4, 5]
  • speakers ((list)) – A list of speakers acronym to which we want to restrict the corpus Example: speakers=[‘AKS0’]
  • sex ((string)) – Restrict to a single sex: ‘F’ for female, ‘M’ for male
get_word(d, w, index=0)

return instance index of word w from group (test or train) d

class pyroomacoustics.datasets.timit.Word(word, boundaries, data, fs, phonems=None)

A class used for words of the TIMIT corpus

word

str – The spelling of the word

boundaries

list – The limits of the word within the sentence

samples

array_like – A view on the sentence samples containing the word

fs

int – The sampling frequency

phonems

list – A list of phones contained in the word

features

array_like – A feature array (e.g. MFCC coefficients)

Parameters:
  • word (str) – The spelling of the word
  • boundaries (list) – The limits of the word within the sentence
  • data (array_like) – The nd-array that contains all the samples of the sentence
  • fs (int) – The sampling frequency
  • phonems (list, optional) – A list of phones contained in the word
mfcc(frame_length=1024, hop=512)

compute the mel-frequency cepstrum coefficients of the word samples

play()

Play the sound sample

plot()

Tools and Helpers

Base Class

Base class for some data corpus and the samples it contains.

class pyroomacoustics.datasets.base.AudioSample(data, fs, **kwargs)

Bases: pyroomacoustics.datasets.base.Sample

We add some methods specific to display and listen to audio samples. The sampling frequency of the samples is an extra parameter.

For multichannel audio, we assume the same format used by `scipy.io.wavfile <https://docs.scipy.org/doc/scipy-0.14.0/reference/io.html#module-scipy.io.wavfile>`_, that is data is then a 2D array with each column being a channel.

data

array_like – The actual data

fs

int – The sampling frequency of the input signal

meta

pyroomacoustics.datasets.Meta – An object containing the sample metadata. They can be accessed using the dot operator

play(**kwargs)

Play the sound sample. This function uses the sounddevice package for playback.

It takes the same keyword arguments as sounddevice.play.

plot(NFFT=512, noverlap=384, **kwargs)

Plot the spectrogram of the audio sample.

It takes the same keyword arguments as matplotlib.pyplot.specgram.

class pyroomacoustics.datasets.base.Dataset

Bases: object

The base class for a data corpus. It has basically a list of samples and a filter function

samples

list – A list of all the Samples in the dataset

info

dict – This dictionary keeps track of all the fields in the metadata. The keys of the dictionary are the metadata field names. The values are again dictionaries, but with the keys being the possible values taken by the metadata and the associated value, the number of samples with this value in the corpus.

add_sample(sample)

Add a sample to the Dataset and keep track of the metadata.

add_sample_matching(sample, **kwargs)

The sample is added to the corpus only if all the keyword arguments match the metadata of the sample. The match is operated by pyroomacoustics.datasets.Meta.match.

filter(**kwargs)

Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute
  2. value is a list and attribute in value == True
  3. value is a callable (a function) and value(attribute) == True
head(n=5)

Print n samples from the dataset

class pyroomacoustics.datasets.base.Meta(**attr)

Bases: object

A simple class that will take a dictionary as input and put the values in attributes named after the keys. We use it to store metadata for the samples

The parameters can be any set of keyword arguments. They will all be transformed into attribute of the object.

as_dict()

Returns all the attribute/value pairs of the object as a dictionary

match(**kwargs)

The key/value pairs given by the keyword arguments are compared to the attribute/value pairs of the object. If the values all match, True is returned. Otherwise False is returned. If a keyword argument has no attribute counterpart, an error is raised. Attributes that do not have a keyword argument counterpart are ignored.

There are three ways to match an attribute with keyword=value: 1. value == attribute 2. value is a list and attribute in value == True 3. value is a callable (a function) and value(attribute) == True

class pyroomacoustics.datasets.base.Sample(data, **kwargs)

Bases: object

The base class for a dataset sample. The idea is that different corpus will have different attributes for the samples. They should at least have a data attribute.

data

array_like – The actual data

meta

pyroomacoustics.datasets.Meta – An object containing the sample metadata. They can be accessed using the dot operator

Dataset Utilities
pyroomacoustics.datasets.utils.download_uncompress(url, path='.', compression=None)

This functions download and uncompress on the fly a file of type tar, tar.gz, tar.bz2.

Parameters:
  • url (str) – The URL of the file
  • path (str, optional) – The path where to uncompress the file
  • compression (str, optional) – The compression type (one of ‘bz2’, ‘gz’, ‘tar’), infered from url if not provided

Adaptive Filtering

Module contents

Adaptive Filter Algorithms

This sub-package provides implementations of popular adaptive filter algorithms.

RLS
Recursive Least Squares
LMS
Least Mean Squares and Normalized Least Mean Squares

All these classes derive from the base class pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter that offer a generic way of running an adaptive filter.

The above classes are applicable for time domain processing. For frequency domain adaptive filtering, there is the SubbandLMS class. After using a DFT or STFT block, the SubbandLMS class can be used to used to apply LMS or NLMS to each frequency band. A shorter adaptive filter can be used on each band as opposed to the filter required in the time domain version. Roughly, a filter of M taps applied to each band (total of B) corresponds to a time domain filter with N = M x B taps.

How to use the adaptive filter module

First, an adaptive filter object is created and all the relevant options can be set (step size, regularization, etc). Then, the update function is repeatedly called to provide new samples to the algorithm.

# initialize the filter
rls = pyroomacoustics.adaptive.RLS(30)

# run the filter on a stream of samples
for i in range(100):
    rls.update(x[i], d[i])

# the reconstructed filter is available
print('Reconstructed filter:', rls.w)

The SubbandLMS class has the same methods as the time domain approaches. However, the signal must be in the frequency domain. This can be done with the STFT block in the transform sub-package of pyroomacoustics.

# initialize STFT and SubbandLMS blocks
block_size = 128
stft_x = pra.transform.STFT(N=block_size,
    hop=block_size//2,
    analysis_window=pra.hann(block_size))
stft_d = pra.transform.STFT(N=block_size,
    hop=block_size//2,
    analysis_window=pra.hann(block_size))
nlms = pra.adaptive.SubbandLMS(num_taps=6,
    num_bands=block_size//2+1, mu=0.5, nlms=True)

# preparing input and reference signals
...

# apply block-by-block
for n in range(num_blocks):

    # obtain block
    ...

    # to frequency domain
    stft_x.analysis(x_block)
    stft_d.analysis(d_block)
    nlms.update(stft_x.X, stft_d.X)

    # estimating input convolved with unknown response
    y_hat = stft_d.synthesis(np.diag(np.dot(nlms.W.conj().T,stft_x.X)))

    # AEC output
    E = stft_d.X - np.diag(np.dot(nlms.W.conj().T,stft_x.X))
    out = stft_d.synthesis(E)
Other Available Subpackages
pyroomacoustics.adaptive.data_structures
this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods
pyroomacoustics.adaptive.util
a few methods mainly to efficiently manipulate Toeplitz and Hankel matrices
Utilities
pyroomacoustics.adaptive.algorithms
a dictionary containing all the adaptive filter object subclasses availables indexed by keys ['RLS', 'BlockRLS', 'BlockLMS', 'NLMS', 'SubbandLMS']

Algorithms

Adaptive Filter (Base)
class pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter(length)

The dummy base class of an adaptive filter. This class doesn’t compute anything. It merely stores values in a buffer. It is used as a template for all other algorithms.

name()
reset()

Reset the state of the adaptive filter

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters:
  • x_n (float) – the new input sample
  • d_n (float) – the new noisy reference signal
Least Mean Squares
Least Mean Squares Family

Implementations of adaptive filters from the LMS class. These algorithms have a low complexity and reliable behavior with a somewhat slower convergence.

class pyroomacoustics.adaptive.lms.BlockLMS(length, mu=0.01, L=1, nlms=False)

Bases: pyroomacoustics.adaptive.lms.NLMS

Implementation of the least mean squares algorithm (NLMS) in its block form

Parameters:
  • length (int) – the length of the filter
  • mu (float, optional) – the step size (default 0.01)
  • L (int, optional) – block size (default is 1)
  • nlms (bool, optional) – whether or not to normalize as in NLMS (default is False)
reset()

Reset the state of the adaptive filter

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters:
  • x_n (float) – the new input sample
  • d_n (float) – the new noisy reference signal
class pyroomacoustics.adaptive.lms.NLMS(length, mu=0.5)

Bases: pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter

Implementation of the normalized least mean squares algorithm (NLMS)

Parameters:
  • length (int) – the length of the filter
  • mu (float, optional) – the step size (default 0.5)
update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters:
  • x_n (float) – the new input sample
  • d_n (float) – the new noisy reference signal
Recursive Least Squares
Recursive Least Squares Family

Implementations of adaptive filters from the RLS class. These algorithms typically have a higher computational complexity, but a faster convergence.

class pyroomacoustics.adaptive.rls.BlockRLS(length, lmbd=0.999, delta=10, dtype=<Mock id='140450836566800'>, L=None)

Bases: pyroomacoustics.adaptive.rls.RLS

Block implementation of the recursive least-squares (RLS) algorithm. The difference with the vanilla implementation is that chunks of the input signals are processed in batch and some savings can be made there.

Parameters:
  • length (int) – the length of the filter
  • lmbd (float, optional) – the exponential forgetting factor (default 0.999)
  • delta (float, optional) – the regularization term (default 10)
  • dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)
  • L (int, optional) – the block size (default to length)
reset()

Reset the state of the adaptive filter

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters:
  • x_n (float) – the new input sample
  • d_n (float) – the new noisy reference signal
class pyroomacoustics.adaptive.rls.RLS(length, lmbd=0.999, delta=10, dtype=<Mock id='140450836566608'>)

Bases: pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter

Implementation of the exponentially weighted Recursive Least Squares (RLS) adaptive filter algorithm.

Parameters:
  • length (int) – the length of the filter
  • lmbd (float, optional) – the exponential forgetting factor (default 0.999)
  • delta (float, optional) – the regularization term (default 10)
  • dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)
reset()

Reset the state of the adaptive filter

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters:
  • x_n (float) – the new input sample
  • d_n (float) – the new noisy reference signal
Subband LMS
pyroomacoustics.adaptive.subband_lms.Hermitian(X)

Compute and return Hermitian transpose

class pyroomacoustics.adaptive.subband_lms.SubbandLMS(num_taps, num_bands, mu=0.5, nlms=True)

Frequency domain implementation of LMS. Adaptive filter for each subband.

Parameters:
  • num_taps (int) – length of the filter
  • num_bands (int) – number of frequency bands, i.e. number of filters
  • mu (float, optional) – step size for each subband (default 0.5)
  • nlms (bool, optional) – whether or not to normalize as in NLMS (default is True)
reset()
update(X_n, D_n)

Updates the adaptive filters for each subband with the new block of input data.

Parameters:
  • X_n (numpy array, float) – new input signal (to unknown system) in frequency domain
  • D_n (numpy array, float) – new noisy reference signal in frequency domain

Tools and Helpers

Data Structures
class pyroomacoustics.adaptive.data_structures.Buffer(length=20, dtype=<Mock id='140450836499344'>)

A simple buffer class with amortized cost

Parameters:
  • length (int) – buffer length
  • dtype (numpy.type) – data type
flush(n)

Removes the n oldest elements in the buffer

push(val)

Add one element at the front of the buffer

size()

Returns the number of elements in the buffer

top(n)

Returns the n elements at the front of the buffer from newest to oldest

class pyroomacoustics.adaptive.data_structures.CoinFlipper(p, length=10000)

This class efficiently generates large number of coin flips. Because each call to numpy.random.rand is a little bit costly, it is more efficient to generate many values at once. This class does this and stores them in advance. It generates new fresh numbers when needed.

Parameters:
  • p (float, 0 < p < 1) – probability to output a 1
  • length (int) – the number of flips to precompute
flip(n)

Get n random binary values from the buffer

flip_all()

Regenerates all the used up values

fresh_flips(n)

Generates n binary random values now

class pyroomacoustics.adaptive.data_structures.Powers(a, length=20, dtype=<Mock id='140450836499472'>)

This class allows to store all powers of a small number and get them ‘a la numpy’ with the bracket operator. There is automatic increase when new values are requested

Parameters:
  • a (float) – the number
  • length (int) – the number of integer powers
  • dtype (numpy.type, optional) – the data type (typically np.float32 or np.float64)

Example

>>> an = Powers(0.5)
>>> print(an[4])
0.0625
Utilities
pyroomacoustics.adaptive.util.autocorr(x)

Fast autocorrelation computation using the FFT

pyroomacoustics.adaptive.util.hankel_multiplication(c, r, A, mkl=True, **kwargs)

Compute numpy.dot(scipy.linalg.hankel(c,r=r), A) using the FFT.

Parameters:
  • c (ndarray) – the first column of the Hankel matrix
  • r (ndarray) – the last row of the Hankel matrix
  • A (ndarray) – the matrix to multiply on the right
  • mkl (bool, optional) – if True, use the mkl_fft package if available
pyroomacoustics.adaptive.util.hankel_stride_trick(x, shape)

Make a Hankel matrix from a vector using stride tricks

Parameters:
  • x (ndarray) – a vector that contains the concatenation of the first column and first row of the Hankel matrix to build without repetition of the lower left corner value of the matrix
  • shape (tuple) – the shape of the Hankel matrix to build, it must satisfy x.shape[0] == shape[0] + shape[1] - 1
pyroomacoustics.adaptive.util.mkl_toeplitz_multiplication(c, r, A, A_padded=False, out=None, fft_len=None)

Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT from the mkl_fft package.

Parameters:
  • c (ndarray) – the first column of the Toeplitz matrix
  • r (ndarray) – the first row of the Toeplitz matrix
  • A (ndarray) – the matrix to multiply on the right
  • A_padded (bool, optional) – the A matrix can be pre-padded with zeros by the user, if this is the case set to True
  • out (ndarray, optional) – an ndarray to store the output of the multiplication
  • fft_len (int, optional) – specify the length of the FFT to use
pyroomacoustics.adaptive.util.naive_toeplitz_multiplication(c, r, A)

Compute numpy.dot(scipy.linalg.toeplitz(c,r), A)

Parameters:
  • c (ndarray) – the first column of the Toeplitz matrix
  • r (ndarray) – the first row of the Toeplitz matrix
  • A (ndarray) – the matrix to multiply on the right
pyroomacoustics.adaptive.util.toeplitz_multiplication(c, r, A, **kwargs)

Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT.

Parameters:
  • c (ndarray) – the first column of the Toeplitz matrix
  • r (ndarray) – the first row of the Toeplitz matrix
  • A (ndarray) – the matrix to multiply on the right
pyroomacoustics.adaptive.util.toeplitz_opt_circ_approx(r, matrix=False)

Optimal circulant approximation of a symmetric Toeplitz matrix by Tony F. Chan

Parameters:
  • r (ndarray) – the first row of the symmetric Toeplitz matrix
  • matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column
pyroomacoustics.adaptive.util.toeplitz_strang_circ_approx(r, matrix=False)

Circulant approximation to a symetric Toeplitz matrix by Gil Strang

Parameters:
  • r (ndarray) – the first row of the symmetric Toeplitz matrix
  • matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column

Blind Source Separation

Module contents

Blind Source Separation

Implementations of a few blind source separation (BSS) algorithms.

AuxIVA
Independent Vector Analysis [1]
Trinicon

A few commonly used functions, such as projection back, can be found in pyroomacoustics.bss.common.

References

[1]N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, 2011.
[2]R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006.

Algorithms

Independent Vector Analysis (AuxIVA)

Blind Source Separation using Independent Vector Analysis with Auxiliary Function

2018 (c) Robin Scheibler, MIT License

pyroomacoustics.bss.auxiva.auxiva(X, n_src=None, n_iter=20, proj_back=True, W0=None, f_contrast=None, f_contrast_args=[], return_filters=False, callback=None)

Implementation of AuxIVA algorithm for BSS presented in

N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, 2011.

Parameters:
  • X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal
  • n_src (int, optional) – The number of sources or independent components
  • n_iter (int, optional) – The number of iterations (default 20)
  • proj_back (bool, optional) – Scaling on first mic by back projection (default True)
  • W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for demixing matrix
  • f_contrast (dict of functions) – A dictionary with two elements ‘f’ and ‘df’ containing the contrast function taking 3 arguments This should be a ufunc acting element-wise on any array
  • return_filters (bool) – If true, the function will return the demixing matrix too
  • callback (func) – A callback function called every 10 iterations, allows to monitor convergence
Returns:

  • Returns an (nframes, nfrequencies, nsources) array. Also returns
  • the demixing matrix (nfrequencies, nchannels, nsources)
  • if return_values keyword is True.

Trinicon
pyroomacoustics.bss.trinicon.trinicon(signals, w0=None, filter_length=2048, block_length=None, n_blocks=8, alpha_on=None, j_max=10, delta_max=0.0001, sigma2_0=1e-07, mu=0.001, lambd_a=0.2, return_filters=False)

Implementation of the TRINICON Blind Source Separation algorithm as described in

R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006. [pdf]

Specifically, adaptation of the pseudo-code from Table 1.

The implementation is hard-coded for 2 output channels.

Parameters:
  • signals (ndarray (nchannels, nsamples)) – The microphone input signals (time domain)
  • w0 (ndarray (nchannels, nsources, nsamples), optional) – Optional initial value for the demixing filters
  • filter_length (int, optional) – The length of the demixing filters, if w0 is provided, this option is ignored
  • block_length (int, optional) – Block length (default 2x filter_length)
  • n_blocks (int, optional) – Number of blocks processed at once (default 8)
  • alpha_on (int, optional) – Online overlap factor (default n_blocks)
  • j_max (int, optional) – Number of offline iterations (default 10)
  • delta_max (float, optional) – Regularization parameter, this sets the maximum value of the regularization term (default 1e-4)
  • sigma2_0 (float, optional) – Regularization parameter, this sets the reference (machine?) noise level in the regularization (default 1e-7)
  • mu (float, optional) – Offline update step size (default 0.001)
  • lambd_a (float, optional) – Online forgetting factor (default 0.2)
  • return_filters (bool) – If true, the function will return the demixing matrix too (default False)
Returns:

Returns an (nsources, nsamples) array. Also returns the demixing matrix (nchannels, nsources, nsamples) if return_filters keyword is True.

Return type:

ndarray

Common Tools

Common Functions used in BSS algorithms

2018 (c) Robin Scheibler, MIT License

pyroomacoustics.bss.common.projection_back(Y, ref, clip_up=None, clip_down=None)

This function computes the frequency-domain filter that minimizes the squared error to a reference signal. This is commonly used to solve the scale ambiguity in BSS.

The optimal filter z minimizes the squared error.

\[\min E[|z^* y - x|^2]\]

It should thus satsify the orthogonality condition and can be derived as follows

\[ \begin{align}\begin{aligned}0 & = E[y^*\, (z^* y - x)]\\0 & = z^*\, E[|y|^2] - E[y^* x]\\z^* & = \frac{E[y^* x]}{E[|y|^2]}\\z & = \frac{E[y x^*]}{E[|y|^2]}\end{aligned}\end{align} \]

In practice, the expectations are replaced by the sample mean.

Parameters:
  • Y (array_like (n_frames, n_bins, n_channels)) – The STFT data to project back on the reference signal
  • ref (array_like (n_frames, n_bins)) – The reference signal
  • clip_up (float, optional) – Limits the maximum value of the gain (default no limit)
  • clip_down (float, optional) – Limits the minimum value of the gain (default no limit)

Direction of Arrival

Module contents

Direction of Arrival Finding

This sub-package provides implementations of popular direction of arrival findings algorithms.

MUSIC
Multiple Signal Classification [1]
SRP-PHAT
Steered Response Power – Phase Transform [2]
CSSM
Coherent Signal Subspace Method [3]
WAVES
Weighted Average of Signal Subspaces [4]
TOPS
Test of Orthogonality of Projected Subspaces [5]
FRIDA
Finite Rate of Innovation Direction of Arrival [6]

All these classes derive from the abstract base class pyroomacoustics.doa.doa.DOA that offers generic methods for finding and visualizing the locations of acoustic sources.

The constructor can be called once to build the DOA finding object. Then, the method pyroomacoustics.doa.doa.DOA.locate_sources performs DOA finding based on time-frequency passed to it as an argument. Extra arguments can be supplied to indicate which frequency bands should be used for localization.

How to use the DOA module

Here R is a 2xQ ndarray that contains the locations of the Q microphones in the columns, fs is the sampling frequency of the input signal, and nfft the length of the FFT used.

The STFT snapshots are passed to the localization methods in the X ndarray of shape Q x (nfft // 2 + 1) x n_snapshots, where n_snapshots is the number of STFT frames to use for the localization. The option freq_bins can be provided to specify which frequency bins to use for the localization.

>>> doa = pyroomacoustics.doa.MUSIC(R, fs, nfft)
>>> doa.locate_sources(X, freq_bins=np.arange(20, 40))
Other Available Subpackages
pyroomacoustics.doa.grid
this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods
pyroomacoustics.doa.plotters
a few methods to plot functions and points on circles or spheres
pyroomacoustics.doa.detect_peaks
1D peak detection routine from Marcos Duarte
pyroomacoustics.doa.tools_frid_doa_plane
routines implementing FRIDA algorithm
Utilities
pyroomacoustics.doa.algorithms
a dictionary containing all the DOA object subclasses availables indexed by keys ['MUSIC', 'SRP', 'CSSM', 'WAVES', 'TOPS', 'FRIDA']

References

[1]R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag., Vol. 34, Num. 3, pp 276–280, 1986
[2]J. H. DiBiase, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays, PHD Thesis, Brown University, 2000
[3]H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985
[4]E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001
[5]Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006
[6]H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017

Algorithms

CSSM
class pyroomacoustics.doa.cssm.CSSM(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)

Bases: pyroomacoustics.doa.music.MUSIC

Class to apply the Coherent Signal-Subspace method [CSSM] for Direction of Arrival (DoA) estimation.

Note

Run locate_sources() to apply the CSSM algorithm.

Parameters:
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
  • fs (float) – Sampling frequency.
  • nfft (int) – FFT length.
  • c (float) – Speed of sound. Default: 343 m/s
  • num_src (int) – Number of sources to detect. Default: 1
  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
  • colatitude (numpy array) – Candidate colatitude angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
  • num_iter (int) – Number of iterations for CSSM. Default: 5

References

[CSSM]H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985
FRIDA
class pyroomacoustics.doa.frida.FRIDA(L, fs, nfft, max_four=None, c=343.0, num_src=1, G_iter=None, max_ini=5, n_rot=1, max_iter=50, noise_level=1e-10, low_rank_cleaning=False, stopping='max_iter', stft_noise_floor=0.0, stft_noise_margin=1.5, signal_type='visibility', use_lu=True, verbose=False, symb=True, use_cache=False, **kwargs)

Bases: pyroomacoustics.doa.doa.DOA

Implements the FRI-based direction of arrival finding algorithm [FRIDA].

Note

Run locate_sources() to apply the CSSM algorithm.

Parameters:
  • L (ndarray) – Contains the locations of the microphones in the columns
  • fs (int or float) – Sampling frequency
  • nfft (int) – FFT size
  • max_four (int) – Maximum order of the Fourier or spherical harmonics expansion
  • c (float, optional) – Speed of sound
  • num_src (int, optional) – The number of sources to recover (default 1)
  • G_iter (int) – Number of mapping matrix refinement iterations in recovery algorithm (default 1)
  • max_ini (int, optional) – Number of random initializations to use in recovery algorithm (default 5)
  • n_rot (int, optional) – Number of random rotations to apply before recovery algorithm (default 10)
  • noise_level (float, optional) – Noise level in the visibility measurements, if available (default 1e-10)
  • stopping (str, optional) – Stopping criteria for the recovery algorithm. Can be max iterations or noise level (default max_iter)
  • stft_noise_floor (float) – The noise floor in the STFT measurements, if available (default 0)
  • stft_noise_margin (float) – When this, along with stft_noise_floor is set, we only pick frames with at least stft_noise_floor * stft_noise_margin power
  • signal_type (str) –

    Which type of measurements to use:

    • ’visibility’: Cross correlation measurements
    • ’raw’: Microphone signals
  • use_lu (bool, optional) – Whether to use LU decomposition for efficiency
  • verbose (bool, optional) – Whether to output intermediate result for debugging purposes
  • symb (bool, optional) – Whether enforce the symmetry on the reconstructed uniform samples of sinusoids b

References

[FRIDA]H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017
MUSIC
class pyroomacoustics.doa.music.MUSIC(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)

Bases: pyroomacoustics.doa.doa.DOA

Class to apply MUltiple SIgnal Classication (MUSIC) direction-of-arrival (DoA) for a particular microphone array.

Note

Run locate_source() to apply the MUSIC algorithm.

Parameters:
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
  • fs (float) – Sampling frequency.
  • nfft (int) – FFT length.
  • c (float) – Speed of sound. Default: 343 m/s
  • num_src (int) – Number of sources to detect. Default: 1
  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
plot_individual_spectrum()

Plot the steered response for each frequency.

SRP-PHAT
class pyroomacoustics.doa.srp.SRP(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)

Bases: pyroomacoustics.doa.doa.DOA

Class to apply Steered Response Power (SRP) direction-of-arrival (DoA) for a particular microphone array.

Note

Run locate_source() to apply the SRP-PHAT algorithm.

Parameters:
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
  • fs (float) – Sampling frequency.
  • nfft (int) – FFT length.
  • c (float) – Speed of sound. Default: 343 m/s
  • num_src (int) – Number of sources to detect. Default: 1
  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
TOPS
class pyroomacoustics.doa.tops.TOPS(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)

Bases: pyroomacoustics.doa.music.MUSIC

Class to apply Test of Orthogonality of Projected Subspaces [TOPS] for Direction of Arrival (DoA) estimation.

Note

Run locate_source() to apply the TOPS algorithm.

Parameters:
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
  • fs (float) – Sampling frequency.
  • nfft (int) – FFT length.
  • c (float) – Speed of sound. Default: 343 m/s
  • num_src (int) – Number of sources to detect. Default: 1
  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)

References

[TOPS]Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006
WAVES
class pyroomacoustics.doa.waves.WAVES(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)

Bases: pyroomacoustics.doa.music.MUSIC

Class to apply Weighted Average of Signal Subspaces [WAVES] for Direction of Arrival (DoA) estimation.

Note

Run locate_sources() to apply the WAVES algorithm.

Parameters:
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
  • fs (float) – Sampling frequency.
  • nfft (int) – FFT length.
  • c (float) – Speed of sound. Default: 343 m/s
  • num_src (int) – Number of sources to detect. Default: 1
  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
  • num_iter (int) – Number of iterations for CSSM. Default: 5

References

[WAVES]E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001

Tools and Helpers

DOA (Base)
class pyroomacoustics.doa.doa.DOA(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, n_grid=None, dim=2, *args, **kwargs)

Bases: object

Abstract parent class for Direction of Arrival (DoA) algorithms. After creating an object (SRP, MUSIC, CSSM, WAVES, or TOPS), run locate_source to apply the corresponding algorithm.

Parameters:
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
  • fs (float) – Sampling frequency.
  • nfft (int) – FFT length.
  • c (float) – Speed of sound. Default: 343 m/s
  • num_src (int) – Number of sources to detect. Default: 1
  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
  • n_grid (int) – If azimuth and colatitude are not specified, we will create a grid with so many points. Default is 360.
  • dim (int) – The dimension of the problem. Set dim=2 to find sources on the circle (x-y plane). Set dim=3 to search on the whole sphere.
locate_sources(X, num_src=None, freq_range=[500.0, 4000.0], freq_bins=None, freq_hz=None)

Locate source(s) using corresponding algorithm.

Parameters:
  • X (numpy array) – Set of signals in the frequency (RFFT) domain for current frame. Size should be M x F x S, where M should correspond to the number of microphones, F to nfft/2+1, and S to the number of snapshots (user-defined). It is recommended to have S >> M.
  • num_src (int) – Number of sources to detect. Default is value given to object constructor.
  • freq_range (list of floats, length 2) – Frequency range on which to run DoA: [fmin, fmax].
  • freq_bins (list of int) – freq_bins: List of individual frequency bins on which to run DoA. If defined by user, it will not take into consideration freq_range or freq_hz.
  • freq_hz (list of floats) – List of individual frequencies on which to run DoA. If defined by user, it will not take into consideration freq_range.
polar_plt_dirac(azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)

Generate polar plot of DoA results.

Parameters:
  • azimuth_ref (numpy array) – True direction of sources (in radians).
  • alpha_ref (numpy array) – Estimated amplitude of sources.
  • save_fig (bool) – Whether or not to save figure as pdf.
  • file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’
  • plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.
class pyroomacoustics.doa.doa.ModeVector(L, fs, nfft, c, grid, mode='far', precompute=False)

Bases: object

This is a class for look-up tables of mode vectors. This look-up table is an outer product of three vectors running along candidate locations, time, and frequency. When the grid becomes large, the look-up table might be too large to store in memory. In that case, this class allows to only compute the outer product elements when needed, only keeping the three vectors in memory. When the table is small, a precompute option can be set to True to compute the whole table in advance.

Tools for FRIDA (azimuth only)
pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri(coef_ri, K, D, L)

Split T matrix in rea/imaginary representation

pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri_half(coef_half, K, D, L, D_coef)

Split T matrix in rea/imaginary conjugate symmetric representation

pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri_half_out_half(coef_half, K, D, L, D_coef, mtx_shrink)

if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size

pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri(b_ri, K, D, L)

build convolution matrix associated with b_ri

Parameters:
  • b_ri – a real-valued vector
  • K – number of Diracs
  • D1 – expansion matrix for the real-part
  • D2 – expansion matrix for the imaginary-part
pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri_half(b_ri, K, D, L, D_coef)

Split T matrix in conjugate symmetric representation

pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri_half_out_half(b_ri, K, D, L, D_coef, mtx_shrink)

if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size

pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_amp(phi_k, p_mic_x, p_mic_y)

the matrix that maps Diracs’ amplitudes to the visibility

Parameters:
  • phi_k – Diracs’ location (azimuth)
  • p_mic_x – a vector that contains microphones’ x-coordinates
  • p_mic_y – a vector that contains microphones’ y-coordinates
pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_amp_ri(p_mic_x, p_mic_y, phi_k)

builds real/imaginary amplitude matrix

pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_raw_amp(p_mic_x, p_mic_y, phi_k)

the matrix that maps Diracs’ amplitudes to the visibility

Parameters:
  • phi_k – Diracs’ location (azimuth)
  • p_mic_x – a vector that contains microphones’ x-coordinates
  • p_mic_y – a vector that contains microphones’ y-coordinates
pyroomacoustics.doa.tools_fri_doa_plane.coef_expan_mtx(K)

expansion matrix for an annihilating filter of size K + 1

Parameters:K – number of Dirac. The filter size is K + 1
pyroomacoustics.doa.tools_fri_doa_plane.compute_b(G_lst, GtG_lst, beta_lst, Rc0, num_bands, a_ri, use_lu=False, GtG_inv_lst=None)

compute the uniform sinusoidal samples b from the updated annihilating filter coeffiients.

Parameters:
  • GtG_lst – list of G^H G for different subbands
  • beta_lst – list of beta-s for different subbands
  • Rc0 – right-dual matrix, here it is the convolution matrix associated with c
  • num_bands – number of bands
  • a_ri – a 2D numpy array. each column corresponds to the measurements within a subband
pyroomacoustics.doa.tools_fri_doa_plane.compute_mtx_obj(GtG_lst, Tbeta_lst, Rc0, num_bands, K)
compute the matrix (M) in the objective function:
min c^H M c s.t. c0^H c = 1
Parameters:
  • GtG_lst – list of G^H * G
  • Tbeta_lst – list of Teoplitz matrices for beta-s
  • Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
pyroomacoustics.doa.tools_fri_doa_plane.compute_obj_val(GtG_inv_lst, Tbeta_lst, Rc0, c_ri_half, num_bands, K)

compute the fitting error. CAUTION: Here we assume use_lu = True

pyroomacoustics.doa.tools_fri_doa_plane.cov_mtx_est(y_mic)

estimate covariance matrix

Parameters:y_mic – received signal (complex based band representation) at microphones
pyroomacoustics.doa.tools_fri_doa_plane.cpx_mtx2real(mtx)

extend complex valued matrix to an extended matrix of real values only

Parameters:mtx – input complex valued matrix
pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')

Reconstruct point sources’ locations (azimuth) from the visibility measurements

Parameters:
  • G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
  • a_ri – the visibility measurements
  • K – number of Diracs
  • M – the Fourier series expansion is between -M and M
  • noise_level – level of noise (ell_2 norm) in the measurements
  • max_ini – maximum number of initialisations
  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')

Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.

Parameters:
  • G (param) – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
  • a_ri – the visibility measurements
  • K – number of Diracs
  • M – the Fourier series expansion is between -M and M
  • noise_level – level of noise (ell_2 norm) in the measurements
  • max_ini – maximum number of initialisations
  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband(G_lst, a_ri, K, M, max_ini=100)

Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.

Parameters:
  • G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids
  • a_ri – the visibility measurements
  • K – number of Diracs
  • M – the Fourier series expansion is between -M and M
  • noise_level – level of noise (ell_2 norm) in the measurements
  • max_ini – maximum number of initialisations
  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband_lu(G_lst, GtG_lst, GtG_inv_lst, a_ri, K, M, max_ini=100, max_iter=50)

Here we use LU decomposition to precompute a few entries. Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.

Parameters:
  • G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids
  • a_ri – the visibility measurements
  • K – number of Diracs
  • M – the Fourier series expansion is between -M and M
  • noise_level – level of noise (ell_2 norm) in the measurements
  • max_ini – maximum number of initialisations
  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband_parallel(G, a_ri, K, M, max_ini=100)

Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’

Parameters:
  • G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
  • a_ri – the visibility measurements
  • K – number of Diracs
  • M – the Fourier series expansion is between -M and M
  • noise_level – level of noise (ell_2 norm) in the measurements
  • max_ini – maximum number of initialisations
  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_parallel(G, a_ri, K, M, max_ini=100)

Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’

Parameters:
  • G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
  • a_ri – the visibility measurements
  • K – number of Diracs
  • M – the Fourier series expansion is between -M and M
  • noise_level – level of noise (ell_2 norm) in the measurements
  • max_ini – maximum number of initialisations
  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_inner(c_ri_half, a_ri, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, max_iter)

inner loop of the dirac_recon_ri_half_parallel function

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_multiband_inner(c_ri_half, a_ri, num_bands, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, GtG, max_iter)

Inner loop of the dirac_recon_ri_multiband function

pyroomacoustics.doa.tools_fri_doa_plane.extract_off_diag(mtx)

extract off diagonal entries in mtx. The output vector is order in a column major manner.

Parameters:mtx – input matrix to extract the off diagonal entries
pyroomacoustics.doa.tools_fri_doa_plane.hermitian_expan(half_vec_len)

expand a real-valued vector to a Hermitian symmetric vector. The input vector is a concatenation of the real parts with NON-POSITIVE indices and the imaginary parts with STRICTLY-NEGATIVE indices.

Parameters:half_vec_len – length of the first half vector
pyroomacoustics.doa.tools_fri_doa_plane.lu_compute_mtx_obj(Tbeta_lst, num_bands, K, lu_R_GtGinv_Rt_lst)
compute the matrix (M) in the objective function:
min c^H M c s.t. c0^H c = 1
Parameters:
  • GtG_lst – list of G^H * G
  • Tbeta_lst – list of Teoplitz matrices for beta-s
  • Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
pyroomacoustics.doa.tools_fri_doa_plane.lu_compute_mtx_obj_initial(GtG_inv_lst, Tbeta_lst, Rc0, num_bands, K)
compute the matrix (M) in the objective function:
min c^H M c s.t. c0^H c = 1
Parameters:
  • GtG_lst – list of G^H * G
  • Tbeta_lst – list of Teoplitz matrices for beta-s
  • Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
pyroomacoustics.doa.tools_fri_doa_plane.make_G(p_mic_x, p_mic_y, omega_bands, sound_speed, M, signal_type='visibility')

reconstruct point sources on the circle from the visibility measurements from multi-bands.

Parameters:
  • p_mic_x – a vector that contains microphones’ x-coordinates
  • p_mic_y – a vector that contains microphones’ y-coordinates
  • omega_bands – mid-band (ANGULAR) frequencies [radian/sec]
  • sound_speed – speed of sound
  • signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
Returns:

Return type:

The list of mapping matrices from measurements to sinusoids

pyroomacoustics.doa.tools_fri_doa_plane.make_GtG_and_inv(G_lst)
pyroomacoustics.doa.tools_fri_doa_plane.mtx_freq2raw(M, p_mic_x, p_mic_y)

build the matrix that maps the Fourier series to the raw microphone signals

Parameters:
  • M – the Fourier series expansion is limited from -M to M
  • p_mic_x – a vector that contains microphones x coordinates
  • p_mic_y – a vector that contains microphones y coordinates
pyroomacoustics.doa.tools_fri_doa_plane.mtx_freq2visi(M, p_mic_x, p_mic_y)

build the matrix that maps the Fourier series to the visibility

Parameters:
  • M – the Fourier series expansion is limited from -M to M
  • p_mic_x – a vector that constains microphones x coordinates
  • p_mic_y – a vector that constains microphones y coordinates
pyroomacoustics.doa.tools_fri_doa_plane.mtx_fri2signal_ri(M, p_mic_x, p_mic_y, D1, D2, signal='visibility')

build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)

Parameters:
  • M – the Fourier series expansion is limited from -M to M
  • p_mic_x – a vector that contains microphones x coordinates
  • p_mic_y – a vector that contains microphones y coordinates
  • D1 – expansion matrix for the real-part
  • D2 – expansion matrix for the imaginary-part
  • signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)
pyroomacoustics.doa.tools_fri_doa_plane.mtx_fri2signal_ri_multiband(M, p_mic_x_all, p_mic_y_all, D1, D2, aslist=False, signal='visibility')

build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)

Parameters:
  • M – the Fourier series expansion is limited from -M to M
  • p_mic_x_all – a matrix that contains microphones x coordinates
  • p_mic_y_all – a matrix that contains microphones y coordinates
  • D1 – expansion matrix for the real-part
  • D2 – expansion matrix for the imaginary-part aslist: whether the linear mapping for each subband is returned as a list or a block diagonal matrix
  • signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)
pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri)

Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations. :param phi_recon: the reconstructed Dirac locations (azimuths) :param M: the Fourier series expansion is between -M to M :param p_mic_x: a vector that contains microphones’ x-coordinates :param p_mic_y: a vector that contains microphones’ y-coordinates :param mtx_freq2visi: the linear mapping from Fourier series to visibilities

pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G_multiband(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri, num_bands)

Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.

Parameters:
  • phi_recon – the reconstructed Dirac locations (azimuths)
  • M – the Fourier series expansion is between -M to M
  • p_mic_x – a vector that contains microphones’ x-coordinates
  • p_mic_y – a vector that contains microphones’ y-coordinates
  • mtx_fri2visi – the linear mapping from Fourier series to visibilities
pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G_multiband_new(phi_opt, M, p_x, p_y, G0_lst, num_bands)

Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.

Parameters:
  • phi_opt – the reconstructed Dirac locations (azimuths)
  • M – the Fourier series expansion is between -M to M
  • p_mic_x – a vector that contains microphones’ x-coordinates
  • p_mic_y – a vector that contains microphones’ y-coordinates
  • G0_lst – the original linear mapping from Fourier series to visibilities
  • num_bands – number of subbands
pyroomacoustics.doa.tools_fri_doa_plane.multiband_cov_mtx_est(y_mic)

estimate covariance matrix based on the received signals at microphones

Parameters:y_mic – received signal (complex base-band representation) at microphones
pyroomacoustics.doa.tools_fri_doa_plane.multiband_extract_off_diag(mtx)

extract off-diagonal entries in mtx The output vector is order in a column major manner

Parameters:mtx (input matrix to extract the off-diagonal entries) –
pyroomacoustics.doa.tools_fri_doa_plane.output_shrink(K, L)

shrink the convolution output to half the size. used when both the annihilating filter and the uniform samples of sinusoids satisfy Hermitian symmetric.

Parameters:
  • K – the annihilating filter size: K + 1
  • L – length of the (complex-valued) b vector
pyroomacoustics.doa.tools_fri_doa_plane.polar2cart(rho, phi)

convert from polar to cartesian coordinates

Parameters:
  • rho – radius
  • phi – azimuth
pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon(a, p_mic_x, p_mic_y, omega_band, sound_speed, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, verbose=False, signal_type='visibility', **kwargs)

reconstruct point sources on the circle from the visibility measurements

Parameters:
  • a – the measured visibilities
  • p_mic_x – a vector that contains microphones’ x-coordinates
  • p_mic_y – a vector that contains microphones’ y-coordinates
  • omega_band – mid-band (ANGULAR) frequency [radian/sec]
  • sound_speed – speed of sound
  • K – number of point sources
  • M – the Fourier series expansion is between -M to M
  • noise_level – noise level in the measured visibilities
  • max_ini – maximum number of random initialisation used
  • stop_cri – either ‘mse’ or ‘max_iter’
  • update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
  • verbose – whether output intermediate results for debugging or not
  • signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
  • kwargs (possible optional input: G_iter: number of iterations for the G updates) –
pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon_multiband(a, p_mic_x, p_mic_y, omega_bands, sound_speed, K, M, noise_level, max_ini=50, update_G=False, verbose=False, signal_type='visibility', max_iter=50, G_lst=None, GtG_lst=None, GtG_inv_lst=None, **kwargs)

reconstruct point sources on the circle from the visibility measurements from multi-bands.

Parameters:
  • a – the measured visibilities in a matrix form, where the second dimension corresponds to different subbands
  • p_mic_x – a vector that contains microphones’ x-coordinates
  • p_mic_y – a vector that contains microphones’ y-coordinates
  • omega_bands – mid-band (ANGULAR) frequencies [radian/sec]
  • sound_speed – speed of sound
  • K – number of point sources
  • M – the Fourier series expansion is between -M to M
  • noise_level – noise level in the measured visibilities
  • max_ini – maximum number of random initialisation used
  • update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
  • verbose – whether output intermediate results for debugging or not
  • signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
  • kwargs – possible optional input: G_iter: number of iterations for the G updates
pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon_rotate(a, p_mic_x, p_mic_y, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, num_rotation=1, verbose=False, signal_type='visibility', **kwargs)

reconstruct point sources on the circle from the visibility measurements. Here we apply random rotations to the coordiantes.

Parameters:
  • a – the measured visibilities
  • p_mic_x – a vector that contains microphones’ x-coordinates
  • p_mic_y – a vector that contains microphones’ y-coordinates
  • K – number of point sources
  • M – the Fourier series expansion is between -M to M
  • noise_level – noise level in the measured visibilities
  • max_ini – maximum number of random initialisation used
  • stop_cri – either ‘mse’ or ‘max_iter’
  • update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
  • num_rotation – number of random rotations
  • verbose – whether output intermediate results for debugging or not
  • signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
  • kwargs – possible optional input: G_iter: number of iterations for the G updates
Grid Objects

Routines to perform grid search on the sphere

class pyroomacoustics.doa.grid.Grid(n_points)

Bases: object

This is an abstract class with attributes and methods for grids

Parameters:n_points (int) – the number of points on the grid
apply(func, spherical=False)
find_peaks(k=1)
set_values(vals)
class pyroomacoustics.doa.grid.GridCircle(n_points=360, azimuth=None)

Bases: pyroomacoustics.doa.grid.Grid

Creates a grid on the circle.

Parameters:
  • n_points (int, optional) – The number of uniformly spaced points in the grid.
  • azimuth (ndarray, optional) – An array of azimuth (in radians) to use for grid locations. Overrides n_points.
apply(func, spherical=False)
find_peaks(k=1)
plot(mark_peaks=0)
class pyroomacoustics.doa.grid.GridSphere(n_points=1000, spherical_points=None)

Bases: pyroomacoustics.doa.grid.Grid

This function computes nearly equidistant points on the sphere using the fibonacci method

Parameters:
  • n_points (int) – The number of points to sample
  • spherical_points (ndarray, optional) – A 2 x n_points array of spherical coordinates with azimuth in the top row and colatitude in the second row. Overrides n_points.

References

http://lgdv.cs.fau.de/uploads/publications/spherical_fibonacci_mapping.pdf http://stackoverflow.com/questions/9600801/evenly-distributing-n-points-on-a-sphere

apply(func, spherical=False)

Apply a function to every grid point

find_peaks(k=1)

Find the largest peaks on the grid

min_max_distance()

Compute some statistics on the distribution of the points

plot(colatitude_ref=None, azimuth_ref=None, colatitude_recon=None, azimuth_recon=None, plotly=True, projection=True, points_only=False)
plot_old(plot_points=False, mark_peaks=0)

Plot the points on the sphere with their values

regrid()

Regrid the non-uniform data on a regular mesh

Plot Helpers

A collection of functions to plot maps and points on circles and spheres.

pyroomacoustics.doa.plotters.polar_plt_dirac(self, azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)

Generate polar plot of DoA results.

Parameters:
  • azimuth_ref (numpy array) – True direction of sources (in radians).
  • alpha_ref (numpy array) – Estimated amplitude of sources.
  • save_fig (bool) – Whether or not to save figure as pdf.
  • file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’
  • plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.
pyroomacoustics.doa.plotters.sph_plot_diracs(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, colatitude_grid=None, azimuth_grid=None, file_name='sph_recon_2d_dirac.pdf', **kwargs)

This function plots the dirty image with sources locations on a flat projection of the sphere

Parameters:
  • colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points
  • azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs
  • colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize
  • azimuth (ndarray, optional) – The azimuths of the collection of points to visualize
  • dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points
  • azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map
  • colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map
pyroomacoustics.doa.plotters.sph_plot_diracs_plotly(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, azimuth_grid=None, colatitude_grid=None, surface_base=1, surface_height=0.0)

Plots a 2D map on a sphere as well as a collection of diracs using the plotly library

Parameters:
  • colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points
  • azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs
  • colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize
  • azimuth (ndarray, optional) – The azimuths of the collection of points to visualize
  • dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points
  • azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map
  • colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map
  • surface_base – radius corresponding to lowest height on the map
  • sufrace_height – radius difference between the lowest and highest point on the map
Peak Detection

Detect peaks in data based on their amplitude and other features.

Author: Marcos Duarte, https://github.com/demotu/BMC Version: 1.0.4 License: MIT

pyroomacoustics.doa.detect_peaks.detect_peaks(x, mph=None, mpd=1, threshold=0, edge='rising', kpsh=False, valley=False, show=False, ax=None)

Detect peaks in data based on their amplitude and other features.

Parameters:
  • x (1D array_like) – data.
  • mph ({None, number}, optional (default = None)) – detect peaks that are greater than minimum peak height.
  • mpd (positive integer, optional (default = 1)) – detect peaks that are at least separated by minimum peak distance (in number of data).
  • threshold (positive number, optional (default = 0)) – detect peaks (valleys) that are greater (smaller) than threshold in relation to their immediate neighbors.
  • edge ({None, 'rising', 'falling', 'both'}, optional (default = 'rising')) – for a flat peak, keep only the rising edge (‘rising’), only the falling edge (‘falling’), both edges (‘both’), or don’t detect a flat peak (None).
  • kpsh (bool, optional (default = False)) – keep peaks with same height even if they are closer than mpd.
  • valley (bool, optional (default = False)) – if True (1), detect valleys (local minima) instead of peaks.
  • show (bool, optional (default = False)) – if True (1), plot data in matplotlib figure.
  • ax (a matplotlib.axes.Axes instance, optional (default = None)) –
Returns:

ind – indeces of the peaks in x.

Return type:

1D array_like

Notes

The detection of valleys instead of peaks is performed internally by simply negating the data: ind_valleys = detect_peaks(-x)

The function can handle NaN’s

See this IPython Notebook [1].

References

[1]http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/DetectPeaks.ipynb

Examples

>>> from detect_peaks import detect_peaks
>>> x = np.random.randn(100)
>>> x[60:81] = np.nan
>>> # detect all peaks and plot data
>>> ind = detect_peaks(x, show=True)
>>> print(ind)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5
>>> # set minimum peak height = 0 and minimum peak distance = 20
>>> detect_peaks(x, mph=0, mpd=20, show=True)
>>> x = [0, 1, 0, 2, 0, 3, 0, 2, 0, 1, 0]
>>> # set minimum peak distance = 2
>>> detect_peaks(x, mpd=2, show=True)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5
>>> # detection of valleys instead of peaks
>>> detect_peaks(x, mph=0, mpd=20, valley=True, show=True)
>>> x = [0, 1, 1, 0, 1, 1, 0]
>>> # detect both edges
>>> detect_peaks(x, edge='both', show=True)
>>> x = [-2, 1, -2, 2, 1, 1, 3, 0]
>>> # set threshold = 2
>>> detect_peaks(x, threshold = 2, show=True)
DOA Utilities

This module contains useful functions to compute distances and errors on on circles and spheres.

pyroomacoustics.doa.utils.circ_dist(azimuth1, azimuth2, r=1.0)

Returns the shortest distance between two points on a circle

Parameters:
  • azimuth1 – azimuth of point 1
  • azimuth2 – azimuth of point 2
  • r (optional) – radius of the circle (Default 1)
pyroomacoustics.doa.utils.great_circ_dist(r, colatitude1, azimuth1, colatitude2, azimuth2)

calculate great circle distance for points located on a sphere

Parameters:
  • r (radius of the sphere) –
  • colatitude1 (colatitude of point 1) –
  • azimuth1 (azimuth of point 1) –
  • colatitude2 (colatitude of point 2) –
  • azimuth2 (azimuth of point 2) –
Returns:

great-circle distance

Return type:

float or ndarray

pyroomacoustics.doa.utils.polar_distance(x1, x2)

Given two arrays of numbers x1 and x2, pairs the cells that are the closest and provides the pairing matrix index: x1(index(1,:)) should be as close as possible to x2(index(2,:)). The function outputs the average of the absolute value of the differences abs(x1(index(1,:))-x2(index(2,:))).

Parameters:
  • x1 – vector 1
  • x2 – vector 2
Returns:

  • d – minimum distance between d
  • index – the permutation matrix

pyroomacoustics.doa.utils.spher2cart(r, azimuth, colatitude)

Convert a spherical point to cartesian coordinates.

Parameters:
  • r – radius
  • azimuth – azimuth
  • colatitude – colatitude
Returns:

An ndarray containing the Cartesian coordinates of the points its columns

Return type:

ndarray

Single Channel Denoising

Module contents

Single Channel Noise Reduction

Collection of single channel noise reduction (SCNR) algorithms for speech. At the moment, only a spectral subtraction method, similar to [1], is implemented.

At the following repository, a deep learning approach in Python can be found here.

Other methods for speech enhancement/noise reduction employ Wiener filtering [2] and subspace approaches [3].

References

[1]M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, ICASSP ‘79. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1979, pp. 208-211.
[2]J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.
[3]Y. Ephraim and H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul 1995.

Algorithms

pyroomacoustics.denoise.spectral_subtraction module
class pyroomacoustics.denoise.spectral_subtraction.SpectralSub(nfft, db_reduc, lookback, beta, alpha=1)

Bases: object

Here we have a class for performing single channel noise reduction via spectral subtraction. The instantaneous signal energy and noise floor is estimated at each time instance (for each frequency bin) and this is used to compute a gain filter with which to perform spectral subtraction.

For a given frame n, the gain for frequency bin k is given by:

\[G[k, n] = \max \left \{ \left ( \dfrac{P[k, n]-\beta P_N[k, n]}{P[k, n]} \right )^\alpha, G_{min} \right \},\]

where \(G_{min} = 10^{-(db\_reduc/20)}\) and \(db\_reduc\) is the maximum reduction (in dB) that we are willing to perform for each bin (a high value can actually be detrimental, see below). The instantaneous energy \(P[k,n]\) is computed by simply squaring the frequency amplitude at the bin k. The time-frequency decomposition of the input signal is typically done with the STFT and overlapping frames. The noise estimate \(P_N[k, n]\) for frequency bin k is given by looking back a certain number of frames \(L\) and selecting the bin with the lowest energy:

\[P_N[k, n] = \min_{[n-L, n]} P[k, n]\]

This approach works best when the SNR is positive and the noise is rather stationary. An alternative approach for the noise estimate (also in the case of stationary noise) would be to apply a lowpass filter for each frequency bin.

With a large suppression, i.e. large values for \(db\_reduc\), we can observe a typical artefact of such spectral subtraction approaches, namely “musical noise”. Here is nice article about noise reduction and musical noise.

Adjusting the constants \(\beta\) and \(\alpha\) also presents a trade-off between suppression and undesirable artefacts, i.e. more noticeable musical noise.

Below is an example of how to use this class. A full example can be found in the “examples” folder of the repository.

# initialize STFT and SpectralSub objects
nfft = 512
stft = pra.transform.STFT(nfft, hop=nfft//2, analysis_window=pra.hann(nfft))
scnr = pra.denoise.SpectralSub(nfft, db_reduc=10, lookback=5, beta=20, alpha=3)

# apply block-by-block
for n in range(num_blocks):

    # go to frequency domain for noise reduction
    stft.analysis(mono_noisy)
    gain_filt = scnr.compute_gain_filter(stft.X)

    # estimating input convolved with unknown response
    mono_denoised = stft.synthesis(gain_filt*stft.X)
Parameters:
  • nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is given by nfft//2+1.
  • db_reduc (float) – Maximum reduction in dB for each bin.
  • lookback (int) – How many frames to look back for the noise estimate.
  • beta (float) – Overestimation factor to “push” the gain filter value (at each frequency) closer to the dB reduction specified by db_reduc.
  • alpha (float, optional) – Exponent factor to modify transition behavior towards the dB reduction specified by db_reduc. Default is 1.
compute_gain_filter(X)
Parameters:X (numpy array) – Complex spectrum of length nfft//2+1.
Returns:Gain filter to multiply given spectrum with.
Return type:numpy array

Pyroomacoustics API

Subpackages

pyroomacoustics.experimental package
Submodules
pyroomacoustics.experimental.deconvolution module
pyroomacoustics.experimental.deconvolution.deconvolve(y, s, length=None, thresh=0.0)

Deconvolve an excitation signal from an impulse response

Parameters:
  • y (ndarray) – The recording
  • s (ndarray) – The excitation signal
  • length (int, optional) – the length of the impulse response to deconvolve
  • thresh (float, optional) – ignore frequency bins with power lower than this
pyroomacoustics.experimental.deconvolution.wiener_deconvolve(y, x, length=None, noise_variance=1.0, let_n_points=15, let_div_base=2)

Deconvolve an excitation signal from an impulse response

We use Wiener filter

Parameters:
  • y (ndarray) – The recording
  • x (ndarray) – The excitation signal
  • length (int, optional) – the length of the impulse response to deconvolve
  • noise_variance (float, optional) – estimate of the noise variance
  • let_n_points (int) – number of points to use in the LET approximation
  • let_div_base (float) – the divider used for the LET grid
pyroomacoustics.experimental.delay_calibration module
class pyroomacoustics.experimental.delay_calibration.DelayCalibration(fs, pad_time=0.0, mls_bits=16, repeat=1, temperature=25.0, humidity=50.0, pressure=1000.0)
run(distance=0.0, ch_in=None, ch_out=None, oversampling=1)

Run the calibration. Plays a maximum length sequence and cross correlate the signals to find the time delay.

Parameters:
  • distance (float, optional) – Distance between the speaker and microphone
  • ch_in (int, optional) – The input channel to use. If not specified, all channels are calibrated
  • ch_out (int, optional) – The output channel to use. If not specified, all channels are calibrated
pyroomacoustics.experimental.localization module

We have a number of points of know locations and have the TDOA measurements from an unknown location to the known point. We perform an EDM line search to find the unknown offset to turn TDOA to TOA.

Parameters:
  • R (ndarray) – An ndarray of 3xN where each column is the location of a point
  • tdoa (ndarray) – A length N vector containing the tdoa measurements from uknown location to known ones
  • bounds (ndarray) – Bounds for the line search
  • step (float) – Step size for the line search
pyroomacoustics.experimental.localization.tdoa(x1, x2, interp=1, fs=1, phat=True)

This function computes the time difference of arrival (TDOA) of the signal at the two microphones. This in turns is used to infer the direction of arrival (DOA) of the signal.

Specifically if s(k) is the signal at the reference microphone and s_2(k) at the second microphone, then for signal arriving with DOA theta we have

s_2(k) = s(k - tau)

with

tau = fs*d*sin(theta)/c

where d is the distance between the two microphones and c the speed of sound.

We recover tau using the Generalized Cross Correlation - Phase Transform (GCC-PHAT) method. The reference is

Knapp, C., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay.

Parameters:
  • x1 (nd-array) – The signal of the reference microphone
  • x2 (nd-array) – The signal of the second microphone
  • interp (int, optional (default 1)) – The interpolation value for the cross-correlation, it can improve the time resolution (and hence DOA resolution)
  • fs (int, optional (default 44100 Hz)) – The sampling frequency of the input signal
Returns:

  • theta (float) – the angle of arrival (in radian (I think))
  • pwr (float) – the magnitude of the maximum cross correlation coefficient
  • delay (float) – the delay between the two microphones (in seconds)

pyroomacoustics.experimental.localization.tdoa_loc(R, tdoa, c, x0=None)

TDOA based localization

Parameters:
  • R (ndarray) – A 3xN array of 3D points
  • tdoa (ndarray) – A length N array of tdoa
  • c (float) – The speed of sound
  • Reference
  • ---------
  • Li, TDOA localization (Steven) –
pyroomacoustics.experimental.measure_ir module
pyroomacoustics.experimental.measure_ir.measure_ir(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)

Measures an impulse response by playing a sweep and recording it using the sounddevice package.

Parameters:
  • sweep_length (float, optional) – length of the sweep in seconds
  • sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)
  • fs (int, optional) – sampling frequency (default 48 kHz)
  • f_lo (float, optional) – lowest frequency in the sweep
  • f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2
  • volume (float, optional) – multiply the sweep by this number before playing (default 0.9)
  • pre_delay (float, optional) – delay in second before playing sweep
  • post_delay (float, optional) – delay in second before stopping recording after playing the sweep
  • fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)
  • dev_in (int, optional) – input device number
  • dev_out (int, optional) – output device number
  • channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.
  • channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.
  • ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies
  • deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)
  • plot (bool, optional) – plot the resulting signal
Returns:

Return type:

Returns the impulse response if deconvolution == True and the recorded signal if not

pyroomacoustics.experimental.physics module
pyroomacoustics.experimental.physics.calculate_speed_of_sound(t, h, p)

Compute the speed of sound as a function of temperature, humidity and pressure

Parameters:
  • t (temperature [Celsius]) –
  • h (relative humidity [%]) –
  • p (atmospheric pressure [kpa]) –
Returns:

Return type:

Speed of sound in [m/s]

pyroomacoustics.experimental.point_cloud module
Point Clouds

Contains PointCloud class.

Given a number of points and their relative distances, this class aims at reconstructing their relative coordinates.

class pyroomacoustics.experimental.point_cloud.PointCloud(m=1, dim=3, diameter=0.0, X=None, labels=None, EDM=None)
EDM()

Computes the EDM corresponding to the marker set

align(marker, axis)

Rotate the marker set around the given axis until it is aligned onto the given marker

Parameters:
  • marker (int or str) – the index or label of the marker onto which to align the set
  • axis (int) – the axis around which the rotation happens
center(marker)

Translate the marker set so that the argument is the origin.

classical_mds(D)

Classical multidimensional scaling

Parameters:D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
copy()

Return a deep copy of this marker set object

correct(corr_dic)

correct a marker location by a given vector

doa(receiver, source)

Computes the direction of arrival wrt a source and receiver

flatten(ind)

Transform the set of points so that the subset of markers given as argument is as close as flat (wrt z-axis) as possible.

Parameters:ind (list of bools) – Lists of marker indices that should be all in the same subspace
fromEDM(D, labels=None, method='mds')

Compute the position of markers from their Euclidean Distance Matrix

Parameters:
  • D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
  • labels (list, optional) – A list of human friendly labels for the markers (e.g. ‘east’, ‘west’, etc)
  • method (str, optional) – The method to use * ‘mds’ for multidimensional scaling (default) * ‘tri’ for trilateration
key2ind(ref)

Get the index location from a label

normalize(refs=None)

Reposition points such that x0 is at origin, x1 lies on c-axis and x2 lies above x-axis, keeping the relative position to each other. The z-axis is defined according to right hand rule by default.

Parameters:
  • refs (list of 3 ints or str) – The index or label of three markers used to define (origin, x-axis, y-axis)
  • left_hand (bool, optional (default False)) – Normally the z-axis is defined using right-hand rule, this flag allows to override this behavior
plot(axes=None, show_labels=True, **kwargs)
trilateration(D)

Find the location of points based on their distance matrix using trilateration

Parameters:D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
trilateration_single_point(c, Dx, Dy)

Given x at origin (0,0) and y at (0,c) the distances from a point at unknown location Dx, Dy to x, y, respectively, finds the position of the point.

pyroomacoustics.experimental.signals module

A few test signals like sweeps and stuff.

pyroomacoustics.experimental.signals.exponential_sweep(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)

Exponential sine sweep

Parameters:
  • T (float) – length in seconds
  • fs – sampling frequency
  • f_lo (float) – lowest frequency in fraction of fs (default 0)
  • f_hi (float) – lowest frequency in fraction of fs (default 1)
  • fade (float, optional) – length of fade in and out in seconds (default 0)
  • ascending (bool, optional) –
pyroomacoustics.experimental.signals.linear_sweep(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)

Linear sine sweep

Parameters:
  • T (float) – length in seconds
  • fs – sampling frequency
  • f_lo (float) – lowest frequency in fraction of fs (default 0)
  • f_hi (float) – lowest frequency in fraction of fs (default 1)
  • fade (float, optional) – length of fade in and out in seconds (default 0)
  • ascending (bool, optional) –
pyroomacoustics.experimental.signals.window(signal, n_win)

window the signal at beginning and end with window of size n_win/2

Module contents
Experimental

A bunch of routines useful when doing measurements and experiments.

pyroomacoustics.experimental.measure_ir(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)

Measures an impulse response by playing a sweep and recording it using the sounddevice package.

Parameters:
  • sweep_length (float, optional) – length of the sweep in seconds
  • sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)
  • fs (int, optional) – sampling frequency (default 48 kHz)
  • f_lo (float, optional) – lowest frequency in the sweep
  • f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2
  • volume (float, optional) – multiply the sweep by this number before playing (default 0.9)
  • pre_delay (float, optional) – delay in second before playing sweep
  • post_delay (float, optional) – delay in second before stopping recording after playing the sweep
  • fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)
  • dev_in (int, optional) – input device number
  • dev_out (int, optional) – output device number
  • channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.
  • channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.
  • ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies
  • deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)
  • plot (bool, optional) – plot the resulting signal
Returns:

Return type:

Returns the impulse response if deconvolution == True and the recorded signal if not

Submodules

pyroomacoustics.acoustics module
pyroomacoustics.acoustics.bands_hz2s(bands_hz, Fs, N, transform='dft')

Converts bands given in Hertz to samples with respect to a given sampling frequency Fs and a transform size N an optional transform type is used to handle DCT case.

pyroomacoustics.acoustics.binning(S, bands)

This function computes the sum of all columns of S in the subbands enumerated in bands

pyroomacoustics.acoustics.critical_bands()

Compute the Critical bands as defined in the book: Psychoacoustics by Zwicker and Fastl. Table 6.1 p. 159

pyroomacoustics.acoustics.invmelscale(b)

Converts from melscale to frequency in Hertz according to Huang-Acero-Hon (6.143)

pyroomacoustics.acoustics.melfilterbank(M, N, fs=1, fl=0.0, fh=0.5)

Returns a filter bank of triangular filters spaced according to mel scale

We follow Huang-Acera-Hon 6.5.2

Parameters:
  • M ((int)) – The number of filters in the bank
  • N ((int)) – The length of the DFT
  • fs ((float) optional) – The sampling frequency (default 8000)
  • fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)
  • fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)
Returns:

Return type:

An M times int(N/2)+1 ndarray that contains one filter per row

pyroomacoustics.acoustics.melscale(f)

Converts f (in Hertz) to the melscale defined according to Huang-Acero-Hon (2.6)

pyroomacoustics.acoustics.mfcc(x, L=128, hop=64, M=14, fs=8000, fl=0.0, fh=0.5)

Computes the Mel-Frequency Cepstrum Coefficients (MFCC) according to the description by Huang-Acera-Hon 6.5.2 (2001) The MFCC are features mimicing the human perception usually used for some learning task.

This function will first split the signal into frames, overlapping or not, and then compute the MFCC for each frame.

Parameters:
  • x ((nd-array)) – Input signal
  • L ((int)) – Frame size (default 128)
  • hop ((int)) – Number of samples to skip between two frames (default 64)
  • M ((int)) – Number of mel-frequency filters (default 14)
  • fs ((int)) – Sampling frequency (default 8000)
  • fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)
  • fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)
Returns:

Return type:

The MFCC of the input signal

pyroomacoustics.acoustics.octave_bands(fc=1000, third=False)

Create a bank of octave bands

Parameters:
  • fc (float, optional) – The center frequency
  • third (bool, optional) – Use third octave bands (default False)
pyroomacoustics.beamforming module
class pyroomacoustics.beamforming.Beamformer(R, fs, N=1024, Lg=None, hop=None, zpf=0, zpb=0)

Bases: pyroomacoustics.beamforming.MicrophoneArray

At some point, in some nice way, the design methods should also go here. Probably with generic arguments.

Parameters:
  • R (numpy.ndarray) – Mics positions
  • fs (int) – Sampling frequency
  • N (int, optional) – Length of FFT, i.e. number of FD beamforming weights, equally spaced. Defaults to 1024.
  • Lg (int, optional) – Length of time-domain filters. Default to N.
  • hop (int, optional) – Hop length for frequency domain processing. Default to N/2.
  • zpf (int, optional) – Front zero padding length for frequency domain processing. Default is 0.
  • zpb (int, optional) – Zero padding length for frequency domain processing. Default is 0.
far_field_weights(phi)

This method computes weight for a far field at infinity

phi: direction of beam

filters_from_weights(non_causal=0.0)

Compute time-domain filters from frequency domain weights.

Parameters:non_causal (float, optional) – ratio of filter coefficients used for non-causal part
plot(sum_ir=False, FD=True)
plot_beam_response()
plot_response_from_point(x, legend=None)
process(FD=False)
rake_delay_and_sum_weights(source, interferer=None, R_n=None, attn=True, ff=False)
rake_distortionless_filters(source, interferer, R_n, delay=0.03, epsilon=0.005)

Compute time-domain filters of a beamformer minimizing noise and interference while forcing a distortionless response towards the source.

rake_max_sinr_filters(source, interferer, R_n, epsilon=0.005, delay=0.0)

Compute the time-domain filters of SINR maximizing beamformer.

rake_max_sinr_weights(source, interferer=None, R_n=None, rcond=0.0, ff=False, attn=True)

This method computes a beamformer focusing on a number of specific sources and ignoring a number of interferers.

INPUTS
  • source : source locations
  • interferer : interferer locations
rake_max_udr_filters(source, interferer=None, R_n=None, delay=0.03, epsilon=0.005)

Compute directly the time-domain filters maximizing the Useful-to-Detrimental Ratio (UDR).

This beamformer is not practical. It maximizes the UDR ratio in the time domain directly without imposing flat response towards the source of interest. This results in severe distortion of the desired signal.

Parameters:
  • source (pyroomacoustics.SoundSource) – the desired source
  • interferer (pyroomacoustics.SoundSource, optional) – the interfering source
  • R_n (ndarray, optional) – the noise covariance matrix, it should be (M * Lg)x(M * Lg) where M is the number of sensors and Lg the filter length
  • delay (float, optional) – the signal delay introduced by the beamformer (default 0.03 s)
  • epsilon (float) –
rake_max_udr_weights(source, interferer=None, R_n=None, ff=False, attn=True)
rake_mvdr_filters(source, interferer, R_n, delay=0.03, epsilon=0.005)

Compute the time-domain filters of the minimum variance distortionless response beamformer.

rake_one_forcing_filters(sources, interferers, R_n, epsilon=0.005)

Compute the time-domain filters of a beamformer with unit response towards multiple sources.

rake_one_forcing_weights(source, interferer=None, R_n=None, ff=False, attn=True)
rake_perceptual_filters(source, interferer=None, R_n=None, delay=0.03, d_relax=0.035, epsilon=0.005)

Compute directly the time-domain filters for a perceptually motivated beamformer. The beamformer minimizes noise and interference, but relaxes the response of the filter within the 30 ms following the delay.

response(phi_list, frequency)
response_from_point(x, frequency)
snr(source, interferer, f, R_n=None, dB=False)
steering_vector_2D(frequency, phi, dist, attn=False)
steering_vector_2D_from_point(frequency, source, attn=True, ff=False)

Creates a steering vector for a particular frequency and source

Parameters:
  • frequency
  • source – location in cartesian coordinates
  • attn – include attenuation factor if True
  • ff – uses far-field distance if true
Returns:

A 2x1 ndarray containing the steering vector.

udr(source, interferer, f, R_n=None, dB=False)
weights_from_filters()
pyroomacoustics.beamforming.H(A, **kwargs)

Returns the conjugate (Hermitian) transpose of a matrix.

class pyroomacoustics.beamforming.MicrophoneArray(R, fs)

Bases: object

Microphone array class.

record(signals, fs)

This simulates the recording of the signals by the microphones. In particular, if the microphones and the room simulation do not use the same sampling frequency, down/up-sampling is done here.

Parameters:
  • signals – An ndarray with as many lines as there are microphones.
  • fs – the sampling frequency of the signals.
to_wav(filename, mono=False, norm=False, bitdepth=<Mock id='140450837159056'>)

Save all the signals to wav files.

Parameters:
  • filename (str) – the name of the file
  • mono (bool, optional) – if true, records only the center channel floor(M / 2) (default False)
  • norm (bool, optional) – if true, normalize the signal to fit in the dynamic range (default False)
  • bitdepth (int, optional) – the format of output samples [np.int8/16/32/64 or np.float (default)]
pyroomacoustics.beamforming.circular_2D_array(center, M, phi0, radius)

Creates an array of uniformly spaced circular points in 2D

Parameters:
  • center (array_like) – The center of the array
  • M (int) – The number of points
  • phi0 (float) – The counterclockwise rotation of the first element in the array (from the x-axis)
  • radius (float) – The radius of the array
Returns:

The array of points

Return type:

ndarray (2, M)

pyroomacoustics.beamforming.distance(x, y)

Computes the distance matrix E.

E[i,j] = sqrt(sum((x[:,i]-y[:,j])**2)). x and y are DxN ndarray containing N D-dimensional vectors.

pyroomacoustics.beamforming.fir_approximation_ls(weights, T, n1, n2)
pyroomacoustics.beamforming.linear_2D_array(center, M, phi, d)

Creates an array of uniformly spaced linear points in 2D

Parameters:
  • center (array_like) – The center of the array
  • M (int) – The number of points
  • phi (float) – The counterclockwise rotation of the array (from the x-axis)
  • d (float) – The distance between neighboring points
Returns:

The array of points

Return type:

ndarray (2, M)

pyroomacoustics.beamforming.mdot(*args)

Left-to-right associative matrix multiplication of multiple 2D ndarrays.

pyroomacoustics.beamforming.poisson_2D_array(center, M, d)

Create array of 2D positions drawn from Poisson process.

Parameters:
  • center (array_like) – The center of the array
  • M (int) – The number of points in the first dimension
  • M – The number of points in the second dimension
  • phi (float) – The counterclockwise rotation of the array (from the x-axis)
  • d (float) – The distance between neighboring points
Returns:

The array of points

Return type:

ndarray (2, M * N)

pyroomacoustics.beamforming.spiral_2D_array(center, M, radius=1.0, divi=3, angle=None)

Generate an array of points placed on a spiral

Parameters:
  • center (array_like) – location of the center of the array
  • M (int) – number of microphones
  • radius (float) – microphones are contained within a cirle of this radius (default 1)
  • divi (int) – number of rotations of the spiral (default 3)
  • angle (float) – the angle offset of the spiral (default random)
Returns:

The array of points

Return type:

ndarray (2, M * N)

pyroomacoustics.beamforming.square_2D_array(center, M, N, phi, d)

Creates an array of uniformly spaced grid points in 2D

Parameters:
  • center (array_like) – The center of the array
  • M (int) – The number of points in the first dimension
  • M – The number of points in the second dimension
  • phi (float) – The counterclockwise rotation of the array (from the x-axis)
  • d (float) – The distance between neighboring points
Returns:

The array of points

Return type:

ndarray (2, M * N)

pyroomacoustics.beamforming.sumcols(A)

Sums the columns of a matrix (np.array).

The output is a 2D np.array of dimensions M x 1.

pyroomacoustics.beamforming.unit_vec2D(phi)
pyroomacoustics.build_rir module
pyroomacoustics.geometry module
pyroomacoustics.geometry.area(corners)

Computes the signed area of a 2D surface represented by its corners.

Parameters:corners – (np.array 2xN, N>2) list of coordinates of the corners forming the surface
Returns:(float) area of the surface positive area means anti-clockwise ordered corners. negative area means clockwise ordered corners.
pyroomacoustics.geometry.ccw3p(p1, p2, p3)

Computes the orientation of three 2D points.

Parameters:
  • p1 – (ndarray size 2) coordinates of a 2D point
  • p2 – (ndarray size 2) coordinates of a 2D point
  • p3 – (ndarray size 2) coordinates of a 2D point
Returns:

(int) orientation of the given triangle 1 if triangle vertices are counter-clockwise -1 if triangle vertices are clockwise 0 if vertices are collinear

Ref:

https://en.wikipedia.org/wiki/Curve_orientation

pyroomacoustics.geometry.intersection_2D_segments(a1, a2, b1, b2)

Computes the intersection between two 2D line segments.

This function computes the intersection between two 2D segments (defined by the coordinates of their endpoints) and returns the coordinates of the intersection point. If there is no intersection, None is returned. If segments are collinear, None is returned. Two booleans are also returned to indicate if the intersection happened at extremities of the segments, which can be useful for limit cases computations.

Parameters:
  • a1 – (ndarray size 2) coordinates of the first endpoint of segment a
  • a2 – (ndarray size 2) coordinates of the second endpoint of segment a
  • b1 – (ndarray size 2) coordinates of the first endpoint of segment b
  • b2 – (ndarray size 2) coordinates of the second endpoint of segment b
Returns:

(tuple of 3 elements) results of the computation (ndarray size 2 or None) coordinates of the intersection point (bool) True if the intersection is at boundaries of segment a (bool) True if the intersection is at boundaries of segment b

pyroomacoustics.geometry.intersection_segment_plane(a1, a2, p, normal)

Computes the intersection between a line segment and a plane in 3D.

This function computes the intersection between a line segment (defined by the coordinates of two points) and a plane (defined by a point belonging to it and a normal vector). If there is no intersection, None is returned. If the segment belongs to the surface, None is returned. A boolean is also returned to indicate if the intersection happened at extremities of the segment, which can be useful for limit cases computations.

Parameters:
  • a1 – (ndarray size 3) coordinates of the first endpoint of the segment
  • a2 – (ndarray size 3) coordinates of the second endpoint of the segment
  • p – (ndarray size 3) coordinates of a point belonging to the plane
  • normal – (ndarray size 3) normal vector of the plane
Returns:

(tuple of 2 elements) results of the computation (ndarray size 3 or None) coordinates of the intersection point (bool) True if the intersection is at boundaries of the segment

pyroomacoustics.geometry.intersection_segment_polygon_surface(a1, a2, corners_2d, normal, plane_point, plane_basis)

Computes the intersection between a line segment and a polygon surface in 3D.

This function computes the intersection between a line segment (defined by the coordinates of two points) and a surface (defined by an array of coordinates of corners of the polygon and a normal vector) If there is no intersection, None is returned. If the segment belongs to the surface, None is returned. Two booleans are also returned to indicate if the intersection happened at extremities of the segment or at a border of the polygon, which can be useful for limit cases computations.

Parameters:
  • a1 – (ndarray size 3) coordinates of the first endpoint of the segment
  • a2 – (ndarray size 3) coordinates of the second endpoint of the segment
  • corners – (ndarray size 3xN, N>2) coordinates of the corners of the polygon
  • normal – (ndarray size 3) normal vector of the surface
Returns:

(tuple of 3 elements) results of the computation (ndarray size 3 or None) coordinates of the intersection point (bool) True if the intersection is at boundaries of the segment (bool) True if the intersection is at boundaries of the polygon

pyroomacoustics.geometry.is_inside_2D_polygon(p, corners)

Checks if a given point is inside a given polygon in 2D.

This function checks if a point (defined by its coordinates) is inside a polygon (defined by an array of coordinates of its corners) by counting the number of intersections between the borders and a segment linking the given point with a computed point outside the polygon. A boolean is also returned to indicate if a point is on a border of the polygon (the point is still considered inside), which can be useful for limit cases computations.

Parameters:
  • p – (ndarray size 2) coordinates of the point
  • corners – (ndarray size 2xN, N>2) coordinates of the corners of the polygon
Returns:

(tuple of 2 elements) results of the computation (bool) True if the point is inside (bool) True if the intersection is at boundaries of the polygon

pyroomacoustics.geometry.side(p, p0, vector)

Compute on which side of a given point an other given point is according to a vector.

Parameters:
  • p – (ndarray size 2 or 3) point to be tested
  • p0 – (ndarray size 2 or 3) origin point
  • vector – (ndarray size 2 or 3) directional vector
Returns:

(int) direction of the point 1 : p is at the side pointed by the vector 0 : p is in the middle (on the same line as p0) -1 : p is at the opposite side of the one pointed by the vector

pyroomacoustics.metrics module
pyroomacoustics.metrics.itakura_saito(x1, x2, sigma2_n, stft_L=128, stft_hop=128)
pyroomacoustics.metrics.median(x, alpha=None, axis=-1, keepdims=False)

Computes 95% confidence interval for the median.

Parameters:
  • x (array_like) – the data array
  • alpha (float, optional) – the confidence level of the interval, confidence intervals are only computed when this argument is provided
  • axis (int, optional) – the axis of the data on which to operate, by default the last axis

:param : :type : returns: A tuple (m, [le, ue]). The confidence interval is [m-le, m+ue].

pyroomacoustics.metrics.mse(x1, x2)

A short hand to compute the mean-squared error of two signals.

\[MSE = \frac{1}{n}\sum_{i=0}^{n-1} (x_i - y_i)^2\]
Parameters:
  • x1 – (ndarray)
  • x2 – (ndarray)
Returns:

(float) The mean of the squared differences of x1 and x2.

pyroomacoustics.metrics.pesq(ref_file, deg_files, Fs=8000, swap=False, wb=False, bin='./bin/pesq')

pesq_vals = pesq(ref_file, deg_files, sample_rate=None, bin=’./bin/pesq’): Computes the perceptual evaluation of speech quality (PESQ) metric of a degraded file with respect to a reference file. Uses the utility obtained from ITU P.862 http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en

Parameters:
  • ref_file – The filename of the reference file.
  • deg_files – A list of degraded sound files names.
  • sample_rate – Sample rates of the sound files [8kHz or 16kHz, default 8kHz].
  • swap – Swap byte orders (whatever that does is not clear to me) [default: False].
  • wb – Use wideband algorithm [default: False].
  • bin – Location of pesq executable [default: ./bin/pesq].
Returns:

(ndarray size 2xN) ndarray containing Raw MOS and MOS LQO in rows 0 and 1, respectively, and has one column per degraded file name in deg_files.

pyroomacoustics.metrics.snr(ref, deg)
pyroomacoustics.multirate module
pyroomacoustics.multirate.frac_delay(delta, N, w_max=0.9, C=4)

Compute optimal fractionnal delay filter according to

Design of Fractional Delay Filters Using Convex Optimization William Putnam and Julius Smith

Parameters:
  • delta – delay of filter in (fractionnal) samples
  • N – number of taps
  • w_max – Bandwidth of the filter (in fraction of pi) (default 0.9)
  • C – sets the number of constraints to C*N (default 4)
pyroomacoustics.multirate.low_pass(numtaps, B, epsilon=0.1)
pyroomacoustics.multirate.resample(x, p, q)
pyroomacoustics.parameters module

This file defines the main physical constants of the system

class pyroomacoustics.parameters.Constants

A class to provide easy access package wide to user settable constants.

Be careful of not using this in tight loops since it uses exceptions.

get(name)
set(name, val)
pyroomacoustics.parameters.calculate_speed_of_sound(t, h, p)

Compute the speed of sound as a function of temperature, humidity and pressure

Parameters:
  • t – temperature [Celsius]
  • h – relative humidity [%]
  • p – atmospheric pressure [kpa]
Returns:

Return type:

Speed of sound in [m/s]

pyroomacoustics.recognition module
class pyroomacoustics.recognition.CircularGaussianEmission(nstates, odim=1, examples=None)
get_pdfs()

Return the pdf of all the emission probabilities

prob_x_given_state(examples)

Recompute the probability of the observation given the state of the latent variables

update_parameters(examples, gamma)
class pyroomacoustics.recognition.GaussianEmission(nstates, odim=1, examples=None)
get_pdfs()

Return the pdf of all the emission probabilities

prob_x_given_state(examples)

Recompute the probability of the observation given the state of the latent variables

update_parameters(examples, gamma)
class pyroomacoustics.recognition.HMM(nstates, emission, model='full', leftright_jump_max=3)

Hidden Markov Model with Gaussian emissions

K

int – Number of states in the model

O

int – Number of dimensions of the Gaussian emission distribution

A

ndarray – KxK transition matrix of the Markov chain

pi

ndarray – K dim vector of the initial probabilities of the Markov chain

emission

(GaussianEmission or CircularGaussianEmission) – An instance of emission_class

model

string, optional – The model used for the chain, can be ‘full’ or ‘left-right’

leftright_jum_max

int, optional – The number of non-zero upper diagonals in a ‘left-right’ model

backward(X, p_x_given_z, c)

The backward recursion for HMM as described in Bishop Ch. 13

fit(examples, tol=0.1, max_iter=10, verbose=False)

Training of the HMM using the EM algorithm

Parameters:
  • examples ((list)) – A list of examples used to train the model. Each example is an array of feature vectors, each row is a feature vector, the sequence runs on axis 0
  • tol ((float)) – The training stops when the progress between to steps is less than this number (default 0.1)
  • max_iter ((int)) – Alternatively the algorithm stops when a maximum number of iterations is reached (default 10)
  • verbose (bool, optional) – When True, prints extra information about convergence
forward(X, p_x_given_z)

The forward recursion for HMM as described in Bishop Ch. 13

generate(N)

Generate a random sample of length N using the model

loglikelihood(X)

Compute the log-likelihood of a sample vector using the sum-product algorithm

update_parameters(examples, gamma, xhi)

Update the parameters of the Markov Chain

viterbi()
pyroomacoustics.soundsource module
class pyroomacoustics.soundsource.SoundSource(position, images=None, damping=None, generators=None, walls=None, orders=None, signal=None, delay=0)

Bases: object

A class to represent sound sources.

This object represents a sound source in a room by a list containing the original source position as well as all the image sources, up to some maximum order.

It also keeps track of the sequence of generated images and the index of the walls (in the original room) that generated the reflection.

add_signal()
distance(ref_point)
get_damping(max_order=None)
get_images(max_order=None, max_distance=None, n_nearest=None, ref_point=None)

Keep this for compatibility Now replaced by the bracket operator and the setOrdering function.

get_rir(mic, visibility, Fs, t0=0.0, t_max=None)

Compute the room impulse response between the source and the microphone whose position is given as an argument.

set_ordering(ordering, ref_point=None)

Set the order in which we retrieve images sources. Can be: ‘nearest’, ‘strongest’, ‘order’ Optional argument: ref_point

wall_sequence(i)

Print the wall sequence for the image source indexed by i

pyroomacoustics.soundsource.build_rir_matrix(mics, sources, Lg, Fs, epsilon=0.005, unit_damping=False)

A function to build the channel matrix for many sources and microphones

Parameters:
  • mics (ndarray) – a dim-by-M ndarray where each column is the position of a microphone
  • sources (list of pyroomacoustics.SoundSource) – list of sound sources for which we want to build the matrix
  • Lg (int) – the length of the beamforming filters
  • Fs (int) – the sampling frequency
  • epsilon (float, optional) – minimum decay of the sinc before truncation. Defaults to epsilon=5e-3
  • unit_damping (bool, optional) – determines if the wall damping parameters are used or not. Default to false.
Returns:

  • the function returns the RIR matrix H =
  • :: – ——————– | H_{11} H_{12} … | … | ——————–
  • where H_{ij} is channel matrix between microphone i and source j.
  • H is of type (M*Lg)x((Lg+Lh-1)*S) where Lh is the channel length (determined by epsilon),
  • and M, S are the number of microphones, sources, respectively.

pyroomacoustics.stft module

Collection of spectral estimation methods.

This module is deprecated. It is replaced by the methods of pyroomacoustics.transform

pyroomacoustics.stft.freqvec(N, fs, centered=False)

Compute the vector of frequencies corresponding to DFT bins. :param N: FFT length :type N: int :param fs: sampling rate of the signal :type fs: int :param shift: False if the DC is at the beginning, True if the DC is centered :type shift: int

pyroomacoustics.stft.istft(X, L, hop, transform=<Mock name='mock.ifft' id='140450837250448'>, win=None, zp_back=0, zp_front=0)
pyroomacoustics.stft.overlap_add(in1, in2, L)
pyroomacoustics.stft.spectroplot(Z, N, hop, fs, fdiv=None, tdiv=None, vmin=None, vmax=None, cmap=None, interpolation='none', colorbar=True)
pyroomacoustics.stft.stft(x, L, hop, transform=<Mock name='mock.fft' id='140450837250192'>, win=None, zp_back=0, zp_front=0)
Parameters:
  • x – input signal
  • L – frame size
  • hop – shift size between frames
  • transform – the transform routine to apply (default FFT)
  • win – the window to apply (default None)
  • zp_back – zero padding to apply at the end of the frame
  • zp_front – zero padding to apply at the beginning of the frame
Returns:

Return type:

The STFT of x

pyroomacoustics.sync module
pyroomacoustics.sync.correlate(x1, x2, interp=1, phat=False)

Compute the cross-correlation between x1 and x2

Parameters:
  • x1,x2 (array_like) – The data arrays
  • interp (int, optional) – The interpolation factor for the output array, default 1.
  • phat (bool, optional) – Apply the PHAT weighting (default False)
Returns:

Return type:

The cross-correlation between the two arrays

pyroomacoustics.sync.delay_estimation(x1, x2, L)

Estimate the delay between x1 and x2. L is the block length used for phat

pyroomacoustics.sync.tdoa(signal, reference, interp=1, phat=False, fs=1, t_max=None)

Estimates the shift of array signal with respect to reference using generalized cross-correlation

Parameters:
  • signal (array_like) – The array whose tdoa is measured
  • reference (array_like) – The reference array
  • interp (int, optional) – The interpolation factor for the output array, default 1.
  • phat (bool, optional) – Apply the PHAT weighting (default False)
  • fs (int or float, optional) – The sampling frequency of the input arrays, default=1
Returns:

Return type:

The estimated delay between the two arrays

pyroomacoustics.sync.time_align(ref, deg, L=4096)

return a copy of deg time-aligned and of same-length as ref. L is the block length used for correlations.

pyroomacoustics.utilities module
pyroomacoustics.utilities.angle_from_points(x1, x2)
pyroomacoustics.utilities.clip(signal, high, low)

Clip a signal from above at high and from below at low.

pyroomacoustics.utilities.compare_plot(signal1, signal2, Fs, fft_size=512, norm=False, equal=False, title1=None, title2=None)
pyroomacoustics.utilities.convmtx(x, n)

Create a convolution matrix H for the vector x of size len(x) times n. Then, the result of np.dot(H,v) where v is a vector of length n is the same as np.convolve(x, v).

pyroomacoustics.utilities.dB(signal, power=False)
pyroomacoustics.utilities.fractional_delay(t0)

Creates a fractional delay filter using a windowed sinc function. The length of the filter is fixed by the module wide constant frac_delay_length (default 81).

Parameters:t0 (float) – The delay in fraction of sample. Typically between 0 and 1.
Returns:
Return type:A fractional delay filter with specified delay.
pyroomacoustics.utilities.fractional_delay_filter_bank(delays)

Creates a fractional delay filter bank of windowed sinc filters

Parameters:delays (1d narray) – The delays corresponding to each filter in fractional samples
Returns:
  • An ndarray where the ith row contains the fractional delay filter
  • corresponding to the ith delay. The number of columns of the matrix
  • is proportional to the maximum delay.
pyroomacoustics.utilities.goertzel(x, k)

Goertzel algorithm to compute DFT coefficients

pyroomacoustics.utilities.highpass(signal, Fs, fc=None, plot=False)

Filter out the really low frequencies, default is below 50Hz

pyroomacoustics.utilities.levinson(r, b)

Solve a system of the form Rx=b where R is hermitian toeplitz matrix and b is any vector using the generalized Levinson recursion as described in M.H. Hayes, Statistical Signal Processing and Modelling, p. 268.

Parameters:
  • r – First column of R, toeplitz hermitian matrix.
  • b – The right-hand argument. If b is a matrix, the system is solved for every column vector in b.
Returns:

Return type:

The solution of the linear system Rx = b.

pyroomacoustics.utilities.low_pass_dirac(t0, alpha, Fs, N)

Creates a vector containing a lowpass Dirac of duration T sampled at Fs with delay t0 and attenuation alpha.

If t0 and alpha are 2D column vectors of the same size, then the function returns a matrix with each line corresponding to pair of t0/alpha values.

pyroomacoustics.utilities.normalize(signal, bits=None)

normalize to be in a given range. The default is to normalize the maximum amplitude to be one. An optional argument allows to normalize the signal to be within the range of a given signed integer representation of bits.

pyroomacoustics.utilities.normalize_pwr(sig1, sig2)

Normalize sig1 to have the same power as sig2.

pyroomacoustics.utilities.prony(x, p, q)

Prony’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154

Parameters:
  • x – signal to model
  • p – order of denominator
  • q – order of numerator
Returns:

  • a – numerator coefficients
  • b – denominator coefficients
  • err (the squared error of approximation)

pyroomacoustics.utilities.real_spectrum(signal, axis=-1, **kwargs)
pyroomacoustics.utilities.shanks(x, p, q)

Shank’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154

Parameters:
  • x – signal to model
  • p – order of denominator
  • q – order of numerator
Returns:

  • a – numerator coefficients
  • b – denominator coefficients
  • err – the squared error of approximation

pyroomacoustics.utilities.spectrum(signal, Fs, N)
pyroomacoustics.utilities.time_dB(signal, Fs, bits=16)

Compute the signed dB amplitude of the oscillating signal normalized wrt the number of bits used for the signal.

pyroomacoustics.utilities.to_16b(signal)

converts float 32 bit signal (-1 to 1) to a signed 16 bits representation No clipping in performed, you are responsible to ensure signal is within the correct interval.

pyroomacoustics.wall module
class pyroomacoustics.wall.Wall(corners, absorption=1.0, name=None)

Bases: object

This class represents a wall instance. A room instance is formed by these.

Attribute corners:
 (np.array dim 2x2 or 3xN, N>2) endpoints forming the wall
Attribute absorption:
 (float) attenuation reflection factor
Attribute name:(string) name given to the wall, which can be reused to reference it in the Room object
Attribute normal:
 (np.array dim 2 or 3) normal vector pointing outward the room
Attribute dim:(int) dimension of the wall (2 or 3, meaning 2D or 3D)
intersection(p1, p2)

Returns the intersection point between the wall and a line segment.

Parameters:
  • p1 – (np.array dim 2 or 3) first end point of the line segment
  • p2 – (np.array dim 2 or 3) second end point of the line segment
Returns:

(np.array dim 2 or 3 or None) intersection point between the wall and the line segment

intersects(p1, p2)

Tests if the given line segment intersects the wall.

Parameters:
  • p1 – (ndarray size 2 or 3) first endpoint of the line segment
  • p2 – (ndarray size 2 or 3) second endpoint of the line segment
Returns:

(tuple size 3) (bool) True if the line segment intersects the wall (bool) True if the intersection happens at a border of the wall (bool) True if the intersection happens at the extremity of the segment

side(p)

Computes on which side of the wall the point p is.

Parameters:p – (np.array dim 2 or 3) coordinates of the point
Returns:(int) integer representing on which side the point is -1 : opposite to the normal vector (going inside the room) 0 : on the wall 1 : in the direction of the normal vector (going outside of the room)
pyroomacoustics.windows module

A collection of windowing functions.

pyroomacoustics.windows.blackman_harris(N, flag='asymmetric', length='full')

The Hann window function

\[w[n] = a_0 - a_1 \cos(2\pi n/M) + a_2 \cos(4\pi n/M) + a_3 \cos(6\pi n/M), n=0,\ldots,N-1\]
Parameters:
  • N (int) – the window length
  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
    • symmetric: the window is symmetric (\(M=N-1\))
  • length (string, optional) –

    Possible values

    • full: the full length window is computed
    • right: the right half of the window is computed
    • left: the left half of the window is computed
pyroomacoustics.windows.cosine(N, flag='asymmetric', length='full')

The cosine window function

\[w[n] = \cos(\pi (n/M - 0.5))^2\]
Parameters:
  • N (int) – the window length
  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
    • symmetric: the window is symmetric (\(M=N-1\))
    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
  • length (string, optional) –

    Possible values

    • full: the full length window is computed
    • right: the right half of the window is computed
    • left: the left half of the window is computed
pyroomacoustics.windows.hann(N, flag='asymmetric', length='full')

The Hann window function

\[w[n] = 0.5 (1 - \cos(2 \pi n / M)), n=0,\ldots,N-1\]
Parameters:
  • N (int) – the window length
  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
    • symmetric: the window is symmetric (\(M=N-1\))
    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
  • length (string, optional) –

    Possible values

    • full: the full length window is computed
    • right: the right half of the window is computed
    • left: the left half of the window is computed
pyroomacoustics.windows.rect(N)

The rectangular window

\[w[n] = 1, n=0,\ldots,N-1\]
Parameters:N (int) – the window length
pyroomacoustics.windows.triang(N, flag='asymmetric', length='full')

The triangular window function

\[w[n] = 1 - | 2 n / M - 1 |, n=0,\ldots,N-1\]
Parameters:
  • N (int) – the window length
  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
    • symmetric: the window is symmetric (\(M=N-1\))
    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
  • length (string, optional) –

    Possible values

    • full: the full length window is computed
    • right: the right half of the window is computed
    • left: the left half of the window is computed

Module contents

pyroomacoustics
Provides
  1. Room impulse simulations via the image source model
  2. Simulation of sound propagation using STFT engine
  3. Reference implementations of popular algorithms for
  • beamforming
  • direction of arrival
  • adaptive filtering
  • etc
How to use the documentation

Documentation is available in two forms: docstrings provided with the code, and a loose standing reference guide, available from the pyroomacoustics readthedocs page.

We recommend exploring the docstrings using IPython, an advanced Python shell with TAB-completion and introspection capabilities. See below for further instructions.

The docstring examples assume that pyroomacoustics has been imported as pra:

>>> import pyroomacoustics as pra

Code snippets are indicated by three greater-than signs:

>>> x = 42
>>> x = x + 1

Use the built-in help function to view a function’s docstring:

>>> help(pra.stft.STFT)
... 
Available submodules
pyroomacoustics.acoustics
Acoustics and psychoacoustics routines, mel-scale, critcal bands, etc.
pyroomacoustics.beamforming
Microphone arrays and beamforming routines.
pyroomacoustics.geometry
Core geometry routine for the image source model.
pyroomacoustics.metrics
Performance metrics like mean-squared error, median, Itakura-Saito, etc.
pyroomacoustics.multirate
Rate conversion routines.
pyroomacoustics.parameters
Global parameters, i.e. for physics constants.
pyroomacoustics.recognition
Hidden Markov Model and TIMIT database structure.
pyroomacoustics.room
Abstraction of room and image source model.
pyroomacoustics.soundsource
Abstraction for a sound source.
pyroomacoustics.stft
Deprecated Replaced by the methods in pyroomacoustics.transform
pyroomacoustics.sync
A few routines to help synchronize signals.
pyroomacoustics.utilities
A bunch of routines to do various stuff.
pyroomacoustics.wall
Abstraction for walls of a room.
pyroomacoustics.windows
Tapering windows for spectral analysis.
Available subpackages
pyroomacoustics.adaptive
Adaptive filter algorithms
pyroomacoustics.bss
Blind source separation.
pyroomacoustics.datasets
Wrappers around a few popular speech datasets
pyroomacoustics.denoise
Single channel noise reduction methods
pyroomacoustics.doa
Direction of arrival finding algorithms
pyroomacoustics.transform
Block frequency domain processing tools
Utilities
__version__
pyroomacoustics version string

Indices and tables