Pyroomacoustics¶
Summary¶
Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms; a fast C implementation of the image source model for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers; and finally, reference implementations of popular algorithms for beamforming, direction finding, and adaptive filtering. Together, they form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step.
Room Acoustics Simulation¶
Consider the following scenario.
Suppose, for example, you wanted to produce a radio crime drama, and it so happens that, according to the scriptwriter, the story line absolutely must culminate in a satanic mass that quickly degenerates into a violent shootout, all taking place right around the altar of the highly reverberant acoustic environment of Oxford’s Christ Church cathedral. To ensure that it sounds authentic, you asked the Dean of Christ Church for permission to record the final scene inside the cathedral, but somehow he fails to be convinced of the artistic merit of your production, and declines to give you permission. But recorded in a conventional studio, the scene sounds flat. So what do you do?
—Schnupp, Nelken, and King, Auditory Neuroscience, 2010
Faced with this difficult situation, pyroomacoustics can save the day by simulating the environment of the Christ Church cathedral!
At the core of the package is a room impulse response (RIR) generator based on the image source model that can handle
- Convex and non-convex rooms
- 2D/3D rooms
Both a pure python implementation and a C accelerator are included for maximum speed and compatibility.
The philosophy of the package is to abstract all necessary elements of an experiment using object oriented programming concept. Each of these elements is represented using a class and an experiment can be designed by combining these elements just as one would do in a real experiment.
Let’s imagine we want to simulate a delay-and-sum beamformer that uses a linear array with four microphones in a shoe box shaped room that contains only one source of sound. First, we create a room object, to which we add a microphone array object, and a sound source object. Then, the room object has methods to compute the RIR between source and receiver. The beamformer object then extends the microphone array class and has different methods to compute the weights, for example delay-and-sum weights. See the example below to get an idea of what the code looks like.
The Room class also allows one to process sound samples emitted by sources, effectively simulating the propagation of sound between sources and microphones. At the input of the microphones composing the beamformer, an STFT (short time Fourier transform) engine allows to quickly process the signals through the beamformer and evaluate the output.
Reference Implementations¶
In addition to its core image source model simulation, pyroomacoustics also contains a number of reference implementations of popular audio processing algorithms for
- beamforming
- direction of arrival (DOA) finding
- adaptive filtering (NLMS, RLS)
- blind source separation (AuxIVA, Trinicon)
We use an object-oriented approach to abstract the details of specific algorithms, making them easy to compare. Each algorithm can be tuned through optional parameters. We have tried to pre-set values for the tuning parameters so that a run with the default values will in general produce reasonable results.
Quick Install¶
Install the package with pip:
$ pip install pyroomacoustics
The requirements are:
* numpy
* scipy
* matplotlib
Example¶
Here is a quick example of how to create and visual the response of a beamformer in a room.
import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# Create a 4 by 6 metres shoe box room
room = pra.ShoeBox([4,6])
# Add a source somewhere in the room
room.add_source([2.5, 4.5])
# Create a linear array beamformer with 4 microphones
# with angle 0 degrees and inter mic distance 10 cm
R = pra.linear_2D_array([2, 1.5], 4, 0, 0.04)
room.add_microphone_array(pra.Beamformer(R, room.fs))
# Now compute the delay and sum weights for the beamformer
room.mic_array.rake_delay_and_sum_weights(room.sources[0][:1])
# plot the room and resulting beamformer
room.plot(freq=[1000, 2000, 4000, 8000], img_order=0)
plt.show()
A comprehensive set of examples covering most of the functionalities
of the package can be found in the examples
folder of the github
repository.
Authors¶
- Robin Scheibler
- Ivan Dokmanić
- Sidney Barthe
- Eric Bezzam
- Hanjie Pan
How to contribute¶
If you would like to contribute, please clone the repository and send a pull request.
For more details, see our CONTRIBUTING page.
Academic publications¶
This package was developed to support academic publications. The package contains implementations for DOA algorithms and acoustic beamformers introduced in the following papers.
- H. Pan, R. Scheibler, I. Dokmanic, E. Bezzam and M. Vetterli. FRIDA: FRI-based DOA estimation for arbitrary array layout, ICASSP 2017, New Orleans, USA, 2017.
- I. Dokmanić, R. Scheibler and M. Vetterli. Raking the Cocktail Party, in IEEE Journal of Selected Topics in Signal Processing, vol. 9, num. 5, p. 825 - 836, 2015.
- R. Scheibler, I. Dokmanić and M. Vetterli. Raking Echoes in the Time Domain, ICASSP 2015, Brisbane, Australia, 2015.
If you use this package in your own research, please cite our paper describing it.
R. Scheibler, E. Bezzam, I. Dokmanić, Pyroomacoustics: A Python package for audio room simulations and array processing algorithms, Proc. IEEE ICASSP, Calgary, CA, 2018.
License¶
Copyright (c) 2014-2017 EPFL-LCAV
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Table of contents¶
Contributing¶
If you want to contribute to pyroomacoustics
and make it better,
your help is very welcome. Contributing is also a great way to learn
more about the package itself.
Ways to contribute¶
- File bug reports
- Improvements to the documentation are always more than welcome. Keeping a good clean documentation is a challenging task and any help is appreciated.
- Feature requests
- If you implemented an extra DOA/adaptive filter/beamforming algorithm: that’s awesome! We’d love to add it to the package.
- Suggestion of improvements to the code base are also welcome.
Coding style¶
We try to stick to PEP8 as much as possible. Variables, functions, modules and packages should be in lowercase with underscores. Class names in CamelCase.
Documentation¶
Docstrings should follow the numpydoc style.
We recommend the following steps for generating the documentation:
- Create a separate environment, e.g. with Anaconda, as such:
conda create -n mkdocs27 python=2.7 sphinx numpydoc mock
- Switch to the environment:
source activate mkdocs27
- Install the theme for ReadTheDocs:
pip install sphinx-rtd-theme
- Navigate to the
docs
folder and run:./make_apidoc.sh
- Build and view the documentation locally with:
make html
- Open in your browser:
docs/_build/html/index.html
Unit Tests¶
As much as possible, for every new function added to the code base, add
a short test script in pyroomacoustics/tests
. The names of the
script and the functions running the test should be prefixed by
test_
. The tests are started by running nosetests
at the root of
the package.
How to make a clean pull request¶
Look for a project’s contribution instructions. If there are any, follow them.
- Create a personal fork of the project on Github.
- Clone the fork on your local machine. Your remote repo on Github is
called
origin
. - Add the original repository as a remote called
upstream
. - If you created your fork a while ago be sure to pull upstream changes into your local repository.
- Create a new branch to work on! Branch from
develop
if it exists, else frommaster
. - Implement/fix your feature, comment your code.
- Follow the code style of the project, including indentation.
- If the project has tests run them!
- Write or adapt tests as needed.
- Add or change the documentation as needed.
- Squash your commits into a single commit with git’s interactive rebase. Create a new branch if necessary.
- Push your branch to your fork on Github, the remote
origin
. - From your fork open a pull request in the correct branch. Target the
project’s
develop
branch if there is one, else go formaster
! - …
- If the maintainer requests further changes just push them to your branch. The PR will be updated automatically.
- Once the pull request is approved and merged you can pull the changes
from
upstream
to your local repo and delete your extra branch(es).
And last but not least: Always write your commit messages in the present tense. Your commit message should describe what the commit, when applied, does to the code – not what you did to the code.
Reference¶
This guide is based on the nice template by @MarcDiethelm available under MIT License.
Changelog¶
All notable changes to pyroomacoustics will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased¶
Added¶
- Added noise reduction sub-package
denoise
with spectral subtraction class and example. - Renamed
realtime
totransform
and added deprecation warning. - Added a cython function to efficiently compute the fractional delays in the room impulse response from time delays and attenuations
- notebooks folder.
- Demo IPython notebook (with WAV files) of several features of the package.
- Wrapper for Google’s Speech Command Dataset and an example usage script in
examples
. - Lots of new features in the
pyroomacoustics.realtime
subpackage- The
STFT
class can now be used both for frame-by-frame processing or for bulk processing - The functionality will replace the methods
pyroomacoustics.stft
,pyroomacoustics.istft
,pyroomacoustics.overlap_add
, etc, - The new function
pyroomacoustics.realtime.compute_synthesis_window
computes the optimal synthesis window given an analysis window and the frame shift - Extensive tests for the
pyroomacoustics.realtime
module - Convenience functions
pyroomacoustics.realtime.analysis
andpyroomacoustics.realtime.synthesis
with an interface similar topyroomacoustics.stft
andpyroomacoustics.istft
(which are now deprecated and will disappear soon) - The ordering of axis in the output from bulk STFT is now
(n_frames, n_frequencies, n_channels)
- Support for Intel’s
mkl_fft
package axis
(along which to perform DFT) andbits
parameters forDFT
class.
- The
Changed¶
- Improved documentation and docstrings
- Using now the built-in RIR generator in examples/doa_algorithms.py
- Improved the download/uncompress function for large datasets
- Dusted the code for plotting on the sphere in
pyroomacoustics.doa.grid.GridSphere
Deprecation Notice¶
- The methods
pyroomacoustics.stft
,pyroomacoustics.istft
,pyroomacoustics.overlap_add
, etc, are now deprecated and will be removed in the near future
0.1.18 - 2018-04-24¶
Added¶
- Added AuxIVA (independent vector analysis) to
bss
subpackage. - Added BSS IVA example
Changed¶
- Moved Trinicon blind source separation algorithm to
bss
subpackage.
Bugfix¶
- Correct a bug that causes 1st order sources to be generated for max_order==0 in pure python code
0.1.17 - 2018-03-23¶
Bugfix¶
- Fixed issue #22 on github. Added INCREF before returning Py_None in C extension.
0.1.16 - 2018-03-06¶
Added¶
- Base classes for Dataset and Sample in
pyroomacoustics.datasets
- Methods to filter datasets according to the metadata of samples
- Deprecation warning for the TimitCorpus interface
Changed¶
- Add list of speakers and sentences from CMU ARCTIC
- CMUArcticDatabase basedir is now the top directory where CMU_ARCTIC database should be saved. Not the directory above as it previously was.
- Libroom C extension is now a proper package. It can be imported.
- Libroom C extension now compiles on windows with python>=3.5.
Room Simulation¶
Room¶
The three main classes are pyroomacoustics.room.Room
,
pyroomacoustics.soundsource.SoundSource
, and
pyroomacoustics.beamforming.MicrophoneArray
. On a high level, a
simulation scenario is created by first defining a room to which a few sound
sources and a microphone array are attached. The actual audio is attached to
the source as raw audio samples. The image source method (ISM) is then used to
find all image sources up to a maximum specified order and room impulse
responses (RIR) are generated from their positions. The microphone signals are
then created by convolving the audio samples associated to sources with the
appropriate RIR. Since the simulation is done on discrete-time signals, a
sampling frequency is specified for the room and the sources it contains.
Microphones can optionally operate at a different sampling frequency; a rate
conversion is done in this case.
Simulating a Shoebox Room¶
We will first walk through the steps to simulate a shoebox-shaped room in 3D.
Create the room¶
So-called shoebox rooms are pallelepipedic rooms with 4 or 6 walls (in 2D and 3D,
respectiely), all at right angles. They are defined by a single vector that contains
the lengths of the walls. They have the advantage of being simple to define and very
efficient to simulate. A 9m x 7.5m x 3.5m
room is simply defined like this
import pyroomacoustics as pra
room = pra.ShoeBox([9, 7.5, 3.5], fs=16000, absorption=0.35, max_order=17)
The second argument is the sampling frequency at which the RIR will be
generated. Note that the default value of fs
is 8 kHz. The third argument
is the absorption of the walls, namely reflections are multiplied by (1 -
absorption)
for every wall they hit. The fourth argument is the maximum
number of reflections allowed in the ISM.
The relationship between absorption
/max_order
and reverberation time (the T60 or RT60 in the
acoustics literature) is not straightforward. Sabine’s formula can be used to
some extent to set these parameters.
Add sources and microphones¶
Sources are fairly straighforward to create. They take their location as single
mandatory argument, and a signal and start time as optional arguments. Here we
create a source located at [2.5, 3.73, 1.76]
within the room, that will utter
the content of the wav file speech.wav
starting at 1.3 s
into the simulation.
# import a mono wavfile as the source signal
# the sampling frequency should match that of the room
from scipy.io import wavfile
_, audio = wavfile.read('speech.wav')
my_source = pra.SoundSource([2.5, 3.73, 1.76], signal=audio, delay=1.3)
# place the source in the room
room.add_source(my_source)
The locations of the microphones in the array should be provided in a numpy
nd-array
of size (ndim, nmics)
, that is each column contains the
coordinates of one microphone. This array is used to construct a
pyroomacoustics.beamforming.MicrophoneArray
object, together with the
sampling frequency for the microphone. Note that it can be different from that
of the room, in which case resampling will occur. Here, we create an array
with two microphones placed at [6.3, 4.87, 1.2]
and [6.3, 4.93, 1.2]
.
# define the location of the array
import numpy as np
R = np.c_[
[6.3, 4.87, 1.2], # mic 1
[6.3, 4.93, 1.2], # mic 2
]
# the fs of the microphones is the same as the room
mic_array = pra.MicrophoneArray(R, room.fs)
# finally place the array in the room
room.add_microphone_array(mic_array)
A number of routines exist to create regular array geometries in 2D.
Create the Room Impulse Response¶
At this point, the RIRs are simply created by invoking the ISM via
pyroomacoustics.room.Room.image_source_model()
. This function will
generate all the images sources up to the order required and use them to
generate the RIRs, which will be stored in the rir
attribute of room
.
The attribute rir
is a list of lists so that the outer list is on microphones
and the inner list over sources.
room.compute_rir()
# plot the RIR between mic 1 and source 0
import matplotlib.pyplot as plt
plt.plot(room.rir[1][0])
plt.show()
Simulate sound propagation¶
By calling pyroomacoustics.room.Room.simulate()
, a convolution of the
signal of each source (if not None
) will be performed with the
corresponding room impulse response. The output from the convolutions will be summed up
at the microphones. The result is stored in the signals
attribute of room.mic_array
with each row corresponding to one microphone.
room.simulate()
# plot signal at microphone 1
plt.plot(room.mic_array.signals[1,:])
Example
import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# Create a 4 by 6 metres shoe box room
room = pra.ShoeBox([4,6])
# Add a source somewhere in the room
room.add_source([2.5, 4.5])
# Create a linear array beamformer with 4 microphones
# with angle 0 degrees and inter mic distance 10 cm
R = pra.linear_2D_array([2, 1.5], 4, 0, 0.04)
room.add_microphone_array(pra.Beamformer(R, room.fs))
# Now compute the delay and sum weights for the beamformer
room.mic_array.rake_delay_and_sum_weights(room.sources[0][:1])
# plot the room and resulting beamformer
room.plot(freq=[1000, 2000, 4000, 8000], img_order=0)
plt.show()
-
class
pyroomacoustics.room.
Room
(walls, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None)¶ Bases:
object
A Room object has as attributes a collection of
pyroomacoustics.wall.Wall
objects, apyroomacoustics.beamforming.MicrophoneArray
array, and a list ofpyroomacoustics.soundsource.SoundSource
. The room can be two dimensional (2D), in which case the walls are simply line segments. A factory methodpyroomacoustics.room.Room.from_corners()
can be used to create the room from a polygon. In three dimensions (3D), the walls are two dimensional polygons, namely a collection of points lying on a common plane. Creating rooms in 3D is more tedious and for convenience a methodpyroomacoustics.room.Room.extrude()
is provided to lift a 2D room into 3D space by adding vertical walls and a parallel $B!H(Bceiling$B!I(B (see Figure 4b).The Room is sub-classed by :py:obj:pyroomacoustics.room.ShoeBox` which creates a rectangular (2D) or parallelepipedic (3D) room. Such rooms benefit from an efficient algorithm for the image source method.
Attribute walls: (Wall array) list of walls forming the room Attribute fs: (int) sampling frequency Attribute t0: (float) time offset Attribute max_order: (int) the maximum computed order for images Attribute sigma2_awgn: (float) ambient additive white gaussian noise level Attribute sources: (SoundSource array) list of sound sources Attribute mics: (MicrophoneArray) array of microphones Attribute normals: (numpy.ndarray 2xN or 3xN, N=number of walls) array containing normal vector for each wall, used for calculations Attribute corners: (numpy.ndarray 2xN or 3xN, N=number of walls) array containing a point belonging to each wall, used for calculations Attribute absorption: (numpy.ndarray size N, N=number of walls) array containing the absorption factor for each wall, used for calculations Attribute dim: (int) dimension of the room (2 or 3 meaning 2D or 3D) Attribute wallsId: (int dictionary) stores the mapping “wall name -> wall id (in the array walls)” -
add_microphone_array
(micArray)¶
-
add_source
(position, signal=None, delay=0)¶
-
check_visibility_for_all_images
(source, p, use_libroom=True)¶ Checks visibility from a given point for all images of the given source.
This function tests visibility for all images of the source and returns the results in an array.
Parameters: - source – (SoundSource) the sound source object (containing all its images)
- p – (np.array size 2 or 3) coordinates of the point where we check visibility
Returns: (int array) list of results of visibility for each image -1 : unchecked (only during execution of the function) 0 (False) : not visible 1 (True) : visible
-
compute_rir
()¶ Compute the room impulse response between every source and microphone
-
convex_hull
()¶ Finds the walls that are not in the convex hull
-
direct_snr
(x, source=0)¶ Computes the direct Signal-to-Noise Ratio
-
extrude
(height, v_vec=None, absorption=0.0)¶ Creates a 3D room by extruding a 2D polygon. The polygon is typically the floor of the room and will have z-coordinate zero. The ceiling
Parameters: - height (float) – The extrusion height
- v_vec (array-like 1D length 3, optionnal) – A unit vector. An orientation for the extrusion direction. The ceiling will be placed as a translation of the floor with respect to this vector (The default is [0,0,1]).
- absorption (float or array-like) – Absorption coefficients for all the walls. If a scalar, then all the walls will have the same absorption. If an array is given, it should have as many elements as there will be walls, that is the number of vertices of the polygon plus two. The two last elements are for the floor and the ceiling, respectively. (default 1)
-
first_order_images
(source_position)¶
-
classmethod
from_corners
(corners, absorption=0.0, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None)¶ Creates a 2D room by giving an array of corners.
Parameters: - corners – (np.array dim 2xN, N>2) list of corners, must be antiClockwise oriented
- absorption – (float array or float) list of absorption factor for each wall or single value for all walls
Returns: (Room) instance of a 2D room
-
get_bbox
()¶ Returns a bounding box for the room
-
get_wall_by_name
(name)¶ Returns the instance of the wall by giving its name.
Parameters: name – (string) name of the wall Returns: (Wall) instance of the wall with this name
-
image_source_model
(use_libroom=True)¶
-
is_inside
(p, include_borders=True)¶ Checks if the given point is inside the room.
Parameters: - p (array_like, length 2 or 3) – point to be tested
- include_borders (bool, optional) – set true if a point on the wall must be considered inside the room
Returns: Return type: True if the given point is inside the room, False otherwise.
-
is_obstructed
(source, p, imageId=0)¶ Checks if there is a wall obstructing the line of sight going from a source to a point.
Parameters: - source – (SoundSource) the sound source (containing all its images)
- p – (np.array size 2 or 3) coordinates of the point where we check obstruction
- imageId – (int) id of the image within the SoundSource object
Returns: (bool) False (0) : not obstructed True (1) : obstructed
-
is_visible
(source, p, imageId=0)¶ Returns true if the given sound source (with image source id) is visible from point p.
Parameters: - source – (SoundSource) the sound source (containing all its images)
- p – (np.array size 2 or 3) coordinates of the point where we check visibility
- imageId – (int) id of the image within the SoundSource object
Returns: (bool) False (0) : not visible True (1) : visible
-
make_c_room
()¶ Wrapper around the C libroom
-
plot
(img_order=None, freq=None, figsize=None, no_axis=False, mic_marker_size=10, **kwargs)¶ Plots the room with its walls, microphones, sources and images
-
plot_rir
(FD=False)¶
-
print_wall_sequences
(source)¶
-
simulate
(recompute_rir=False)¶ Simulates the microphone signal at every microphone in the array
-
-
class
pyroomacoustics.room.
ShoeBox
(p, fs=8000, t0=0.0, absorption=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None)¶ Bases:
pyroomacoustics.room.Room
This class extends room for shoebox room in 3D space.
-
extrude
(height)¶ Overload the extrude method from 3D rooms
-
Dataset Wrappers¶
Module contents¶
The Datasets Sub-package is responsible to deliver wrappers around a few popular audio datasets to make them easier to use.
Two base class pyroomacoustics.datasets.base.Dataset
and
pyroomacoustics.datasets.base.Sample
wrap
together the audio samples and their meta data.
The general idea is to create a sample object with an attribute
containing all metadata. Dataset objects that have a collection
of samples can then be created and can be filtered according
to the values in the metadata.
Many of the functions with match
or filter
will take an
arbitrary number of keyword arguments. The keys should match some
metadata in the samples. Then there are three ways that match occurs
between a key/value
pair and an attribute
sharing the same key.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
Example 1¶
# Prepare a few artificial samples
samples = [
{
'data' : 0.99,
'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'one' },
},
{
'data' : 2.1,
'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'two' },
},
{
'data' : 1.02,
'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'one' },
},
{
'data' : 2.07,
'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'two' },
},
]
corpus = Dataset()
for s in samples:
new_sample = Sample(s['data'], **s['metadata'])
corpus.add_sample(new_sample)
# Then, it possible to display summary info about the corpus
print(corpus)
# The number of samples in the corpus is given by ``len``
print('Number of samples:', len(corpus))
# And we can access samples with the slice operator
print('Sample #2:')
print(corpus[2]) # (shortcut for `corpus.samples[2]`)
# We can obtain a new corpus with only male subject
corpus_male_only = corpus.filter(sex='male')
print(corpus_male_only)
# Only retain speakers above 40 years old
corpus_older = corpus.filter(age=lambda a : a > 40)
print(corpus_older)
Example 2 (CMU ARCTIC)¶
# This example involves the CMU ARCTIC corpus available at
# http://www.festvox.org/cmu_arctic/
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# Here, the corpus for speaker bdl is automatically downloaded
# if it is not available already
corpus = pra.datasets.CMUArcticCorpus(download=True, speaker=['bdl'])
# print dataset info and 10 sentences
print(corpus)
corpus.head(n=10)
# let's extract all samples containing the word 'what'
keyword = 'what'
matches = corpus.filter(text=lambda t : keyword in t)
print('The number of sentences containing "{}": {}'.format(keyword, len(matches)))
for s in matches.sentences:
print(' *', s)
# if the sounddevice package is available, we can play the sample
matches[0].play()
# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()
Example 3 (Google’s Speech Commands Dataset)¶
# This example involves Google's Speech Commands Dataset available at
# https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# The dataset is automatically downloaded if not available and 10 of each word is selected
dataset = pra.datasets.GoogleSpeechCommands(download=True, subset=10, seed=0)
# print dataset info, first 10 entries, and all sounds
print(dataset)
dataset.head(n=10)
print("All sounds in the dataset:")
print(dataset.classes)
# filter by specific word
selected_word = 'yes'
matches = dataset.filter(word=selected_word)
print("Number of '%s' samples : %d" % (selected_word, len(matches)))
# if the sounddevice package is available, we can play the sample
matches[0].play()
# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()
Datasets Available¶
CMU ARCTIC Corpus¶
The CMU ARCTIC Dataset¶
The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US English single speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.
The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databses include US English male (bdl) and female (slt) speakers (both experinced voice talent) as well as other accented speakers.
The 1132 sentence prompt list is available from cmuarctic.data
The distributions include 16KHz waveform and simultaneous EGG signals. Full phoentically labelling was perfromed by the CMU Sphinx using the FestVox based labelling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labelling etc.
License: Permissive, attribution required
Price: Free
URL: http://www.festvox.org/cmu_arctic/
-
class
pyroomacoustics.datasets.cmu_arctic.
CMUArcticCorpus
(basedir=None, download=False, build=True, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.Dataset
This class will load the CMU ARCTIC corpus in a structure amenable to be processed.
-
basedir
¶ str, option – The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.
-
info
¶ dict – A dictionary whose keys are the labels of metadata fields attached to the samples. The values are lists of all distinct values the field takes.
-
sentences
¶ list of CMUArcticSentence – The list of all utterances in the corpus
Parameters: - basedir (str, optional) – The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.
- download (bool, optional) – If the corpus does not exist, download it.
- speaker (str or list of str, optional) – A list of the CMU ARCTIC speakers labels. If provided, only those speakers are loaded. By default, all speakers are loaded.
- sex (str or list of str, optional) – Can be ‘female’ or ‘male’
- lang (str or list of str, optional) – The language, only ‘English’ is available here
- accent (str of list of str, optional) – The accent of the speaker
-
build_corpus
(**kwargs)¶ Build the corpus with some filters (sex, lang, accent, sentence_tag, sentence)
-
filter
(**kwargs)¶ Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
-
-
class
pyroomacoustics.datasets.cmu_arctic.
CMUArcticSentence
(path, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.AudioSample
Create the sentence object
Parameters: - path (str) – the path to the audio file
- **kwargs – metadata as a list of keyword arguments
-
data
¶ array_like – The actual audio signal
-
fs
¶ int – sampling frequency
-
plot
(**kwargs)¶ Plot the spectrogram
Google Speech Commands¶
Google’s Speech Commands Dataset¶
The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.
More info about the dataset can be found at the link below:
https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
AIY website for contributing recordings:
https://aiyprojects.withgoogle.com/open_speech_recording
Tutorial on creating a word classifier:
https://www.tensorflow.org/versions/master/tutorials/audio_recognition
-
class
pyroomacoustics.datasets.google_speech_commands.
GoogleSample
(path, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.AudioSample
Create the sound object.
Parameters: - path (str) – the path to the audio file
- **kwargs – metadata as a list of keyword arguments
-
data
¶ array_like – the actual audio signal
-
fs
¶ int – sampling frequency
-
plot
(**kwargs)¶ Plot the spectogram
-
class
pyroomacoustics.datasets.google_speech_commands.
GoogleSpeechCommands
(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.Dataset
This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.
-
basedir
¶ str – The directory where the Speech Commands Dataset is located/downloaded.
-
size_by_samples
¶ dict – A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.
-
subdirs
¶ list – The list of subdirectories in
basedir
, where each sound type is the name of a subdirectory.
-
classes
¶ list – The list of all sounds, same as the keys of
size_by_samples
.
Parameters: - basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.
- download (bool, optional) – If the corpus does not exist, download it.
- build (bool, optional) – Whether or not to build the dataset. By default, it is.
- subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.
- seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default,
seed=0
.
-
build_corpus
(subset=None, **kwargs)¶ Build the corpus with some filters (speech or not speech, sound type).
-
filter
(**kwargs)¶ Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
-
TIMIT Corpus¶
The TIMIT Dataset¶
The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).
The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.
Unfortunately, this is a proprietary dataset. A licensed can be obtained for $125 to $250 depending on your status (academic or otherwise).
Deprecation Warning: The interface of TimitCorpus will change in the near
future to match that of pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus
URL: https://catalog.ldc.upenn.edu/ldc93s1
-
class
pyroomacoustics.datasets.timit.
Sentence
(path)¶ Create the sentence object
Parameters: path ((string)) – the path to the particular sample -
speaker
¶ str – Speaker initials
-
id
¶ str – a digit to disambiguate identical initials
-
sex
¶ str – Speaker gender (M or F)
-
dialect
¶ str – Speaker dialect region number:
- New England
- Northern
- North Midland
- South Midland
- Southern
- New York City
- Western
- Army Brat (moved around)
-
fs
¶ int – sampling frequency
-
samples
¶ array_like (n_samples,) – the audio track
-
text
¶ str – the text of the sentence
-
words
¶ list – list of Word objects forming the sentence
-
phonems
¶ list – List of phonems contained in the sentence. Each element is a dictionnary containing a ‘bnd’ with the limits of the phonem and ‘name’ that is the phonem transcription.
-
play
()¶ Play the sound sample
-
plot
(L=512, hop=128, zpb=0, phonems=False, **kwargs)¶
-
-
class
pyroomacoustics.datasets.timit.
TimitCorpus
(basedir)¶ TimitCorpus class
Parameters: - basedir ((string)) – The location of the TIMIT database
- directories ((list of strings)) – The subdirectories containing the data ([‘TEST’,’TRAIN’])
- sentence_corpus ((dict)) – A dictionnary that contains a list of Sentence objects for each sub-directory
- word_corpus ((dict)) – A dictionnary that contains a list of Words objects for each sub-directory and word available in the corpus
-
build_corpus
(sentences=None, dialect_region=None, speakers=None, sex=None)¶ Build the corpus
The TIMIT database structure is encoded in the directory sturcture:
- basedir
- TEST/TRAIN
- Regional accent index (1 to 8)
- Speakers (one directory per speaker)
- Sentences (one file per sentence)
Parameters: - sentences ((list)) – A list containing the sentences to which we want to restrict the corpus Example: sentences=[‘SA1’,’SA2’]
- dialect_region ((list of int)) – A list to which we restrict the dialect regions Example: dialect_region=[1, 4, 5]
- speakers ((list)) – A list of speakers acronym to which we want to restrict the corpus Example: speakers=[‘AKS0’]
- sex ((string)) – Restrict to a single sex: ‘F’ for female, ‘M’ for male
-
get_word
(d, w, index=0)¶ return instance index of word w from group (test or train) d
-
class
pyroomacoustics.datasets.timit.
Word
(word, boundaries, data, fs, phonems=None)¶ A class used for words of the TIMIT corpus
-
word
¶ str – The spelling of the word
-
boundaries
¶ list – The limits of the word within the sentence
-
samples
¶ array_like – A view on the sentence samples containing the word
-
fs
¶ int – The sampling frequency
-
phonems
¶ list – A list of phones contained in the word
-
features
¶ array_like – A feature array (e.g. MFCC coefficients)
Parameters: - word (str) – The spelling of the word
- boundaries (list) – The limits of the word within the sentence
- data (array_like) – The nd-array that contains all the samples of the sentence
- fs (int) – The sampling frequency
- phonems (list, optional) – A list of phones contained in the word
-
mfcc
(frame_length=1024, hop=512)¶ compute the mel-frequency cepstrum coefficients of the word samples
-
play
()¶ Play the sound sample
-
plot
()¶
-
Tools and Helpers¶
Base Class¶
Base class for some data corpus and the samples it contains.
-
class
pyroomacoustics.datasets.base.
AudioSample
(data, fs, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.Sample
We add some methods specific to display and listen to audio samples. The sampling frequency of the samples is an extra parameter.
For multichannel audio, we assume the same format used by
`scipy.io.wavfile <https://docs.scipy.org/doc/scipy-0.14.0/reference/io.html#module-scipy.io.wavfile>`_
, that isdata
is then a 2D array with each column being a channel.-
data
¶ array_like – The actual data
-
fs
¶ int – The sampling frequency of the input signal
-
meta
¶ pyroomacoustics.datasets.Meta – An object containing the sample metadata. They can be accessed using the dot operator
-
play
(**kwargs)¶ Play the sound sample. This function uses the sounddevice package for playback.
It takes the same keyword arguments as sounddevice.play.
-
plot
(NFFT=512, noverlap=384, **kwargs)¶ Plot the spectrogram of the audio sample.
It takes the same keyword arguments as matplotlib.pyplot.specgram.
-
-
class
pyroomacoustics.datasets.base.
Dataset
¶ Bases:
object
The base class for a data corpus. It has basically a list of samples and a filter function
-
samples
¶ list – A list of all the Samples in the dataset
-
info
¶ dict – This dictionary keeps track of all the fields in the metadata. The keys of the dictionary are the metadata field names. The values are again dictionaries, but with the keys being the possible values taken by the metadata and the associated value, the number of samples with this value in the corpus.
-
add_sample
(sample)¶ Add a sample to the Dataset and keep track of the metadata.
-
add_sample_matching
(sample, **kwargs)¶ The sample is added to the corpus only if all the keyword arguments match the metadata of the sample. The match is operated by
pyroomacoustics.datasets.Meta.match
.
-
filter
(**kwargs)¶ Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
-
head
(n=5)¶ Print n samples from the dataset
-
-
class
pyroomacoustics.datasets.base.
Meta
(**attr)¶ Bases:
object
A simple class that will take a dictionary as input and put the values in attributes named after the keys. We use it to store metadata for the samples
The parameters can be any set of keyword arguments. They will all be transformed into attribute of the object.
-
as_dict
()¶ Returns all the attribute/value pairs of the object as a dictionary
-
match
(**kwargs)¶ The key/value pairs given by the keyword arguments are compared to the attribute/value pairs of the object. If the values all match, True is returned. Otherwise False is returned. If a keyword argument has no attribute counterpart, an error is raised. Attributes that do not have a keyword argument counterpart are ignored.
There are three ways to match an attribute with keyword=value: 1.
value == attribute
2.value
is a list andattribute in value == True
3.value
is a callable (a function) andvalue(attribute) == True
-
-
class
pyroomacoustics.datasets.base.
Sample
(data, **kwargs)¶ Bases:
object
The base class for a dataset sample. The idea is that different corpus will have different attributes for the samples. They should at least have a data attribute.
-
data
¶ array_like – The actual data
-
meta
¶ pyroomacoustics.datasets.Meta – An object containing the sample metadata. They can be accessed using the dot operator
-
Dataset Utilities¶
-
pyroomacoustics.datasets.utils.
download_uncompress
(url, path='.', compression=None)¶ This functions download and uncompress on the fly a file of type tar, tar.gz, tar.bz2.
Parameters: - url (str) – The URL of the file
- path (str, optional) – The path where to uncompress the file
- compression (str, optional) – The compression type (one of ‘bz2’, ‘gz’, ‘tar’), infered from url if not provided
Adaptive Filtering¶
Module contents¶
Adaptive Filter Algorithms¶
This sub-package provides implementations of popular adaptive filter algorithms.
- RLS
- Recursive Least Squares
- LMS
- Least Mean Squares and Normalized Least Mean Squares
All these classes derive from the base class
pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter
that offer
a generic way of running an adaptive filter.
The above classes are applicable for time domain processing. For frequency domain adaptive filtering, there is the SubbandLMS class. After using a DFT or STFT block, the SubbandLMS class can be used to used to apply LMS or NLMS to each frequency band. A shorter adaptive filter can be used on each band as opposed to the filter required in the time domain version. Roughly, a filter of M taps applied to each band (total of B) corresponds to a time domain filter with N = M x B taps.
How to use the adaptive filter module¶
First, an adaptive filter object is created and all the relevant options can be set (step size, regularization, etc). Then, the update function is repeatedly called to provide new samples to the algorithm.
# initialize the filter
rls = pyroomacoustics.adaptive.RLS(30)
# run the filter on a stream of samples
for i in range(100):
rls.update(x[i], d[i])
# the reconstructed filter is available
print('Reconstructed filter:', rls.w)
The SubbandLMS class has the same methods as the time domain approaches. However, the signal must be in the frequency domain. This can be done with the STFT block in the transform sub-package of pyroomacoustics.
# initialize STFT and SubbandLMS blocks
block_size = 128
stft_x = pra.transform.STFT(N=block_size,
hop=block_size//2,
analysis_window=pra.hann(block_size))
stft_d = pra.transform.STFT(N=block_size,
hop=block_size//2,
analysis_window=pra.hann(block_size))
nlms = pra.adaptive.SubbandLMS(num_taps=6,
num_bands=block_size//2+1, mu=0.5, nlms=True)
# preparing input and reference signals
...
# apply block-by-block
for n in range(num_blocks):
# obtain block
...
# to frequency domain
stft_x.analysis(x_block)
stft_d.analysis(d_block)
nlms.update(stft_x.X, stft_d.X)
# estimating input convolved with unknown response
y_hat = stft_d.synthesis(np.diag(np.dot(nlms.W.conj().T,stft_x.X)))
# AEC output
E = stft_d.X - np.diag(np.dot(nlms.W.conj().T,stft_x.X))
out = stft_d.synthesis(E)
Other Available Subpackages¶
pyroomacoustics.adaptive.data_structures
- this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods
pyroomacoustics.adaptive.util
- a few methods mainly to efficiently manipulate Toeplitz and Hankel matrices
Utilities¶
pyroomacoustics.adaptive.algorithms
- a dictionary containing all the adaptive filter object subclasses availables indexed by
keys
['RLS', 'BlockRLS', 'BlockLMS', 'NLMS', 'SubbandLMS']
Algorithms¶
Adaptive Filter (Base)¶
-
class
pyroomacoustics.adaptive.adaptive_filter.
AdaptiveFilter
(length)¶ The dummy base class of an adaptive filter. This class doesn’t compute anything. It merely stores values in a buffer. It is used as a template for all other algorithms.
-
name
()¶
-
reset
()¶ Reset the state of the adaptive filter
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
-
Least Mean Squares¶
Least Mean Squares Family¶
Implementations of adaptive filters from the LMS class. These algorithms have a low complexity and reliable behavior with a somewhat slower convergence.
-
class
pyroomacoustics.adaptive.lms.
BlockLMS
(length, mu=0.01, L=1, nlms=False)¶ Bases:
pyroomacoustics.adaptive.lms.NLMS
Implementation of the least mean squares algorithm (NLMS) in its block form
Parameters: - length (int) – the length of the filter
- mu (float, optional) – the step size (default 0.01)
- L (int, optional) – block size (default is 1)
- nlms (bool, optional) – whether or not to normalize as in NLMS (default is False)
-
reset
()¶ Reset the state of the adaptive filter
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
-
class
pyroomacoustics.adaptive.lms.
NLMS
(length, mu=0.5)¶ Bases:
pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter
Implementation of the normalized least mean squares algorithm (NLMS)
Parameters: - length (int) – the length of the filter
- mu (float, optional) – the step size (default 0.5)
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
Recursive Least Squares¶
Recursive Least Squares Family¶
Implementations of adaptive filters from the RLS class. These algorithms typically have a higher computational complexity, but a faster convergence.
-
class
pyroomacoustics.adaptive.rls.
BlockRLS
(length, lmbd=0.999, delta=10, dtype=<Mock id='140450836566800'>, L=None)¶ Bases:
pyroomacoustics.adaptive.rls.RLS
Block implementation of the recursive least-squares (RLS) algorithm. The difference with the vanilla implementation is that chunks of the input signals are processed in batch and some savings can be made there.
Parameters: - length (int) – the length of the filter
- lmbd (float, optional) – the exponential forgetting factor (default 0.999)
- delta (float, optional) – the regularization term (default 10)
- dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)
- L (int, optional) – the block size (default to length)
-
reset
()¶ Reset the state of the adaptive filter
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
-
class
pyroomacoustics.adaptive.rls.
RLS
(length, lmbd=0.999, delta=10, dtype=<Mock id='140450836566608'>)¶ Bases:
pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter
Implementation of the exponentially weighted Recursive Least Squares (RLS) adaptive filter algorithm.
Parameters: - length (int) – the length of the filter
- lmbd (float, optional) – the exponential forgetting factor (default 0.999)
- delta (float, optional) – the regularization term (default 10)
- dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)
-
reset
()¶ Reset the state of the adaptive filter
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
Subband LMS¶
-
pyroomacoustics.adaptive.subband_lms.
Hermitian
(X)¶ Compute and return Hermitian transpose
-
class
pyroomacoustics.adaptive.subband_lms.
SubbandLMS
(num_taps, num_bands, mu=0.5, nlms=True)¶ Frequency domain implementation of LMS. Adaptive filter for each subband.
Parameters: - num_taps (int) – length of the filter
- num_bands (int) – number of frequency bands, i.e. number of filters
- mu (float, optional) – step size for each subband (default 0.5)
- nlms (bool, optional) – whether or not to normalize as in NLMS (default is True)
-
reset
()¶
-
update
(X_n, D_n)¶ Updates the adaptive filters for each subband with the new block of input data.
Parameters: - X_n (numpy array, float) – new input signal (to unknown system) in frequency domain
- D_n (numpy array, float) – new noisy reference signal in frequency domain
Tools and Helpers¶
Data Structures¶
-
class
pyroomacoustics.adaptive.data_structures.
Buffer
(length=20, dtype=<Mock id='140450836499344'>)¶ A simple buffer class with amortized cost
Parameters: - length (int) – buffer length
- dtype (numpy.type) – data type
-
flush
(n)¶ Removes the n oldest elements in the buffer
-
push
(val)¶ Add one element at the front of the buffer
-
size
()¶ Returns the number of elements in the buffer
-
top
(n)¶ Returns the n elements at the front of the buffer from newest to oldest
-
class
pyroomacoustics.adaptive.data_structures.
CoinFlipper
(p, length=10000)¶ This class efficiently generates large number of coin flips. Because each call to
numpy.random.rand
is a little bit costly, it is more efficient to generate many values at once. This class does this and stores them in advance. It generates new fresh numbers when needed.Parameters: - p (float, 0 < p < 1) – probability to output a 1
- length (int) – the number of flips to precompute
-
flip
(n)¶ Get n random binary values from the buffer
-
flip_all
()¶ Regenerates all the used up values
-
fresh_flips
(n)¶ Generates n binary random values now
-
class
pyroomacoustics.adaptive.data_structures.
Powers
(a, length=20, dtype=<Mock id='140450836499472'>)¶ This class allows to store all powers of a small number and get them ‘a la numpy’ with the bracket operator. There is automatic increase when new values are requested
Parameters: - a (float) – the number
- length (int) – the number of integer powers
- dtype (numpy.type, optional) – the data type (typically np.float32 or np.float64)
Example
>>> an = Powers(0.5) >>> print(an[4]) 0.0625
Utilities¶
-
pyroomacoustics.adaptive.util.
autocorr
(x)¶ Fast autocorrelation computation using the FFT
-
pyroomacoustics.adaptive.util.
hankel_multiplication
(c, r, A, mkl=True, **kwargs)¶ Compute numpy.dot(scipy.linalg.hankel(c,r=r), A) using the FFT.
Parameters: - c (ndarray) – the first column of the Hankel matrix
- r (ndarray) – the last row of the Hankel matrix
- A (ndarray) – the matrix to multiply on the right
- mkl (bool, optional) – if True, use the mkl_fft package if available
-
pyroomacoustics.adaptive.util.
hankel_stride_trick
(x, shape)¶ Make a Hankel matrix from a vector using stride tricks
Parameters: - x (ndarray) – a vector that contains the concatenation of the first column and first row of the Hankel matrix to build without repetition of the lower left corner value of the matrix
- shape (tuple) – the shape of the Hankel matrix to build, it must satisfy
x.shape[0] == shape[0] + shape[1] - 1
-
pyroomacoustics.adaptive.util.
mkl_toeplitz_multiplication
(c, r, A, A_padded=False, out=None, fft_len=None)¶ Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT from the mkl_fft package.
Parameters: - c (ndarray) – the first column of the Toeplitz matrix
- r (ndarray) – the first row of the Toeplitz matrix
- A (ndarray) – the matrix to multiply on the right
- A_padded (bool, optional) – the A matrix can be pre-padded with zeros by the user, if this is the case set to True
- out (ndarray, optional) – an ndarray to store the output of the multiplication
- fft_len (int, optional) – specify the length of the FFT to use
-
pyroomacoustics.adaptive.util.
naive_toeplitz_multiplication
(c, r, A)¶ Compute numpy.dot(scipy.linalg.toeplitz(c,r), A)
Parameters: - c (ndarray) – the first column of the Toeplitz matrix
- r (ndarray) – the first row of the Toeplitz matrix
- A (ndarray) – the matrix to multiply on the right
-
pyroomacoustics.adaptive.util.
toeplitz_multiplication
(c, r, A, **kwargs)¶ Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT.
Parameters: - c (ndarray) – the first column of the Toeplitz matrix
- r (ndarray) – the first row of the Toeplitz matrix
- A (ndarray) – the matrix to multiply on the right
-
pyroomacoustics.adaptive.util.
toeplitz_opt_circ_approx
(r, matrix=False)¶ Optimal circulant approximation of a symmetric Toeplitz matrix by Tony F. Chan
Parameters: - r (ndarray) – the first row of the symmetric Toeplitz matrix
- matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column
-
pyroomacoustics.adaptive.util.
toeplitz_strang_circ_approx
(r, matrix=False)¶ Circulant approximation to a symetric Toeplitz matrix by Gil Strang
Parameters: - r (ndarray) – the first row of the symmetric Toeplitz matrix
- matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column
Blind Source Separation¶
Module contents¶
Blind Source Separation¶
Implementations of a few blind source separation (BSS) algorithms.
A few commonly used functions, such as projection back, can be found in
pyroomacoustics.bss.common
.
References
[1] | N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, 2011. |
[2] | R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006. |
Algorithms¶
Independent Vector Analysis (AuxIVA)¶
Blind Source Separation using Independent Vector Analysis with Auxiliary Function
2018 (c) Robin Scheibler, MIT License
-
pyroomacoustics.bss.auxiva.
auxiva
(X, n_src=None, n_iter=20, proj_back=True, W0=None, f_contrast=None, f_contrast_args=[], return_filters=False, callback=None)¶ Implementation of AuxIVA algorithm for BSS presented in
N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, 2011.
Parameters: - X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal
- n_src (int, optional) – The number of sources or independent components
- n_iter (int, optional) – The number of iterations (default 20)
- proj_back (bool, optional) – Scaling on first mic by back projection (default True)
- W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for demixing matrix
- f_contrast (dict of functions) – A dictionary with two elements ‘f’ and ‘df’ containing the contrast function taking 3 arguments This should be a ufunc acting element-wise on any array
- return_filters (bool) – If true, the function will return the demixing matrix too
- callback (func) – A callback function called every 10 iterations, allows to monitor convergence
Returns: - Returns an (nframes, nfrequencies, nsources) array. Also returns
- the demixing matrix (nfrequencies, nchannels, nsources)
- if
return_values
keyword is True.
Trinicon¶
-
pyroomacoustics.bss.trinicon.
trinicon
(signals, w0=None, filter_length=2048, block_length=None, n_blocks=8, alpha_on=None, j_max=10, delta_max=0.0001, sigma2_0=1e-07, mu=0.001, lambd_a=0.2, return_filters=False)¶ Implementation of the TRINICON Blind Source Separation algorithm as described in
R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006. [pdf]
Specifically, adaptation of the pseudo-code from Table 1.
The implementation is hard-coded for 2 output channels.
Parameters: - signals (ndarray (nchannels, nsamples)) – The microphone input signals (time domain)
- w0 (ndarray (nchannels, nsources, nsamples), optional) – Optional initial value for the demixing filters
- filter_length (int, optional) – The length of the demixing filters, if w0 is provided, this option is ignored
- block_length (int, optional) – Block length (default 2x filter_length)
- n_blocks (int, optional) – Number of blocks processed at once (default 8)
- alpha_on (int, optional) – Online overlap factor (default
n_blocks
) - j_max (int, optional) – Number of offline iterations (default 10)
- delta_max (float, optional) – Regularization parameter, this sets the maximum value of the regularization term (default 1e-4)
- sigma2_0 (float, optional) – Regularization parameter, this sets the reference (machine?) noise level in the regularization (default 1e-7)
- mu (float, optional) – Offline update step size (default 0.001)
- lambd_a (float, optional) – Online forgetting factor (default 0.2)
- return_filters (bool) – If true, the function will return the demixing matrix too (default False)
Returns: Returns an (nsources, nsamples) array. Also returns the demixing matrix (nchannels, nsources, nsamples) if
return_filters
keyword is True.Return type: ndarray
Common Tools¶
Common Functions used in BSS algorithms
2018 (c) Robin Scheibler, MIT License
-
pyroomacoustics.bss.common.
projection_back
(Y, ref, clip_up=None, clip_down=None)¶ This function computes the frequency-domain filter that minimizes the squared error to a reference signal. This is commonly used to solve the scale ambiguity in BSS.
The optimal filter z minimizes the squared error.
\[\min E[|z^* y - x|^2]\]It should thus satsify the orthogonality condition and can be derived as follows
\[ \begin{align}\begin{aligned}0 & = E[y^*\, (z^* y - x)]\\0 & = z^*\, E[|y|^2] - E[y^* x]\\z^* & = \frac{E[y^* x]}{E[|y|^2]}\\z & = \frac{E[y x^*]}{E[|y|^2]}\end{aligned}\end{align} \]In practice, the expectations are replaced by the sample mean.
Parameters: - Y (array_like (n_frames, n_bins, n_channels)) – The STFT data to project back on the reference signal
- ref (array_like (n_frames, n_bins)) – The reference signal
- clip_up (float, optional) – Limits the maximum value of the gain (default no limit)
- clip_down (float, optional) – Limits the minimum value of the gain (default no limit)
Direction of Arrival¶
Module contents¶
Direction of Arrival Finding¶
This sub-package provides implementations of popular direction of arrival findings algorithms.
- MUSIC
- Multiple Signal Classification [1]
- SRP-PHAT
- Steered Response Power – Phase Transform [2]
- CSSM
- Coherent Signal Subspace Method [3]
- WAVES
- Weighted Average of Signal Subspaces [4]
- TOPS
- Test of Orthogonality of Projected Subspaces [5]
- FRIDA
- Finite Rate of Innovation Direction of Arrival [6]
All these classes derive from the abstract base class
pyroomacoustics.doa.doa.DOA
that offers generic methods for finding
and visualizing the locations of acoustic sources.
The constructor can be called once to build the DOA finding object. Then, the
method pyroomacoustics.doa.doa.DOA.locate_sources
performs DOA
finding based on time-frequency passed to it as an argument. Extra arguments
can be supplied to indicate which frequency bands should be used for
localization.
How to use the DOA module¶
Here R
is a 2xQ ndarray that contains the locations of the Q microphones
in the columns, fs
is the sampling frequency of the input signal, and
nfft
the length of the FFT used.
The STFT snapshots are passed to the localization methods in the X ndarray of
shape Q x (nfft // 2 + 1) x n_snapshots
, where n_snapshots
is the
number of STFT frames to use for the localization. The option freq_bins
can be provided to specify which frequency bins to use for the localization.
>>> doa = pyroomacoustics.doa.MUSIC(R, fs, nfft)
>>> doa.locate_sources(X, freq_bins=np.arange(20, 40))
Other Available Subpackages¶
pyroomacoustics.doa.grid
- this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods
pyroomacoustics.doa.plotters
- a few methods to plot functions and points on circles or spheres
pyroomacoustics.doa.detect_peaks
- 1D peak detection routine from Marcos Duarte
pyroomacoustics.doa.tools_frid_doa_plane
- routines implementing FRIDA algorithm
Utilities¶
pyroomacoustics.doa.algorithms
- a dictionary containing all the DOA object subclasses availables indexed by
keys
['MUSIC', 'SRP', 'CSSM', 'WAVES', 'TOPS', 'FRIDA']
References
[1] | R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag., Vol. 34, Num. 3, pp 276–280, 1986 |
[2] | J. H. DiBiase, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays, PHD Thesis, Brown University, 2000 |
[3] | H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985 |
[4] | E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001 |
[5] | Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006 |
[6] | H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017 |
Algorithms¶
CSSM¶
-
class
pyroomacoustics.doa.cssm.
CSSM
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)¶ Bases:
pyroomacoustics.doa.music.MUSIC
Class to apply the Coherent Signal-Subspace method [CSSM] for Direction of Arrival (DoA) estimation.
Note
Run locate_sources() to apply the CSSM algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate colatitude angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
- num_iter (int) – Number of iterations for CSSM. Default: 5
References
[CSSM] H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985
FRIDA¶
-
class
pyroomacoustics.doa.frida.
FRIDA
(L, fs, nfft, max_four=None, c=343.0, num_src=1, G_iter=None, max_ini=5, n_rot=1, max_iter=50, noise_level=1e-10, low_rank_cleaning=False, stopping='max_iter', stft_noise_floor=0.0, stft_noise_margin=1.5, signal_type='visibility', use_lu=True, verbose=False, symb=True, use_cache=False, **kwargs)¶ Bases:
pyroomacoustics.doa.doa.DOA
Implements the FRI-based direction of arrival finding algorithm [FRIDA].
Note
Run locate_sources() to apply the CSSM algorithm.
Parameters: - L (ndarray) – Contains the locations of the microphones in the columns
- fs (int or float) – Sampling frequency
- nfft (int) – FFT size
- max_four (int) – Maximum order of the Fourier or spherical harmonics expansion
- c (float, optional) – Speed of sound
- num_src (int, optional) – The number of sources to recover (default 1)
- G_iter (int) – Number of mapping matrix refinement iterations in recovery algorithm (default 1)
- max_ini (int, optional) – Number of random initializations to use in recovery algorithm (default 5)
- n_rot (int, optional) – Number of random rotations to apply before recovery algorithm (default 10)
- noise_level (float, optional) – Noise level in the visibility measurements, if available (default 1e-10)
- stopping (str, optional) – Stopping criteria for the recovery algorithm. Can be max iterations or noise level (default max_iter)
- stft_noise_floor (float) – The noise floor in the STFT measurements, if available (default 0)
- stft_noise_margin (float) – When this, along with stft_noise_floor is set, we only pick frames with at least stft_noise_floor * stft_noise_margin power
- signal_type (str) –
Which type of measurements to use:
- ’visibility’: Cross correlation measurements
- ’raw’: Microphone signals
- use_lu (bool, optional) – Whether to use LU decomposition for efficiency
- verbose (bool, optional) – Whether to output intermediate result for debugging purposes
- symb (bool, optional) – Whether enforce the symmetry on the reconstructed uniform samples of sinusoids b
References
[FRIDA] H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017
MUSIC¶
-
class
pyroomacoustics.doa.music.
MUSIC
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)¶ Bases:
pyroomacoustics.doa.doa.DOA
Class to apply MUltiple SIgnal Classication (MUSIC) direction-of-arrival (DoA) for a particular microphone array.
Note
Run locate_source() to apply the MUSIC algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
-
plot_individual_spectrum
()¶ Plot the steered response for each frequency.
SRP-PHAT¶
-
class
pyroomacoustics.doa.srp.
SRP
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)¶ Bases:
pyroomacoustics.doa.doa.DOA
Class to apply Steered Response Power (SRP) direction-of-arrival (DoA) for a particular microphone array.
Note
Run locate_source() to apply the SRP-PHAT algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
TOPS¶
-
class
pyroomacoustics.doa.tops.
TOPS
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)¶ Bases:
pyroomacoustics.doa.music.MUSIC
Class to apply Test of Orthogonality of Projected Subspaces [TOPS] for Direction of Arrival (DoA) estimation.
Note
Run locate_source() to apply the TOPS algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
References
[TOPS] Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006
WAVES¶
-
class
pyroomacoustics.doa.waves.
WAVES
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)¶ Bases:
pyroomacoustics.doa.music.MUSIC
Class to apply Weighted Average of Signal Subspaces [WAVES] for Direction of Arrival (DoA) estimation.
Note
Run locate_sources() to apply the WAVES algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
- num_iter (int) – Number of iterations for CSSM. Default: 5
References
[WAVES] E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001
Tools and Helpers¶
DOA (Base)¶
-
class
pyroomacoustics.doa.doa.
DOA
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, n_grid=None, dim=2, *args, **kwargs)¶ Bases:
object
Abstract parent class for Direction of Arrival (DoA) algorithms. After creating an object (SRP, MUSIC, CSSM, WAVES, or TOPS), run locate_source to apply the corresponding algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
- n_grid (int) – If azimuth and colatitude are not specified, we will create a grid with so many points. Default is 360.
- dim (int) – The dimension of the problem. Set dim=2 to find sources on the circle (x-y plane). Set dim=3 to search on the whole sphere.
-
locate_sources
(X, num_src=None, freq_range=[500.0, 4000.0], freq_bins=None, freq_hz=None)¶ Locate source(s) using corresponding algorithm.
Parameters: - X (numpy array) – Set of signals in the frequency (RFFT) domain for current frame. Size should be M x F x S, where M should correspond to the number of microphones, F to nfft/2+1, and S to the number of snapshots (user-defined). It is recommended to have S >> M.
- num_src (int) – Number of sources to detect. Default is value given to object constructor.
- freq_range (list of floats, length 2) – Frequency range on which to run DoA: [fmin, fmax].
- freq_bins (list of int) – freq_bins: List of individual frequency bins on which to run DoA. If defined by user, it will not take into consideration freq_range or freq_hz.
- freq_hz (list of floats) – List of individual frequencies on which to run DoA. If defined by user, it will not take into consideration freq_range.
-
polar_plt_dirac
(azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)¶ Generate polar plot of DoA results.
Parameters: - azimuth_ref (numpy array) – True direction of sources (in radians).
- alpha_ref (numpy array) – Estimated amplitude of sources.
- save_fig (bool) – Whether or not to save figure as pdf.
- file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’
- plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.
-
class
pyroomacoustics.doa.doa.
ModeVector
(L, fs, nfft, c, grid, mode='far', precompute=False)¶ Bases:
object
This is a class for look-up tables of mode vectors. This look-up table is an outer product of three vectors running along candidate locations, time, and frequency. When the grid becomes large, the look-up table might be too large to store in memory. In that case, this class allows to only compute the outer product elements when needed, only keeping the three vectors in memory. When the table is small, a precompute option can be set to True to compute the whole table in advance.
Tools for FRIDA (azimuth only)¶
-
pyroomacoustics.doa.tools_fri_doa_plane.
Rmtx_ri
(coef_ri, K, D, L)¶ Split T matrix in rea/imaginary representation
-
pyroomacoustics.doa.tools_fri_doa_plane.
Rmtx_ri_half
(coef_half, K, D, L, D_coef)¶ Split T matrix in rea/imaginary conjugate symmetric representation
-
pyroomacoustics.doa.tools_fri_doa_plane.
Rmtx_ri_half_out_half
(coef_half, K, D, L, D_coef, mtx_shrink)¶ if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size
-
pyroomacoustics.doa.tools_fri_doa_plane.
Tmtx_ri
(b_ri, K, D, L)¶ build convolution matrix associated with b_ri
Parameters: - b_ri – a real-valued vector
- K – number of Diracs
- D1 – expansion matrix for the real-part
- D2 – expansion matrix for the imaginary-part
-
pyroomacoustics.doa.tools_fri_doa_plane.
Tmtx_ri_half
(b_ri, K, D, L, D_coef)¶ Split T matrix in conjugate symmetric representation
-
pyroomacoustics.doa.tools_fri_doa_plane.
Tmtx_ri_half_out_half
(b_ri, K, D, L, D_coef, mtx_shrink)¶ if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size
-
pyroomacoustics.doa.tools_fri_doa_plane.
build_mtx_amp
(phi_k, p_mic_x, p_mic_y)¶ the matrix that maps Diracs’ amplitudes to the visibility
Parameters: - phi_k – Diracs’ location (azimuth)
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
-
pyroomacoustics.doa.tools_fri_doa_plane.
build_mtx_amp_ri
(p_mic_x, p_mic_y, phi_k)¶ builds real/imaginary amplitude matrix
-
pyroomacoustics.doa.tools_fri_doa_plane.
build_mtx_raw_amp
(p_mic_x, p_mic_y, phi_k)¶ the matrix that maps Diracs’ amplitudes to the visibility
Parameters: - phi_k – Diracs’ location (azimuth)
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
-
pyroomacoustics.doa.tools_fri_doa_plane.
coef_expan_mtx
(K)¶ expansion matrix for an annihilating filter of size K + 1
Parameters: K – number of Dirac. The filter size is K + 1
-
pyroomacoustics.doa.tools_fri_doa_plane.
compute_b
(G_lst, GtG_lst, beta_lst, Rc0, num_bands, a_ri, use_lu=False, GtG_inv_lst=None)¶ compute the uniform sinusoidal samples b from the updated annihilating filter coeffiients.
Parameters: - GtG_lst – list of G^H G for different subbands
- beta_lst – list of beta-s for different subbands
- Rc0 – right-dual matrix, here it is the convolution matrix associated with c
- num_bands – number of bands
- a_ri – a 2D numpy array. each column corresponds to the measurements within a subband
-
pyroomacoustics.doa.tools_fri_doa_plane.
compute_mtx_obj
(GtG_lst, Tbeta_lst, Rc0, num_bands, K)¶ - compute the matrix (M) in the objective function:
- min c^H M c s.t. c0^H c = 1
Parameters: - GtG_lst – list of G^H * G
- Tbeta_lst – list of Teoplitz matrices for beta-s
- Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
-
pyroomacoustics.doa.tools_fri_doa_plane.
compute_obj_val
(GtG_inv_lst, Tbeta_lst, Rc0, c_ri_half, num_bands, K)¶ compute the fitting error. CAUTION: Here we assume use_lu = True
-
pyroomacoustics.doa.tools_fri_doa_plane.
cov_mtx_est
(y_mic)¶ estimate covariance matrix
Parameters: y_mic – received signal (complex based band representation) at microphones
-
pyroomacoustics.doa.tools_fri_doa_plane.
cpx_mtx2real
(mtx)¶ extend complex valued matrix to an extended matrix of real values only
Parameters: mtx – input complex valued matrix
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri
(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements
Parameters: - G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half
(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
Parameters: - G (param) – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half_multiband
(G_lst, a_ri, K, M, max_ini=100)¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
Parameters: - G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half_multiband_lu
(G_lst, GtG_lst, GtG_inv_lst, a_ri, K, M, max_ini=100, max_iter=50)¶ Here we use LU decomposition to precompute a few entries. Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
Parameters: - G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half_multiband_parallel
(G, a_ri, K, M, max_ini=100)¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’
Parameters: - G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half_parallel
(G, a_ri, K, M, max_ini=100)¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’
Parameters: - G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_inner
(c_ri_half, a_ri, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, max_iter)¶ inner loop of the dirac_recon_ri_half_parallel function
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_multiband_inner
(c_ri_half, a_ri, num_bands, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, GtG, max_iter)¶ Inner loop of the dirac_recon_ri_multiband function
-
pyroomacoustics.doa.tools_fri_doa_plane.
extract_off_diag
(mtx)¶ extract off diagonal entries in mtx. The output vector is order in a column major manner.
Parameters: mtx – input matrix to extract the off diagonal entries
-
pyroomacoustics.doa.tools_fri_doa_plane.
hermitian_expan
(half_vec_len)¶ expand a real-valued vector to a Hermitian symmetric vector. The input vector is a concatenation of the real parts with NON-POSITIVE indices and the imaginary parts with STRICTLY-NEGATIVE indices.
Parameters: half_vec_len – length of the first half vector
-
pyroomacoustics.doa.tools_fri_doa_plane.
lu_compute_mtx_obj
(Tbeta_lst, num_bands, K, lu_R_GtGinv_Rt_lst)¶ - compute the matrix (M) in the objective function:
- min c^H M c s.t. c0^H c = 1
Parameters: - GtG_lst – list of G^H * G
- Tbeta_lst – list of Teoplitz matrices for beta-s
- Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
-
pyroomacoustics.doa.tools_fri_doa_plane.
lu_compute_mtx_obj_initial
(GtG_inv_lst, Tbeta_lst, Rc0, num_bands, K)¶ - compute the matrix (M) in the objective function:
- min c^H M c s.t. c0^H c = 1
Parameters: - GtG_lst – list of G^H * G
- Tbeta_lst – list of Teoplitz matrices for beta-s
- Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
-
pyroomacoustics.doa.tools_fri_doa_plane.
make_G
(p_mic_x, p_mic_y, omega_bands, sound_speed, M, signal_type='visibility')¶ reconstruct point sources on the circle from the visibility measurements from multi-bands.
Parameters: - p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- omega_bands – mid-band (ANGULAR) frequencies [radian/sec]
- sound_speed – speed of sound
- signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
Returns: Return type: The list of mapping matrices from measurements to sinusoids
-
pyroomacoustics.doa.tools_fri_doa_plane.
make_GtG_and_inv
(G_lst)¶
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_freq2raw
(M, p_mic_x, p_mic_y)¶ build the matrix that maps the Fourier series to the raw microphone signals
Parameters: - M – the Fourier series expansion is limited from -M to M
- p_mic_x – a vector that contains microphones x coordinates
- p_mic_y – a vector that contains microphones y coordinates
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_freq2visi
(M, p_mic_x, p_mic_y)¶ build the matrix that maps the Fourier series to the visibility
Parameters: - M – the Fourier series expansion is limited from -M to M
- p_mic_x – a vector that constains microphones x coordinates
- p_mic_y – a vector that constains microphones y coordinates
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_fri2signal_ri
(M, p_mic_x, p_mic_y, D1, D2, signal='visibility')¶ build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)
Parameters: - M – the Fourier series expansion is limited from -M to M
- p_mic_x – a vector that contains microphones x coordinates
- p_mic_y – a vector that contains microphones y coordinates
- D1 – expansion matrix for the real-part
- D2 – expansion matrix for the imaginary-part
- signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_fri2signal_ri_multiband
(M, p_mic_x_all, p_mic_y_all, D1, D2, aslist=False, signal='visibility')¶ build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)
Parameters: - M – the Fourier series expansion is limited from -M to M
- p_mic_x_all – a matrix that contains microphones x coordinates
- p_mic_y_all – a matrix that contains microphones y coordinates
- D1 – expansion matrix for the real-part
- D2 – expansion matrix for the imaginary-part aslist: whether the linear mapping for each subband is returned as a list or a block diagonal matrix
- signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_updated_G
(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri)¶ Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations. :param phi_recon: the reconstructed Dirac locations (azimuths) :param M: the Fourier series expansion is between -M to M :param p_mic_x: a vector that contains microphones’ x-coordinates :param p_mic_y: a vector that contains microphones’ y-coordinates :param mtx_freq2visi: the linear mapping from Fourier series to visibilities
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_updated_G_multiband
(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri, num_bands)¶ Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.
Parameters: - phi_recon – the reconstructed Dirac locations (azimuths)
- M – the Fourier series expansion is between -M to M
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- mtx_fri2visi – the linear mapping from Fourier series to visibilities
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_updated_G_multiband_new
(phi_opt, M, p_x, p_y, G0_lst, num_bands)¶ Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.
Parameters: - phi_opt – the reconstructed Dirac locations (azimuths)
- M – the Fourier series expansion is between -M to M
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- G0_lst – the original linear mapping from Fourier series to visibilities
- num_bands – number of subbands
-
pyroomacoustics.doa.tools_fri_doa_plane.
multiband_cov_mtx_est
(y_mic)¶ estimate covariance matrix based on the received signals at microphones
Parameters: y_mic – received signal (complex base-band representation) at microphones
-
pyroomacoustics.doa.tools_fri_doa_plane.
multiband_extract_off_diag
(mtx)¶ extract off-diagonal entries in mtx The output vector is order in a column major manner
Parameters: mtx (input matrix to extract the off-diagonal entries) –
-
pyroomacoustics.doa.tools_fri_doa_plane.
output_shrink
(K, L)¶ shrink the convolution output to half the size. used when both the annihilating filter and the uniform samples of sinusoids satisfy Hermitian symmetric.
Parameters: - K – the annihilating filter size: K + 1
- L – length of the (complex-valued) b vector
-
pyroomacoustics.doa.tools_fri_doa_plane.
polar2cart
(rho, phi)¶ convert from polar to cartesian coordinates
Parameters: - rho – radius
- phi – azimuth
-
pyroomacoustics.doa.tools_fri_doa_plane.
pt_src_recon
(a, p_mic_x, p_mic_y, omega_band, sound_speed, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, verbose=False, signal_type='visibility', **kwargs)¶ reconstruct point sources on the circle from the visibility measurements
Parameters: - a – the measured visibilities
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- omega_band – mid-band (ANGULAR) frequency [radian/sec]
- sound_speed – speed of sound
- K – number of point sources
- M – the Fourier series expansion is between -M to M
- noise_level – noise level in the measured visibilities
- max_ini – maximum number of random initialisation used
- stop_cri – either ‘mse’ or ‘max_iter’
- update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
- verbose – whether output intermediate results for debugging or not
- signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
- kwargs (possible optional input: G_iter: number of iterations for the G updates) –
-
pyroomacoustics.doa.tools_fri_doa_plane.
pt_src_recon_multiband
(a, p_mic_x, p_mic_y, omega_bands, sound_speed, K, M, noise_level, max_ini=50, update_G=False, verbose=False, signal_type='visibility', max_iter=50, G_lst=None, GtG_lst=None, GtG_inv_lst=None, **kwargs)¶ reconstruct point sources on the circle from the visibility measurements from multi-bands.
Parameters: - a – the measured visibilities in a matrix form, where the second dimension corresponds to different subbands
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- omega_bands – mid-band (ANGULAR) frequencies [radian/sec]
- sound_speed – speed of sound
- K – number of point sources
- M – the Fourier series expansion is between -M to M
- noise_level – noise level in the measured visibilities
- max_ini – maximum number of random initialisation used
- update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
- verbose – whether output intermediate results for debugging or not
- signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
- kwargs – possible optional input: G_iter: number of iterations for the G updates
-
pyroomacoustics.doa.tools_fri_doa_plane.
pt_src_recon_rotate
(a, p_mic_x, p_mic_y, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, num_rotation=1, verbose=False, signal_type='visibility', **kwargs)¶ reconstruct point sources on the circle from the visibility measurements. Here we apply random rotations to the coordiantes.
Parameters: - a – the measured visibilities
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- K – number of point sources
- M – the Fourier series expansion is between -M to M
- noise_level – noise level in the measured visibilities
- max_ini – maximum number of random initialisation used
- stop_cri – either ‘mse’ or ‘max_iter’
- update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
- num_rotation – number of random rotations
- verbose – whether output intermediate results for debugging or not
- signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
- kwargs – possible optional input: G_iter: number of iterations for the G updates
Grid Objects¶
Routines to perform grid search on the sphere
-
class
pyroomacoustics.doa.grid.
Grid
(n_points)¶ Bases:
object
This is an abstract class with attributes and methods for grids
Parameters: n_points (int) – the number of points on the grid -
apply
(func, spherical=False)¶
-
find_peaks
(k=1)¶
-
set_values
(vals)¶
-
-
class
pyroomacoustics.doa.grid.
GridCircle
(n_points=360, azimuth=None)¶ Bases:
pyroomacoustics.doa.grid.Grid
Creates a grid on the circle.
Parameters: - n_points (int, optional) – The number of uniformly spaced points in the grid.
- azimuth (ndarray, optional) – An array of azimuth (in radians) to use for grid locations. Overrides n_points.
-
apply
(func, spherical=False)¶
-
find_peaks
(k=1)¶
-
plot
(mark_peaks=0)¶
-
class
pyroomacoustics.doa.grid.
GridSphere
(n_points=1000, spherical_points=None)¶ Bases:
pyroomacoustics.doa.grid.Grid
This function computes nearly equidistant points on the sphere using the fibonacci method
Parameters: - n_points (int) – The number of points to sample
- spherical_points (ndarray, optional) – A 2 x n_points array of spherical coordinates with azimuth in the top row and colatitude in the second row. Overrides n_points.
References
http://lgdv.cs.fau.de/uploads/publications/spherical_fibonacci_mapping.pdf http://stackoverflow.com/questions/9600801/evenly-distributing-n-points-on-a-sphere
-
apply
(func, spherical=False)¶ Apply a function to every grid point
-
find_peaks
(k=1)¶ Find the largest peaks on the grid
-
min_max_distance
()¶ Compute some statistics on the distribution of the points
-
plot
(colatitude_ref=None, azimuth_ref=None, colatitude_recon=None, azimuth_recon=None, plotly=True, projection=True, points_only=False)¶
-
plot_old
(plot_points=False, mark_peaks=0)¶ Plot the points on the sphere with their values
-
regrid
()¶ Regrid the non-uniform data on a regular mesh
Plot Helpers¶
A collection of functions to plot maps and points on circles and spheres.
-
pyroomacoustics.doa.plotters.
polar_plt_dirac
(self, azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)¶ Generate polar plot of DoA results.
Parameters: - azimuth_ref (numpy array) – True direction of sources (in radians).
- alpha_ref (numpy array) – Estimated amplitude of sources.
- save_fig (bool) – Whether or not to save figure as pdf.
- file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’
- plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.
-
pyroomacoustics.doa.plotters.
sph_plot_diracs
(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, colatitude_grid=None, azimuth_grid=None, file_name='sph_recon_2d_dirac.pdf', **kwargs)¶ This function plots the dirty image with sources locations on a flat projection of the sphere
Parameters: - colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points
- azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs
- colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize
- azimuth (ndarray, optional) – The azimuths of the collection of points to visualize
- dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points
- azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map
- colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map
-
pyroomacoustics.doa.plotters.
sph_plot_diracs_plotly
(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, azimuth_grid=None, colatitude_grid=None, surface_base=1, surface_height=0.0)¶ Plots a 2D map on a sphere as well as a collection of diracs using the plotly library
Parameters: - colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points
- azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs
- colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize
- azimuth (ndarray, optional) – The azimuths of the collection of points to visualize
- dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points
- azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map
- colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map
- surface_base – radius corresponding to lowest height on the map
- sufrace_height – radius difference between the lowest and highest point on the map
Peak Detection¶
Detect peaks in data based on their amplitude and other features.
Author: Marcos Duarte, https://github.com/demotu/BMC Version: 1.0.4 License: MIT
-
pyroomacoustics.doa.detect_peaks.
detect_peaks
(x, mph=None, mpd=1, threshold=0, edge='rising', kpsh=False, valley=False, show=False, ax=None)¶ Detect peaks in data based on their amplitude and other features.
Parameters: - x (1D array_like) – data.
- mph ({None, number}, optional (default = None)) – detect peaks that are greater than minimum peak height.
- mpd (positive integer, optional (default = 1)) – detect peaks that are at least separated by minimum peak distance (in number of data).
- threshold (positive number, optional (default = 0)) – detect peaks (valleys) that are greater (smaller) than threshold in relation to their immediate neighbors.
- edge ({None, 'rising', 'falling', 'both'}, optional (default = 'rising')) – for a flat peak, keep only the rising edge (‘rising’), only the falling edge (‘falling’), both edges (‘both’), or don’t detect a flat peak (None).
- kpsh (bool, optional (default = False)) – keep peaks with same height even if they are closer than mpd.
- valley (bool, optional (default = False)) – if True (1), detect valleys (local minima) instead of peaks.
- show (bool, optional (default = False)) – if True (1), plot data in matplotlib figure.
- ax (a matplotlib.axes.Axes instance, optional (default = None)) –
Returns: ind – indeces of the peaks in x.
Return type: 1D array_like
Notes
The detection of valleys instead of peaks is performed internally by simply negating the data: ind_valleys = detect_peaks(-x)
The function can handle NaN’s
See this IPython Notebook [1].
References
[1] http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/DetectPeaks.ipynb Examples
>>> from detect_peaks import detect_peaks >>> x = np.random.randn(100) >>> x[60:81] = np.nan >>> # detect all peaks and plot data >>> ind = detect_peaks(x, show=True) >>> print(ind)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5 >>> # set minimum peak height = 0 and minimum peak distance = 20 >>> detect_peaks(x, mph=0, mpd=20, show=True)
>>> x = [0, 1, 0, 2, 0, 3, 0, 2, 0, 1, 0] >>> # set minimum peak distance = 2 >>> detect_peaks(x, mpd=2, show=True)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5 >>> # detection of valleys instead of peaks >>> detect_peaks(x, mph=0, mpd=20, valley=True, show=True)
>>> x = [0, 1, 1, 0, 1, 1, 0] >>> # detect both edges >>> detect_peaks(x, edge='both', show=True)
>>> x = [-2, 1, -2, 2, 1, 1, 3, 0] >>> # set threshold = 2 >>> detect_peaks(x, threshold = 2, show=True)
DOA Utilities¶
This module contains useful functions to compute distances and errors on on circles and spheres.
-
pyroomacoustics.doa.utils.
circ_dist
(azimuth1, azimuth2, r=1.0)¶ Returns the shortest distance between two points on a circle
Parameters: - azimuth1 – azimuth of point 1
- azimuth2 – azimuth of point 2
- r (optional) – radius of the circle (Default 1)
-
pyroomacoustics.doa.utils.
great_circ_dist
(r, colatitude1, azimuth1, colatitude2, azimuth2)¶ calculate great circle distance for points located on a sphere
Parameters: - r (radius of the sphere) –
- colatitude1 (colatitude of point 1) –
- azimuth1 (azimuth of point 1) –
- colatitude2 (colatitude of point 2) –
- azimuth2 (azimuth of point 2) –
Returns: great-circle distance
Return type: float or ndarray
-
pyroomacoustics.doa.utils.
polar_distance
(x1, x2)¶ Given two arrays of numbers x1 and x2, pairs the cells that are the closest and provides the pairing matrix index: x1(index(1,:)) should be as close as possible to x2(index(2,:)). The function outputs the average of the absolute value of the differences abs(x1(index(1,:))-x2(index(2,:))).
Parameters: - x1 – vector 1
- x2 – vector 2
Returns: - d – minimum distance between d
- index – the permutation matrix
-
pyroomacoustics.doa.utils.
spher2cart
(r, azimuth, colatitude)¶ Convert a spherical point to cartesian coordinates.
Parameters: - r – radius
- azimuth – azimuth
- colatitude – colatitude
Returns: An ndarray containing the Cartesian coordinates of the points its columns
Return type: ndarray
Single Channel Denoising¶
Module contents¶
Single Channel Noise Reduction¶
Collection of single channel noise reduction (SCNR) algorithms for speech. At the moment, only a spectral subtraction method, similar to [1], is implemented.
At the following repository, a deep learning approach in Python can be found here.
Other methods for speech enhancement/noise reduction employ Wiener filtering [2] and subspace approaches [3].
References
[1] | M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, ICASSP ‘79. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1979, pp. 208-211. |
[2] | J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210. |
[3] | Y. Ephraim and H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul 1995. |
Algorithms¶
pyroomacoustics.denoise.spectral_subtraction module¶
-
class
pyroomacoustics.denoise.spectral_subtraction.
SpectralSub
(nfft, db_reduc, lookback, beta, alpha=1)¶ Bases:
object
Here we have a class for performing single channel noise reduction via spectral subtraction. The instantaneous signal energy and noise floor is estimated at each time instance (for each frequency bin) and this is used to compute a gain filter with which to perform spectral subtraction.
For a given frame n, the gain for frequency bin k is given by:
\[G[k, n] = \max \left \{ \left ( \dfrac{P[k, n]-\beta P_N[k, n]}{P[k, n]} \right )^\alpha, G_{min} \right \},\]where \(G_{min} = 10^{-(db\_reduc/20)}\) and \(db\_reduc\) is the maximum reduction (in dB) that we are willing to perform for each bin (a high value can actually be detrimental, see below). The instantaneous energy \(P[k,n]\) is computed by simply squaring the frequency amplitude at the bin k. The time-frequency decomposition of the input signal is typically done with the STFT and overlapping frames. The noise estimate \(P_N[k, n]\) for frequency bin k is given by looking back a certain number of frames \(L\) and selecting the bin with the lowest energy:
\[P_N[k, n] = \min_{[n-L, n]} P[k, n]\]This approach works best when the SNR is positive and the noise is rather stationary. An alternative approach for the noise estimate (also in the case of stationary noise) would be to apply a lowpass filter for each frequency bin.
With a large suppression, i.e. large values for \(db\_reduc\), we can observe a typical artefact of such spectral subtraction approaches, namely “musical noise”. Here is nice article about noise reduction and musical noise.
Adjusting the constants \(\beta\) and \(\alpha\) also presents a trade-off between suppression and undesirable artefacts, i.e. more noticeable musical noise.
Below is an example of how to use this class. A full example can be found in the “examples” folder of the repository.
# initialize STFT and SpectralSub objects nfft = 512 stft = pra.transform.STFT(nfft, hop=nfft//2, analysis_window=pra.hann(nfft)) scnr = pra.denoise.SpectralSub(nfft, db_reduc=10, lookback=5, beta=20, alpha=3) # apply block-by-block for n in range(num_blocks): # go to frequency domain for noise reduction stft.analysis(mono_noisy) gain_filt = scnr.compute_gain_filter(stft.X) # estimating input convolved with unknown response mono_denoised = stft.synthesis(gain_filt*stft.X)
Parameters: - nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is given by
nfft//2+1
. - db_reduc (float) – Maximum reduction in dB for each bin.
- lookback (int) – How many frames to look back for the noise estimate.
- beta (float) – Overestimation factor to “push” the gain filter value (at each frequency)
closer to the dB reduction specified by
db_reduc
. - alpha (float, optional) – Exponent factor to modify transition behavior towards the dB reduction
specified by
db_reduc
. Default is 1.
-
compute_gain_filter
(X)¶ Parameters: X (numpy array) – Complex spectrum of length nfft//2+1
.Returns: Gain filter to multiply given spectrum with. Return type: numpy array
- nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is given by
Pyroomacoustics API¶
Subpackages¶
pyroomacoustics.experimental package¶
Submodules¶
-
pyroomacoustics.experimental.deconvolution.
deconvolve
(y, s, length=None, thresh=0.0)¶ Deconvolve an excitation signal from an impulse response
Parameters: - y (ndarray) – The recording
- s (ndarray) – The excitation signal
- length (int, optional) – the length of the impulse response to deconvolve
- thresh (float, optional) – ignore frequency bins with power lower than this
-
pyroomacoustics.experimental.deconvolution.
wiener_deconvolve
(y, x, length=None, noise_variance=1.0, let_n_points=15, let_div_base=2)¶ Deconvolve an excitation signal from an impulse response
We use Wiener filter
Parameters: - y (ndarray) – The recording
- x (ndarray) – The excitation signal
- length (int, optional) – the length of the impulse response to deconvolve
- noise_variance (float, optional) – estimate of the noise variance
- let_n_points (int) – number of points to use in the LET approximation
- let_div_base (float) – the divider used for the LET grid
-
class
pyroomacoustics.experimental.delay_calibration.
DelayCalibration
(fs, pad_time=0.0, mls_bits=16, repeat=1, temperature=25.0, humidity=50.0, pressure=1000.0)¶ -
run
(distance=0.0, ch_in=None, ch_out=None, oversampling=1)¶ Run the calibration. Plays a maximum length sequence and cross correlate the signals to find the time delay.
Parameters: - distance (float, optional) – Distance between the speaker and microphone
- ch_in (int, optional) – The input channel to use. If not specified, all channels are calibrated
- ch_out (int, optional) – The output channel to use. If not specified, all channels are calibrated
-
-
pyroomacoustics.experimental.localization.
edm_line_search
(R, tdoa, bounds, steps)¶ We have a number of points of know locations and have the TDOA measurements from an unknown location to the known point. We perform an EDM line search to find the unknown offset to turn TDOA to TOA.
Parameters: - R (ndarray) – An ndarray of 3xN where each column is the location of a point
- tdoa (ndarray) – A length N vector containing the tdoa measurements from uknown location to known ones
- bounds (ndarray) – Bounds for the line search
- step (float) – Step size for the line search
-
pyroomacoustics.experimental.localization.
tdoa
(x1, x2, interp=1, fs=1, phat=True)¶ This function computes the time difference of arrival (TDOA) of the signal at the two microphones. This in turns is used to infer the direction of arrival (DOA) of the signal.
Specifically if s(k) is the signal at the reference microphone and s_2(k) at the second microphone, then for signal arriving with DOA theta we have
s_2(k) = s(k - tau)
with
tau = fs*d*sin(theta)/c
where d is the distance between the two microphones and c the speed of sound.
We recover tau using the Generalized Cross Correlation - Phase Transform (GCC-PHAT) method. The reference is
Knapp, C., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay.
Parameters: - x1 (nd-array) – The signal of the reference microphone
- x2 (nd-array) – The signal of the second microphone
- interp (int, optional (default 1)) – The interpolation value for the cross-correlation, it can improve the time resolution (and hence DOA resolution)
- fs (int, optional (default 44100 Hz)) – The sampling frequency of the input signal
Returns: - theta (float) – the angle of arrival (in radian (I think))
- pwr (float) – the magnitude of the maximum cross correlation coefficient
- delay (float) – the delay between the two microphones (in seconds)
-
pyroomacoustics.experimental.localization.
tdoa_loc
(R, tdoa, c, x0=None)¶ TDOA based localization
Parameters: - R (ndarray) – A 3xN array of 3D points
- tdoa (ndarray) – A length N array of tdoa
- c (float) – The speed of sound
- Reference –
- --------- –
- Li, TDOA localization (Steven) –
-
pyroomacoustics.experimental.measure_ir.
measure_ir
(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)¶ Measures an impulse response by playing a sweep and recording it using the sounddevice package.
Parameters: - sweep_length (float, optional) – length of the sweep in seconds
- sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)
- fs (int, optional) – sampling frequency (default 48 kHz)
- f_lo (float, optional) – lowest frequency in the sweep
- f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2
- volume (float, optional) – multiply the sweep by this number before playing (default 0.9)
- pre_delay (float, optional) – delay in second before playing sweep
- post_delay (float, optional) – delay in second before stopping recording after playing the sweep
- fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)
- dev_in (int, optional) – input device number
- dev_out (int, optional) – output device number
- channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.
- channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.
- ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies
- deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)
- plot (bool, optional) – plot the resulting signal
Returns: Return type: Returns the impulse response if deconvolution == True and the recorded signal if not
-
pyroomacoustics.experimental.physics.
calculate_speed_of_sound
(t, h, p)¶ Compute the speed of sound as a function of temperature, humidity and pressure
Parameters: - t (temperature [Celsius]) –
- h (relative humidity [%]) –
- p (atmospheric pressure [kpa]) –
Returns: Return type: Speed of sound in [m/s]
Contains PointCloud class.
Given a number of points and their relative distances, this class aims at reconstructing their relative coordinates.
-
class
pyroomacoustics.experimental.point_cloud.
PointCloud
(m=1, dim=3, diameter=0.0, X=None, labels=None, EDM=None)¶ -
EDM
()¶ Computes the EDM corresponding to the marker set
-
align
(marker, axis)¶ Rotate the marker set around the given axis until it is aligned onto the given marker
Parameters: - marker (int or str) – the index or label of the marker onto which to align the set
- axis (int) – the axis around which the rotation happens
-
center
(marker)¶ Translate the marker set so that the argument is the origin.
-
classical_mds
(D)¶ Classical multidimensional scaling
Parameters: D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
-
copy
()¶ Return a deep copy of this marker set object
-
correct
(corr_dic)¶ correct a marker location by a given vector
-
doa
(receiver, source)¶ Computes the direction of arrival wrt a source and receiver
-
flatten
(ind)¶ Transform the set of points so that the subset of markers given as argument is as close as flat (wrt z-axis) as possible.
Parameters: ind (list of bools) – Lists of marker indices that should be all in the same subspace
-
fromEDM
(D, labels=None, method='mds')¶ Compute the position of markers from their Euclidean Distance Matrix
Parameters: - D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
- labels (list, optional) – A list of human friendly labels for the markers (e.g. ‘east’, ‘west’, etc)
- method (str, optional) – The method to use * ‘mds’ for multidimensional scaling (default) * ‘tri’ for trilateration
-
key2ind
(ref)¶ Get the index location from a label
-
normalize
(refs=None)¶ Reposition points such that x0 is at origin, x1 lies on c-axis and x2 lies above x-axis, keeping the relative position to each other. The z-axis is defined according to right hand rule by default.
Parameters: - refs (list of 3 ints or str) – The index or label of three markers used to define (origin, x-axis, y-axis)
- left_hand (bool, optional (default False)) – Normally the z-axis is defined using right-hand rule, this flag allows to override this behavior
-
plot
(axes=None, show_labels=True, **kwargs)¶
-
trilateration
(D)¶ Find the location of points based on their distance matrix using trilateration
Parameters: D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
-
trilateration_single_point
(c, Dx, Dy)¶ Given x at origin (0,0) and y at (0,c) the distances from a point at unknown location Dx, Dy to x, y, respectively, finds the position of the point.
-
A few test signals like sweeps and stuff.
-
pyroomacoustics.experimental.signals.
exponential_sweep
(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)¶ Exponential sine sweep
Parameters: - T (float) – length in seconds
- fs – sampling frequency
- f_lo (float) – lowest frequency in fraction of fs (default 0)
- f_hi (float) – lowest frequency in fraction of fs (default 1)
- fade (float, optional) – length of fade in and out in seconds (default 0)
- ascending (bool, optional) –
-
pyroomacoustics.experimental.signals.
linear_sweep
(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)¶ Linear sine sweep
Parameters: - T (float) – length in seconds
- fs – sampling frequency
- f_lo (float) – lowest frequency in fraction of fs (default 0)
- f_hi (float) – lowest frequency in fraction of fs (default 1)
- fade (float, optional) – length of fade in and out in seconds (default 0)
- ascending (bool, optional) –
-
pyroomacoustics.experimental.signals.
window
(signal, n_win)¶ window the signal at beginning and end with window of size n_win/2
Module contents¶
A bunch of routines useful when doing measurements and experiments.
-
pyroomacoustics.experimental.
measure_ir
(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)¶ Measures an impulse response by playing a sweep and recording it using the sounddevice package.
Parameters: - sweep_length (float, optional) – length of the sweep in seconds
- sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)
- fs (int, optional) – sampling frequency (default 48 kHz)
- f_lo (float, optional) – lowest frequency in the sweep
- f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2
- volume (float, optional) – multiply the sweep by this number before playing (default 0.9)
- pre_delay (float, optional) – delay in second before playing sweep
- post_delay (float, optional) – delay in second before stopping recording after playing the sweep
- fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)
- dev_in (int, optional) – input device number
- dev_out (int, optional) – output device number
- channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.
- channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.
- ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies
- deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)
- plot (bool, optional) – plot the resulting signal
Returns: Return type: Returns the impulse response if deconvolution == True and the recorded signal if not
Submodules¶
pyroomacoustics.acoustics module¶
-
pyroomacoustics.acoustics.
bands_hz2s
(bands_hz, Fs, N, transform='dft')¶ Converts bands given in Hertz to samples with respect to a given sampling frequency Fs and a transform size N an optional transform type is used to handle DCT case.
-
pyroomacoustics.acoustics.
binning
(S, bands)¶ This function computes the sum of all columns of S in the subbands enumerated in bands
-
pyroomacoustics.acoustics.
critical_bands
()¶ Compute the Critical bands as defined in the book: Psychoacoustics by Zwicker and Fastl. Table 6.1 p. 159
-
pyroomacoustics.acoustics.
invmelscale
(b)¶ Converts from melscale to frequency in Hertz according to Huang-Acero-Hon (6.143)
-
pyroomacoustics.acoustics.
melfilterbank
(M, N, fs=1, fl=0.0, fh=0.5)¶ Returns a filter bank of triangular filters spaced according to mel scale
We follow Huang-Acera-Hon 6.5.2
Parameters: - M ((int)) – The number of filters in the bank
- N ((int)) – The length of the DFT
- fs ((float) optional) – The sampling frequency (default 8000)
- fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)
- fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)
Returns: Return type: An M times int(N/2)+1 ndarray that contains one filter per row
-
pyroomacoustics.acoustics.
melscale
(f)¶ Converts f (in Hertz) to the melscale defined according to Huang-Acero-Hon (2.6)
-
pyroomacoustics.acoustics.
mfcc
(x, L=128, hop=64, M=14, fs=8000, fl=0.0, fh=0.5)¶ Computes the Mel-Frequency Cepstrum Coefficients (MFCC) according to the description by Huang-Acera-Hon 6.5.2 (2001) The MFCC are features mimicing the human perception usually used for some learning task.
This function will first split the signal into frames, overlapping or not, and then compute the MFCC for each frame.
Parameters: - x ((nd-array)) – Input signal
- L ((int)) – Frame size (default 128)
- hop ((int)) – Number of samples to skip between two frames (default 64)
- M ((int)) – Number of mel-frequency filters (default 14)
- fs ((int)) – Sampling frequency (default 8000)
- fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)
- fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)
Returns: Return type: The MFCC of the input signal
-
pyroomacoustics.acoustics.
octave_bands
(fc=1000, third=False)¶ Create a bank of octave bands
Parameters: - fc (float, optional) – The center frequency
- third (bool, optional) – Use third octave bands (default False)
pyroomacoustics.beamforming module¶
-
class
pyroomacoustics.beamforming.
Beamformer
(R, fs, N=1024, Lg=None, hop=None, zpf=0, zpb=0)¶ Bases:
pyroomacoustics.beamforming.MicrophoneArray
At some point, in some nice way, the design methods should also go here. Probably with generic arguments.
Parameters: - R (numpy.ndarray) – Mics positions
- fs (int) – Sampling frequency
- N (int, optional) – Length of FFT, i.e. number of FD beamforming weights, equally spaced. Defaults to 1024.
- Lg (int, optional) – Length of time-domain filters. Default to N.
- hop (int, optional) – Hop length for frequency domain processing. Default to N/2.
- zpf (int, optional) – Front zero padding length for frequency domain processing. Default is 0.
- zpb (int, optional) – Zero padding length for frequency domain processing. Default is 0.
-
far_field_weights
(phi)¶ This method computes weight for a far field at infinity
phi: direction of beam
-
filters_from_weights
(non_causal=0.0)¶ Compute time-domain filters from frequency domain weights.
Parameters: non_causal (float, optional) – ratio of filter coefficients used for non-causal part
-
plot
(sum_ir=False, FD=True)¶
-
plot_beam_response
()¶
-
plot_response_from_point
(x, legend=None)¶
-
process
(FD=False)¶
-
rake_delay_and_sum_weights
(source, interferer=None, R_n=None, attn=True, ff=False)¶
-
rake_distortionless_filters
(source, interferer, R_n, delay=0.03, epsilon=0.005)¶ Compute time-domain filters of a beamformer minimizing noise and interference while forcing a distortionless response towards the source.
-
rake_max_sinr_filters
(source, interferer, R_n, epsilon=0.005, delay=0.0)¶ Compute the time-domain filters of SINR maximizing beamformer.
-
rake_max_sinr_weights
(source, interferer=None, R_n=None, rcond=0.0, ff=False, attn=True)¶ This method computes a beamformer focusing on a number of specific sources and ignoring a number of interferers.
- INPUTS
- source : source locations
- interferer : interferer locations
-
rake_max_udr_filters
(source, interferer=None, R_n=None, delay=0.03, epsilon=0.005)¶ Compute directly the time-domain filters maximizing the Useful-to-Detrimental Ratio (UDR).
This beamformer is not practical. It maximizes the UDR ratio in the time domain directly without imposing flat response towards the source of interest. This results in severe distortion of the desired signal.
Parameters: - source (pyroomacoustics.SoundSource) – the desired source
- interferer (pyroomacoustics.SoundSource, optional) – the interfering source
- R_n (ndarray, optional) – the noise covariance matrix, it should be (M * Lg)x(M * Lg) where M is the number of sensors and Lg the filter length
- delay (float, optional) – the signal delay introduced by the beamformer (default 0.03 s)
- epsilon (float) –
-
rake_max_udr_weights
(source, interferer=None, R_n=None, ff=False, attn=True)¶
-
rake_mvdr_filters
(source, interferer, R_n, delay=0.03, epsilon=0.005)¶ Compute the time-domain filters of the minimum variance distortionless response beamformer.
-
rake_one_forcing_filters
(sources, interferers, R_n, epsilon=0.005)¶ Compute the time-domain filters of a beamformer with unit response towards multiple sources.
-
rake_one_forcing_weights
(source, interferer=None, R_n=None, ff=False, attn=True)¶
-
rake_perceptual_filters
(source, interferer=None, R_n=None, delay=0.03, d_relax=0.035, epsilon=0.005)¶ Compute directly the time-domain filters for a perceptually motivated beamformer. The beamformer minimizes noise and interference, but relaxes the response of the filter within the 30 ms following the delay.
-
response
(phi_list, frequency)¶
-
response_from_point
(x, frequency)¶
-
snr
(source, interferer, f, R_n=None, dB=False)¶
-
steering_vector_2D
(frequency, phi, dist, attn=False)¶
-
steering_vector_2D_from_point
(frequency, source, attn=True, ff=False)¶ Creates a steering vector for a particular frequency and source
Parameters: - frequency –
- source – location in cartesian coordinates
- attn – include attenuation factor if True
- ff – uses far-field distance if true
Returns: A 2x1 ndarray containing the steering vector.
-
udr
(source, interferer, f, R_n=None, dB=False)¶
-
weights_from_filters
()¶
-
pyroomacoustics.beamforming.
H
(A, **kwargs)¶ Returns the conjugate (Hermitian) transpose of a matrix.
-
class
pyroomacoustics.beamforming.
MicrophoneArray
(R, fs)¶ Bases:
object
Microphone array class.
-
record
(signals, fs)¶ This simulates the recording of the signals by the microphones. In particular, if the microphones and the room simulation do not use the same sampling frequency, down/up-sampling is done here.
Parameters: - signals – An ndarray with as many lines as there are microphones.
- fs – the sampling frequency of the signals.
-
to_wav
(filename, mono=False, norm=False, bitdepth=<Mock id='140450837159056'>)¶ Save all the signals to wav files.
Parameters: - filename (str) – the name of the file
- mono (bool, optional) – if true, records only the center channel floor(M / 2) (default False)
- norm (bool, optional) – if true, normalize the signal to fit in the dynamic range (default False)
- bitdepth (int, optional) – the format of output samples [np.int8/16/32/64 or np.float (default)]
-
-
pyroomacoustics.beamforming.
circular_2D_array
(center, M, phi0, radius)¶ Creates an array of uniformly spaced circular points in 2D
Parameters: - center (array_like) – The center of the array
- M (int) – The number of points
- phi0 (float) – The counterclockwise rotation of the first element in the array (from the x-axis)
- radius (float) – The radius of the array
Returns: The array of points
Return type: ndarray (2, M)
-
pyroomacoustics.beamforming.
distance
(x, y)¶ Computes the distance matrix E.
E[i,j] = sqrt(sum((x[:,i]-y[:,j])**2)). x and y are DxN ndarray containing N D-dimensional vectors.
-
pyroomacoustics.beamforming.
fir_approximation_ls
(weights, T, n1, n2)¶
-
pyroomacoustics.beamforming.
linear_2D_array
(center, M, phi, d)¶ Creates an array of uniformly spaced linear points in 2D
Parameters: - center (array_like) – The center of the array
- M (int) – The number of points
- phi (float) – The counterclockwise rotation of the array (from the x-axis)
- d (float) – The distance between neighboring points
Returns: The array of points
Return type: ndarray (2, M)
-
pyroomacoustics.beamforming.
mdot
(*args)¶ Left-to-right associative matrix multiplication of multiple 2D ndarrays.
-
pyroomacoustics.beamforming.
poisson_2D_array
(center, M, d)¶ Create array of 2D positions drawn from Poisson process.
Parameters: - center (array_like) – The center of the array
- M (int) – The number of points in the first dimension
- M – The number of points in the second dimension
- phi (float) – The counterclockwise rotation of the array (from the x-axis)
- d (float) – The distance between neighboring points
Returns: The array of points
Return type: ndarray (2, M * N)
-
pyroomacoustics.beamforming.
spiral_2D_array
(center, M, radius=1.0, divi=3, angle=None)¶ Generate an array of points placed on a spiral
Parameters: - center (array_like) – location of the center of the array
- M (int) – number of microphones
- radius (float) – microphones are contained within a cirle of this radius (default 1)
- divi (int) – number of rotations of the spiral (default 3)
- angle (float) – the angle offset of the spiral (default random)
Returns: The array of points
Return type: ndarray (2, M * N)
-
pyroomacoustics.beamforming.
square_2D_array
(center, M, N, phi, d)¶ Creates an array of uniformly spaced grid points in 2D
Parameters: - center (array_like) – The center of the array
- M (int) – The number of points in the first dimension
- M – The number of points in the second dimension
- phi (float) – The counterclockwise rotation of the array (from the x-axis)
- d (float) – The distance between neighboring points
Returns: The array of points
Return type: ndarray (2, M * N)
-
pyroomacoustics.beamforming.
sumcols
(A)¶ Sums the columns of a matrix (np.array).
The output is a 2D np.array of dimensions M x 1.
-
pyroomacoustics.beamforming.
unit_vec2D
(phi)¶
pyroomacoustics.build_rir module¶
pyroomacoustics.geometry module¶
-
pyroomacoustics.geometry.
area
(corners)¶ Computes the signed area of a 2D surface represented by its corners.
Parameters: corners – (np.array 2xN, N>2) list of coordinates of the corners forming the surface Returns: (float) area of the surface positive area means anti-clockwise ordered corners. negative area means clockwise ordered corners.
-
pyroomacoustics.geometry.
ccw3p
(p1, p2, p3)¶ Computes the orientation of three 2D points.
Parameters: - p1 – (ndarray size 2) coordinates of a 2D point
- p2 – (ndarray size 2) coordinates of a 2D point
- p3 – (ndarray size 2) coordinates of a 2D point
Returns: (int) orientation of the given triangle 1 if triangle vertices are counter-clockwise -1 if triangle vertices are clockwise 0 if vertices are collinear
Ref:
-
pyroomacoustics.geometry.
intersection_2D_segments
(a1, a2, b1, b2)¶ Computes the intersection between two 2D line segments.
This function computes the intersection between two 2D segments (defined by the coordinates of their endpoints) and returns the coordinates of the intersection point. If there is no intersection, None is returned. If segments are collinear, None is returned. Two booleans are also returned to indicate if the intersection happened at extremities of the segments, which can be useful for limit cases computations.
Parameters: - a1 – (ndarray size 2) coordinates of the first endpoint of segment a
- a2 – (ndarray size 2) coordinates of the second endpoint of segment a
- b1 – (ndarray size 2) coordinates of the first endpoint of segment b
- b2 – (ndarray size 2) coordinates of the second endpoint of segment b
Returns: (tuple of 3 elements) results of the computation (ndarray size 2 or None) coordinates of the intersection point (bool) True if the intersection is at boundaries of segment a (bool) True if the intersection is at boundaries of segment b
-
pyroomacoustics.geometry.
intersection_segment_plane
(a1, a2, p, normal)¶ Computes the intersection between a line segment and a plane in 3D.
This function computes the intersection between a line segment (defined by the coordinates of two points) and a plane (defined by a point belonging to it and a normal vector). If there is no intersection, None is returned. If the segment belongs to the surface, None is returned. A boolean is also returned to indicate if the intersection happened at extremities of the segment, which can be useful for limit cases computations.
Parameters: - a1 – (ndarray size 3) coordinates of the first endpoint of the segment
- a2 – (ndarray size 3) coordinates of the second endpoint of the segment
- p – (ndarray size 3) coordinates of a point belonging to the plane
- normal – (ndarray size 3) normal vector of the plane
Returns: (tuple of 2 elements) results of the computation (ndarray size 3 or None) coordinates of the intersection point (bool) True if the intersection is at boundaries of the segment
-
pyroomacoustics.geometry.
intersection_segment_polygon_surface
(a1, a2, corners_2d, normal, plane_point, plane_basis)¶ Computes the intersection between a line segment and a polygon surface in 3D.
This function computes the intersection between a line segment (defined by the coordinates of two points) and a surface (defined by an array of coordinates of corners of the polygon and a normal vector) If there is no intersection, None is returned. If the segment belongs to the surface, None is returned. Two booleans are also returned to indicate if the intersection happened at extremities of the segment or at a border of the polygon, which can be useful for limit cases computations.
Parameters: - a1 – (ndarray size 3) coordinates of the first endpoint of the segment
- a2 – (ndarray size 3) coordinates of the second endpoint of the segment
- corners – (ndarray size 3xN, N>2) coordinates of the corners of the polygon
- normal – (ndarray size 3) normal vector of the surface
Returns: (tuple of 3 elements) results of the computation (ndarray size 3 or None) coordinates of the intersection point (bool) True if the intersection is at boundaries of the segment (bool) True if the intersection is at boundaries of the polygon
-
pyroomacoustics.geometry.
is_inside_2D_polygon
(p, corners)¶ Checks if a given point is inside a given polygon in 2D.
This function checks if a point (defined by its coordinates) is inside a polygon (defined by an array of coordinates of its corners) by counting the number of intersections between the borders and a segment linking the given point with a computed point outside the polygon. A boolean is also returned to indicate if a point is on a border of the polygon (the point is still considered inside), which can be useful for limit cases computations.
Parameters: - p – (ndarray size 2) coordinates of the point
- corners – (ndarray size 2xN, N>2) coordinates of the corners of the polygon
Returns: (tuple of 2 elements) results of the computation (bool) True if the point is inside (bool) True if the intersection is at boundaries of the polygon
-
pyroomacoustics.geometry.
side
(p, p0, vector)¶ Compute on which side of a given point an other given point is according to a vector.
Parameters: - p – (ndarray size 2 or 3) point to be tested
- p0 – (ndarray size 2 or 3) origin point
- vector – (ndarray size 2 or 3) directional vector
Returns: (int) direction of the point 1 : p is at the side pointed by the vector 0 : p is in the middle (on the same line as p0) -1 : p is at the opposite side of the one pointed by the vector
pyroomacoustics.metrics module¶
-
pyroomacoustics.metrics.
itakura_saito
(x1, x2, sigma2_n, stft_L=128, stft_hop=128)¶
-
pyroomacoustics.metrics.
median
(x, alpha=None, axis=-1, keepdims=False)¶ Computes 95% confidence interval for the median.
Parameters: - x (array_like) – the data array
- alpha (float, optional) – the confidence level of the interval, confidence intervals are only computed when this argument is provided
- axis (int, optional) – the axis of the data on which to operate, by default the last axis
:param : :type : returns: A tuple (m, [le, ue]). The confidence interval is [m-le, m+ue].
-
pyroomacoustics.metrics.
mse
(x1, x2)¶ A short hand to compute the mean-squared error of two signals.
\[MSE = \frac{1}{n}\sum_{i=0}^{n-1} (x_i - y_i)^2\]Parameters: - x1 – (ndarray)
- x2 – (ndarray)
Returns: (float) The mean of the squared differences of x1 and x2.
-
pyroomacoustics.metrics.
pesq
(ref_file, deg_files, Fs=8000, swap=False, wb=False, bin='./bin/pesq')¶ pesq_vals = pesq(ref_file, deg_files, sample_rate=None, bin=’./bin/pesq’): Computes the perceptual evaluation of speech quality (PESQ) metric of a degraded file with respect to a reference file. Uses the utility obtained from ITU P.862 http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en
Parameters: - ref_file – The filename of the reference file.
- deg_files – A list of degraded sound files names.
- sample_rate – Sample rates of the sound files [8kHz or 16kHz, default 8kHz].
- swap – Swap byte orders (whatever that does is not clear to me) [default: False].
- wb – Use wideband algorithm [default: False].
- bin – Location of pesq executable [default: ./bin/pesq].
Returns: (ndarray size 2xN) ndarray containing Raw MOS and MOS LQO in rows 0 and 1, respectively, and has one column per degraded file name in deg_files.
-
pyroomacoustics.metrics.
snr
(ref, deg)¶
pyroomacoustics.multirate module¶
-
pyroomacoustics.multirate.
frac_delay
(delta, N, w_max=0.9, C=4)¶ Compute optimal fractionnal delay filter according to
Design of Fractional Delay Filters Using Convex Optimization William Putnam and Julius Smith
Parameters: - delta – delay of filter in (fractionnal) samples
- N – number of taps
- w_max – Bandwidth of the filter (in fraction of pi) (default 0.9)
- C – sets the number of constraints to C*N (default 4)
-
pyroomacoustics.multirate.
low_pass
(numtaps, B, epsilon=0.1)¶
-
pyroomacoustics.multirate.
resample
(x, p, q)¶
pyroomacoustics.parameters module¶
This file defines the main physical constants of the system
-
class
pyroomacoustics.parameters.
Constants
¶ A class to provide easy access package wide to user settable constants.
Be careful of not using this in tight loops since it uses exceptions.
-
get
(name)¶
-
set
(name, val)¶
-
-
pyroomacoustics.parameters.
calculate_speed_of_sound
(t, h, p)¶ Compute the speed of sound as a function of temperature, humidity and pressure
Parameters: - t – temperature [Celsius]
- h – relative humidity [%]
- p – atmospheric pressure [kpa]
Returns: Return type: Speed of sound in [m/s]
pyroomacoustics.recognition module¶
-
class
pyroomacoustics.recognition.
CircularGaussianEmission
(nstates, odim=1, examples=None)¶ -
get_pdfs
()¶ Return the pdf of all the emission probabilities
-
prob_x_given_state
(examples)¶ Recompute the probability of the observation given the state of the latent variables
-
update_parameters
(examples, gamma)¶
-
-
class
pyroomacoustics.recognition.
GaussianEmission
(nstates, odim=1, examples=None)¶ -
get_pdfs
()¶ Return the pdf of all the emission probabilities
-
prob_x_given_state
(examples)¶ Recompute the probability of the observation given the state of the latent variables
-
update_parameters
(examples, gamma)¶
-
-
class
pyroomacoustics.recognition.
HMM
(nstates, emission, model='full', leftright_jump_max=3)¶ Hidden Markov Model with Gaussian emissions
-
K
¶ int – Number of states in the model
-
O
¶ int – Number of dimensions of the Gaussian emission distribution
-
A
¶ ndarray – KxK transition matrix of the Markov chain
-
pi
¶ ndarray – K dim vector of the initial probabilities of the Markov chain
-
emission
¶ (GaussianEmission or CircularGaussianEmission) – An instance of emission_class
-
model
¶ string, optional – The model used for the chain, can be ‘full’ or ‘left-right’
-
leftright_jum_max
¶ int, optional – The number of non-zero upper diagonals in a ‘left-right’ model
-
backward
(X, p_x_given_z, c)¶ The backward recursion for HMM as described in Bishop Ch. 13
-
fit
(examples, tol=0.1, max_iter=10, verbose=False)¶ Training of the HMM using the EM algorithm
Parameters: - examples ((list)) – A list of examples used to train the model. Each example is an array of feature vectors, each row is a feature vector, the sequence runs on axis 0
- tol ((float)) – The training stops when the progress between to steps is less than this number (default 0.1)
- max_iter ((int)) – Alternatively the algorithm stops when a maximum number of iterations is reached (default 10)
- verbose (bool, optional) – When True, prints extra information about convergence
-
forward
(X, p_x_given_z)¶ The forward recursion for HMM as described in Bishop Ch. 13
-
generate
(N)¶ Generate a random sample of length N using the model
-
loglikelihood
(X)¶ Compute the log-likelihood of a sample vector using the sum-product algorithm
-
update_parameters
(examples, gamma, xhi)¶ Update the parameters of the Markov Chain
-
viterbi
()¶
-
pyroomacoustics.soundsource module¶
-
class
pyroomacoustics.soundsource.
SoundSource
(position, images=None, damping=None, generators=None, walls=None, orders=None, signal=None, delay=0)¶ Bases:
object
A class to represent sound sources.
This object represents a sound source in a room by a list containing the original source position as well as all the image sources, up to some maximum order.
It also keeps track of the sequence of generated images and the index of the walls (in the original room) that generated the reflection.
-
add_signal
()¶
-
distance
(ref_point)¶
-
get_damping
(max_order=None)¶
-
get_images
(max_order=None, max_distance=None, n_nearest=None, ref_point=None)¶ Keep this for compatibility Now replaced by the bracket operator and the setOrdering function.
-
get_rir
(mic, visibility, Fs, t0=0.0, t_max=None)¶ Compute the room impulse response between the source and the microphone whose position is given as an argument.
-
set_ordering
(ordering, ref_point=None)¶ Set the order in which we retrieve images sources. Can be: ‘nearest’, ‘strongest’, ‘order’ Optional argument: ref_point
-
wall_sequence
(i)¶ Print the wall sequence for the image source indexed by i
-
-
pyroomacoustics.soundsource.
build_rir_matrix
(mics, sources, Lg, Fs, epsilon=0.005, unit_damping=False)¶ A function to build the channel matrix for many sources and microphones
Parameters: - mics (ndarray) – a dim-by-M ndarray where each column is the position of a microphone
- sources (list of pyroomacoustics.SoundSource) – list of sound sources for which we want to build the matrix
- Lg (int) – the length of the beamforming filters
- Fs (int) – the sampling frequency
- epsilon (float, optional) – minimum decay of the sinc before truncation. Defaults to epsilon=5e-3
- unit_damping (bool, optional) – determines if the wall damping parameters are used or not. Default to false.
Returns: - the function returns the RIR matrix H =
- :: – ——————– | H_{11} H_{12} … | … | ——————–
- where H_{ij} is channel matrix between microphone i and source j.
- H is of type (M*Lg)x((Lg+Lh-1)*S) where Lh is the channel length (determined by epsilon),
- and M, S are the number of microphones, sources, respectively.
pyroomacoustics.stft module¶
Collection of spectral estimation methods.
This module is deprecated. It is replaced by the methods of pyroomacoustics.transform
-
pyroomacoustics.stft.
freqvec
(N, fs, centered=False)¶ Compute the vector of frequencies corresponding to DFT bins. :param N: FFT length :type N: int :param fs: sampling rate of the signal :type fs: int :param shift: False if the DC is at the beginning, True if the DC is centered :type shift: int
-
pyroomacoustics.stft.
istft
(X, L, hop, transform=<Mock name='mock.ifft' id='140450837250448'>, win=None, zp_back=0, zp_front=0)¶
-
pyroomacoustics.stft.
overlap_add
(in1, in2, L)¶
-
pyroomacoustics.stft.
spectroplot
(Z, N, hop, fs, fdiv=None, tdiv=None, vmin=None, vmax=None, cmap=None, interpolation='none', colorbar=True)¶
-
pyroomacoustics.stft.
stft
(x, L, hop, transform=<Mock name='mock.fft' id='140450837250192'>, win=None, zp_back=0, zp_front=0)¶ Parameters: - x – input signal
- L – frame size
- hop – shift size between frames
- transform – the transform routine to apply (default FFT)
- win – the window to apply (default None)
- zp_back – zero padding to apply at the end of the frame
- zp_front – zero padding to apply at the beginning of the frame
Returns: Return type: The STFT of x
pyroomacoustics.sync module¶
-
pyroomacoustics.sync.
correlate
(x1, x2, interp=1, phat=False)¶ Compute the cross-correlation between x1 and x2
Parameters: - x1,x2 (array_like) – The data arrays
- interp (int, optional) – The interpolation factor for the output array, default 1.
- phat (bool, optional) – Apply the PHAT weighting (default False)
Returns: Return type: The cross-correlation between the two arrays
-
pyroomacoustics.sync.
delay_estimation
(x1, x2, L)¶ Estimate the delay between x1 and x2. L is the block length used for phat
-
pyroomacoustics.sync.
tdoa
(signal, reference, interp=1, phat=False, fs=1, t_max=None)¶ Estimates the shift of array signal with respect to reference using generalized cross-correlation
Parameters: - signal (array_like) – The array whose tdoa is measured
- reference (array_like) – The reference array
- interp (int, optional) – The interpolation factor for the output array, default 1.
- phat (bool, optional) – Apply the PHAT weighting (default False)
- fs (int or float, optional) – The sampling frequency of the input arrays, default=1
Returns: Return type: The estimated delay between the two arrays
-
pyroomacoustics.sync.
time_align
(ref, deg, L=4096)¶ return a copy of deg time-aligned and of same-length as ref. L is the block length used for correlations.
pyroomacoustics.utilities module¶
-
pyroomacoustics.utilities.
angle_from_points
(x1, x2)¶
-
pyroomacoustics.utilities.
clip
(signal, high, low)¶ Clip a signal from above at high and from below at low.
-
pyroomacoustics.utilities.
compare_plot
(signal1, signal2, Fs, fft_size=512, norm=False, equal=False, title1=None, title2=None)¶
-
pyroomacoustics.utilities.
convmtx
(x, n)¶ Create a convolution matrix H for the vector x of size len(x) times n. Then, the result of np.dot(H,v) where v is a vector of length n is the same as np.convolve(x, v).
-
pyroomacoustics.utilities.
dB
(signal, power=False)¶
-
pyroomacoustics.utilities.
fractional_delay
(t0)¶ Creates a fractional delay filter using a windowed sinc function. The length of the filter is fixed by the module wide constant frac_delay_length (default 81).
Parameters: t0 (float) – The delay in fraction of sample. Typically between 0 and 1. Returns: Return type: A fractional delay filter with specified delay.
-
pyroomacoustics.utilities.
fractional_delay_filter_bank
(delays)¶ Creates a fractional delay filter bank of windowed sinc filters
Parameters: delays (1d narray) – The delays corresponding to each filter in fractional samples Returns: - An ndarray where the ith row contains the fractional delay filter
- corresponding to the ith delay. The number of columns of the matrix
- is proportional to the maximum delay.
-
pyroomacoustics.utilities.
goertzel
(x, k)¶ Goertzel algorithm to compute DFT coefficients
-
pyroomacoustics.utilities.
highpass
(signal, Fs, fc=None, plot=False)¶ Filter out the really low frequencies, default is below 50Hz
-
pyroomacoustics.utilities.
levinson
(r, b)¶ Solve a system of the form Rx=b where R is hermitian toeplitz matrix and b is any vector using the generalized Levinson recursion as described in M.H. Hayes, Statistical Signal Processing and Modelling, p. 268.
Parameters: - r – First column of R, toeplitz hermitian matrix.
- b – The right-hand argument. If b is a matrix, the system is solved for every column vector in b.
Returns: Return type: The solution of the linear system Rx = b.
-
pyroomacoustics.utilities.
low_pass_dirac
(t0, alpha, Fs, N)¶ Creates a vector containing a lowpass Dirac of duration T sampled at Fs with delay t0 and attenuation alpha.
If t0 and alpha are 2D column vectors of the same size, then the function returns a matrix with each line corresponding to pair of t0/alpha values.
-
pyroomacoustics.utilities.
normalize
(signal, bits=None)¶ normalize to be in a given range. The default is to normalize the maximum amplitude to be one. An optional argument allows to normalize the signal to be within the range of a given signed integer representation of bits.
-
pyroomacoustics.utilities.
normalize_pwr
(sig1, sig2)¶ Normalize sig1 to have the same power as sig2.
-
pyroomacoustics.utilities.
prony
(x, p, q)¶ Prony’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154
Parameters: - x – signal to model
- p – order of denominator
- q – order of numerator
Returns: - a – numerator coefficients
- b – denominator coefficients
- err (the squared error of approximation)
-
pyroomacoustics.utilities.
real_spectrum
(signal, axis=-1, **kwargs)¶
-
pyroomacoustics.utilities.
shanks
(x, p, q)¶ Shank’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154
Parameters: - x – signal to model
- p – order of denominator
- q – order of numerator
Returns: - a – numerator coefficients
- b – denominator coefficients
- err – the squared error of approximation
-
pyroomacoustics.utilities.
spectrum
(signal, Fs, N)¶
-
pyroomacoustics.utilities.
time_dB
(signal, Fs, bits=16)¶ Compute the signed dB amplitude of the oscillating signal normalized wrt the number of bits used for the signal.
-
pyroomacoustics.utilities.
to_16b
(signal)¶ converts float 32 bit signal (-1 to 1) to a signed 16 bits representation No clipping in performed, you are responsible to ensure signal is within the correct interval.
pyroomacoustics.wall module¶
-
class
pyroomacoustics.wall.
Wall
(corners, absorption=1.0, name=None)¶ Bases:
object
This class represents a wall instance. A room instance is formed by these.
Attribute corners: (np.array dim 2x2 or 3xN, N>2) endpoints forming the wall Attribute absorption: (float) attenuation reflection factor Attribute name: (string) name given to the wall, which can be reused to reference it in the Room object Attribute normal: (np.array dim 2 or 3) normal vector pointing outward the room Attribute dim: (int) dimension of the wall (2 or 3, meaning 2D or 3D) -
intersection
(p1, p2)¶ Returns the intersection point between the wall and a line segment.
Parameters: - p1 – (np.array dim 2 or 3) first end point of the line segment
- p2 – (np.array dim 2 or 3) second end point of the line segment
Returns: (np.array dim 2 or 3 or None) intersection point between the wall and the line segment
-
intersects
(p1, p2)¶ Tests if the given line segment intersects the wall.
Parameters: - p1 – (ndarray size 2 or 3) first endpoint of the line segment
- p2 – (ndarray size 2 or 3) second endpoint of the line segment
Returns: (tuple size 3) (bool) True if the line segment intersects the wall (bool) True if the intersection happens at a border of the wall (bool) True if the intersection happens at the extremity of the segment
-
side
(p)¶ Computes on which side of the wall the point p is.
Parameters: p – (np.array dim 2 or 3) coordinates of the point Returns: (int) integer representing on which side the point is -1 : opposite to the normal vector (going inside the room) 0 : on the wall 1 : in the direction of the normal vector (going outside of the room)
-
pyroomacoustics.windows module¶
A collection of windowing functions.
-
pyroomacoustics.windows.
blackman_harris
(N, flag='asymmetric', length='full')¶ The Hann window function
\[w[n] = a_0 - a_1 \cos(2\pi n/M) + a_2 \cos(4\pi n/M) + a_3 \cos(6\pi n/M), n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
cosine
(N, flag='asymmetric', length='full')¶ The cosine window function
\[w[n] = \cos(\pi (n/M - 0.5))^2\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
hann
(N, flag='asymmetric', length='full')¶ The Hann window function
\[w[n] = 0.5 (1 - \cos(2 \pi n / M)), n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
rect
(N)¶ The rectangular window
\[w[n] = 1, n=0,\ldots,N-1\]Parameters: N (int) – the window length
-
pyroomacoustics.windows.
triang
(N, flag='asymmetric', length='full')¶ The triangular window function
\[w[n] = 1 - | 2 n / M - 1 |, n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
Module contents¶
pyroomacoustics¶
- Provides
- Room impulse simulations via the image source model
- Simulation of sound propagation using STFT engine
- Reference implementations of popular algorithms for
- beamforming
- direction of arrival
- adaptive filtering
- etc
How to use the documentation¶
Documentation is available in two forms: docstrings provided with the code, and a loose standing reference guide, available from the pyroomacoustics readthedocs page.
We recommend exploring the docstrings using IPython, an advanced Python shell with TAB-completion and introspection capabilities. See below for further instructions.
The docstring examples assume that pyroomacoustics has been imported as pra:
>>> import pyroomacoustics as pra
Code snippets are indicated by three greater-than signs:
>>> x = 42
>>> x = x + 1
Use the built-in help
function to view a function’s docstring:
>>> help(pra.stft.STFT)
...
Available submodules¶
pyroomacoustics.acoustics
- Acoustics and psychoacoustics routines, mel-scale, critcal bands, etc.
pyroomacoustics.beamforming
- Microphone arrays and beamforming routines.
pyroomacoustics.geometry
- Core geometry routine for the image source model.
pyroomacoustics.metrics
- Performance metrics like mean-squared error, median, Itakura-Saito, etc.
pyroomacoustics.multirate
- Rate conversion routines.
pyroomacoustics.parameters
- Global parameters, i.e. for physics constants.
pyroomacoustics.recognition
- Hidden Markov Model and TIMIT database structure.
pyroomacoustics.room
- Abstraction of room and image source model.
pyroomacoustics.soundsource
- Abstraction for a sound source.
pyroomacoustics.stft
- Deprecated Replaced by the methods in
pyroomacoustics.transform
pyroomacoustics.sync
- A few routines to help synchronize signals.
pyroomacoustics.utilities
- A bunch of routines to do various stuff.
pyroomacoustics.wall
- Abstraction for walls of a room.
pyroomacoustics.windows
- Tapering windows for spectral analysis.
Available subpackages¶
pyroomacoustics.adaptive
- Adaptive filter algorithms
pyroomacoustics.bss
- Blind source separation.
pyroomacoustics.datasets
- Wrappers around a few popular speech datasets
pyroomacoustics.denoise
- Single channel noise reduction methods
pyroomacoustics.doa
- Direction of arrival finding algorithms
pyroomacoustics.transform
- Block frequency domain processing tools
Utilities¶
- __version__
- pyroomacoustics version string