pysptk¶
A python wrapper for Speech Signal Processing Toolkit (SPTK).
https://github.com/r9y9/pysptk
The wrapper is based on a modified version of SPTK (r9y9/SPTK)
Full documentation¶
A full documentation of SPTK is available at http://sp-tk.sourceforge.net.
If you are not familiar with SPTK, I recommend you to take a look at the doc
first before using pysptk
.
Demonstration notebooks¶
- Introduction notebook: a brief introduction to pysptk
- Speech analysis and re-synthesis resynthesis notebook: a demonstration notebook for speech analysis and re-synthesis. Synthesized audio examples(English) are available on the notebook.
Installation guide¶
Installation guide¶
The latest release is availabe on pypi. Assuming you have already numpy
installed, you can install pysptk by:
pip install pysptk
If yout want the latest development version, run:
pip install git+https://github.com/r9y9/pysptk
or:
git clone https://github.com/r9y9/pysptk
cd pysptk
python setup.py develop # or install
This should resolve the package dependencies and install pysptk
property.
Note
If you use the development version, you need to have cython
(and C compiler) installed to compile cython module(s).
For Windows users¶
There are some binary wheels available on pypi, so you can install pysptk
via pip without cython and C compilier if there exists a binary wheel that matches your environment (depends on bits of system and python version). For now, wheels are available for:
- Python 2.7 on 32 bit system
- Python 2.7 on 64 bit system
- Python 3.4 on 32 bit system
If there is no binary wheel available for your environment, you can build pysptk
from the source distribution, which is also available on pypi. Note that in order to compile pysptk
from source in Windows, it is highly recommended to use Anaconda , since installation of numpy, cython and other scientific packages is really easy. In fact, continuous integration in Windows on AppVeyor uses Anacona to build and test pysptk
. See pysptk/appveyor.yml for the exact build steps.
API documentation¶
API¶
Core SPTK API¶
All functionality in pysptk.sptk
(the core API) is directly accesible from the top-level pysptk.*
namespace.
For convenience, vector-to-vector functions (pysptk.mcep
, pysptk.mc2b
, etc) that takes an input vector as the first argment, can also accept matrix. As for matrix inputs, vector-to-vector functions are applied along with the last axis internally; e.g.
mc = pysptk.mcep(frames) # frames.shape == (num_frames, frame_len)
is equivalent to:
mc = np.apply_along_axis(pysptk.mcep, -1, frames)
Warning
The core APIs in pysptk.sptk
package are based on the SPTK’s internal APIs (e.g. code in _mgc2sp.c
), so the functionalities are not exactly same as SPTK’s CLI. If you find any inconsistency that should be addressed, please file an issue.
Note
Almost all of pysptk functions assume that the input array is C-contiguous and has float64
element type. For vector-to-vector functions, the input array is automatically converted to float64
-typed one, the function is executed on it, and then the output array is converted to have the same type with the input you provided.
Library routines¶
agexp (r, x, y) |
Magnitude squared generalized exponential function |
gexp (r, x) |
Generalized exponential function |
glog (r, x) |
Generalized logarithmic function |
mseq () |
M-sequence |
Adaptive cepstrum analysis¶
acep (x, c[, lambda_coef, step, tau, pd, eps]) |
Adaptive cepstral analysis |
agcep (x, c[, stage, lambda_coef, step, tau, eps]) |
Adaptive generalized cepstral analysis |
amcep (x, b[, alpha, lambda_coef, step, tau, …]) |
Adaptive mel-cepstral analysis |
Mel-generalized cepstrum analysis¶
mcep (windowed[, order, alpha, miniter, …]) |
Mel-cepstrum analysis |
gcep (windowed[, order, gamma, miniter, …]) |
Generalized-cepstrum analysis |
mgcep (windowed[, order, alpha, gamma, …]) |
Mel-generalized cepstrum analysis |
uels (windowed[, order, miniter, maxiter, …]) |
Unbiased estimation of log spectrum |
fftcep (logsp[, order, num_iter, …]) |
FFT-based cepstrum analysis |
lpc (windowed[, order, min_det]) |
Linear prediction analysis |
LPC, LSP and PARCOR conversions¶
lpc2c (lpc[, order]) |
LPC to cepstrum |
lpc2lsp (lpc[, numsp, maxiter, eps, loggain, …]) |
LPC to LSP |
lpc2par (lpc) |
LPC to PARCOR |
par2lpc (par) |
PARCOR to LPC |
lsp2sp (lsp[, fftlen]) |
LSP to spectrum |
Mel-generalized cepstrum conversions¶
mc2b (mc[, alpha]) |
Mel-cepsrum to MLSA filter coefficients |
b2mc (b[, alpha]) |
MLSA filter coefficients to mel-cesptrum |
c2acr (c[, order, fftlen]) |
Cepstrum to autocorrelation |
c2ir (c[, length]) |
Cepstrum to impulse response |
ic2ir (h[, order]) |
Impulse response to cepstrum |
c2ndps (c[, fftlen]) |
Cepstrum to Negative Derivative of Phase Spectrum (NDPS) |
ndps2c (ndps[, order]) |
Cepstrum to Negative Derivative of Phase Spectrum (NDPS) |
gc2gc (src_ceps[, src_gamma, dst_order, …]) |
Generalized cepstrum transform |
gnorm (ceps[, gamma]) |
Gain normalization |
ignorm (ceps[, gamma]) |
Inverse gain normalization |
freqt (ceps[, order, alpha]) |
Frequency transform |
mgc2mgc (src_ceps[, src_alpha, src_gamma, …]) |
Mel-generalized cepstrum transform |
mgc2sp (ceps[, alpha, gamma, fftlen]) |
Mel-generalized cepstrum transform |
mgclsp2sp (lsp[, alpha, gamma, fftlen, gain]) |
MGC-LSP to spectrum |
F0 analysis¶
swipe (x, fs, hopsize[, min, max, threshold, …]) |
SWIPE’ - A Saw-tooth Waveform Inspired Pitch Estimation |
rapt (x, fs, hopsize[, min, max, voice_bias, …]) |
RAPT - a robust algorithm for pitch tracking |
Window functions¶
blackman (n[, normalize]) |
Blackman window |
hamming (n[, normalize]) |
Hamming window |
hanning (n[, normalize]) |
Hanning window |
bartlett (n[, normalize]) |
Bartlett window |
trapezoid (n[, normalize]) |
Trapezoid window |
rectangular (n[, normalize]) |
Rectangular window |
Waveform generation filters¶
poledf (x, a, delay) |
All-pole digital filter |
lmadf (x, b, pd, delay) |
LMA digital filter |
lspdf (x, f, delay) |
LSP synthesis digital filter |
ltcdf (x, k, delay) |
All-pole lattice digital filter |
glsadf (x, c, stage, delay) |
GLSA digital filter |
mlsadf (x, b, alpha, pd, delay) |
MLSA digital filter |
mglsadf (x, b, alpha, stage, delay) |
MGLSA digital filter |
Utilities for waveform generation filters¶
poledf_delay (order) |
Delay for poledf |
lmadf_delay (order, pd) |
Delay for lmadf |
lspdf_delay (order) |
Delay for lspdf |
ltcdf_delay (order) |
Delay for ltcdf |
glsadf_delay (order, stage) |
Delay for glsadf |
mlsadf_delay (order, pd) |
Delay for mlsadf |
mglsadf_delay (order, stage) |
Delay for mglsadf |
Other conversions¶
Not exist in SPTK itself, but can be used with the core API.
Functions in the pysptk.conversion
module can also be directly accesible by pysptk.*
.
mgc2b (mgc[, alpha, gamma]) |
Mel-generalized cepstrum to MGLSA filter coefficients |
sp2mc (powerspec, order, alpha) |
Convert spectrum envelope to mel-cepstrum |
mc2sp (mc, alpha, fftlen) |
Convert mel-cepstrum back to power spectrum |
mc2e (mc[, alpha, irlen]) |
Compute energy from mel-cepstrum |
High-level interface for waveform synthesis¶
Module pysptk.synthesis
provides high-leve interface that wraps low-level
SPTK waveform synthesis functions (e.g. mlsadf
),
Synthesizer¶
-
class
pysptk.synthesis.
Synthesizer
(filt, hopsize)¶ Speech waveform synthesizer
Attributes: - filt : SynthesisFilter
A speech synthesis filter
- hopsize : int
Hop size
-
synthesis
(source, b)¶ Synthesize a waveform given a source excitation and sequence of filter coefficients (e.g. cepstrum).
Parameters: - source : array
Source excitation
- b : array
Filter coefficients
Returns: - y : array, shape (same as
source
) Synthesized waveform
-
synthesis_one_frame
(source, prev_b, curr_b)¶ Synthesize one frame waveform
Parameters: - source : array
Source excitation
- prev_b : array
Filter coefficients of previous frame
- curr_b : array
Filter coefficients of current frame
Returns: - y : array
Synthesized waveform
SynthesisFilters¶
LMADF¶
-
class
pysptk.synthesis.
LMADF
(order=25, pd=4)¶ LMA digital filter that wraps
lmadf
Attributes: - pd : int
Order of pade approximation. Default is 4.
- delay : array
Delay
-
filt
(x, coef)¶ Filter one sample using using
lmadf
Parameters: - x : float
A input sample
- coef: array
LMA filter coefficients (i.e. Cepstrum)
Returns: - y : float
A filtered sample
See also
MLSADF¶
-
class
pysptk.synthesis.
MLSADF
(order=25, alpha=0.35, pd=4)¶ MLSA digital filter that wraps
mlsadf
Attributes: - alpha : float
All-pass constant
- pd : int
Order of pade approximation. Default is 4.
- delay : array
Delay
-
filt
(x, coef)¶ Filter one sample using
mlsadf
Parameters: - x : float
A input sample
- coef: array
MLSA filter coefficients
Returns: - y : float
A filtered sample
See also
MGLSADF¶
-
class
pysptk.synthesis.
MGLSADF
(order=25, alpha=0.35, stage=1)¶ MGLSA digital filter that wraps
mglsadf
Attributes: - alpha : float
All-pass constant
- stage : int
-1/gamma
- delay : array
Delay
-
filt
(x, coef)¶ Filter one sample using
mglsadf
Parameters: - x : float
A input sample
- coef: array
MGLSA filter coefficients
Returns: - y : float
A filtered sample
See also
AllPoleDF¶
AllPoleLatticeDF¶
-
class
pysptk.synthesis.
AllPoleLatticeDF
(order=25)¶ All-pole lttice digital filter that wraps
ltcdf
Attributes: - delay : array
Delay
-
filt
(x, coef)¶ Filter one sample using using
ltcdf
Parameters: - x : float
A input sample
- coef: array
PARCOR coefficients (with loggain)
Returns: - y : float
A filtered sample
See also
Utilities¶
Audio files¶
example_audio_file () |
Get the path to an included audio example file. |
Developer Documentation¶
Developer Documentation¶
Design principle¶
pysptk is a thin python wrapper of SPTK. It is designed to be API consistent with the original SPTK as possible, but give better interface. There are a few design principles to wrap C interface:
Avoid really short names for variables (e.g. a, b, c, aa, bb, dd)
Variable names should be informative. If the C functions have such short names, use self-descriptive names instead for python interfaces, unless they have clear meanings in their context.
Avoid too many function arguments
Less is better. If the C functions have too many function arguments, use keyword arguments with proper default values for optional ones in python.
Handle errors in python
Since C functions might exit (unfortunately) inside their functions for unexpected inputs, it should be check if the inputs are supported or not in python.
To wrap C interface, Cython is totally used.
How to build pysptk¶
You have to install numpy
and cython
first, and then:
git clone https://github.com/r9y9/pysptk
cd pysptk
git submodule update --init
python setup.py develop
should work.
Note
Dependency to the SPTK is added as a submodule. You have to checkout the
supported SPTK as git sudmobule update --init
before running setup.py.
How to build docs¶
pysptk docs are managed by the python sphinx. Docs-related dependencies can be resolved by:
pip install .[docs]
at the top of pysptk directory.
To build docs, go to the docs directory and then:
make html
You will see the generated docs in _build directory as follows (might different depends on sphinx version):
% tree _build/ -d
_build/
├── doctrees
│ └── generated
├── html
│ ├── _images
│ ├── _modules
│ │ └── pysptk
│ ├── _sources
│ │ └── generated
│ ├── _static
│ │ ├── css
│ │ ├── fonts
│ │ └── js
│ └── generated
└── plot_directive
└── generated
See _build/html/index.html for the top page of the generated docs.
How to add a new function¶
There are a lot of functions unexposed from SPTK. To add a new function to pysptk, there are a few typical steps:
- Add function signature to
_sptk.pxd
- Add cython implementation to
_sptk.pyx
- Add python interface (with docstrings) to
sptk.py
(or some proper module)
As you can see in setup.py, _sptk.pyx
and SPTK sources are compiled into a
single extension module.
Note
You might wonder why cython implementation and python interface should be separated because cython module can be directly accessed by python. The reasons are 1) to avoid rebuilding cython module when docs strings are changed in the source 2) to make doc looks great, since sphinx seems unable to collect function argments correctly from cython module for now. Relevant issue: pysptk/#33
An example¶
In _sptk.pyd:
cdef extern from "SPTK.h":
double _agexp "agexp"(double r, double x, double y)
In _sptk.pyx:
def agexp(r, x, y):
return _agexp(r, x, y)
In sptk.pyx:
def agexp(r, x, y):
"""Magnitude squared generalized exponential function
Parameters
----------
r : float
Gamma
x : float
Real part
y : float
Imaginary part
Returns
-------
Value
"""
return _sptk.agexp(r, x, y)