pyXpcm: Ocean Profile Classification Model¶
pyXpcm is a python package to create and work with ocean Profile Classification Model that consumes and produces Xarray objects. Xarray objects are N-D labeled arrays and datasets in Python.
An ocean Profile Classification Model allows to automatically assemble ocean profiles in clusters according to their vertical structure similarities. The geospatial properties of these clusters can be used to address a large variety of oceanographic problems: front detection, water mass identification, natural region contouring (gyres, eddies), reference profile selection for QC validation, etc… The vertical structure of these clusters furthermore provides a highly synthetic representation of large ocean areas that can be used for dimensionality reduction and coherent intercomparisons of ocean data (re)-analysis or simulations.
Documentation¶
Getting Started
Overview¶
What is an ocean PCM?¶
An ocean PCM is a Profile Classification Model for ocean data, a statistical procedure to classify ocean vertical profiles into a finite set of “clusters”. Depending on the dataset, such clusters can show space/time coherence that can be used in many different ways to study the ocean.
Statistic method¶
It consists in conducting un-supervised classification (clustering) with vertical profiles of one or more ocean variables.
Each levels of the vertical axis of each ocean variables are considered a feature. One ocean vertical profile with ocean variables is considered a sample.
All the details of the Profile Classification Modelling (PCM) statistical methodology can be found in Maze et al, 2017.
Illustration¶
Given a collection of Argo temperature profiles in the North Atlantic, a PCM analysis is applied and produces an optimal set of 8 ocean temperature profile classes. The PCM clusters synthesize the structural information of heat distribution in the North Atlantic. Each clusters objectively define an ocean region where dynamic gives rise to an unique vertical stratification pattern.

Maze et al, 2017 applied it to the North Atlantic with Argo temperature data. Jones et al, 2019, later applied it to the Southern Ocean, also with Argo temperature data. Rosso et al (in prep) has applied it to the Southern Indian Ocean using both temperature and salinity Argo data.
pyXpcm¶
pyXpcm is an Python implementation of the PCM method that consumes and produces Xarray objects (xarray.Dataset
and xarray.DataArray
), hence the x.
With pyXpcm you can conduct a PCM analysis for a collection of profiles (gridded or not), of one or more ocean variables, stored in an xarray.Dataset
.
pyXpcm also provides basic statistics and plotting functions to get you started with your analysis.
The philosophy of the pyXpcm toolbox is to create and be able to use a PCM from and on different ocean datasets and variables. In order to achieve this, a PCM is created with information about ocean variables to classify and the vertical axis of these variables. Then this PCM can be fitted and subsequently classify ocean profiles from any datasets, as long as it contains the PCM variables.
The pyXpcm procedure is to preprocess (stack, scale, reduce and combine data) and then to fit a classifier on data. Once the model is fitted pyXpcm can classify data. The library uses many language and logic from Scikit-learn but doesn’t inherit from a sklearn.BaseEstimator
.
Installation¶
Required dependencies¶
- Python 3.6
- Xarray 0.12
- Dask 0.16
- Scikit-learn 0.19
Note that Scikit-learn is the default statistic backend, but that if Dask_ml is installed you can use it as well (see API reference).
For full plotting functionality (see the Plotting API) the following packages are required:
- Matplotlib 3.0 (mandatory)
- Cartopy 0.17 (for some methods only)
- Seaborn 0.9.0 (for some methods only)
Instructions¶
For the development version:
pip install git+https://github.com/gmaze/pyxpcm.git@nfeatures
For the latest public release:
pip install pyxpcm
Pre-trained PCM¶
We’ll try to list here pre-trained PCM models ready for you to use and classify your data with.
Features | K [1] | Relevant domain | Training data | Access link | Reference |
---|---|---|---|---|---|
Temperature (0-1400m,5m) | 8 | North Atlantic | Argo (2000-2014) | Archimer | Maze et al (2017) |
Temperature (15-980db,5db) | 8 | Southern Ocean | Argo (2001-2017) | [2] | Jones et al (2019) |
[1] | Number of classes |
[2] | Coming up soon ! |
User guide
Standard procedure¶
Here is a standard procedure based on pyXpcm. This will show you how to create a model, how to fit/train it, how to classify data and to visualise results.
Create a model¶
Let’s import the Profile Classification Model (PCM) constructor:
[2]:
from pyxpcm.models import pcm
A PCM can be created independently of any dataset using the class constructor.
To be created a PCM requires a number of classes (or clusters) and a dictionary to define the list of features and their vertical axis:
[3]:
z = np.arange(0.,-1000,-10.)
pcm_features = {'temperature': z, 'salinity':z}
We can now instantiate a PCM, say with 8 classes:
[4]:
m = pcm(K=8, features=pcm_features)
m
[4]:
<pcm 'gmm' (K: 8, F: 2)>
Number of class: 8
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: False
Feature: 'temperature'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Feature: 'salinity'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture.gaussian_mixture.GaussianMixture'>
Here we created a PCM with 8 classes (K=8) and 2 features (F=2) that are temperature and salinity profiles defined between the surface and 1000m depth.
We furthermore note the list of transform methods that will be used to preprocess each of the features (see the preprocessing documentation page for more details).
Note that the number of classes and features are PCM properties accessible at pyxpcm.pcm.K
and pyxpcm.pcm.F
.
Load training data¶
pyXpcm is able to work with both gridded datasets (eg: model outputs with longitude,latitude,time dimensions) and collection of profiles (eg: Argo, XBT, CTD section profiles).
In this example, let’s import a sample of North Atlantic Argo data that come with pyxpcm.pcm
:
[5]:
import pyxpcm
ds = pyxpcm.tutorial.open_dataset('argo').load()
print(ds)
<xarray.Dataset>
Dimensions: (DEPTH: 282, N_PROF: 7560)
Coordinates:
* DEPTH (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Dimensions without coordinates: N_PROF
Data variables:
LATITUDE (N_PROF) float32 ...
LONGITUDE (N_PROF) float32 ...
TIME (N_PROF) datetime64[ns] ...
DBINDEX (N_PROF) float64 ...
TEMP (N_PROF, DEPTH) float32 ...
PSAL (N_PROF, DEPTH) float32 ...
SIG0 (N_PROF, DEPTH) float32 ...
BRV2 (N_PROF, DEPTH) float32 ...
Attributes:
Sample test prepared by: G. Maze
Institution: Ifremer/LOPS
Data source DOI: 10.17882/42182
Fit the model on data¶
Fitting can be done on any dataset coherent with the PCM definition, in a sense that it must have the feature variables of the PCM.
To tell the PCM model how to identify features in any xarray.Dataset
, we need to provide a dictionary of variable names mapping:
[6]:
features_in_ds = {'temperature': 'TEMP', 'salinity': 'PSAL'}
which means that the PCM feature temperature
is to be found in the dataset variables TEMP
.
We also need to specify what is the vertical dimension of the dataset variables:
[7]:
features_zdim='DEPTH'
Now we’re ready to fit the model on the this dataset:
[8]:
m.fit(ds, features=features_in_ds, dim=features_zdim)
m
[8]:
<pcm 'gmm' (K: 8, F: 2)>
Number of class: 8
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: True
Feature: 'temperature'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Feature: 'salinity'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture.gaussian_mixture.GaussianMixture'>
log likelihood of the training set: 37.953397
Note
pyXpcm can also identify PCM features and axis within a xarray.DataSet
with variable attributes. From the example above we can set:
ds['TEMP'].attrs['feature_name'] = 'temperature'
ds['PSAL'].attrs['feature_name'] = 'salinity'
ds['DEPTH'].attrs['axis'] = 'Z'
And then simply call the fit method without arguments:
m.fit(ds)
Note that if data follows the CF the vertical dimension axis
attribute should already be set to Z
.
Classify data¶
Now that the PCM is fitted, we can predict the classification results like:
[9]:
m.predict(ds, features=features_in_ds, inplace=True)
ds
[9]:
<xarray.Dataset>
Dimensions: (DEPTH: 282, N_PROF: 7560)
Coordinates:
* N_PROF (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
* DEPTH (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Data variables:
LATITUDE (N_PROF) float32 ...
LONGITUDE (N_PROF) float32 ...
TIME (N_PROF) datetime64[ns] ...
DBINDEX (N_PROF) float64 ...
TEMP (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
PSAL (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
SIG0 (N_PROF, DEPTH) float32 ...
BRV2 (N_PROF, DEPTH) float32 ...
PCM_LABELS (N_PROF) int64 4 4 4 4 4 4 4 4 4 4 4 4 ... 3 3 3 3 3 3 3 3 3 3 3
Attributes:
Sample test prepared by: G. Maze
Institution: Ifremer/LOPS
Data source DOI: 10.17882/42182
Prediction labels are automatically added to the dataset as PCM_LABELS
because the option inplace
was set to True
. We didn’t specify the dim
option because our dataset is CF compliant.
pyXpcm use a GMM classifier by default, which is a fuzzy classifier. So we can also predict the probability of each classes for all profiles, the so-called posteriors:
[10]:
m.predict_proba(ds, features=features_in_ds, inplace=True)
ds
[10]:
<xarray.Dataset>
Dimensions: (DEPTH: 282, N_PROF: 7560, pcm_class: 8)
Coordinates:
* N_PROF (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
* DEPTH (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Dimensions without coordinates: pcm_class
Data variables:
LATITUDE (N_PROF) float32 ...
LONGITUDE (N_PROF) float32 ...
TIME (N_PROF) datetime64[ns] ...
DBINDEX (N_PROF) float64 ...
TEMP (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
PSAL (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
SIG0 (N_PROF, DEPTH) float32 ...
BRV2 (N_PROF, DEPTH) float32 ...
PCM_LABELS (N_PROF) int64 4 4 4 4 4 4 4 4 4 4 4 4 ... 3 3 3 3 3 3 3 3 3 3 3
PCM_POST (pcm_class, N_PROF) float64 3.822e-09 3.296e-05 ... 2.269e-70
Attributes:
Sample test prepared by: G. Maze
Institution: Ifremer/LOPS
Data source DOI: 10.17882/42182
which are added to the dataset as the PCM_POST
variables. The probability of classes for each profiles has a new dimension pcm_class
by default that goes from 0 to K-1.
Note
You can delete variables added by pyXpcm to the xarray.DataSet
with the pyxpcm.xarray.pyXpcmDataSetAccessor.drop_all()
method:
ds.pyxpcm.drop_all()
Or you can split pyXpcm variables out of the original xarray.DataSet
:
ds_pcm, ds = ds.pyxpcm.split()
It is important to note that once the PCM is fitted, you can predict labels for any dataset, as long as it has the PCM features.
For instance, let’s predict labels for a gridded dataset:
[12]:
ds_gridded = pyxpcm.tutorial.open_dataset('isas_snapshot').load()
ds_gridded
[12]:
<xarray.Dataset>
Dimensions: (depth: 152, latitude: 53, longitude: 61)
Coordinates:
* latitude (latitude) float32 30.023445 30.455408 ... 49.41288 49.737103
* longitude (longitude) float32 -70.0 -69.5 -69.0 ... -41.0 -40.5 -40.0
* depth (depth) float32 -1.0 -3.0 -5.0 ... -1960.0 -1980.0 -2000.0
Data variables:
TEMP (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
TEMP_ERR (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
TEMP_PCTVAR (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
PSAL (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
PSAL_ERR (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
PSAL_PCTVAR (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
SST (latitude, longitude) float32 dask.array<shape=(53, 61), chunksize=(53, 61)>
[13]:
m.predict(ds_gridded, features={'temperature':'TEMP','salinity':'PSAL'}, dim='depth', inplace=True)
ds_gridded
[13]:
<xarray.Dataset>
Dimensions: (depth: 152, latitude: 53, longitude: 61)
Coordinates:
* latitude (latitude) float64 30.02 30.46 30.89 ... 49.09 49.41 49.74
* longitude (longitude) float64 -70.0 -69.5 -69.0 ... -41.0 -40.5 -40.0
* depth (depth) float32 -1.0 -3.0 -5.0 ... -1960.0 -1980.0 -2000.0
Data variables:
TEMP (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
TEMP_ERR (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
TEMP_PCTVAR (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
PSAL (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
PSAL_ERR (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
PSAL_PCTVAR (depth, latitude, longitude) float32 dask.array<shape=(152, 53, 61), chunksize=(152, 53, 61)>
SST (latitude, longitude) float32 dask.array<shape=(53, 61), chunksize=(53, 61)>
PCM_LABELS (latitude, longitude) float64 4.0 4.0 4.0 4.0 ... 1.0 1.0 1.0
where you can see the adition of the PCM_LABELS
variable.
Vertical structure of classes¶
One key outcome of the PCM analysis if the vertical structure of each classes.
This can be computed using the :meth:pyxpcm.stat.quantile
method.
Below we compute the 5, 50 and 95% quantiles for temperature and salinity of each classes:
[14]:
for vname in ['TEMP', 'PSAL']:
ds = ds.pyxpcm.quantile(m, q=[0.05, 0.5, 0.95], of=vname, outname=vname + '_Q', keep_attrs=True, inplace=True)
ds
[14]:
<xarray.Dataset>
Dimensions: (DEPTH: 282, N_PROF: 7560, pcm_class: 8, quantile: 3)
Coordinates:
* pcm_class (pcm_class) int64 0 1 2 3 4 5 6 7
* N_PROF (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
* DEPTH (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
* quantile (quantile) float64 0.05 0.5 0.95
Data variables:
LATITUDE (N_PROF) float32 27.122 27.818 27.452 26.976 ... 4.243 4.15 4.44
LONGITUDE (N_PROF) float32 -74.86 -75.6 -74.949 ... -1.263 -0.821 -0.002
TIME (N_PROF) datetime64[ns] 2008-06-23T13:07:30 ... 2013-03-09T14:52:58.124999936
DBINDEX (N_PROF) float64 1.484e+04 1.622e+04 ... 8.557e+03 1.063e+04
TEMP (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
PSAL (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
SIG0 (N_PROF, DEPTH) float32 23.601229 23.601229 ... 27.685583
BRV2 (N_PROF, DEPTH) float32 0.00029447526 ... 4.500769e-06
PCM_LABELS (N_PROF) int64 4 4 4 4 4 4 4 4 4 4 4 4 ... 3 3 3 3 3 3 3 3 3 3 3
PCM_POST (pcm_class, N_PROF) float64 3.822e-09 3.296e-05 ... 2.269e-70
TEMP_Q (pcm_class, quantile, DEPTH) float64 12.73 12.73 ... 5.287 5.273
PSAL_Q (pcm_class, quantile, DEPTH) float64 35.12 35.12 ... 35.09 35.09
Attributes:
Sample test prepared by: G. Maze
Institution: Ifremer/LOPS
Data source DOI: 10.17882/42182
Quantiles can be plotted using the :func:pyxpcm.plot.quantile
method.
[15]:
fig, ax = m.plot.quantile(ds['TEMP_Q'], maxcols=4, figsize=(10, 8), sharey=True)

Geographic distribution of classes¶
Warning
To follow this section you’ll need to have Cartopy installed and working.
A map of labels can now easily be plotted:
[16]:
proj = ccrs.PlateCarree()
subplot_kw={'projection': proj, 'extent': np.array([-80,1,-1,66]) + np.array([-0.1,+0.1,-0.1,+0.1])}
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5,5), dpi=120, facecolor='w', edgecolor='k', subplot_kw=subplot_kw)
kmap = m.plot.cmap()
sc = ax.scatter(ds['LONGITUDE'], ds['LATITUDE'], s=3, c=ds['PCM_LABELS'], cmap=kmap, transform=proj, vmin=0, vmax=m.K)
cl = m.plot.colorbar(ax=ax)
gl = m.plot.latlongrid(ax, dx=10)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)
ax.set_title('LABELS of the training set')
plt.show()

Since we predicted labels for 2 datasets, we can superimpose them
[17]:
proj = ccrs.PlateCarree()
subplot_kw={'projection': proj, 'extent': np.array([-75,-35,25,55]) + np.array([-0.1,+0.1,-0.1,+0.1])}
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5,5), dpi=120, facecolor='w', edgecolor='k', subplot_kw=subplot_kw)
kmap = m.plot.cmap()
sc = ax.pcolor(ds_gridded['longitude'], ds_gridded['latitude'], ds_gridded['PCM_LABELS'], cmap=kmap, transform=proj, vmin=0, vmax=m.K)
sc = ax.scatter(ds['LONGITUDE'], ds['LATITUDE'], s=10, c=ds['PCM_LABELS'], cmap=kmap, transform=proj, vmin=0, vmax=m.K, edgecolors=[0.3]*3, linewidths=0.3)
cl = m.plot.colorbar(ax=ax)
gl = m.plot.latlongrid(ax, dx=10)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)
ax.set_title('LABELS of the training set (dots) and another product (shade)')
plt.show()

Posteriors are defined for each data point and give the probability of that point to belong to any of the classes. It can be plotted this way:
[18]:
cmap = sns.light_palette("blue", as_cmap=True)
proj = ccrs.PlateCarree()
subplot_kw={'projection': proj, 'extent': np.array([-80,1,-1,66]) + np.array([-0.1,+0.1,-0.1,+0.1])}
fig, ax = m.plot.subplots(figsize=(10,22), maxcols=2, subplot_kw=subplot_kw)
for k in m:
sc = ax[k].scatter(ds['LONGITUDE'], ds['LATITUDE'], s=3, c=ds['PCM_POST'].sel(pcm_class=k),
cmap=cmap, transform=proj, vmin=0, vmax=1)
cl = plt.colorbar(sc, ax=ax[k], fraction=0.03)
gl = m.plot.latlongrid(ax[k], fontsize=8, dx=20, dy=10)
ax[k].add_feature(cfeature.LAND)
ax[k].add_feature(cfeature.COASTLINE)
ax[k].set_title('PCM Posteriors k=%i' % k)

PCM properties¶
The PCM class does a lot of data preprocessing under the hood in order to classify profiles.
Here is how to access PCM preprocessed results and data.
Import and set-up
Import the library and toy data
[2]:
import pyxpcm
from pyxpcm.models import pcm
Let’s work with a standard PCM of temperature and salinity, from the surface down to -1000m:
[3]:
# Define a vertical axis to work with
z = np.arange(0.,-1000,-10.)
# Define features to use
features_pcm = {'temperature': z, 'salinity': z}
# Instantiate the PCM
m = pcm(K=4, features=features_pcm, maxvar=2)
print(m)
<pcm 'gmm' (K: 4, F: 2)>
Number of class: 4
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: False
Feature: 'temperature'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Feature: 'salinity'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture.gaussian_mixture.GaussianMixture'>
Note that here we used a strong dimensionality reduction to limit the dimensions and size of the plots to come (maxvar==2
tell the PCM to use the first 2 PCAs of each variables).
Now we can load a dataset to be used for fitting.
[4]:
ds = pyxpcm.tutorial.open_dataset('argo').load()
Fit and predict model on data:
[5]:
features_in_ds = {'temperature': 'TEMP', 'salinity': 'PSAL'}
ds = ds.pyxpcm.fit_predict(m, features=features_in_ds, inplace=True)
print(ds)
<xarray.Dataset>
Dimensions: (DEPTH: 282, N_PROF: 7560)
Coordinates:
* N_PROF (N_PROF) int64 0 1 2 3 4 5 6 ... 7554 7555 7556 7557 7558 7559
* DEPTH (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Data variables:
LATITUDE (N_PROF) float32 ...
LONGITUDE (N_PROF) float32 ...
TIME (N_PROF) datetime64[ns] ...
DBINDEX (N_PROF) float64 ...
TEMP (N_PROF, DEPTH) float32 27.422163 27.422163 ... 4.391791
PSAL (N_PROF, DEPTH) float32 36.35267 36.35267 ... 34.910286
SIG0 (N_PROF, DEPTH) float32 ...
BRV2 (N_PROF, DEPTH) float32 ...
PCM_LABELS (N_PROF) int64 1 1 1 1 1 1 1 1 1 1 1 1 ... 3 3 3 3 3 3 3 3 3 3 3
Attributes:
Sample test prepared by: G. Maze
Institution: Ifremer/LOPS
Data source DOI: 10.17882/42182
Scaler properties¶
[6]:
fig, ax = m.plot.scaler()
# More options:
# m.plot.scaler(style='darkgrid')
# m.plot.scaler(style='darkgrid', subplot_kw={'ylim':[-1000,0]})

Reducer properties¶
Plot eigen vectors for a PCA reducer or nothing if no reduced used
[7]:
fig, ax = m.plot.reducer()
# Equivalent to:
# pcmplot.reducer(m)
# More options:
# m.plot.reducer(pcalist = range(0,4));
# m.plot.reducer(pcalist = [0], maxcols=1);
# m.plot.reducer(pcalist = range(0,4), style='darkgrid', plot_kw={'linewidth':1.5}, subplot_kw={'ylim':[-1400,0]}, figsize=(12,10));

Scatter plot of features, as seen by the classifier¶
You can have access to pre-processed data for your own plot/analysis through the pyxpcm.pcm.preprocessing()
method:
[8]:
X, sampling_dims = m.preprocessing(ds, features=features_in_ds)
X
[8]:
<xarray.DataArray (n_samples: 7560, n_features: 4)>
array([[ 1.928166, -0.091499, 1.7341 , -0.270248],
[ 2.314077, 0.106842, 2.083683, -0.18765 ],
[ 1.675512, -0.17313 , 1.563701, -0.432449],
...,
[-0.802601, -0.578377, -1.576134, -0.311841],
[-0.955218, -0.609439, -1.804922, -0.427322],
[-0.892514, -0.623732, -1.792266, -0.465512]], dtype=float32)
Coordinates:
* n_samples (n_samples) int64 0 1 2 3 4 5 ... 7554 7555 7556 7557 7558 7559
* n_features (n_features) <U13 'temperature_0' ... 'salinity_1'
pyXpcm return a 2-dimensional xarray.DataArray
for which pairwise relationship can easily be visualise with the pyxpcm.plot.preprocessed()
method (this requires Seaborn):
[9]:
g = m.plot.preprocessed(ds, features=features_in_ds, style='darkgrid')
# A posteriori adjustements:
# g.set(xlim=(-3,3),ylim=(-3,3))
# g.savefig('toto.png')

[10]:
# Combine KDE with histrograms (very slow plot, so commented here):
g = m.plot.preprocessed(ds, features=features_in_ds, kde=True)
/Users/gmaze/anaconda/envs/obidam36/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Save and load PCM from local files¶
PCM instances are light weigth python objects and can easily be saved on and loaded from files. pyXpcm uses the netcdf file format because it is easy to add meta-data to numerical arrays.
Import and set-up
Import the library and toy data
[2]:
import pyxpcm
from pyxpcm.models import pcm
# Load tutorial data:
ds = pyxpcm.tutorial.open_dataset('argo').load()
Saving a model¶
Let’s first create a PCM and fit it onto the tutorial dataset:
[3]:
# Define a vertical axis to work with
z = np.arange(0.,-1000,-10.)
# Define features to use
features_pcm = {'temperature': z, 'salinity': z}
# Instantiate the PCM:
m = pcm(K=4, features=features_pcm)
# Fit:
m.fit(ds, features={'temperature': 'TEMP', 'salinity': 'PSAL'})
[3]:
<pcm 'gmm' (K: 4, F: 2)>
Number of class: 4
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: True
Feature: 'temperature'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Feature: 'salinity'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture.gaussian_mixture.GaussianMixture'>
log likelihood of the training set: 33.234426
We can now save the fitted model to a local file:
[4]:
m.to_netcdf('my_pcm.nc')
Loading a model¶
To load a PCM from file, use:
[5]:
m_loaded = pyxpcm.load_netcdf('my_pcm.nc')
m_loaded
[5]:
<pcm 'gmm' (K: 4, F: 2)>
Number of class: 4
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: True
Feature: 'temperature'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Feature: 'salinity'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing.data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition.pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture.gaussian_mixture.GaussianMixture'>
log likelihood of the training set: 33.234426
Debugging and performances¶
Import and set-up
Import the library and toy data
[2]:
import pyxpcm
from pyxpcm.models import pcm
# Load a dataset to work with:
ds = pyxpcm.tutorial.open_dataset('argo').load()
# Define vertical axis and features to use:
z = np.arange(0.,-1000.,-10.)
features_pcm = {'temperature': z, 'salinity': z}
features_in_ds = {'temperature': 'TEMP', 'salinity': 'PSAL'}
Debugging¶
Use option debug
to print log messages
[3]:
# Instantiate a new PCM:
m = pcm(K=8, features=features_pcm, debug=True)
# Fit with log:
m.fit(ds, features=features_in_ds);
> Start preprocessing for action 'fit'
> Preprocessing xarray dataset 'TEMP' as PCM feature 'temperature'
[<class 'xarray.core.dataarray.DataArray'>, <class 'dask.array.core.Array'>, ((7560,), (282,))] X RAVELED with success
Output axis is in the input axis, not need to interpolate, simple intersection
[<class 'xarray.core.dataarray.DataArray'>, <class 'dask.array.core.Array'>, ((7560,), (100,))] X INTERPOLATED with success)
[<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None] X SCALED with success)
[<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None] X REDUCED with success)
temperature pre-processed with success, [<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None]
Homogenisation for fit of temperature
> Preprocessing xarray dataset 'PSAL' as PCM feature 'salinity'
[<class 'xarray.core.dataarray.DataArray'>, <class 'dask.array.core.Array'>, ((7560,), (282,))] X RAVELED with success
Output axis is in the input axis, not need to interpolate, simple intersection
[<class 'xarray.core.dataarray.DataArray'>, <class 'dask.array.core.Array'>, ((7560,), (100,))] X INTERPOLATED with success)
[<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None] X SCALED with success)
[<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None] X REDUCED with success)
salinity pre-processed with success, [<class 'xarray.core.dataarray.DataArray'>, <class 'numpy.ndarray'>, None]
Homogenisation for fit of salinity
Features array shape and type for xarray: (7560, 30) <class 'numpy.ndarray'> <class 'memoryview'>
> Preprocessing done, working with final X (<class 'xarray.core.dataarray.DataArray'>) array of shape: (7560, 30) and sampling dimensions: ['N_PROF']
Performance / Optimisation¶
Use timeit
and timeit_verb
to compute computation time of PCM operations
Times are accessible as a pandas Dataframe in timeit
pyXpcm instance property.
The pyXpcm m.plot.timeit()
plot method allows for a simple visualisation of times.
Time readings during execution¶
[4]:
# Create a PCM and execute methods:
m = pcm(K=8, features=features_pcm, timeit=True, timeit_verb=1)
m.fit(ds, features=features_in_ds);
fit.1-preprocess.1-mask: 62 ms
fit.1-preprocess.2-feature_temperature.1-ravel: 27 ms
fit.1-preprocess.2-feature_temperature.2-interp: 2 ms
fit.1-preprocess.2-feature_temperature.3-scale_fit: 15 ms
fit.1-preprocess.2-feature_temperature.4-scale_transform: 6 ms
fit.1-preprocess.2-feature_temperature.5-reduce_fit: 21 ms
fit.1-preprocess.2-feature_temperature.6-reduce_transform: 7 ms
fit.1-preprocess.2-feature_temperature.total: 80 ms
fit.1-preprocess: 80 ms
fit.1-preprocess.3-homogeniser: 5 ms
fit.1-preprocess.2-feature_salinity.1-ravel: 32 ms
fit.1-preprocess.2-feature_salinity.2-interp: 1 ms
fit.1-preprocess.2-feature_salinity.3-scale_fit: 11 ms
fit.1-preprocess.2-feature_salinity.4-scale_transform: 5 ms
fit.1-preprocess.2-feature_salinity.5-reduce_fit: 18 ms
fit.1-preprocess.2-feature_salinity.6-reduce_transform: 4 ms
fit.1-preprocess.2-feature_salinity.total: 75 ms
fit.1-preprocess: 75 ms
fit.1-preprocess.3-homogeniser: 1 ms
fit.1-preprocess.4-xarray: 1 ms
fit.1-preprocess: 228 ms
fit.fit: 3400 ms
fit.score: 12 ms
fit: 3642 ms
A posteriori Execution time analysis¶
[5]:
# Create a PCM and execute methods:
m = pcm(K=8, features=features_pcm, timeit=True, timeit_verb=0)
m.fit(ds, features=features_in_ds);
m.predict(ds, features=features_in_ds);
m.fit_predict(ds, features=features_in_ds);
Execution times are accessible through a dataframe with the pyxpcm.pcm.timeit
property
[6]:
m.timeit
[6]:
Method Sub-method Sub-sub-method Sub-sub-sub-method
fit 1-preprocess 1-mask total 60.623884
2-feature_temperature 1-ravel 29.070854
2-interp 1.361847
3-scale_fit 24.303198
4-scale_transform 5.542994
5-reduce_fit 17.215014
6-reduce_transform 4.530907
total 82.225800
total 405.465841
3-homogeniser total 3.330231
2-feature_salinity 1-ravel 33.647060
2-interp 1.427889
3-scale_fit 19.104004
4-scale_transform 16.283989
5-reduce_fit 13.432264
6-reduce_transform 3.180981
total 87.301970
4-xarray total 1.182079
fit total 1668.042660
score total 14.346838
total 1918.222189
predict 1-preprocess 1-mask total 64.723015
2-feature_temperature 1-ravel 28.513908
2-interp 1.239061
3-scale_fit 0.003099
4-scale_transform 7.060051
5-reduce_fit 0.002146
6-reduce_transform 2.730846
total 39.700031
total 235.766172
...
2-feature_salinity 6-reduce_transform 2.788067
total 44.227123
4-xarray total 1.113892
predict total 10.058880
score total 11.398077
xarray total 11.323929
total 184.562922
fit_predict 1-preprocess 1-mask total 64.216852
2-feature_temperature 1-ravel 26.321888
2-interp 1.183033
3-scale_fit 0.001907
4-scale_transform 5.228996
5-reduce_fit 0.000954
6-reduce_transform 2.723217
total 35.592079
total 224.620104
3-homogeniser total 2.858639
2-feature_salinity 1-ravel 29.989958
2-interp 1.201153
3-scale_fit 0.000954
4-scale_transform 5.232811
5-reduce_fit 0.001907
6-reduce_transform 4.884958
total 41.451693
4-xarray total 1.657963
fit total 2717.261076
score total 11.775970
predict total 10.827065
xarray total 10.989189
total 2898.393869
Length: 66, dtype: float64
Visualisation help¶
To facilitate your analysis of execution times, you can use pyxpcm.plot.timeit()
.
Main steps by method¶
[7]:
fig, ax, df = m.plot.timeit(group='Method', split='Sub-method', style='darkgrid') # Default group/split
df
[7]:
Sub-method | 1-preprocess | fit | predict | score | xarray |
---|---|---|---|---|---|
Method | |||||
fit | 809.230804 | 1668.042660 | NaN | 14.346838 | NaN |
fit_predict | 447.169065 | 2717.261076 | 10.827065 | 11.775970 | 10.989189 |
predict | 469.947577 | NaN | 10.058880 | 11.398077 | 11.323929 |

Preprocessing main steps by method¶
[8]:
fig, ax, df = m.plot.timeit(group='Method', split='Sub-sub-method')
df
[8]:
Sub-sub-method | 1-mask | 2-feature_salinity | 2-feature_temperature | 3-homogeniser | 4-xarray |
---|---|---|---|---|---|
Method | |||||
fit | 60.623884 | 174.378157 | 164.250612 | 3.330231 | 1.182079 |
fit_predict | 64.216852 | 82.763433 | 71.052074 | 2.858639 | 1.657963 |
predict | 64.723015 | 88.269234 | 79.249144 | 0.826120 | 1.113892 |

Preprocessing details by method¶
[9]:
fig, ax, df = m.plot.timeit(group='Method', split='Sub-sub-sub-method')
df
[9]:
Sub-sub-sub-method | 1-ravel | 2-interp | 3-scale_fit | 4-scale_transform | 5-reduce_fit | 6-reduce_transform |
---|---|---|---|---|---|---|
Method | ||||||
fit | 62.717915 | 2.789736 | 43.407202 | 21.826982 | 30.647278 | 7.711887 |
fit_predict | 56.311846 | 2.384186 | 0.002861 | 10.461807 | 0.002861 | 7.608175 |
predict | 60.415030 | 4.472017 | 0.005245 | 13.175964 | 0.004053 | 5.518913 |

Preprocessing details by features¶
[10]:
fig, ax, df = m.plot.timeit(split='Sub-sub-sub-method', group='Sub-sub-method', unit='s')
df
[10]:
Sub-sub-sub-method | 1-ravel | 2-interp | 3-scale_fit | 4-scale_transform | 5-reduce_fit | 6-reduce_transform |
---|---|---|---|---|---|---|
Sub-sub-method | ||||||
2-feature_salinity | 0.095538 | 0.005862 | 0.019107 | 0.027633 | 0.013436 | 0.010854 |
2-feature_temperature | 0.083907 | 0.003784 | 0.024308 | 0.017832 | 0.017218 | 0.009985 |

Help & reference
Bibliography¶
- Maze G. et al. Coherent heat patterns revealed by unsupervised classification of Argo temperature profiles in the North Atlantic Ocean. Progress in Oceanography (2017). http://dx.doi.org/10.1016/j.pocean.2016.12.008
- Maze, G., et al. Profile Classification Models. Mercator Ocean Journal (2017). http://archimer.ifremer.fr/doc/00387/49816
- Jones D. et al. Unsupervised Clustering of Southern Ocean Argo Float Temperature Profiles. Journal of Geophysical Research: Oceans (2019). http://dx.doi.org/10.1029/2018JC014629
What’s New¶
v0.4 (1 Nov. 2019)¶
Warning
The API has changed, break backward compatibility.
Enhancements:
- Multiple-features classification
- ND-Array classification (so that you can classify directly profiles from gridded products, eg: latitude/longitude/time grid, and not only a collection of profiles already in 2D array)
- pyXpcm methods can be accessed through the
xarray.Dataset
accessor namespacepyxpcm
- Allow to choose statistic backends (sklearn, dask_ml or user-defined)
- Save/load PCM to/from netcdf files
pyXpcm now consumes xarray/dask objects all along, not only on the user front-end. This add a small overhead with small dataset but allows for PCM to handle large and more complex datasets.
v0.3 (5 Apr. 2019)¶
- Removed support for python 2.7
- Added more data input consistency checks
- Fix bug in interpolation and plotting methods
- Added custom colormap and colorbar to plot module
v0.2 (26 Mar. 2019)¶
- Upgrade to python 3.6 (compatible 2.7)
- Added test for continuous coverage
- Added score and bic methods
- Improved vocabulary consistency in methods
v0.1.3 (12 Nov. 2018)¶
- Initial release.
API reference¶
This page provides an auto-generated summary of pyXpcm’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.
Top-level PCM functions¶
Creating a PCM¶
pcm (K, features[, scaling, reduction, …]) |
Profile Classification Model class constructor |
pyxpcm.load_netcdf (ncfile) |
Load a PCM model from netcdf file |
Attributes¶
pcm.K |
Return the number of classes |
pcm.F |
Return the number of features |
pcm.features |
Return features definition dictionnary |
Computation¶
pcm.fit (self, ds[, features, dim]) |
Estimate PCM parameters |
pcm.fit_predict (self, ds[, features, dim, …]) |
Estimate PCM parameters and predict classes. |
pcm.predict (self, ds[, features, dim, …]) |
Predict labels for profile samples |
pcm.predict_proba (self, ds[, features, dim, …]) |
Predict posterior probability of each components given the data |
pcm.score (self, ds[, features, dim]) |
Compute the per-sample average log-likelihood of the given data |
pcm.bic (self, ds[, features, dim]) |
Compute Bayesian information criterion for the current model on the input dataset |
Low-level PCM properties and functions¶
pcm.timeit |
Return a pandas.DataFrame with Execution time of method called on this instance |
pcm.ravel (self, da[, dim, feature_name]) |
Extract from N-d array a X(feature,sample) 2-d array and vertical dimension z |
pcm.unravel (self, ds, sampling_dims, X) |
Create a DataArray from a numpy array and sampling dimensions |
Plotting¶
pcm.plot |
Access plotting functions |
Plot PCM Contents¶
plot.quantile (m, da[, xlim, classdimname, …]) |
Plot q-th quantiles of a dataArray for each PCM components |
plot.scaler (m[, style, plot_kw, subplot_kw]) |
Plot PCM scalers properties |
plot.reducer (m[, pcalist, style, maxcols, …]) |
Plot PCM reducers properties |
plot.preprocessed (m, ds[, features, dim, n, …]) |
Plot preprocessed features as pairwise scatter plots |
plot.timeit (m[, group, split, subplot_kw, …]) |
Plot PCM registered timing of operations |
Tools¶
plot.cmap (m, name[, palette, usage]) |
Return categorical colormaps |
plot.colorbar (m[, cmap]) |
Add a colorbar to the current plot with centered ticks on discrete colors |
plot.subplots (m[, maxcols, K, subplot_kw]) |
Return (figure, axis) with one subplot per cluster |
plot.latlongrid (ax[, dx, dy, fontsize]) |
Add latitude/longitude grid line and labels to a cartopy geoaxes |
Statistics¶
pcm.stat |
Access statistics functions |
stat.quantile (ds[, q, of, using, outname, …]) |
Compute q-th quantile of a xarray.DataArray for each PCM components |
stat.robustness (ds[, name, classdimname, …]) |
Compute classification robustness |
stat.robustness_digit (ds[, name, …]) |
Digitize classification robustness |
Save/load PCM models¶
pcm.to_netcdf (self, ncfile, \*\*ka) |
Save a PCM to a netcdf file |
pyxpcm.load_netcdf (ncfile) |
Load a PCM model from netcdf file |
Helper¶
tutorial.open_dataset (name) |
Open a dataset from the pyXpcm online data repository (requires internet). |
Xarray pyxpcm name space¶
Provide accessor to enhance interoperability between xarray
and pyxpcm
.
Provide a scope named pyxpcm
as accessor to xarray.Dataset
objects.
-
class
pyxpcm.xarray.
pyXpcmDataSetAccessor
[source]¶ Class registered under scope
pyxpcm
to accessxarray.Dataset
objects.-
add
(self, da)[source]¶ Add a
xarray.DataArray
to thisxarray.Dataset
-
bic
(self, this_pcm, **kwargs)[source]¶ Compute Bayesian information criterion for the current model on the input dataset
Only for a GMM classifier
Parameters: - ds: :class:`xarray.Dataset`
The dataset to work with
- features: dict()
Definitions of PCM features in the input
xarray.Dataset
. If not specified or set to None, features are identified usingxarray.DataArray
attributes ‘feature_name’.- dim: str
Name of the vertical dimension in the input
xarray.Dataset
Returns: - bic: float
The lower the better
-
drop_all
(self)[source]¶ Remove
xarray.DataArray
created with pyXpcm front thisxarray.Dataset
-
feature_dict
(self, this_pcm, features=None)[source]¶ Return dictionary of features for this
xarray.Dataset
and a PCMParameters: - pcm :
pyxpcm.pcmmodel.pcm
- features : dict
Keys are PCM feature name, Values are corresponding
xarray.Dataset
variable names
Returns: - dict()
Dictionary where keys are PCM feature names and values the corresponding
xarray.Dataset
variables
- pcm :
-
fit
(self, this_pcm, **kwargs)[source]¶ Estimate PCM parameters
For a PCM, the fit method consists in the following operations:
- pre-processing
- interpolation to the
feature_axis
levels of the model - scaling
- reduction
- interpolation to the
- estimate classifier parameters
Parameters: - ds: :class:`xarray.Dataset`
The dataset to work with
- features: dict()
Definitions of PCM features in the input
xarray.Dataset
. If not specified or set to None, features are identified usingxarray.DataArray
attributes ‘feature_name’.- dim: str
Name of the vertical dimension in the input
xarray.Dataset
Returns: - self
-
fit_predict
(self, this_pcm, **kwargs)[source]¶ Estimate PCM parameters and predict classes.
This method add these properties to the PCM object:
llh
: The log likelihood of the model with regard to new data
Parameters: - ds: :class:`xarray.Dataset`
The dataset to work with
- features: dict()
Definitions of PCM features in the input
xarray.Dataset
. If not specified or set to None, features are identified usingxarray.DataArray
attributes ‘feature_name’.- dim: str
Name of the vertical dimension in the input
xarray.Dataset
- inplace: boolean, False by default
If False, return a
xarray.DataArray
with predicted labels If True, return the inputxarray.Dataset
with labels added as a newxarray.DataArray
- name: string (‘PCM_LABELS’)
Name of the DataArray holding labels.
Returns: xarray.DataArray
Component labels (if option ‘inplace’ = False)
- or
xarray.Dataset
Input dataset with component labels as a ‘PCM_LABELS’ new
xarray.DataArray
(if option ‘inplace’ = True)
-
mask
(self, this_pcm, features=None, dim=None)[source]¶ Create a mask where all PCM features are defined
Create a mask where all feature profiles are not null over the this_pcm feature axis.
Parameters: - :class:`pyxpcm.pcmmodel.pcm`
- features : dict()
Definitions of this_pcm features in the
xarray.Dataset
. If not specified or set to None, features are identified usingxarray.DataArray
attributes ‘feature_name’.- dim : str
Name of the vertical dimension in the
xarray.Dataset
. If not specified or set to None, dim is identified as thexarray.DataArray
variables with attributes ‘axis’ set to ‘z’.
Returns:
-
predict
(self, this_pcm, inplace=False, **kwargs)[source]¶ Predict labels for profile samples
This method add these properties to the PCM object:
llh
: The log likelihood of the model with regard to new data
Parameters: - ds: :class:`xarray.Dataset`
The dataset to work with
- features: dict()
Definitions of PCM features in the input
xarray.Dataset
. If not specified or set to None, features are identified usingxarray.DataArray
attributes ‘feature_name’.- dim: str
Name of the vertical dimension in the input
xarray.Dataset
- inplace: boolean, False by default
If False, return a
xarray.DataArray
with predicted labels If True, return the inputxarray.Dataset
with labels added as a newxarray.DataArray
- name: str, default is ‘PCM_LABELS’
Name of the
xarray.DataArray
with labels
Returns: xarray.DataArray
Component labels (if option ‘inplace’ = False)
- or
xarray.Dataset
Input dataset with Component labels as a ‘PCM_LABELS’ new
xarray.DataArray
(if option ‘inplace’ = True)
-
predict_proba
(self, this_pcm, **kwargs)[source]¶ Predict posterior probability of each components given the data
This method adds these properties to the PCM instance:
llh
: The log likelihood of the model with regard to new data
Parameters: - ds: :class:`xarray.Dataset`
The dataset to work with
- features: dict()
Definitions of PCM features in the input
xarray.Dataset
. If not specified or set to None, features are identified usingxarray.DataArray
attributes ‘feature_name’.- dim: str
Name of the vertical dimension in the input
xarray.Dataset
- inplace: boolean, False by default
If False, return a
xarray.DataArray
with predicted probabilities If True, return the inputxarray.Dataset
with probabilities added as a newxarray.DataArray
- name: str, default is ‘PCM_POST’
Name of the DataArray with prediction probability (posteriors)
- classdimname: str, default is ‘pcm_class’
Name of the dimension holding classes
Returns: xarray.DataArray
Probability of each Gaussian (state) in the model given each sample (if option ‘inplace’ = False)
- or
xarray.Dataset
Input dataset with Component Probability as a ‘PCM_POST’ new
xarray.DataArray
(if option ‘inplace’ = True)
-
quantile
(self, this_pcm, inplace=False, **kwargs)[source]¶ Compute q-th quantile of a
xarray.DataArray
for each PCM componentsParameters: - q: float in the range of [0,1] (or sequence of floats)
Quantiles to compute, which must be between 0 and 1 inclusive.
- of: str
Name of the
xarray.Dataset
variable to compute quantiles for.- using: str
Name of the
xarray.Dataset
variable with classification labels to use. Use ‘PCM_LABELS’ by default.- outname: ‘PCM_QUANT’ or str
Name of the
xarray.DataArray
with quantile- keep_attrs: boolean, False by default
Preserve
of
xarray.Dataset
attributes or not in the new quantile variable.
Returns: xarray.Dataset
with shape (K, n_quantiles, N_z=n_features)- or
xarray.DataArray
with shape (K, n_quantiles, N_z=n_features)
-
robustness
(self, this_pcm, inplace=False, **kwargs)[source]¶ Compute classification robustness
Parameters: - name: str, default is ‘PCM_POST’
Name of the
xarray.DataArray
with prediction probability (posteriors)- classdimname: str, default is ‘pcm_class’
Name of the dimension holding classes
- outname: ‘PCM_ROBUSTNESS’ or str
Name of the
xarray.DataArray
with robustness- inplace: boolean, False by default
If False, return a
xarray.DataArray
with robustness If True, return the inputxarray.Dataset
with robustness added as a newxarray.DataArray
Returns: xarray.Dataset
if inplace=True- or
xarray.DataArray
if inplace=False
-
robustness_digit
(self, this_pcm, inplace=False, **kwargs)[source]¶ Digitize classification robustness
Parameters: - ds: :class:`xarray.Dataset`
Input dataset
- name: str, default is ‘PCM_POST’
Name of the
xarray.DataArray
with prediction probability (posteriors)- classdimname: str, default is ‘pcm_class’
Name of the dimension holding classes
- outname: ‘PCM_ROBUSTNESS_CAT’ or str
Name of the
xarray.DataArray
with robustness categories- inplace: boolean, False by default
If False, return a
xarray.DataArray
with robustness If True, return the inputxarray.Dataset
with robustness categories added as a newxarray.DataArray
Returns: xarray.Dataset
if inplace=True- or
xarray.DataArray
if inplace=False
-
sampling_dim
(self, this_pcm, features=None, dim=None)[source]¶ Return the list of dimensions to be stacked for sampling
Parameters: - pcm :
pyxpcm.pcm
- features : None (default) or dict()
Keys are PCM feature name, Values are corresponding
xarray.Dataset
variable names. It set to None, all PCM features are used.- dim : None (default) or str()
The
xarray.Dataset
dimension to use as vertical axis in all features. If set to None, it is automatically set to the dimension with an attributeaxis
set toZ
.
Returns: - dict()
Dictionary where keys are
xarray.Dataset
variable names of features and values are another dictionary with the list of sampling dimension in DIM_SAMPLING key and the name of the vertical axis in the DIM_VERTICAL key.
- pcm :
-
score
(self, this_pcm, **kwargs)[source]¶ Compute the per-sample average log-likelihood of the given data
Parameters: - ds: :class:`xarray.Dataset`
The dataset to work with
- features: dict()
Definitions of PCM features in the input
xarray.Dataset
. If not specified or set to None, features are identified usingxarray.DataArray
attributes ‘feature_name’.- dim: str
Name of the vertical dimension in the input
xarray.Dataset
Returns: - log_likelihood: float
In the case of a GMM classifier, this is the Log likelihood of the Gaussian mixture given data
-
split
(self)[source]¶ Split pyXpcm variables from the original
xarray.Dataset
Returns: xarray.Dataset
,xarray.Dataset
Two DataSest: one with pyXpcm variables, one with the original DataSet
-