
N-dimensional labelled arrays¶
LArray is open source Python library that aims to provide tools for easy exploration and manipulation of N-dimensional labelled data structures.
Library Highlights¶
N-dimensional labelled array objects to store and manipulate multi-dimensional data
I/O functions for reading and writing arrays in different formats: CSV, Microsoft Excel, HDF5, pickle
Arrays can be grouped into Session objects and loaded/dumped at once
User interface with an IPython console for rapid exploration of data
Compatible with the pandas library: LArray objects can be converted into pandas DataFrame and vice versa.

Documentation¶
The official documentation is hosted on ReadTheDocs at http://larray.readthedocs.io/en/stable/
Get in touch¶
To be informed of each new release, please subscribe to the announce mailing list.
For questions, ideas or general discussion, please use the Google Users Group.
To report bugs, suggest features or view the source code, please go to our GitHub website.
You can subscribe to the announce mailing list by entering your email address here (if you are connected to your Google account but you want to subscribe using another address, please log out first):
Contents¶
Installation¶
Pre-built binaries¶
The easiest route to installing larray is through Conda. For all platforms installing larray can be done with:
conda install -c gdementen larray
This will install a lightweight version of larray depending only on Numpy and Pandas libraries only. Additional libraries are required to use the included graphical user interface, make plots or use special I/O functions for easy dump/load from Excel or HDF files. Optional dependencies are described below.
Installing larray with all optional dependencies can be done with
conda install -c gdementen larrayenv
You can also first add the channel gdementen to your channel list
conda config --add channels gdementen
and then install larray (or larrayenv) as
conda install larray
Building from source¶
The latest release of LArray is available from https://github.com/larray-project/larray.git
Once you have satisfied the requirements detailed below, simply run:
python setup.py install
Optional Dependencies¶
For IO (HDF, Excel)¶
pytables: for working with files in HDF5 format.
xlwings: recommended package to get benefit of all Excel features of LArray. Only available on Windows and Mac platforms.
xlrd: for reading data and formatting information from older Excel files (ie: .xls)
openpyxl: recommended package for reading and writing Excel 2010 files (ie: .xlsx)
xlsxwriter: alternative package for writing data, formatting information and, in particular, charts in the Excel 2010 format (ie: .xlsx)
larray_eurostat: provides functions to easily download EUROSTAT files as larray objects. Currently limited to TSV files.
For Graphical User Interface¶
LArray includes a graphical user interface to view, edit and compare arrays.
pyqt (4 or 5): required by larray-editor (see below).
pyside: alternative to PyQt.
qtpy: required by larray-editor. Provides support for PyQt5, PyQt4 and PySide using the PyQt5 layout.
larray-editor: required to use the graphical user interface associated with larray. It assumes that qtpy and pyqt or pyside are installed. On windows, creates also a menu
LArray
in the Windows Start Menu.
For plotting¶
matplotlib: required for plotting.
Update¶
If larray has been installed using conda, update is done via
conda update larray
Be careful if you have installed optional dependencies. In that case, you may have to update some of them.
If larray has been installed using conda via larrayenv, you simply must do
conda update larrayenv
For Windows users who have larrayenv (>= 0.25) installed, simply click on the
Update LArray
link in the the Windows Start Menu > LArray.

Tutorial¶
This is an overview of the LArray library. It is not intended to be a fully comprehensive manual. It is mainly dedicated to help new users to familiarize with it and others to remind essentials.
Getting Started¶
The purpose of the present Getting Started section is to give a quick overview of the main objects and features of the LArray library. To get a more detailed presentation of all capabilities of LArray, read the next sections of the tutorial. The API Reference section of the documentation give you the list of all objects, methods and functions with their individual documentation and examples.
To use the LArray library, the first thing to do is to import it:
In [1]: from larray import *
Create an array¶
Working with the LArray library mainly consists of manipulating LArray data structures. They represent N-dimensional labelled arrays and are composed of raw data (NumPy ndarray), axes and optionally some metadata.
An axis represents a dimension of an array. It contains a list of labels and has a name:
# define some axes to be used later
In [2]: age = Axis(['0-9', '10-17', '18-66', '67+'], 'age')
In [3]: sex = Axis(['F', 'M'], 'sex')
In [4]: year = Axis([2015, 2016, 2017], 'year')
The labels allow to select subsets and to manipulate the data without working with the positions of array elements directly.
To create an array from scratch, you need to supply data and axes:
# define some data. This is the belgian population (in thousands). Source: eurostat.
In [5]: data = [[[633, 635, 634],
...: [663, 665, 664]],
...: [[484, 486, 491],
...: [505, 511, 516]],
...: [[3572, 3581, 3583],
...: [3600, 3618, 3616]],
...: [[1023, 1038, 1053],
...: [756, 775, 793]]]
...:
# create an LArray object
In [6]: pop = LArray(data, axes=[age, sex, year])
In [7]: pop
Out[7]:
age sex\year 2015 2016 2017
0-9 F 633 635 634
0-9 M 663 665 664
10-17 F 484 486 491
10-17 M 505 511 516
18-66 F 3572 3581 3583
18-66 M 3600 3618 3616
67+ F 1023 1038 1053
67+ M 756 775 793
You can optionally attach some metadata to an array:
# attach some metadata to the pop array
In [8]: pop.meta.title = 'population by age, sex and year'
In [9]: pop.meta.source = 'Eurostat'
# display metadata
In [10]: pop.meta
Out[10]:
title: population by age, sex and year
source: Eurostat
To get a short summary of an array, type:
# Array summary: metadata + dimensions + description of axes
In [11]: pop.info
Out[11]:
title: population by age, sex and year
source: Eurostat
4 x 2 x 3
age [4]: '0-9' '10-17' '18-66' '67+'
sex [2]: 'F' 'M'
year [3]: 2015 2016 2017
dtype: int64
memory used: 192 bytes
Create an array filled with predefined values¶
Arrays filled with predefined values can be generated through dedicated functions:
zeros()
: creates an array filled with 0
In [12]: zeros([age, sex])
Out[12]:
age\sex F M
0-9 0.0 0.0
10-17 0.0 0.0
18-66 0.0 0.0
67+ 0.0 0.0
ones()
: creates an array filled with 1
In [13]: ones([age, sex])
Out[13]:
age\sex F M
0-9 1.0 1.0
10-17 1.0 1.0
18-66 1.0 1.0
67+ 1.0 1.0
full()
: creates an array filled with a given value
In [14]: full([age, sex], fill_value=10.0)
Out[14]:
age\sex F M
0-9 10.0 10.0
10-17 10.0 10.0
18-66 10.0 10.0
67+ 10.0 10.0
sequence()
: creates an array by sequentially applying modifications to the array along axis.
In [15]: sequence(age)
Out[15]:
age 0-9 10-17 18-66 67+
0 1 2 3
ndtest()
: creates a test array with increasing numbers as data
In [16]: ndtest([age, sex])
Out[16]:
age\sex F M
0-9 0 1
10-17 2 3
18-66 4 5
67+ 6 7
Save/Load an array¶
The LArray library offers many I/O functions to read and write arrays in various formats
(CSV, Excel, HDF5). For example, to save an array in a CSV file, call the method
to_csv()
:
# save our pop array to a CSV file
In [17]: pop.to_csv('belgium_pop.csv')
The content of the CSV file is then:
age,sex\time,2015,2016,2017
0-9,F,633,635,634
0-9,M,663,665,664
10-17,F,484,486,491
10-17,M,505,511,516
18-66,F,3572,3581,3583
18-66,M,3600,3618,3616
67+,F,1023,1038,1053
67+,M,756,775,793
Note
In CSV or Excel files, the last dimension is horizontal and the names of the
last two dimensions are separated by a \
.
To load a saved array, call the function read_csv()
:
In [18]: pop = read_csv('belgium_pop.csv')
In [19]: pop
Out[19]:
age sex\year 2015 2016 2017
0-9 F 633 635 634
0-9 M 663 665 664
10-17 F 484 486 491
10-17 M 505 511 516
18-66 F 3572 3581 3583
18-66 M 3600 3618 3616
67+ F 1023 1038 1053
67+ M 756 775 793
Other input/output functions are described in the Input/Output section of the API documentation.
Selecting a subset¶
To select an element or a subset of an array, use brackets [ ]. In Python we usually use the term indexing for this operation.
Let us start by selecting a single element:
In [20]: pop['67+', 'F', 2017]
Out[20]: 1053
Labels can be given in arbitrary order:
In [21]: pop[2017, 'F', '67+']
Out[21]: 1053
When selecting a larger subset the result is an array:
In [22]: pop[2017] Out[22]: age\sex F M 0-9 634 664 10-17 491 516 18-66 3583 3616 67+ 1053 793 In [23]: pop['M'] Out[23]: age\year 2015 2016 2017 0-9 663 665 664 10-17 505 511 516 18-66 3600 3618 3616 67+ 756 775 793
When selecting several labels for the same axis, they must be given as a list (enclosed by [ ]
)
In [24]: pop['F', ['0-9', '10-17']]
Out[24]:
age\year 2015 2016 2017
0-9 633 635 634
10-17 484 486 491
You can also select slices, which are all labels between two bounds (we usually call them the start and stop bounds). Specifying the start and stop bounds of a slice is optional: when not given, start is the first label of the corresponding axis, stop the last one:
# in this case '10-17':'67+' is equivalent to ['10-17', '18-66', '67+'] In [25]: pop['F', '10-17':'67+'] Out[25]: age\year 2015 2016 2017 10-17 484 486 491 18-66 3572 3581 3583 67+ 1023 1038 1053 # :'18-66' selects all labels between the first one and '18-66' # 2017: selects all labels between 2017 and the last one In [26]: pop[:'18-66', 2017:] Out[26]: age sex\year 2017 0-9 F 634 0-9 M 664 10-17 F 491 10-17 M 516 18-66 F 3583 18-66 M 3616
Note
Contrary to slices on normal Python lists, the stop
bound is included in the selection.
Warning
Selecting by labels as above only works as long as there is no ambiguity. When several axes have some labels in common and you do not specify explicitly on which axis to work, it fails with an error ending with something like ValueError: <somelabel> is ambiguous (valid in <axis1>, <axis2>).
For example, let us create a test array with an ambiguous label. We first create an axis (some kind of status code) with an ‘F’ label (remember we already have an ‘F’ label on the sex axis).
In [27]: status = Axis(['A', 'C', 'F'], 'status')
Then create a test array using both axes ‘sex’ and ‘status’:
In [28]: ambiguous_arr = ndtest([sex, status, year])
In [29]: ambiguous_arr
Out[29]:
sex status\year 2015 2016 2017
F A 0 1 2
F C 3 4 5
F F 6 7 8
M A 9 10 11
M C 12 13 14
M F 15 16 17
If we try to get the subset of our array concerning women (represented by the ‘F’ label in our array), we might try something like:
In [30]: ambiguous_arr[2017, 'F']
… but we receive back a volley of insults
[some long error message ending with the line below]
[...]
ValueError: F is ambiguous (valid in sex, status)
In that case, we have to specify explicitly which axis the ‘F’ label we want to select belongs to:
In [31]: ambiguous_arr[2017, sex['F']]
Out[31]:
status A C F
2 5 8
Aggregation¶
The LArray library includes many aggregations methods: sum, mean, min, max, std, var, …
For example, assuming we still have an array in the pop
variable:
In [32]: pop
Out[32]:
age sex\year 2015 2016 2017
0-9 F 633 635 634
0-9 M 663 665 664
10-17 F 484 486 491
10-17 M 505 511 516
18-66 F 3572 3581 3583
18-66 M 3600 3618 3616
67+ F 1023 1038 1053
67+ M 756 775 793
We can sum along the ‘sex’ axis using:
In [33]: pop.sum(sex)
Out[33]:
age\year 2015 2016 2017
0-9 1296 1300 1298
10-17 989 997 1007
18-66 7172 7199 7199
67+ 1779 1813 1846
Or sum along both ‘age’ and ‘sex’:
In [34]: pop.sum(age, sex)
Out[34]:
year 2015 2016 2017
11236 11309 11350
It is sometimes more convenient to aggregate along all axes except some. In that case, use the aggregation methods ending with _by. For example:
In [35]: pop.sum_by(year)
Out[35]:
year 2015 2016 2017
11236 11309 11350
Groups¶
A Group represents a subset of labels or positions of an axis:
In [36]: children = age['0-9', '10-17']
In [37]: children
Out[37]: age['0-9', '10-17']
It is often useful to attach them an explicit name using the >>
operator:
In [38]: working = age['18-66'] >> 'working'
In [39]: working
Out[39]: age['18-66'] >> 'working'
In [40]: nonworking = age['0-9', '10-17', '67+'] >> 'nonworking'
In [41]: nonworking
Out[41]: age['0-9', '10-17', '67+'] >> 'nonworking'
Still using the same pop
array:
In [42]: pop
Out[42]:
age sex\year 2015 2016 2017
0-9 F 633 635 634
0-9 M 663 665 664
10-17 F 484 486 491
10-17 M 505 511 516
18-66 F 3572 3581 3583
18-66 M 3600 3618 3616
67+ F 1023 1038 1053
67+ M 756 775 793
Groups can be used in selections:
In [43]: pop[working] Out[43]: sex\year 2015 2016 2017 F 3572 3581 3583 M 3600 3618 3616 In [44]: pop[nonworking] Out[44]: age sex\year 2015 2016 2017 0-9 F 633 635 634 0-9 M 663 665 664 10-17 F 484 486 491 10-17 M 505 511 516 67+ F 1023 1038 1053 67+ M 756 775 793
or aggregations:
In [45]: pop.sum(nonworking)
Out[45]:
sex\year 2015 2016 2017
F 2140 2159 2178
M 1924 1951 1973
When aggregating several groups, the names we set above using >>
determines the label on the aggregated axis.
Since we did not give a name for the children group, the resulting label is generated automatically :
In [46]: pop.sum((children, working, nonworking))
Out[46]:
age sex\year 2015 2016 2017
0-9,10-17 F 1117 1121 1125
0-9,10-17 M 1168 1176 1180
working F 3572 3581 3583
working M 3600 3618 3616
nonworking F 2140 2159 2178
nonworking M 1924 1951 1973
Grouping arrays in a Session¶
Arrays may be grouped in Session objects. A session is an ordered dict-like container of LArray objects with special I/O methods. To create a session, you need to pass a list of pairs (array_name, array):
In [47]: pop = zeros([age, sex, year]) In [48]: births = zeros([age, sex, year]) In [49]: deaths = zeros([age, sex, year]) # create a session containing the three arrays 'pop', 'births' and 'deaths' In [50]: demo = Session(pop=pop, births=births, deaths=deaths) # displays names of arrays contained in the session In [51]: demo.names Out[51]: ['births', 'deaths', 'pop'] # get an array In [52]: demo['pop'] Out[52]: age sex\year 2015 2016 2017 0-9 F 0.0 0.0 0.0 0-9 M 0.0 0.0 0.0 10-17 F 0.0 0.0 0.0 10-17 M 0.0 0.0 0.0 18-66 F 0.0 0.0 0.0 18-66 M 0.0 0.0 0.0 67+ F 0.0 0.0 0.0 67+ M 0.0 0.0 0.0 # add/modify an array In [53]: demo['foreigners'] = zeros([age, sex, year])
Warning
If you are using a Python version prior to 3.6, you will have to pass a list of pairs to the Session constructor otherwise the arrays will be stored in an arbitrary order in the new session. For example, the session above must be created using the syntax: demo=Session([(‘pop’, pop), (‘births’, births), (‘deaths’, deaths)]).
One of the main interests of using sessions is to save and load many arrays at once:
# dump all arrays contained in the session 'demo' in one HDF5 file
In [54]: demo.save('demo.h5')
# load all arrays saved in the HDF5 file 'demo.h5' and store them in the session 'demo'
In [55]: demo = Session('demo.h5')
Graphical User Interface (viewer)¶
The LArray project provides an optional package called larray-editor allowing users to explore and edit arrays through a graphical interface. The larray-editor tool is automatically available when installing the larrayenv metapackage from conda.
To explore the content of arrays in read-only mode, import larray-editor
and call view()
In [56]: from larray_editor import *
# shows the arrays of a given session in a graphical user interface
In [57]: view(ses)
# the session may be directly loaded from a file
In [58]: view('my_session.h5')
# creates a session with all existing arrays from the current namespace
# and shows its content
In [59]: view()
To open the user interface in edit mode, call edit()
instead.

Once open, you can save and load any session using the File menu.
Finally, you can also visually compare two arrays or sessions using the compare()
function.
In [60]: arr0 = ndtest((3, 3))
In [61]: arr1 = ndtest((3, 3))
In [62]: arr1[['a1', 'a2']] = -arr1[['a1', 'a2']]
In [63]: compare(arr0, arr1)

In case of two arrays, they must have compatible axes.
For Windows Users¶
Installing the larray-editor
package on Windows will create a LArray
menu in the
Windows Start Menu. This menu contains:
a shortcut to open the documentation of the last stable version of the library
a shortcut to open the graphical interface in edit mode.
a shortcut to update larrayenv.


Once the graphical interface is open, all LArray objects and functions are directly accessible. No need to start by from larray import *.
Presenting LArray objects (Axis, Groups, LArray, Session)¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31-dev'
Axis¶
An Axis
represents a dimension of an LArray object. It consists of a name and a list of labels.
They are several ways to create an axis:
[4]:
# create a wildcard axis
age = Axis(3, 'age')
# labels given as a list
time = Axis([2007, 2008, 2009], 'time')
# create an axis using one string
sex = Axis('sex=M,F')
# labels generated using a special syntax
other = Axis('other=A01..C03')
age, sex, time, other
[4]:
(Axis(3, 'age'),
Axis(['M', 'F'], 'sex'),
Axis([2007, 2008, 2009], 'time'),
Axis(['A01', 'A02', 'A03', 'B01', 'B02', 'B03', 'C01', 'C02', 'C03'], 'other'))
See the Axis section of the API Reference to explore all methods of Axis objects.
Groups¶
A Group
represents a selection of labels from an Axis. It can optionally have a name (using operator >>
). Groups can be used when selecting a subset of an array and in aggregations.
Group objects are created as follow:
[5]:
# define an Axis object 'age'
age = Axis('age=0..100')
# create an anonymous Group object 'teens'
teens = age[10:20]
# create a Group object 'pensioners' with a name
pensioners = age[67:] >> 'pensioners'
teens
[5]:
age[10:20]
It is possible to set a name or to rename a group after its declaration:
[6]:
# method 'named' returns a new group with the given name
teens = teens.named('teens')
# operator >> is just a shortcut for the call of the method named
teens = teens >> 'teens'
teens
[6]:
age[10:20] >> 'teens'
See the Group section of the API Reference to explore all methods of Group objects.
LArray¶
A LArray
object represents a multidimensional array with labeled axes.
Create an array from scratch¶
To create an array from scratch, you need to provide the data and a list of axes. Optionally, metadata (title, description, creation date, authors, …) can be associated to the array:
[7]:
import numpy as np
# list of the axes
axes = [age, sex, time, other]
# data (the shape of data array must match axes lengths)
data = np.random.randint(100, size=[len(axis) for axis in axes])
# metadata
meta = [('title', 'random array')]
arr = LArray(data, axes, meta=meta)
arr
[7]:
age sex time\other A01 A02 A03 B01 B02 B03 C01 C02 C03
0 M 2007 29 72 28 96 79 25 16 30 87
0 M 2008 15 75 64 12 15 31 57 22 0
0 M 2009 38 25 48 91 13 56 35 55 3
0 F 2007 34 64 88 90 52 7 68 36 7
0 F 2008 22 30 23 62 19 31 33 33 48
... ... ... ... ... ... ... ... ... ... ... ...
100 M 2008 32 97 47 25 18 86 27 61 89
100 M 2009 87 42 21 76 63 37 23 55 69
100 F 2007 21 54 65 38 87 32 42 82 66
100 F 2008 85 56 93 96 77 10 7 16 56
100 F 2009 95 41 83 66 76 25 30 12 70
Metadata can be added to an array at any time using:
[8]:
arr.meta.description = 'array containing random values between 0 and 100'
arr.meta
[8]:
title: random array
description: array containing random values between 0 and 100
Warning:
Currently, only the HDF (.h5) file format supports saving and loading array metadata.
Metadata is not kept when actions or methods are applied on an array except for operations modifying the object in-place, such as
pop[age < 10] = 0
, and when the methodcopy()
is called. Do not add metadata to an array if you know you will apply actions or methods on it before dumping it.
Array creation functions¶
Arrays can also be generated in an easier way through creation functions:
ndtest
: creates a test array with increasing numbers as dataempty
: creates an array but leaves its allocated memory unchanged (i.e., it contains “garbage”. Be careful !)zeros
: fills an array with 0ones
: fills an array with 1full
: fills an array with a given valuesequence
: creates an array from an axis by iteratively applying a function to a given initial value.
Except for ndtest, a list of axes must be provided. Axes can be passed in different ways:
as Axis objects
as integers defining the lengths of auto-generated wildcard axes
as a string : ‘sex=M,F;time=2007,2008,2009’ (name is optional)
as pairs (name, labels)
Optionally, the type of data stored by the array can be specified using argument dtype.
[9]:
# start defines the starting value of data
ndtest(['age=0..2', 'sex=M,F', 'time=2007..2009'], start=-1)
[9]:
age sex\time 2007 2008 2009
0 M -1 0 1
0 F 2 3 4
1 M 5 6 7
1 F 8 9 10
2 M 11 12 13
2 F 14 15 16
[10]:
# start defines the starting value of data
# label_start defines the starting index of labels
ndtest((3, 3), start=-1, label_start=2)
[10]:
a\b b2 b3 b4
a2 -1 0 1
a3 2 3 4
a4 5 6 7
[11]:
# empty generates uninitialised array with correct axes
# (much faster but use with care!).
# This not really random either, it just reuses a portion
# of memory that is available, with whatever content is there.
# Use it only if performance matters and make sure all data
# will be overridden.
empty(['age=0..2', 'sex=M,F', 'time=2007..2009'])
[11]:
age sex\time ...
0 M ...
0 F ...
1 M ...
1 F ...
2 M ...
2 F ...
[12]:
# example with anonymous axes
zeros(['0..2', 'M,F', '2007..2009'])
[12]:
{0} {1}\{2} 2007 2008 2009
0 M 0.0 0.0 0.0
0 F 0.0 0.0 0.0
1 M 0.0 0.0 0.0
1 F 0.0 0.0 0.0
2 M 0.0 0.0 0.0
2 F 0.0 0.0 0.0
[13]:
# dtype=int forces to store int data instead of default float
ones(['age=0..2', 'sex=M,F', 'time=2007..2009'], dtype=int)
[13]:
age sex\time 2007 2008 2009
0 M 1 1 1
0 F 1 1 1
1 M 1 1 1
1 F 1 1 1
2 M 1 1 1
2 F 1 1 1
[14]:
full(['age=0..2', 'sex=M,F', 'time=2007..2009'], 1.23)
[14]:
age sex\time 2007 2008 2009
0 M 1.23 1.23 1.23
0 F 1.23 1.23 1.23
1 M 1.23 1.23 1.23
1 F 1.23 1.23 1.23
2 M 1.23 1.23 1.23
2 F 1.23 1.23 1.23
All the above functions exist in *(func)_like* variants which take axes from another array
[15]:
ones_like(arr)
[15]:
age sex time\other A01 A02 A03 B01 B02 B03 C01 C02 C03
0 M 2007 1 1 1 1 1 1 1 1 1
0 M 2008 1 1 1 1 1 1 1 1 1
0 M 2009 1 1 1 1 1 1 1 1 1
0 F 2007 1 1 1 1 1 1 1 1 1
0 F 2008 1 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ... ... ... ...
100 M 2008 1 1 1 1 1 1 1 1 1
100 M 2009 1 1 1 1 1 1 1 1 1
100 F 2007 1 1 1 1 1 1 1 1 1
100 F 2008 1 1 1 1 1 1 1 1 1
100 F 2009 1 1 1 1 1 1 1 1 1
Create an array using the special sequence
function (see link to documention of sequence
in API reference for more examples):
[16]:
# With initial=1.0 and inc=0.5, we generate the sequence 1.0, 1.5, 2.0, 2.5, 3.0, ...
sequence('sex=M,F', initial=1.0, inc=0.5)
[16]:
sex M F
1.0 1.5
Inspecting LArray objects¶
[17]:
# create a test array
arr = ndtest([age, sex, time, other])
Get array summary : metadata + dimensions + description of axes + dtype + size in memory
[18]:
arr.info
[18]:
101 x 2 x 3 x 9
age [101]: 0 1 2 ... 98 99 100
sex [2]: 'M' 'F'
time [3]: 2007 2008 2009
other [9]: 'A01' 'A02' 'A03' ... 'C01' 'C02' 'C03'
dtype: int64
memory used: 42.61 Kb
Get axes
[19]:
arr.axes
[19]:
AxisCollection([
Axis([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], 'age'),
Axis(['M', 'F'], 'sex'),
Axis([2007, 2008, 2009], 'time'),
Axis(['A01', 'A02', 'A03', 'B01', 'B02', 'B03', 'C01', 'C02', 'C03'], 'other')
])
Get number of dimensions
[20]:
arr.ndim
[20]:
4
Get length of each dimension
[21]:
arr.shape
[21]:
(101, 2, 3, 9)
Get total number of elements of the array
[22]:
arr.size
[22]:
5454
Get type of internal data (int, float, …)
[23]:
arr.dtype
[23]:
dtype('int64')
Get size in memory
[24]:
arr.memory_used
[24]:
'42.61 Kb'
Display the array in the viewer (graphical user interface) in read-only mode. This will open a new window and block execution of the rest of code until the windows is closed! Required PyQt installed.
view(arr)
Or load it in Excel:
arr.to_excel()
More on LArray objects¶
To know how to save and load arrays in CSV, Excel or HDF format, please refer to the Loading and Dumping Arrays section of the tutorial.
See the LArray section of the API Reference to explore all methods of LArray objects.
Session¶
A Session
object is a dictionary-like object used to gather several arrays, axes and groups. A session is particularly adapted to gather all input objects of a model or to gather the output arrays from different scenarios. Like with arrays, it is possible to associate metadata to sessions.
Creating Sessions¶
To create a session, you can first create an empty session and then populate it with arrays, axes and groups:
[25]:
# create an empty session
s_pop = Session()
# add axes to the session
gender = Axis("gender=Male,Female")
s_pop.gender = gender
time = Axis("time=2013,2014,2015")
s_pop.time = time
# add arrays to the session
s_pop.pop = zeros((gender, time))
s_pop.births = zeros((gender, time))
s_pop.deaths = zeros((gender, time))
# add metadata after creation
s_pop.meta.title = 'Demographic Model of Belgium'
s_pop.meta.description = 'Models the demography of Belgium'
# print content of the session
print(s_pop.summary())
Metadata:
title: Demographic Model of Belgium
description: Models the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015] (3)
pop: gender, time (2 x 3) [float64]
births: gender, time (2 x 3) [float64]
deaths: gender, time (2 x 3) [float64]
or you can create and populate a session in one step:
[26]:
gender = Axis("gender=Male,Female")
time = Axis("time=2013,2014,2015")
# create and populate a new session in one step
# Python <= 3.5
s_pop = Session([('gender', gender), ('time', time), ('pop', zeros((gender, time))),
('births', zeros((gender, time))), ('deaths', zeros((gender, time)))],
meta=[('title', 'Demographic Model of Belgium'),('description', 'Modelize the demography of Belgium')])
# Python 3.6+
s_pop = Session(gender=gender, time=time, pop=zeros((gender, time)),
births=zeros((gender, time)), deaths=zeros((gender, time)),
meta=Metadata(title='Demographic Model of Belgium', description='Modelize the demography of Belgium'))
# print content of the session
print(s_pop.summary())
Metadata:
title: Demographic Model of Belgium
description: Modelize the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015] (3)
pop: gender, time (2 x 3) [float64]
births: gender, time (2 x 3) [float64]
deaths: gender, time (2 x 3) [float64]
Warning:
Contrary to array metadata, saving and loading session metadata is supported for all current session file formats: Excel, CSV and HDF (.h5).
Metadata is not kept when actions or methods are applied on a session except for operations modifying a session in-place, such as:
s.arr1 = 0
. Do not add metadata to a session if you know you will apply actions or methods on it before dumping it.
More on Session objects¶
To know how to save and load sessions in CSV, Excel or HDF format, please refer to the Loading and Dumping Sessions section of the tutorial.
To see how to work with sessions, please read the Working With Sessions section of the tutorial.
Finally, see the Session section of the API Reference to explore all methods of Session objects.
Load And Dump Arrays, Sessions, Axes And Groups¶
LArray provides methods and functions to load and dump LArray, Session, Axis Group objects to several formats such as Excel, CSV and HDF5. The HDF5 file format is designed to store and organize large amounts of data. It allows to read and write data much faster than when working with CSV and Excel files.
[2]:
# first of all, import the LArray library
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31-dev'
Loading and Dumping Arrays¶
Loading Arrays - Basic Usage (CSV, Excel, HDF5)¶
To read an array from a CSV file, you must use the read_csv
function:
[4]:
csv_dir = get_example_filepath('examples')
# read the array pop from the file 'pop.csv'.
# The data of the array below is derived from a subset of the demo_pjan table from Eurostat
pop = read_csv(csv_dir + '/pop.csv')
pop
[4]:
country gender\time 2013 2014 2015
Belgium Male 5472856 5493792 5524068
Belgium Female 5665118 5687048 5713206
France Male 31772665 31936596 32175328
France Female 33827685 34005671 34280951
Germany Male 39380976 39556923 39835457
Germany Female 41142770 41210540 41362080
To read an array from a sheet of an Excel file, you can use the read_excel
function:
[5]:
filepath_excel = get_example_filepath('examples.xlsx')
# read the array from the sheet 'births' of the Excel file 'examples.xlsx'
# The data of the array below is derived from a subset of the demo_fasec table from Eurostat
births = read_excel(filepath_excel, 'births')
births
[5]:
country gender\time 2013 2014 2015
Belgium Male 64371 64173 62561
Belgium Female 61235 60841 59713
France Male 415762 418721 409145
France Female 396581 400607 390526
Germany Male 349820 366835 378478
Germany Female 332249 348092 359097
The open_excel
function in combination with the load
method allows you to load several arrays from the same Workbook without opening and closing it several times:
# open the Excel file 'population.xlsx' and let it opened as long as you keep the indent.
# The Python keyword ``with`` ensures that the Excel file is properly closed even if an error occurs
with open_excel(filepath_excel) as wb:
# load the array 'pop' from the sheet 'pop'
pop = wb['pop'].load()
# load the array 'births' from the sheet 'births'
births = wb['births'].load()
# load the array 'deaths' from the sheet 'deaths'
deaths = wb['deaths'].load()
# the Workbook is automatically closed when getting out the block defined by the with statement
Warning: open_excel
requires to work on Windows and to have the library xlwings
installed.
The HDF5
file format is specifically designed to store and organize large amounts of data. Reading and writing data in this file format is much faster than with CSV or Excel. An HDF5 file can contain multiple arrays, each array being associated with a key. To read an array from an HDF5 file, you must use the read_hdf
function and provide the key associated with the array:
[6]:
filepath_hdf = get_example_filepath('examples.h5')
# read the array from the file 'examples.h5' associated with the key 'deaths'
# The data of the array below is derived from a subset of the demo_magec table from Eurostat
deaths = read_hdf(filepath_hdf, 'deaths')
deaths
[6]:
country gender\time 2013 2014 2015
Belgium Male 53908 51579 53631
Belgium Female 55426 53176 56910
France Male 287410 282381 297028
France Female 281955 277054 296779
Germany Male 429645 422225 449512
Germany Female 464180 446131 475688
Dumping Arrays - Basic Usage (CSV, Excel, HDF5)¶
To write an array in a CSV file, you must use the to_csv
method:
[7]:
# save the array pop in the file 'pop.csv'
pop.to_csv('pop.csv')
To write an array to a sheet of an Excel file, you can use the to_excel
method:
[8]:
# save the array pop in the sheet 'pop' of the Excel file 'population.xlsx'
pop.to_excel('population.xlsx', 'pop')
Note that to_excel
create a new Excel file if it does not exist yet. If the file already exists, a new sheet is added after the existing ones if that sheet does not already exists:
[9]:
# add a new sheet 'births' to the file 'population.xlsx' and save the array births in it
births.to_excel('population.xlsx', 'births')
To reset an Excel file, you simply need to set the overwrite_file
argument as True:
[10]:
# 1. reset the file 'population.xlsx' (all sheets are removed)
# 2. create a sheet 'pop' and save the array pop in it
pop.to_excel('population.xlsx', 'pop', overwrite_file=True)
The open_excel
function in combination with the dump()
method allows you to open a Workbook and to export several arrays at once. If the Excel file doesn’t exist, the overwrite_file
argument must be set to True.
Warning: The save
method must be called at the end of the block defined by the with statement to actually write data in the Excel file, otherwise you will end up with an empty file.
# to create a new Excel file, argument overwrite_file must be set to True
with open_excel('population.xlsx', overwrite_file=True) as wb:
# add a new sheet 'pop' and dump the array pop in it
wb['pop'] = pop.dump()
# add a new sheet 'births' and dump the array births in it
wb['births'] = births.dump()
# add a new sheet 'deaths' and dump the array deaths in it
wb['deaths'] = deaths.dump()
# actually write data in the Workbook
wb.save()
# the Workbook is automatically closed when getting out the block defined by the with statement
To write an array in an HDF5 file, you must use the to_hdf
function and provide the key that will be associated with the array:
[11]:
# save the array pop in the file 'population.h5' and associate it with the key 'pop'
pop.to_hdf('population.h5', 'pop')
Specifying Wide VS Narrow format (CSV, Excel)¶
By default, all reading functions assume that arrays are stored in the wide
format, meaning that their last axis is represented horizontally:
country \ time |
2013 |
2014 |
2015 |
---|---|---|---|
Belgium |
11137974 |
11180840 |
11237274 |
France |
65600350 |
65942267 |
66456279 |
By setting the wide
argument to False, reading functions will assume instead that arrays are stored in the narrow
format, i.e. one column per axis plus one value column:
country |
time |
value |
---|---|---|
Belgium |
2013 |
11137974 |
Belgium |
2014 |
11180840 |
Belgium |
2015 |
11237274 |
France |
2013 |
65600350 |
France |
2014 |
65942267 |
France |
2015 |
66456279 |
[12]:
# set 'wide' argument to False to indicate that the array is stored in the 'narrow' format
pop_BE_FR = read_csv(csv_dir + '/pop_narrow_format.csv', wide=False)
pop_BE_FR
[12]:
country\time 2013 2014 2015
Belgium 11137974 11180840 11237274
France 65600350 65942267 66456279
[13]:
# same for the read_excel function
pop_BE_FR = read_excel(filepath_excel, sheet='pop_narrow_format', wide=False)
pop_BE_FR
[13]:
country\time 2013 2014 2015
Belgium 11137974 11180840 11237274
France 65600350 65942267 66456279
By default, writing functions will set the name of the column containing the data to ‘value’. You can choose the name of this column by using the value_name
argument. For example, using value_name='population'
you can export the previous array as:
country |
time |
population |
---|---|---|
Belgium |
2013 |
11137974 |
Belgium |
2014 |
11180840 |
Belgium |
2015 |
11237274 |
France |
2013 |
65600350 |
France |
2014 |
65942267 |
France |
2015 |
66456279 |
[14]:
# dump the array pop_BE_FR in a narrow format (one column per axis plus one value column).
# By default, the name of the column containing data is set to 'value'
pop_BE_FR.to_csv('pop_narrow_format.csv', wide=False)
# same but replace 'value' by 'population'
pop_BE_FR.to_csv('pop_narrow_format.csv', wide=False, value_name='population')
[15]:
# same for the to_excel method
pop_BE_FR.to_excel('population.xlsx', 'pop_narrow_format', wide=False, value_name='population')
Like with the to_excel
method, it is possible to export arrays in a narrow
format using open_excel
. To do so, you must set the wide
argument of the dump
method to False:
with open_excel('population.xlsx') as wb:
# dump the array pop_BE_FR in a narrow format:
# one column per axis plus one value column.
# Argument value_name can be used to change the name of the
# column containing the data (default name is 'value')
wb['pop_narrow_format'] = pop_BE_FR.dump(wide=False, value_name='population')
# don't forget to call save()
wb.save()
# in the sheet 'pop_narrow_format', data is written as:
# | country | time | value |
# | ------- | ---- | -------- |
# | Belgium | 2013 | 11137974 |
# | Belgium | 2014 | 11180840 |
# | Belgium | 2015 | 11237274 |
# | France | 2013 | 65600350 |
# | France | 2014 | 65942267 |
# | France | 2015 | 66456279 |
Specifying Position in Sheet (Excel)¶
If you want to read an array from an Excel sheet which does not start at cell A1
(when there is more than one array stored in the same sheet for example), you will need to use the range
argument.
Warning: Note that the range
argument is only available if you have the library xlwings
installed (Windows).
# the 'range' argument must be used to load data not starting at cell A1.
# This is useful when there is several arrays stored in the same sheet
births = read_excel(filepath_excel, sheet='pop_births_deaths', range='A9:E15')
Using open_excel
, ranges are passed in brackets:
with open_excel(filepath_excel) as wb:
# store sheet 'pop_births_deaths' in a temporary variable sh
sh = wb['pop_births_deaths']
# load the array pop from range A1:E7
pop = sh['A1:E7'].load()
# load the array births from range A9:E15
births = sh['A9:E15'].load()
# load the array deaths from range A17:E23
deaths = sh['A17:E23'].load()
# the Workbook is automatically closed when getting out the block defined by the with statement
When exporting arrays to Excel files, data is written starting at cell A1
by default. Using the position
argument of the to_excel
method, it is possible to specify the top left cell of the dumped data. This can be useful when you want to export several arrays in the same sheet for example
Warning: Note that the position
argument is only available if you have the library xlwings
installed (Windows).
filename = 'population.xlsx'
sheetname = 'pop_births_deaths'
# save the arrays pop, births and deaths in the same sheet 'pop_births_and_deaths'.
# The 'position' argument is used to shift the location of the second and third arrays to be dumped
pop.to_excel(filename, sheetname)
births.to_excel(filename, sheetname, position='A9')
deaths.to_excel(filename, sheetname, position='A17')
Using open_excel
, the position is passed in brackets (this allows you to also add extra informations):
with open_excel('population.xlsx') as wb:
# add a new sheet 'pop_births_deaths' and write 'population' in the first cell
# note: you can use wb['new_sheet_name'] = '' to create an empty sheet
wb['pop_births_deaths'] = 'population'
# store sheet 'pop_births_deaths' in a temporary variable sh
sh = wb['pop_births_deaths']
# dump the array pop in sheet 'pop_births_deaths' starting at cell A2
sh['A2'] = pop.dump()
# add 'births' in cell A10
sh['A10'] = 'births'
# dump the array births in sheet 'pop_births_deaths' starting at cell A11
sh['A11'] = births.dump()
# add 'deaths' in cell A19
sh['A19'] = 'deaths'
# dump the array deaths in sheet 'pop_births_deaths' starting at cell A20
sh['A20'] = deaths.dump()
# don't forget to call save()
wb.save()
# the Workbook is automatically closed when getting out the block defined by the with statement
Exporting data without headers (Excel)¶
For some reasons, you may want to export only the data of an array without axes. For example, you may want to insert a new column containing extra information. As an exercise, let us consider we want to add the capital city for each country present in the array containing the total population by country:
country |
capital city |
2013 |
2014 |
2015 |
---|---|---|---|---|
Belgium |
Brussels |
11137974 |
11180840 |
11237274 |
France |
Paris |
65600350 |
65942267 |
66456279 |
Germany |
Berlin |
80523746 |
80767463 |
81197537 |
Assuming you have prepared an excel sheet as below:
country |
capital city |
2013 |
2014 |
2015 |
---|---|---|---|---|
Belgium |
Brussels |
|||
France |
Paris |
|||
Germany |
Berlin |
you can then dump the data at right place by setting the header
argument of to_excel
to False and specifying the position of the data in sheet:
pop_by_country = pop.sum('gender')
# export only the data of the array pop_by_country starting at cell C2
pop_by_country.to_excel('population.xlsx', 'pop_by_country', header=False, position='C2')
Using open_excel
, you can easily prepare the sheet and then export only data at the right place by either setting the header
argument of the dump
method to False or avoiding to call dump
:
with open_excel('population.xlsx') as wb:
# create new empty sheet 'pop_by_country'
wb['pop_by_country'] = ''
# store sheet 'pop_by_country' in a temporary variable sh
sh = wb['pop_by_country']
# write extra information (description)
sh['A1'] = 'Population at 1st January by country'
# export column names
sh['A2'] = ['country', 'capital city']
sh['C2'] = pop_by_country.time.labels
# export countries as first column
sh['A3'].options(transpose=True).value = pop_by_country.country.labels
# export capital cities as second column
sh['B3'].options(transpose=True).value = ['Brussels', 'Paris', 'Berlin']
# export only data of pop_by_country
sh['C3'] = pop_by_country.dump(header=False)
# or equivalently
sh['C3'] = pop_by_country
# don't forget to call save()
wb.save()
# the Workbook is automatically closed when getting out the block defined by the with statement
Specifying the Number of Axes at Reading (CSV, Excel)¶
By default, read_csv
and read_excel
will search the position of the first cell containing the special character \
in the header line in order to determine the number of axes of the array to read. The special character \
is used to separate the name of the two last axes. If there is no special character \
, read_csv
and read_excel
will consider that the array to read has only one dimension. For an array stored as:
country |
gender \ time |
2013 |
2014 |
2015 |
---|---|---|---|---|
Belgium |
Male |
5472856 |
5493792 |
5524068 |
Belgium |
Female |
5665118 |
5687048 |
5713206 |
France |
Male |
31772665 |
31936596 |
32175328 |
France |
Female |
33827685 |
34005671 |
34280951 |
Germany |
Male |
39380976 |
39556923 |
39835457 |
Germany |
Female |
41142770 |
41210540 |
41362080 |
read_csv
and read_excel
will find the special character \
in the second cell meaning it expects three axes (country, gender and time).
Sometimes, you need to read an array for which the name of the last axis is implicit:
country |
gender |
2013 |
2014 |
2015 |
---|---|---|---|---|
Belgium |
Male |
5472856 |
5493792 |
5524068 |
Belgium |
Female |
5665118 |
5687048 |
5713206 |
France |
Male |
31772665 |
31936596 |
32175328 |
France |
Female |
33827685 |
34005671 |
34280951 |
Germany |
Male |
39380976 |
39556923 |
39835457 |
Germany |
Female |
41142770 |
41210540 |
41362080 |
For such case, you will have to inform read_csv
and read_excel
of the number of axes of the output array by setting the nb_axes
argument:
[16]:
# read the 3 x 2 x 3 array stored in the file 'pop_missing_axis_name.csv' wihout using 'nb_axes' argument.
pop = read_csv(csv_dir + '/pop_missing_axis_name.csv')
# shape and data type of the output array are not what we expected
pop.info
[16]:
6 x 4
country [6]: 'Belgium' 'Belgium' 'France' 'France' 'Germany' 'Germany'
{1} [4]: 'gender' '2013' '2014' '2015'
dtype: object
memory used: 192 bytes
[17]:
# by setting the 'nb_axes' argument, you can indicate to read_csv the number of axes of the output array
pop = read_csv(csv_dir + '/pop_missing_axis_name.csv', nb_axes=3)
# give a name to the last axis
pop = pop.rename(-1, 'time')
# shape and data type of the output array are what we expected
pop.info
[17]:
3 x 2 x 3
country [3]: 'Belgium' 'France' 'Germany'
gender [2]: 'Male' 'Female'
time [3]: 2013 2014 2015
dtype: int64
memory used: 144 bytes
[18]:
# same for the read_excel function
pop = read_excel(filepath_excel, sheet='pop_missing_axis_name', nb_axes=3)
pop = pop.rename(-1, 'time')
pop.info
[18]:
3 x 2 x 3
country [3]: 'Belgium' 'France' 'Germany'
gender [2]: 'Male' 'Female'
time [3]: 2013 2014 2015
dtype: int64
memory used: 144 bytes
NaNs and Missing Data Handling at Reading (CSV, Excel)¶
Sometimes, there is no data available for some label combinations. In the example below, the rows corresponding to France - Male
and Germany - Female
are missing:
country |
gender \ time |
2013 |
2014 |
2015 |
---|---|---|---|---|
Belgium |
Male |
5472856 |
5493792 |
5524068 |
Belgium |
Female |
5665118 |
5687048 |
5713206 |
France |
Female |
33827685 |
34005671 |
34280951 |
Germany |
Male |
39380976 |
39556923 |
39835457 |
By default, read_csv
and read_excel
will fill cells associated with missing label combinations with nans. Be aware that, in that case, an int array will be converted to a float array.
[19]:
# by default, cells associated will missing label combinations are filled with nans.
# In that case, the output array is converted to a float array
read_csv(csv_dir + '/pop_missing_values.csv')
[19]:
country gender\time 2013 2014 2015
Belgium Male 5472856.0 5493792.0 5524068.0
Belgium Female 5665118.0 5687048.0 5713206.0
France Male nan nan nan
France Female 33827685.0 34005671.0 34280951.0
Germany Male 39380976.0 39556923.0 39835457.0
Germany Female nan nan nan
However, it is possible to choose which value to use to fill missing cells using the fill_value
argument:
[20]:
read_csv(csv_dir + '/pop_missing_values.csv', fill_value=0)
[20]:
country gender\time 2013 2014 2015
Belgium Male 5472856 5493792 5524068
Belgium Female 5665118 5687048 5713206
France Male 0 0 0
France Female 33827685 34005671 34280951
Germany Male 39380976 39556923 39835457
Germany Female 0 0 0
[21]:
# same for the read_excel function
read_excel(filepath_excel, sheet='pop_missing_values', fill_value=0)
[21]:
country gender\time 2013 2014 2015
Belgium Male 5472856 5493792 5524068
Belgium Female 5665118 5687048 5713206
France Male 0 0 0
France Female 33827685 34005671 34280951
Germany Male 39380976 39556923 39835457
Germany Female 0 0 0
Sorting Axes at Reading (CSV, Excel, HDF5)¶
The sort_rows
and sort_columns
arguments of the reading functions allows you to sort rows and columns alphabetically:
[22]:
# sort labels at reading --> Male and Female labels are inverted
read_csv(csv_dir + '/pop.csv', sort_rows=True)
[22]:
country gender\time 2013 2014 2015
Belgium Female 5665118 5687048 5713206
Belgium Male 5472856 5493792 5524068
France Female 33827685 34005671 34280951
France Male 31772665 31936596 32175328
Germany Female 41142770 41210540 41362080
Germany Male 39380976 39556923 39835457
[23]:
read_excel(filepath_excel, sheet='births', sort_rows=True)
[23]:
country gender\time 2013 2014 2015
Belgium Female 61235 60841 59713
Belgium Male 64371 64173 62561
France Female 396581 400607 390526
France Male 415762 418721 409145
Germany Female 332249 348092 359097
Germany Male 349820 366835 378478
[24]:
read_hdf(filepath_hdf, key='deaths', sort_rows=True)
[24]:
country gender\time 2013 2014 2015
Belgium Female 55426 53176 56910
Belgium Male 53908 51579 53631
France Female 281955 277054 296779
France Male 287410 282381 297028
Germany Female 464180 446131 475688
Germany Male 429645 422225 449512
Metadata (HDF5)¶
Since the version 0.29 of LArray, it is possible to add metadata to arrays:
[25]:
pop.meta.title = 'Population at 1st January'
pop.meta.origin = 'Table demo_jpan from Eurostat'
pop.info
[25]:
title: Population at 1st January
origin: Table demo_jpan from Eurostat
3 x 2 x 3
country [3]: 'Belgium' 'France' 'Germany'
gender [2]: 'Male' 'Female'
time [3]: 2013 2014 2015
dtype: int64
memory used: 144 bytes
These metadata are automatically saved and loaded when working with the HDF5 file format:
[26]:
pop.to_hdf('population.h5', 'pop')
new_pop = read_hdf('population.h5', 'pop')
new_pop.info
[26]:
title: Population at 1st January
origin: Table demo_jpan from Eurostat
3 x 2 x 3
country [3]: 'Belgium' 'France' 'Germany'
gender [2]: 'Male' 'Female'
time [3]: 2013 2014 2015
dtype: int64
memory used: 144 bytes
Warning: Currently, metadata associated with arrays cannot be saved and loaded when working with CSV and Excel files. This restriction does not apply however to metadata associated with sessions.
Loading and Dumping Sessions¶
One of the main advantages of grouping arrays, axes and groups in session objects is that you can load and save all of them in one shot. Like arrays, it is possible to associate metadata to a session. These can be saved and loaded in all file formats.
Loading Sessions (CSV, Excel, HDF5)¶
To load the items of a session, you have two options:
Instantiate a new session and pass the path to the Excel/HDF5 file or to the directory containing CSV files to the Session constructor:
[27]:
# create a new Session object and load all arrays, axes, groups and metadata
# from all CSV files located in the passed directory
csv_dir = get_example_filepath('population_session')
session = Session(csv_dir)
# create a new Session object and load all arrays, axes, groups and metadata
# stored in the passed Excel file
filepath_excel = get_example_filepath('population_session.xlsx')
session = Session(filepath_excel)
# create a new Session object and load all arrays, axes, groups and metadata
# stored in the passed HDF5 file
filepath_hdf = get_example_filepath('population_session.h5')
session = Session(filepath_hdf)
print(session.summary())
country: country ['Belgium' 'France' 'Germany'] (3)
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015] (3)
even_years: time['2014'] >> even_years (1)
odd_years: time[2013 2015] >> odd_years (2)
births: country, gender, time (3 x 2 x 3) [int32]
deaths: country, gender, time (3 x 2 x 3) [int32]
pop: country, gender, time (3 x 2 x 3) [int32]
Call the
load
method on an existing session and pass the path to the Excel/HDF5 file or to the directory containing CSV files as first argument:
[28]:
# create a session containing 3 axes, 2 groups and one array 'pop'
filepath = get_example_filepath('pop_only.xlsx')
session = Session(filepath)
print(session.summary())
country: country ['Belgium' 'France' 'Germany'] (3)
gender: gender ['Male' 'Female' nan] (3)
time: time [2013 2014 2015] (3)
even_years: time[ 2014. nan] >> even_years (2)
odd_years: time[2013 2015] >> odd_years (2)
pop: country, gender, time (3 x 2 x 3) [int64]
[29]:
# call the load method on the previous session and add the 'births' and 'deaths' arrays to it
filepath = get_example_filepath('births_and_deaths.xlsx')
session.load(filepath)
print(session.summary())
country: country ['Belgium' 'France' 'Germany'] (3)
gender: gender ['Male' 'Female' nan] (3)
time: time [2013 2014 2015] (3)
even_years: time[ 2014. nan] >> even_years (2)
odd_years: time[2013 2015] >> odd_years (2)
pop: country, gender, time (3 x 2 x 3) [int64]
births: country, gender, time (3 x 2 x 3) [int64]
deaths: country, gender, time (3 x 2 x 3) [int64]
The load
method offers some options:
Using the
names
argument, you can specify which items to load:
[30]:
session = Session()
# use the names argument to only load births and deaths arrays
session.load(filepath_hdf, names=['births', 'deaths'])
print(session.summary())
births: country, gender, time (3 x 2 x 3) [int32]
deaths: country, gender, time (3 x 2 x 3) [int32]
Setting the
display
argument to True, theload
method will print a message each time a new item is loaded:
[31]:
session = Session()
# with display=True, the load method will print a message
# each time a new item is loaded
session.load(filepath_hdf, display=True)
opening /home/docs/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/tests/data/population_session.h5
loading Axis object country ... done
loading Axis object gender ... done
loading Axis object time ... done
loading Group object even_years ... done
loading Group object odd_years ... done
loading Array object births ... done
loading Array object deaths ... done
loading Array object pop ... done
Dumping Sessions (CSV, Excel, HDF5)¶
To save a session, you need to call the save
method. The first argument is the path to a Excel/HDF5 file or to a directory if items are saved to CSV files:
[32]:
# save items of a session in CSV files.
# Here, the save method will create a 'population' directory in which CSV files will be written
session.save('population')
# save session to an HDF5 file
session.save('population.h5')
# save session to an Excel file
session.save('population.xlsx')
# load session saved in 'population.h5' to see its content
Session('population.h5')
[32]:
Session(country, gender, time, even_years, odd_years, births, deaths, pop)
Note: Concerning the CSV and Excel formats:
all Axis objects are saved together in the same Excel sheet (CSV file) named
__axes__(.csv)
all Group objects are saved together in the same Excel sheet (CSV file) named
__groups__(.csv)
metadata is saved in one Excel sheet (CSV file) named
__metadata__(.csv)
These sheet (CSV file) names cannot be changed.
The save
method has several arguments:
Using the
names
argument, you can specify which items to save:
[33]:
# use the names argument to only save births and deaths arrays
session.save('population.h5', names=['births', 'deaths'])
# load session saved in 'population.h5' to see its content
Session('population.h5')
[33]:
Session(births, deaths)
By default, dumping a session to an Excel or HDF5 file will overwrite it. By setting the
overwrite
argument to False, you can choose to update the existing Excel or HDF5 file:
[34]:
pop = read_csv('./population/pop.csv')
ses_pop = Session([('pop', pop)])
# by setting overwrite to False, the destination file is updated instead of overwritten.
# The items already stored in the file but not present in the session are left intact.
# On the contrary, the items that exist in both the file and the session are completely overwritten.
ses_pop.save('population.h5', overwrite=False)
# load session saved in 'population.h5' to see its content
Session('population.h5')
[34]:
Session(births, deaths, pop)
Setting the
display
argument to True, thesave
method will print a message each time an item is dumped:
[35]:
# with display=True, the save method will print a message
# each time an item is dumped
session.save('population.h5', display=True)
dumping country ... done
dumping gender ... done
dumping time ... done
dumping even_years ... done
dumping odd_years ... done
dumping births ... done
dumping deaths ... done
dumping pop ... done
Transforming Arrays (Relabeling, Renaming, Reordering, Combining, Extending, Sorting, …)¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31-dev'
Manipulating axes¶
[4]:
# let's start with
pop = load_example_data('demography').pop[2016, 'BruCap', 90:95]
pop
[4]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
Relabeling¶
Replace all labels of one axis
[5]:
# returns a copy by default
pop_new_labels = pop.set_labels('sex', ['Men', 'Women'])
pop_new_labels
[5]:
age sex\nat BE FO
90 Men 539 74
90 Women 1477 136
91 Men 499 49
91 Women 1298 105
92 Men 332 35
92 Women 1141 78
93 Men 287 27
93 Women 906 74
94 Men 237 23
94 Women 739 65
95 Men 154 19
95 Women 566 53
[6]:
# inplace flag avoids to create a copy
pop.set_labels('sex', ['M', 'F'], inplace=True)
[6]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
Renaming axes¶
Rename one axis
[7]:
pop.info
[7]:
6 x 2 x 2
age [6]: 90 91 92 93 94 95
sex [2]: 'M' 'F'
nat [2]: 'BE' 'FO'
dtype: int64
memory used: 192 bytes
[8]:
# 'rename' returns a copy of the array
pop2 = pop.rename('sex', 'gender')
pop2
[8]:
age gender\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
Rename several axes at once
[9]:
# No x. here because sex and nat are keywords and not actual axes
pop2 = pop.rename(sex='gender', nat='nationality')
pop2
[9]:
age gender\nationality BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
Reordering axes¶
Axes can be reordered using transpose
method. By default, transpose reverse axes, otherwise it permutes the axes according to the list given as argument. Axes not mentioned come after those which are mentioned(and keep their relative order). Finally, transpose returns a copy of the array.
[10]:
# starting order : age, sex, nat
pop
[10]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
[11]:
# no argument --> reverse axes
pop.transpose()
# .T is a shortcut for .transpose()
pop.T
[11]:
nat sex\age 90 91 92 93 94 95
BE M 539 499 332 287 237 154
BE F 1477 1298 1141 906 739 566
FO M 74 49 35 27 23 19
FO F 136 105 78 74 65 53
[12]:
# reorder according to list
pop.transpose('age', 'nat', 'sex')
[12]:
age nat\sex M F
90 BE 539 1477
90 FO 74 136
91 BE 499 1298
91 FO 49 105
92 BE 332 1141
92 FO 35 78
93 BE 287 906
93 FO 27 74
94 BE 237 739
94 FO 23 65
95 BE 154 566
95 FO 19 53
[13]:
# axes not mentioned come after those which are mentioned (and keep their relative order)
pop.transpose('sex')
[13]:
sex age\nat BE FO
M 90 539 74
M 91 499 49
M 92 332 35
M 93 287 27
M 94 237 23
M 95 154 19
F 90 1477 136
F 91 1298 105
F 92 1141 78
F 93 906 74
F 94 739 65
F 95 566 53
Combining arrays¶
Append/Prepend¶
Append/prepend one element to an axis of an array
[14]:
pop = load_example_data('demography').pop[2016, 'BruCap', 90:95]
# imagine that you have now acces to the number of non-EU foreigners
data = [[25, 54], [15, 33], [12, 28], [11, 37], [5, 21], [7, 19]]
pop_non_eu = LArray(data, pop['FO'].axes)
# you can do something like this
pop = pop.append('nat', pop_non_eu, 'NEU')
pop
[14]:
age sex\nat BE FO NEU
90 M 539 74 25
90 F 1477 136 54
91 M 499 49 15
91 F 1298 105 33
92 M 332 35 12
92 F 1141 78 28
93 M 287 27 11
93 F 906 74 37
94 M 237 23 5
94 F 739 65 21
95 M 154 19 7
95 F 566 53 19
[15]:
# you can also add something at the start of an axis
pop = pop.prepend('sex', pop.sum('sex'), 'B')
pop
[15]:
age sex\nat BE FO NEU
90 B 2016 210 79
90 M 539 74 25
90 F 1477 136 54
91 B 1797 154 48
91 M 499 49 15
91 F 1298 105 33
92 B 1473 113 40
92 M 332 35 12
92 F 1141 78 28
93 B 1193 101 48
93 M 287 27 11
93 F 906 74 37
94 B 976 88 26
94 M 237 23 5
94 F 739 65 21
95 B 720 72 26
95 M 154 19 7
95 F 566 53 19
The value being appended/prepended can have missing (or even extra) axes as long as common axes are compatible
[16]:
aliens = zeros(pop.axes['sex'])
aliens
[16]:
sex B M F
0.0 0.0 0.0
[17]:
pop = pop.append('nat', aliens, 'AL')
pop
[17]:
age sex\nat BE FO NEU AL
90 B 2016.0 210.0 79.0 0.0
90 M 539.0 74.0 25.0 0.0
90 F 1477.0 136.0 54.0 0.0
91 B 1797.0 154.0 48.0 0.0
91 M 499.0 49.0 15.0 0.0
91 F 1298.0 105.0 33.0 0.0
92 B 1473.0 113.0 40.0 0.0
92 M 332.0 35.0 12.0 0.0
92 F 1141.0 78.0 28.0 0.0
93 B 1193.0 101.0 48.0 0.0
93 M 287.0 27.0 11.0 0.0
93 F 906.0 74.0 37.0 0.0
94 B 976.0 88.0 26.0 0.0
94 M 237.0 23.0 5.0 0.0
94 F 739.0 65.0 21.0 0.0
95 B 720.0 72.0 26.0 0.0
95 M 154.0 19.0 7.0 0.0
95 F 566.0 53.0 19.0 0.0
Extend¶
Extend an array along an axis with another array with that axis (but other labels)
[18]:
_pop = load_example_data('demography').pop
pop = _pop[2016, 'BruCap', 90:95]
pop_next = _pop[2016, 'BruCap', 96:100]
# concatenate along age axis
pop.extend('age', pop_next)
[18]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
96 M 80 9
96 F 327 25
97 M 43 9
97 F 171 21
98 M 23 4
98 F 135 9
99 M 20 2
99 F 92 8
100 M 12 0
100 F 60 3
Stack¶
Stack several arrays together to create an entirely new dimension
[19]:
# imagine you have loaded data for each nationality in different arrays (e.g. loaded from different Excel sheets)
pop_be, pop_fo = pop['BE'], pop['FO']
# first way to stack them
nat = Axis('nat=BE,FO,NEU')
pop = stack([pop_be, pop_fo, pop_non_eu], nat)
# second way
pop = stack([('BE', pop_be), ('FO', pop_fo), ('NEU', pop_non_eu)], 'nat')
pop
[19]:
age sex\nat BE FO NEU
90 M 539 74 25
90 F 1477 136 54
91 M 499 49 15
91 F 1298 105 33
92 M 332 35 12
92 F 1141 78 28
93 M 287 27 11
93 F 906 74 37
94 M 237 23 5
94 F 739 65 21
95 M 154 19 7
95 F 566 53 19
Sorting¶
Sort an axis (alphabetically if labels are strings)
[20]:
pop_sorted = pop.sort_axes('nat')
pop_sorted
[20]:
age sex\nat BE FO NEU
90 M 539 74 25
90 F 1477 136 54
91 M 499 49 15
91 F 1298 105 33
92 M 332 35 12
92 F 1141 78 28
93 M 287 27 11
93 F 906 74 37
94 M 237 23 5
94 F 739 65 21
95 M 154 19 7
95 F 566 53 19
Give labels which would sort the axis
[21]:
pop_sorted.labelsofsorted('sex')
[21]:
age sex\nat BE FO NEU
90 0 M M M
90 1 F F F
91 0 M M M
91 1 F F F
92 0 M M M
92 1 F F F
93 0 M M M
93 1 F F F
94 0 M M M
94 1 F F F
95 0 M M M
95 1 F F F
Sort according to values
[22]:
pop_sorted.sort_values((90, 'F'))
[22]:
age sex\nat NEU FO BE
90 M 25 74 539
90 F 54 136 1477
91 M 15 49 499
91 F 33 105 1298
92 M 12 35 332
92 F 28 78 1141
93 M 11 27 287
93 F 37 74 906
94 M 5 23 237
94 F 21 65 739
95 M 7 19 154
95 F 19 53 566
Indexing, Selecting and Assigning¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31-dev'
Import the test array pop
:
[4]:
# let's start with
pop = load_example_data('demography').pop
pop
[4]:
time geo age sex\nat BE FO
1991 BruCap 0 M 4182 2377
1991 BruCap 0 F 4052 2188
1991 BruCap 1 M 3904 2316
1991 BruCap 1 F 3769 2241
1991 BruCap 2 M 3790 2365
... ... ... ... ... ...
2016 Wal 118 F 0 0
2016 Wal 119 M 0 0
2016 Wal 119 F 0 0
2016 Wal 120 M 0 0
2016 Wal 120 F 0 0
Selecting (Subsets)¶
LArray allows to select a subset of an array either by labels or indices (positions)
Selecting by Labels¶
To take a subset of an array using labels, use brackets [ ].
Let’s start by selecting a single element:
[5]:
# here we select the value associated with Belgian women
# of age 50 from Brussels region for the year 2015
pop[2015, 'BruCap', 50, 'F', 'BE']
[5]:
4813
Continue with selecting a subset using slices and lists of labels
[6]:
# here we select the subset associated with Belgian women of age 50, 51 and 52
# from Brussels region for the years 2010 to 2016
pop[2010:2016, 'BruCap', 50:52, 'F', 'BE']
[6]:
time\age 50 51 52
2010 4869 4811 4699
2011 5015 4860 4792
2012 4722 5014 4818
2013 4711 4727 5007
2014 4788 4702 4730
2015 4813 4767 4676
2016 4814 4792 4740
[7]:
# slices bounds are optional:
# if not given start is assumed to be the first label and stop is the last one.
# Here we select all years starting from 2010
pop[2010:, 'BruCap', 50:52, 'F', 'BE']
[7]:
time\age 50 51 52
2010 4869 4811 4699
2011 5015 4860 4792
2012 4722 5014 4818
2013 4711 4727 5007
2014 4788 4702 4730
2015 4813 4767 4676
2016 4814 4792 4740
[8]:
# Slices can also have a step (defaults to 1), to take every Nth labels
# Here we select all even years starting from 2010
pop[2010::2, 'BruCap', 50:52, 'F', 'BE']
[8]:
time\age 50 51 52
2010 4869 4811 4699
2012 4722 5014 4818
2014 4788 4702 4730
2016 4814 4792 4740
[9]:
# one can also use list of labels to take non-contiguous labels.
# Here we select years 2008, 2010, 2013 and 2015
pop[[2008, 2010, 2013, 2015], 'BruCap', 50:52, 'F', 'BE']
[9]:
time\age 50 51 52
2008 4731 4735 4724
2010 4869 4811 4699
2013 4711 4727 5007
2015 4813 4767 4676
The order of indexing does not matter either, so you usually do not care/have to remember about axes positions during computation. It only matters for output.
[10]:
# order of index doesn't matter
pop['F', 'BE', 'BruCap', [2008, 2010, 2013, 2015], 50:52]
[10]:
time\age 50 51 52
2008 4731 4735 4724
2010 4869 4811 4699
2013 4711 4727 5007
2015 4813 4767 4676
Warning: Selecting by labels as above works well as long as there is no ambiguity. When two or more axes have common labels, it may lead to a crash. The solution is then to precise to which axis belong the labels.
[11]:
# let us now create an array with the same labels on several axes
age, weight, size = Axis('age=0..80'), Axis('weight=0..120'), Axis('size=0..200')
arr_ws = ndtest([age, weight, size])
[12]:
# let's try to select teenagers with size between 1 m 60 and 1 m 65 and weight > 80 kg.
# In this case the subset is ambiguous and this results in an error:
arr_ws[10:18, :80, 160:165]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-139cd48d3ba8> in <module>
1 # let's try to select teenagers with size between 1 m 60 and 1 m 65 and weight > 80 kg.
2 # In this case the subset is ambiguous and this results in an error:
----> 3 arr_ws[10:18, :80, 160:165]
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/array.py in __getitem__(self, key, collapse_slices, translate_key)
2088 # FIXME: I have a huge problem with boolean axis labels + non points
2089 raw_broadcasted_key, res_axes, transpose_indices = self.axes._key_to_raw_and_axes(key, collapse_slices,
-> 2090 translate_key)
2091 res_data = data[raw_broadcasted_key]
2092 if res_axes:
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _key_to_raw_and_axes(self, key, collapse_slices, translate_key)
2806
2807 if translate_key:
-> 2808 key = self._translated_key(key)
2809 assert isinstance(key, tuple) and len(key) == self.ndim
2810
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _translated_key(self, key)
2766 """
2767 # any key -> (IGroup, IGroup, ...)
-> 2768 igroup_key = self._key_to_igroups(key)
2769
2770 # extract axis from Group keys
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _key_to_igroups(self, key)
2746
2747 # translate all keys to IGroup
-> 2748 return tuple(self._translate_axis_key(axis_key) for axis_key in key)
2749
2750 def _translated_key(self, key):
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in <genexpr>(.0)
2746
2747 # translate all keys to IGroup
-> 2748 return tuple(self._translate_axis_key(axis_key) for axis_key in key)
2749
2750 def _translated_key(self, key):
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _translate_axis_key(self, axis_key)
2686 return self._translate_axis_key_chunk(axis_key)
2687 else:
-> 2688 return self._translate_axis_key_chunk(axis_key)
2689
2690 def _key_to_igroups(self, key):
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _translate_axis_key_chunk(self, axis_key)
2618 valid_axes = ', '.join(a.name if a.name is not None else '{{{}}}'.format(self.index(a))
2619 for a in valid_axes)
-> 2620 raise ValueError('%s is ambiguous (valid in %s)' % (axis_key, valid_axes))
2621 return valid_axes[0].i[axis_pos_key]
2622
ValueError: slice(10, 18, None) is ambiguous (valid in age, weight, size)
[13]:
# the solution is simple. You need to precise the axes on which you make a selection
arr_ws[age[10:18], weight[:80], size[160:165]]
[13]:
age weight\size 160 161 162 163 164 165
10 0 243370 243371 243372 243373 243374 243375
10 1 243571 243572 243573 243574 243575 243576
10 2 243772 243773 243774 243775 243776 243777
10 3 243973 243974 243975 243976 243977 243978
10 4 244174 244175 244176 244177 244178 244179
... ... ... ... ... ... ... ...
18 76 453214 453215 453216 453217 453218 453219
18 77 453415 453416 453417 453418 453419 453420
18 78 453616 453617 453618 453619 453620 453621
18 79 453817 453818 453819 453820 453821 453822
18 80 454018 454019 454020 454021 454022 454023
Ambiguous Cases - Specifying Axes Using The Special Variable X¶
When selecting, assiging or using aggregate functions, an axis can be refered via the special variable X
:
pop[X.age[:20]]
pop.sum(X.age)
This gives you acces to axes of the array you are manipulating. The main drawback of using X
is that you lose the autocompletion available from many editors. It only works with non-anonymous axes for which names do not contain whitespaces or special characters.
[14]:
# the previous example could have been also written as
arr_ws[X.age[10:18], X.weight[:80], X.size[160:165]]
[14]:
age weight\size 160 161 162 163 164 165
10 0 243370 243371 243372 243373 243374 243375
10 1 243571 243572 243573 243574 243575 243576
10 2 243772 243773 243774 243775 243776 243777
10 3 243973 243974 243975 243976 243977 243978
10 4 244174 244175 244176 244177 244178 244179
... ... ... ... ... ... ... ...
18 76 453214 453215 453216 453217 453218 453219
18 77 453415 453416 453417 453418 453419 453420
18 78 453616 453617 453618 453619 453620 453621
18 79 453817 453818 453819 453820 453821 453822
18 80 454018 454019 454020 454021 454022 454023
Selecting by Indices¶
Sometimes it is more practical to use indices (positions) along the axis, instead of labels. You need to add the character i
before the brackets: .i[indices]
. As for selection with labels, you can use a single index, a slice or a list of indices. Indices can be also negative (-1 represent the last element of an axis).
Note: Remember that indices (positions) are always 0-based in Python. So the first element is at index 0, the second is at index 1, etc.
[15]:
# here we select the subset associated with Belgian women of age 50, 51 and 52
# from Brussels region for the first 3 years
pop[X.time.i[:3], 'BruCap', 50:52, 'F', 'BE']
[15]:
time\age 50 51 52
1991 3739 4138 4101
1992 3373 3665 4088
1993 3648 3335 3615
[16]:
# same but for the last 3 years
pop[X.time.i[-3:], 'BruCap', 50:52, 'F', 'BE']
[16]:
time\age 50 51 52
2014 4788 4702 4730
2015 4813 4767 4676
2016 4814 4792 4740
[17]:
# using list of indices
pop[X.time.i[-9,-7,-4,-2], 'BruCap', 50:52, 'F', 'BE']
[17]:
time\age 50 51 52
2008 4731 4735 4724
2010 4869 4811 4699
2013 4711 4727 5007
2015 4813 4767 4676
Warning: The end indice (position) is EXCLUSIVE while the end label is INCLUSIVE.
[18]:
# with labels (3 is included)
pop[2015, 'BruCap', X.age[:3], 'F', 'BE']
[18]:
age 0 1 2 3
6020 5882 6023 5861
[19]:
# with indices (3 is out)
pop[2015, 'BruCap', X.age.i[:3], 'F', 'BE']
[19]:
age 0 1 2
6020 5882 6023
You can use .i[]
selection directly on array instead of axes. In this context, if you want to select a subset of the first and third axes for example, you must use a full slice :
for the second one.
[20]:
# here we select the last year and first 3 ages
# equivalent to: pop.i[-1, :, :3, :, :]
pop.i[-1, :, :3]
[20]:
geo age sex\nat BE FO
BruCap 0 M 6155 3104
BruCap 0 F 5900 2817
BruCap 1 M 6165 3068
BruCap 1 F 5916 2946
BruCap 2 M 6053 2918
BruCap 2 F 5736 2776
Fla 0 M 29993 3717
Fla 0 F 28483 3587
Fla 1 M 31292 3716
Fla 1 F 29721 3575
Fla 2 M 31718 3597
Fla 2 F 30353 3387
Wal 0 M 17869 1472
Wal 0 F 17242 1454
Wal 1 M 18820 1432
Wal 1 F 17604 1443
Wal 2 M 19076 1444
Wal 2 F 18189 1358
Using Groups In Selections¶
[21]:
teens = pop.age[10:20]
pop[2015, 'BruCap', teens, 'F', 'BE']
[21]:
age 10 11 12 13 14 15 16 17 18 19 20
5124 4865 4758 4807 4587 4593 4429 4466 4517 4461 4464
Assigning subsets¶
Assigning A Value¶
Assign a value to a subset
[22]:
# let's take a smaller array
pop = load_example_data('demography').pop[2016, 'BruCap', 100:105]
pop2 = pop
pop2
[22]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 8 0
102 F 26 1
103 M 2 1
103 F 17 2
104 M 2 1
104 F 14 0
105 M 0 0
105 F 2 2
[23]:
# set all data corresponding to age >= 102 to 0
pop2[102:] = 0
pop2
[23]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 0 0
102 F 0 0
103 M 0 0
103 F 0 0
104 M 0 0
104 F 0 0
105 M 0 0
105 F 0 0
One very important gotcha though…
Warning: Modifying a slice of an array in-place like we did above should be done with care otherwise you could have unexpected effects. The reason is that taking a slice subset of an array does not return a copy of that array, but rather a view on that array. To avoid such behavior, use .copy()
method.
Remember:
taking a slice subset of an array is extremely fast (no data is copied)
if one modifies that subset in-place, one also modifies the original array
.copy() returns a copy of the subset (takes speed and memory) but allows you to change the subset without modifying the original array in the same time
[24]:
# indeed, data from the original array have also changed
pop
[24]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 0 0
102 F 0 0
103 M 0 0
103 F 0 0
104 M 0 0
104 F 0 0
105 M 0 0
105 F 0 0
[25]:
# the right way
pop = load_example_data('demography').pop[2016, 'BruCap', 100:105]
pop2 = pop.copy()
pop2[102:] = 0
pop2
[25]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 0 0
102 F 0 0
103 M 0 0
103 F 0 0
104 M 0 0
104 F 0 0
105 M 0 0
105 F 0 0
[26]:
# now, data from the original array have not changed this time
pop
[26]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 8 0
102 F 26 1
103 M 2 1
103 F 17 2
104 M 2 1
104 F 14 0
105 M 0 0
105 F 2 2
Assigning Arrays And Broadcasting¶
Instead of a value, we can also assign an array to a subset. In that case, that array can have less axes than the target but those which are present must be compatible with the subset being targeted.
[27]:
sex, nat = Axis('sex=M,F'), Axis('nat=BE,FO')
new_value = LArray([[1, -1], [2, -2]],[sex, nat])
new_value
[27]:
sex\nat BE FO
M 1 -1
F 2 -2
[28]:
# this assigns 1, -1 to Belgian, Foreigner men
# and 2, -2 to Belgian, Foreigner women for all
# people older than 100
pop[102:] = new_value
pop
[28]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 1 -1
102 F 2 -2
103 M 1 -1
103 F 2 -2
104 M 1 -1
104 F 2 -2
105 M 1 -1
105 F 2 -2
Warning: The array being assigned must have compatible axes (i.e. same axes names and same labels) with the target subset.
[29]:
# assume we define the following array with shape 3 x 2 x 2
new_value = zeros(['age=100..102', sex, nat])
new_value
[29]:
age sex\nat BE FO
100 M 0.0 0.0
100 F 0.0 0.0
101 M 0.0 0.0
101 F 0.0 0.0
102 M 0.0 0.0
102 F 0.0 0.0
[30]:
# now let's try to assign the previous array in a subset from age 103 to 105
pop[103:105] = new_value
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-30-63d0ef0af080> in <module>
1 # now let's try to assign the previous array in a subset from age 103 to 105
----> 2 pop[103:105] = new_value
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/array.py in __setitem__(self, key, value, collapse_slices, translate_key)
2108 # TODO: the check_compatible should be included in broadcast_with
2109 value = value.broadcast_with(target_axes)
-> 2110 value.axes.check_compatible(target_axes)
2111
2112 # replace incomprehensible error message "could not broadcast input array from shape XX into shape YY"
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in check_compatible(self, axes)
1986 local_axis = self.get_by_pos(axis, i)
1987 if not local_axis.iscompatible(axis):
-> 1988 raise ValueError("incompatible axes:\n{!r}\nvs\n{!r}".format(axis, local_axis))
1989
1990 # XXX: deprecate method (functionality is duplicated in union)?
ValueError: incompatible axes:
Axis([103, 104, 105], 'age')
vs
Axis([100, 101, 102], 'age')
[31]:
# but this works
pop[100:102] = new_value
pop
[31]:
age sex\nat BE FO
100 M 0 0
100 F 0 0
101 M 0 0
101 F 0 0
102 M 0 0
102 F 0 0
103 M 1 -1
103 F 2 -2
104 M 1 -1
104 F 2 -2
105 M 1 -1
105 F 2 -2
Boolean Filtering¶
Boolean filtering can be use to extract subsets.
[32]:
#Let's focus on population living in Brussels during the year 2016
pop = load_example_data('demography').pop[2016, 'BruCap']
# here we select all males and females with age less than 5 and 10 respectively
subset = pop[((X.sex == 'H') & (X.age <= 5)) | ((X.sex == 'F') & (X.age <= 10))]
subset
[32]:
sex_age\nat BE FO
F_0 5900 2817
F_1 5916 2946
F_2 5736 2776
F_3 5883 2734
F_4 5784 2523
F_5 5780 2521
F_6 5759 2290
F_7 5518 2234
F_8 5474 2066
F_9 5354 1896
F_10 5200 1785
Note: Be aware that after boolean filtering, several axes may have merged.
[33]:
# 'age' and 'sex' axes have been merged together
subset.info
[33]:
11 x 2
sex_age [11]: 'F_0' 'F_1' 'F_2' ... 'F_8' 'F_9' 'F_10'
nat [2]: 'BE' 'FO'
dtype: int64
memory used: 176 bytes
This may be not what you because previous selections on merged axes are no longer valid
[34]:
# now let's try to calculate the proportion of females with age less than 10
subset['F'].sum() / pop['F'].sum()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-d9f443e5c9e1> in <module>
1 # now let's try to calculate the proportion of females with age less than 10
----> 2 subset['F'].sum() / pop['F'].sum()
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/array.py in __getitem__(self, key, collapse_slices, translate_key)
2088 # FIXME: I have a huge problem with boolean axis labels + non points
2089 raw_broadcasted_key, res_axes, transpose_indices = self.axes._key_to_raw_and_axes(key, collapse_slices,
-> 2090 translate_key)
2091 res_data = data[raw_broadcasted_key]
2092 if res_axes:
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _key_to_raw_and_axes(self, key, collapse_slices, translate_key)
2806
2807 if translate_key:
-> 2808 key = self._translated_key(key)
2809 assert isinstance(key, tuple) and len(key) == self.ndim
2810
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _translated_key(self, key)
2766 """
2767 # any key -> (IGroup, IGroup, ...)
-> 2768 igroup_key = self._key_to_igroups(key)
2769
2770 # extract axis from Group keys
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _key_to_igroups(self, key)
2746
2747 # translate all keys to IGroup
-> 2748 return tuple(self._translate_axis_key(axis_key) for axis_key in key)
2749
2750 def _translated_key(self, key):
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in <genexpr>(.0)
2746
2747 # translate all keys to IGroup
-> 2748 return tuple(self._translate_axis_key(axis_key) for axis_key in key)
2749
2750 def _translated_key(self, key):
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _translate_axis_key(self, axis_key)
2686 return self._translate_axis_key_chunk(axis_key)
2687 else:
-> 2688 return self._translate_axis_key_chunk(axis_key)
2689
2690 def _key_to_igroups(self, key):
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in _translate_axis_key_chunk(self, axis_key)
2612 continue
2613 if not valid_axes:
-> 2614 raise ValueError("%s is not a valid label for any axis" % axis_key)
2615 elif len(valid_axes) > 1:
2616 # TODO: make an AxisCollection.display_name(axis) method out of this
ValueError: F is not a valid label for any axis
Therefore, it is sometimes more useful to not select, but rather set to 0 (or another value) non matching elements
[35]:
subset = pop.copy()
subset[((X.sex == 'F') & (X.age > 10))] = 0
subset['F', :20]
[35]:
age\nat BE FO
0 5900 2817
1 5916 2946
2 5736 2776
3 5883 2734
4 5784 2523
5 5780 2521
6 5759 2290
7 5518 2234
8 5474 2066
9 5354 1896
10 5200 1785
11 0 0
12 0 0
13 0 0
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 0 0
[36]:
# now we can calculate the proportion of females with age less than 10
subset['F'].sum() / pop['F'].sum()
[36]:
0.14618110657051941
Boolean filtering can also mix axes and arrays. Example above could also have been written as
[37]:
age_limit = sequence('sex=M,F', initial=5, inc=5)
age_limit
[37]:
sex M F
5 10
[38]:
age = pop.axes['age']
(age <= age_limit)[:20]
[38]:
age\sex M F
0 True True
1 True True
2 True True
3 True True
4 True True
5 True True
6 False True
7 False True
8 False True
9 False True
10 False True
11 False False
12 False False
13 False False
14 False False
15 False False
16 False False
17 False False
18 False False
19 False False
20 False False
[39]:
subset = pop.copy()
subset[X.age > age_limit] = 0
subset['F'].sum() / pop['F'].sum()
[39]:
0.14618110657051941
Finally, you can choose to filter on data instead of axes
[40]:
# let's focus on females older than 90
subset = pop['F', 90:110].copy()
subset
[40]:
age\nat BE FO
90 1477 136
91 1298 105
92 1141 78
93 906 74
94 739 65
95 566 53
96 327 25
97 171 21
98 135 9
99 92 8
100 60 3
101 66 5
102 26 1
103 17 2
104 14 0
105 2 2
106 3 3
107 1 2
108 1 0
109 0 0
110 0 0
[41]:
# here we set to 0 all data < 10
subset[subset < 10] = 0
subset
[41]:
age\nat BE FO
90 1477 136
91 1298 105
92 1141 78
93 906 74
94 739 65
95 566 53
96 327 25
97 171 21
98 135 0
99 92 0
100 60 0
101 66 0
102 26 0
103 17 0
104 14 0
105 0 0
106 0 0
107 0 0
108 0 0
109 0 0
110 0 0
Arithmetic Operations And Aggregations¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31-dev'
Arithmetic operations¶
Import a subset of the test array pop
:
[4]:
# import a 6 x 2 x 2 subset of the 'pop' example array
pop = load_example_data('demography').pop[2016, 'BruCap', 90:95]
pop
[4]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually
[5]:
# addition
pop + 200
[5]:
age sex\nat BE FO
90 M 739 274
90 F 1677 336
91 M 699 249
91 F 1498 305
92 M 532 235
92 F 1341 278
93 M 487 227
93 F 1106 274
94 M 437 223
94 F 939 265
95 M 354 219
95 F 766 253
[6]:
# multiplication
pop * 2
[6]:
age sex\nat BE FO
90 M 1078 148
90 F 2954 272
91 M 998 98
91 F 2596 210
92 M 664 70
92 F 2282 156
93 M 574 54
93 F 1812 148
94 M 474 46
94 F 1478 130
95 M 308 38
95 F 1132 106
[7]:
# ** means raising to the power (squaring in this case)
pop ** 2
[7]:
age sex\nat BE FO
90 M 290521 5476
90 F 2181529 18496
91 M 249001 2401
91 F 1684804 11025
92 M 110224 1225
92 F 1301881 6084
93 M 82369 729
93 F 820836 5476
94 M 56169 529
94 F 546121 4225
95 M 23716 361
95 F 320356 2809
[8]:
# % means modulo (aka remainder of division)
pop % 10
[8]:
age sex\nat BE FO
90 M 9 4
90 F 7 6
91 M 9 9
91 F 8 5
92 M 2 5
92 F 1 8
93 M 7 7
93 F 6 4
94 M 7 3
94 F 9 5
95 M 4 9
95 F 6 3
More interestingly, it also works between two arrays
[9]:
# load mortality equivalent array
mortality = load_example_data('demography').qx[2016, 'BruCap', 90:95]
# compute number of deaths
death = pop * mortality
death
[9]:
age sex\nat BE FO
90 M 94.00000000000001 13.000000000000004
90 F 204.00000000000003 19.000000000000004
91 M 95.0 9.0
91 F 200.00000000000006 16.0
92 M 70.0 7.0
92 F 195.00000000000006 13.000000000000004
93 M 66.00000000000001 6.0
93 F 171.99999999999997 14.0
94 M 59.0 6.0
94 F 155.00000000000003 14.0
95 M 41.0 5.0
95 F 130.0 12.000000000000004
Note: Be careful when mixing different data types. You can use the method astype
to change the data type of an array.
[10]:
# to be sure to get number of deaths as integers
# one can use .astype() method
death = (pop * mortality).astype(int)
death
[10]:
age sex\nat BE FO
90 M 94 13
90 F 204 19
91 M 95 9
91 F 200 16
92 M 70 7
92 F 195 13
93 M 66 6
93 F 171 14
94 M 59 6
94 F 155 14
95 M 41 5
95 F 130 12
Warning: Operations between two arrays only works when they have compatible axes (i.e. same labels). However, it can be override but at your own risk. In that case only the position on the axis is used and not the labels.
[11]:
pop[90:92] * mortality[93:95]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-3e6b95e7cc66> in <module>
----> 1 pop[90:92] * mortality[93:95]
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/array.py in opmethod(self, other)
5439 if isinstance(other, LArray):
5440 # TODO: first test if it is not already broadcastable
-> 5441 (self, other), res_axes = make_numpy_broadcastable([self, other])
5442 other = other.data
5443 return LArray(super_method(self.data, other), res_axes)
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/array.py in make_numpy_broadcastable(values, min_axes)
9350 Axis.iscompatible : tests if axes are compatible between them.
9351 """
-> 9352 all_axes = AxisCollection.union(*[get_axes(v) for v in values])
9353 if min_axes is not None:
9354 if not isinstance(min_axes, AxisCollection):
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in union(self, *args, **kwargs)
1705 if not isinstance(a, AxisCollection):
1706 a = AxisCollection(a)
-> 1707 result.extend(a, validate=validate, replace_wildcards=replace_wildcards)
1708 return result
1709 __or__ = union
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/axis.py in extend(self, axes, validate, replace_wildcards)
2050 # check that common axes are the same
2051 if validate and not old_axis.iscompatible(axis):
-> 2052 raise ValueError("incompatible axes:\n%r\nvs\n%r" % (axis, old_axis))
2053 if replace_wildcards and old_axis.iswildcard:
2054 self[old_axis] = axis
ValueError: incompatible axes:
Axis([93, 94, 95], 'age')
vs
Axis([90, 91, 92], 'age')
[12]:
pop[90:92] * mortality[93:95].ignore_labels('age')
[12]:
age sex\nat BE FO
90 M 123.95121951219514 16.444444444444443
90 F 280.401766004415 25.72972972972973
91 M 124.22362869198312 12.782608695652174
91 F 272.24627875507446 22.615384615384617
92 M 88.38961038961038 9.210526315789473
92 F 262.06713780918733 17.66037735849057
Boolean Operations¶
[13]:
pop2 = pop.copy()
pop2['F'] = -pop2['F']
pop2
[13]:
age sex\nat BE FO
90 M 539 74
90 F -1477 -136
91 M 499 49
91 F -1298 -105
92 M 332 35
92 F -1141 -78
93 M 287 27
93 F -906 -74
94 M 237 23
94 F -739 -65
95 M 154 19
95 F -566 -53
[14]:
# testing for equality is done using == (a single = assigns the value)
pop == pop2
[14]:
age sex\nat BE FO
90 M True True
90 F False False
91 M True True
91 F False False
92 M True True
92 F False False
93 M True True
93 F False False
94 M True True
94 F False False
95 M True True
95 F False False
[15]:
# testing for inequality
pop != pop2
[15]:
age sex\nat BE FO
90 M False False
90 F True True
91 M False False
91 F True True
92 M False False
92 F True True
93 M False False
93 F True True
94 M False False
94 F True True
95 M False False
95 F True True
[16]:
# what was our original array like again?
pop
[16]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
[17]:
# & means (boolean array) and
(pop >= 500) & (pop <= 1000)
[17]:
age sex\nat BE FO
90 M True False
90 F False False
91 M False False
91 F False False
92 M False False
92 F False False
93 M False False
93 F True False
94 M False False
94 F True False
95 M False False
95 F True False
[18]:
# | means (boolean array) or
(pop < 500) | (pop > 1000)
[18]:
age sex\nat BE FO
90 M False True
90 F True True
91 M True True
91 F True True
92 M True True
92 F True True
93 M True True
93 F False True
94 M True True
94 F False True
95 M True True
95 F False True
Arithmetic operations with missing axes¶
[19]:
pop.sum('age')
[19]:
sex\nat BE FO
M 2048 227
F 6127 511
[20]:
# arr has 3 dimensions
pop.info
[20]:
6 x 2 x 2
age [6]: 90 91 92 93 94 95
sex [2]: 'M' 'F'
nat [2]: 'BE' 'FO'
dtype: int64
memory used: 192 bytes
[21]:
# and arr.sum(age) has two
pop.sum('age').info
[21]:
2 x 2
sex [2]: 'M' 'F'
nat [2]: 'BE' 'FO'
dtype: int64
memory used: 32 bytes
[22]:
# you can do operation with missing axes so this works
pop / pop.sum('age')
[22]:
age sex\nat BE FO
90 M 0.26318359375 0.32599118942731276
90 F 0.2410641423208748 0.26614481409001955
91 M 0.24365234375 0.21585903083700442
91 F 0.2118491921005386 0.2054794520547945
92 M 0.162109375 0.15418502202643172
92 F 0.18622490615309287 0.15264187866927592
93 M 0.14013671875 0.11894273127753303
93 F 0.14787008323812634 0.14481409001956946
94 M 0.11572265625 0.1013215859030837
94 F 0.12061367716663947 0.12720156555772993
95 M 0.0751953125 0.08370044052863436
95 F 0.09237799902072792 0.10371819960861056
Axis order does not matter much (except for output)¶
You can do operations between arrays having different axes order. The axis order of the result is the same as the left array
[23]:
pop
[23]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
[24]:
# let us change the order of axes
pop_transposed = pop.T
pop_transposed
[24]:
nat sex\age 90 91 92 93 94 95
BE M 539 499 332 287 237 154
BE F 1477 1298 1141 906 739 566
FO M 74 49 35 27 23 19
FO F 136 105 78 74 65 53
[25]:
# mind blowing
pop_transposed + pop
[25]:
nat sex\age 90 91 92 93 94 95
BE M 1078 998 664 574 474 308
BE F 2954 2596 2282 1812 1478 1132
FO M 148 98 70 54 46 38
FO F 272 210 156 148 130 106
Aggregates¶
Calculate the sum along an axis:
[26]:
pop = load_example_data('demography').pop[2016, 'BruCap']
pop.sum('age')
[26]:
sex\nat BE FO
M 375261 204534
F 401554 206541
or along all axes except one by appending _by
to the aggregation function
[27]:
pop[90:95].sum_by('age')
# is equivalent to
pop[90:95].sum('sex', 'nat')
[27]:
age 90 91 92 93 94 95
2226 1951 1586 1294 1064 792
Calculate the sum along one group:
[28]:
teens = pop.age[10:20]
pop.sum(teens)
[28]:
sex\nat BE FO
M 53834 19145
F 51740 18871
Calculate the sum along two groups:
[29]:
pensioners = pop.age[67:]
# groups from the same axis must be grouped in a tuple
pop.sum((teens, pensioners))
[29]:
age sex\nat BE FO
10:20 M 53834 19145
10:20 F 51740 18871
67: M 44138 9939
67: F 70314 13241
Mixing axes and groups in aggregations:
[30]:
pop.sum((teens, pensioners), 'nat')
[30]:
age\sex M F
10:20 72979 70611
67: 54077 83555
More On Aggregations¶
There are many other aggregation functions:
mean, min, max, median, percentile, var (variance), std (standard deviation)
labelofmin, labelofmax (label indirect minimum/maxium – labels where the value is minimum/maximum)
indexofmin, indexofmax (positional indirect minimum/maxium – position along axis where the value is minimum/maximum)
cumsum, cumprod (cumulative sum, cumulative product)
Plotting¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31-dev'
Import a subset of the test array pop
:
[4]:
# import a 6 x 2 x 2 subset of the 'pop' example array
pop = load_example_data('demography').pop[2016, 'BruCap', 90:95]
pop
[4]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
Inline matplotlib:
[5]:
%matplotlib inline
Create a plot (last axis define the different curves to draw):
[6]:
pop.plot()
[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe47ff1e5f8>

[7]:
# plot total of both sex
pop.sum('sex').plot()
[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe47dde52b0>

Miscellaneous (other interesting array functions)¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31-dev'
Import a subset of the test array pop
:
[4]:
# import a 6 x 2 x 2 subset of the 'pop' example array
pop = load_example_data('demography').pop[2016, 'BruCap', 100:105]
pop
[4]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 8 0
102 F 26 1
103 M 2 1
103 F 17 2
104 M 2 1
104 F 14 0
105 M 0 0
105 F 2 2
with total¶
Add totals to one axis
[5]:
pop.with_total('sex', label='B')
[5]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
100 B 72 3
101 M 12 2
101 F 66 5
101 B 78 7
102 M 8 0
102 F 26 1
102 B 34 1
103 M 2 1
103 F 17 2
103 B 19 3
104 M 2 1
104 F 14 0
104 B 16 1
105 M 0 0
105 F 2 2
105 B 2 2
Add totals to all axes at once
[6]:
# by default label is 'total'
pop.with_total()
[6]:
age sex\nat BE FO total
100 M 12 0 12
100 F 60 3 63
100 total 72 3 75
101 M 12 2 14
101 F 66 5 71
101 total 78 7 85
102 M 8 0 8
102 F 26 1 27
102 total 34 1 35
103 M 2 1 3
103 F 17 2 19
103 total 19 3 22
104 M 2 1 3
104 F 14 0 14
104 total 16 1 17
105 M 0 0 0
105 F 2 2 4
105 total 2 2 4
total M 36 4 40
total F 185 13 198
total total 221 17 238
where¶
where can be used to apply some computation depending on a condition
[7]:
# where(condition, value if true, value if false)
where(pop < 10, 0, -pop)
[7]:
age sex\nat BE FO
100 M -12 0
100 F -60 0
101 M -12 0
101 F -66 0
102 M 0 0
102 F -26 0
103 M 0 0
103 F -17 0
104 M 0 0
104 F -14 0
105 M 0 0
105 F 0 0
clip¶
Set all data between a certain range
[8]:
# clip(min, max)
# values below 10 are set to 10 and values above 50 are set to 50
pop.clip(10, 50)
[8]:
age sex\nat BE FO
100 M 12 10
100 F 50 10
101 M 12 10
101 F 50 10
102 M 10 10
102 F 26 10
103 M 10 10
103 F 17 10
104 M 10 10
104 F 14 10
105 M 10 10
105 F 10 10
divnot0¶
Replace division by 0 to 0
[9]:
pop['BE'] / pop['FO']
/home/docs/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered during operation
"""Entry point for launching an IPython kernel.
/home/docs/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/ipykernel_launcher.py:1: RuntimeWarning: invalid value (NaN) encountered during operation (this is typically caused by a 0 / 0)
"""Entry point for launching an IPython kernel.
[9]:
age\sex M F
100 inf 20.0
101 6.0 13.2
102 inf 26.0
103 2.0 8.5
104 2.0 inf
105 nan 1.0
[10]:
# divnot0 replaces results of division by 0 by 0.
# Using it should be done with care though
# because it can hide a real error in your data.
pop['BE'].divnot0(pop['FO'])
[10]:
age\sex M F
100 0.0 20.0
101 6.0 13.2
102 0.0 26.0
103 2.0 8.5
104 2.0 0.0
105 0.0 1.0
diff¶
The diff
method calculates the n-th order discrete difference along a given axis. The first order difference is given by out[n+1] = in[n + 1] - in[n] along the given axis.
[11]:
pop = load_example_data('demography').pop[2005:2015, 'BruCap', 50]
pop
[11]:
time sex\nat BE FO
2005 M 4289 1591
2005 F 4661 1584
2006 M 4335 1761
2006 F 4781 1580
2007 M 4291 1806
2007 F 4719 1650
2008 M 4349 1773
2008 F 4731 1680
2009 M 4429 2003
2009 F 4824 1722
2010 M 4582 2085
2010 F 4869 1928
2011 M 4677 2294
2011 F 5015 2104
2012 M 4463 2450
2012 F 4722 2186
2013 M 4610 2604
2013 F 4711 2254
2014 M 4725 2709
2014 F 4788 2349
2015 M 4841 2891
2015 F 4813 2498
[12]:
# calculates 'pop[year+1] - pop[year]'
pop.diff('time')
[12]:
time sex\nat BE FO
2006 M 46 170
2006 F 120 -4
2007 M -44 45
2007 F -62 70
2008 M 58 -33
2008 F 12 30
2009 M 80 230
2009 F 93 42
2010 M 153 82
2010 F 45 206
2011 M 95 209
2011 F 146 176
2012 M -214 156
2012 F -293 82
2013 M 147 154
2013 F -11 68
2014 M 115 105
2014 F 77 95
2015 M 116 182
2015 F 25 149
[13]:
# calculates 'pop[year+2] - pop[year]'
pop.diff('time', d=2)
[13]:
time sex\nat BE FO
2007 M 2 215
2007 F 58 66
2008 M 14 12
2008 F -50 100
2009 M 138 197
2009 F 105 72
2010 M 233 312
2010 F 138 248
2011 M 248 291
2011 F 191 382
2012 M -119 365
2012 F -147 258
2013 M -67 310
2013 F -304 150
2014 M 262 259
2014 F 66 163
2015 M 231 287
2015 F 102 244
ratio¶
[14]:
pop.ratio('nat')
# which is equivalent to
pop / pop.sum('nat')
[14]:
time sex\nat BE FO
2005 M 0.729421768707483 0.270578231292517
2005 F 0.7463570856685349 0.2536429143314652
2006 M 0.7111220472440944 0.2888779527559055
2006 F 0.7516113818581984 0.2483886181418016
2007 M 0.703788748564868 0.29621125143513205
2007 F 0.7409326424870466 0.25906735751295334
2008 M 0.7103887618425351 0.28961123815746487
2008 F 0.7379503977538605 0.26204960224613943
2009 M 0.6885883084577115 0.31141169154228854
2009 F 0.7369385884509624 0.26306141154903756
2010 M 0.6872656367181641 0.3127343632818359
2010 F 0.7163454465205238 0.2836545534794762
2011 M 0.6709223927700474 0.32907760722995266
2011 F 0.7044528725944655 0.29554712740553446
2012 M 0.6455952553160712 0.35440474468392885
2012 F 0.6835552982049797 0.31644470179502027
2013 M 0.6390352093152204 0.3609647906847796
2013 F 0.6763819095477387 0.3236180904522613
2014 M 0.635593220338983 0.3644067796610169
2014 F 0.6708701134930644 0.3291298865069357
2015 M 0.6260993274702535 0.3739006725297465
2015 F 0.6583230748187663 0.34167692518123377
percents¶
[15]:
# or, if you want the previous ratios in percents
pop.percent('nat')
[15]:
time sex\nat BE FO
2005 M 72.9421768707483 27.0578231292517
2005 F 74.63570856685348 25.364291433146516
2006 M 71.11220472440945 28.887795275590552
2006 F 75.16113818581984 24.83886181418016
2007 M 70.3788748564868 29.621125143513204
2007 F 74.09326424870466 25.906735751295336
2008 M 71.03887618425351 28.96112381574649
2008 F 73.79503977538606 26.204960224613945
2009 M 68.85883084577114 31.141169154228855
2009 F 73.69385884509624 26.30614115490376
2010 M 68.72656367181641 31.273436328183593
2010 F 71.63454465205237 28.365455347947623
2011 M 67.09223927700474 32.90776072299526
2011 F 70.44528725944654 29.554712740553448
2012 M 64.55952553160712 35.440474468392885
2012 F 68.35552982049798 31.644470179502026
2013 M 63.90352093152204 36.09647906847796
2013 F 67.63819095477388 32.36180904522613
2014 M 63.559322033898304 36.440677966101696
2014 F 67.08701134930644 32.91298865069357
2015 M 62.60993274702535 37.39006725297465
2015 F 65.83230748187663 34.167692518123374
growth_rate¶
using the same principle than diff
[16]:
pop.growth_rate('time')
[16]:
time sex\nat BE FO
2006 M 0.010725110748426206 0.10685103708359522
2006 F 0.025745548165629694 -0.0025252525252525255
2007 M -0.010149942329873126 0.02555366269165247
2007 F -0.012967998326709893 0.04430379746835443
2008 M 0.013516662782568165 -0.018272425249169437
2008 F 0.0025429116338207248 0.01818181818181818
2009 M 0.01839503334099793 0.12972363226170333
2009 F 0.019657577679137603 0.025
2010 M 0.03454504402799729 0.040938592111832255
2010 F 0.009328358208955223 0.11962833914053426
2011 M 0.02073330423395897 0.10023980815347722
2011 F 0.029985623331279524 0.0912863070539419
2012 M -0.04575582638443447 0.06800348735832606
2012 F -0.0584247258225324 0.03897338403041825
2013 M 0.03293748599596684 0.06285714285714286
2013 F -0.002329521389241847 0.03110704483074108
2014 M 0.024945770065075923 0.04032258064516129
2014 F 0.01634472511144131 0.04214729370008873
2015 M 0.02455026455026455 0.06718346253229975
2015 F 0.0052213868003341685 0.06343124733929331
shift¶
The shift
method drops first label of an axis and shifts all subsequent labels
[17]:
pop.shift('time')
[17]:
time sex\nat BE FO
2006 M 4289 1591
2006 F 4661 1584
2007 M 4335 1761
2007 F 4781 1580
2008 M 4291 1806
2008 F 4719 1650
2009 M 4349 1773
2009 F 4731 1680
2010 M 4429 2003
2010 F 4824 1722
2011 M 4582 2085
2011 F 4869 1928
2012 M 4677 2294
2012 F 5015 2104
2013 M 4463 2450
2013 F 4722 2186
2014 M 4610 2604
2014 F 4711 2254
2015 M 4725 2709
2015 F 4788 2349
[18]:
# when shift is applied on an (increasing) time axis,
# it effectively brings "past" data into the future
pop.shift('time').ignore_labels('time') == pop[2005:2014].ignore_labels('time')
[18]:
time* sex\nat BE FO
0 M True True
0 F True True
1 M True True
1 F True True
2 M True True
2 F True True
3 M True True
3 F True True
4 M True True
4 F True True
5 M True True
5 F True True
6 M True True
6 F True True
7 M True True
7 F True True
8 M True True
8 F True True
9 M True True
9 F True True
[19]:
# this is mostly useful when you want to do operations between the past and now
# as an example, here is an alternative implementation of the .diff method seen above:
pop.i[1:] - pop.shift('time')
[19]:
time sex\nat BE FO
2006 M 46 170
2006 F 120 -4
2007 M -44 45
2007 F -62 70
2008 M 58 -33
2008 F 12 30
2009 M 80 230
2009 F 93 42
2010 M 153 82
2010 F 45 206
2011 M 95 209
2011 F 146 176
2012 M -214 156
2012 F -293 82
2013 M 147 154
2013 F -11 68
2014 M 115 105
2014 F 77 95
2015 M 116 182
2015 F 25 149
Misc other interesting functions¶
There are a lot more interesting functions available:
round, floor, ceil, trunc,
exp, log, log10,
sqrt, absolute, nan_to_num, isnan, isinf, inverse,
sin, cos, tan, arcsin, arccos, arctan
and many many more…
Working With Sessions¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31-dev'
Before To Continue¶
If you not yet comfortable with creating, saving and loading sessions, please read first the Creating Sessions and Loading and Dumping Sessions sections of the tutorial before going further.
Exploring Content¶
To get the list of items names of a session, use the names shortcut (be careful that the list is sorted alphabetically and does not follow the internal order!):
[4]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('population_session.h5')
s_pop = Session(filepath_hdf)
# print the content of the session
print(s_pop.names)
['births', 'country', 'deaths', 'even_years', 'gender', 'odd_years', 'pop', 'time']
To get more information of items of a session, the summary will provide not only the names of items but also the list of labels in the case of axes or groups and the list of axes, the shape and the dtype in the case of arrays:
[5]:
# print the content of the session
print(s_pop.summary())
country: country ['Belgium' 'France' 'Germany'] (3)
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015] (3)
even_years: time['2014'] >> even_years (1)
odd_years: time[2013 2015] >> odd_years (2)
births: country, gender, time (3 x 2 x 3) [int32]
deaths: country, gender, time (3 x 2 x 3) [int32]
pop: country, gender, time (3 x 2 x 3) [int32]
Selecting And Filtering Items¶
To select an item, simply use the syntax <session_var>.<item_name>
:
[6]:
s_pop.pop
[6]:
country gender\time 2013 2014 2015
Belgium Male 5472856 5493792 5524068
Belgium Female 5665118 5687048 5713206
France Male 31772665 31936596 32175328
France Female 33827685 34005671 34280951
Germany Male 39380976 39556923 39835457
Germany Female 41142770 41210540 41362080
To return a new session with selected items, use the syntax <session_var>[list, of, item, names]
:
[7]:
s_pop_new = s_pop['pop', 'births', 'deaths']
s_pop_new.names
[7]:
['births', 'deaths', 'pop']
The filter method allows you to select all items of the same kind (i.e. all axes, or groups or arrays) or all items with names satisfying a given pattern:
[8]:
# select only arrays of a session
s_pop.filter(kind=LArray)
[8]:
Session(births, deaths, pop)
[9]:
# selection all items with a name starting with a letter between a and k
s_pop.filter(pattern='[a-k]*')
[9]:
Session(country, gender, even_years, births, deaths)
Arithmetic Operations On Sessions¶
Session objects accept binary operations with a scalar:
[10]:
# get population, births and deaths in millions
s_pop_div = s_pop / 1e6
s_pop_div.pop
[10]:
country gender\time 2013 2014 2015
Belgium Male 5.472856 5.493792 5.524068
Belgium Female 5.665118 5.687048 5.713206
France Male 31.772665 31.936596 32.175328
France Female 33.827685 34.005671 34.280951
Germany Male 39.380976 39.556923 39.835457
Germany Female 41.14277 41.21054 41.36208
with an array (please read the documentation of the random.choice function first if you don’t know it):
[11]:
from larray import random
random_multiplicator = random.choice([0.98, 1.0, 1.02], p=[0.15, 0.7, 0.15], axes=s_pop.pop.axes)
random_multiplicator
[11]:
country gender\time 2013 2014 2015
Belgium Male 0.98 1.0 1.0
Belgium Female 1.0 1.02 1.02
France Male 0.98 0.98 1.0
France Female 1.0 1.0 1.0
Germany Male 1.0 1.0 1.0
Germany Female 1.0 1.0 1.0
[12]:
# multiply all variables of a session by a common array
s_pop_rand = s_pop * random_multiplicator
s_pop_rand.pop
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-12-5f82b3cbbdf9> in <module>
1 # multiply all variables of a session by a common array
----> 2 s_pop_rand = s_pop * random_multiplicator
3
4 s_pop_rand.pop
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/session.py in opmethod(self, other)
941 res = []
942 for name in all_keys:
--> 943 self_item = self.get(name, nan)
944 other_operand = other.get(name, nan) if hasattr(other, 'get') else other
945 if arrays_only and not isinstance(self_item, LArray):
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/session.py in get(self, key, default)
299 """
300 try:
--> 301 return self[key]
302 except KeyError:
303 return default
~/checkouts/readthedocs.org/user_builds/larray-test/conda/documentation/lib/python3.6/site-packages/larray-0.31.dev0-py3.6.egg/larray/core/session.py in __getitem__(self, key)
255 return Session([(name, self[name]) for name in truenames])
256 elif isinstance(key, (tuple, list)):
--> 257 assert all(isinstance(k, str) for k in key)
258 return Session([(k, self[k]) for k in key])
259 else:
AssertionError:
with another session:
[13]:
# compute the difference between each array of the two sessions
s_diff = s_pop - s_pop_rand
s_diff.births
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-13-db5241167ae2> in <module>
1 # compute the difference between each array of the two sessions
----> 2 s_diff = s_pop - s_pop_rand
3
4 s_diff.births
NameError: name 's_pop_rand' is not defined
Applying Functions On All Arrays¶
In addition to the classical arithmetic operations, the apply method can be used to apply the same function on all arrays. This function should take a single element argument and return a single value:
[14]:
# force conversion to type int
def as_type_int(array):
return array.astype(int)
s_pop_rand_int = s_pop_rand.apply(as_type_int)
print('pop array before calling apply:')
print(s_pop_rand.pop)
print()
print('pop array after calling apply:')
print(s_pop_rand_int.pop)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-14-5ba7352689a5> in <module>
3 return array.astype(int)
4
----> 5 s_pop_rand_int = s_pop_rand.apply(as_type_int)
6
7 print('pop array before calling apply:')
NameError: name 's_pop_rand' is not defined
It is possible to pass a function with additional arguments:
[15]:
# passing the LArray.astype method directly with argument
# dtype defined as int
s_pop_rand_int = s_pop_rand.apply(LArray.astype, dtype=int)
print('pop array before calling apply:')
print(s_pop_rand.pop)
print()
print('pop array after calling apply:')
print(s_pop_rand_int.pop)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-15-526833a6ec98> in <module>
1 # passing the LArray.astype method directly with argument
2 # dtype defined as int
----> 3 s_pop_rand_int = s_pop_rand.apply(LArray.astype, dtype=int)
4
5 print('pop array before calling apply:')
NameError: name 's_pop_rand' is not defined
It is also possible to apply a function on non-LArray objects of a session. Please refer the documentation of the apply method.
Comparing Sessions¶
Being able to compare two sessions may be useful when you want to compare two different models expected to give the same results or when you have updated your model and want to see what are the consequences of the recent changes.
Session objects provide the two methods to compare two sessions: equals and element_equals.
The equals
method will return True if all items from both sessions are identical, False otherwise:
[16]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('population_session.h5')
s_pop = Session(filepath_hdf)
# create a copy of the original session
s_pop_copy = Session(filepath_hdf)
# 'equals' returns True if all items of the two sessions have exactly the same items
s_pop.equals(s_pop_copy)
[16]:
True
[17]:
# create a copy of the original session but with the array
# 'births' slightly modified for some labels combination
s_pop_alternative = Session(filepath_hdf)
s_pop_alternative.births *= random_multiplicator
# 'equals' returns False if at least on item of the two sessions are different in values or axes
s_pop.equals(s_pop_alternative)
[17]:
False
[18]:
# add an array to the session
s_pop_new_output = Session(filepath_hdf)
s_pop_new_output.gender_ratio = s_pop_new_output.pop.ratio('gender')
# 'equals' returns False if at least on item is not present in the two sessions
s_pop.equals(s_pop_new_output)
[18]:
False
The element_equals
method will compare items of two sessions one by one and return an array of boolean values:
[19]:
# 'element_equals' compare arrays one by one
s_pop.element_equals(s_pop_copy)
[19]:
name country gender time even_years odd_years births deaths pop
True True True True True True True True
[20]:
# array 'births' is different between the two sessions
s_pop.element_equals(s_pop_alternative)
[20]:
name country gender time even_years odd_years births deaths pop
True True True True True False True True
The ==
operator return a new session with boolean arrays with elements compared element-wise:
[21]:
s_same_values = s_pop == s_pop_alternative
s_same_values.births
[21]:
country gender\time 2013 2014 2015
Belgium Male False True True
Belgium Female True False False
France Male False False True
France Female True True True
Germany Male True True True
Germany Female True True True
This also works for axes and groups:
[22]:
s_same_values.country
[22]:
country Belgium France Germany
True True True
The !=
operator does the opposite of ==
operator:
[23]:
s_different_values = s_pop != s_pop_alternative
s_different_values.births
[23]:
country gender\time 2013 2014 2015
Belgium Male True False False
Belgium Female False True True
France Male True True False
France Female False False False
Germany Male False False False
Germany Female False False False
A more visual way is to use the compare function which will open the Editor
.
compare(s_pop, s_pop_alternative, names=['baseline', 'lower_birth_rate'])
Compatibility with pandas¶
To convert a LArray object into a pandas DataFrame, the method to_frame()
can be used:
In [1]: df = pop.to_frame()
In [2]: df
Out[2]:
year 2015 2016 2017
age sex
0-9 F 0.0 0.0 0.0
M 0.0 0.0 0.0
10-17 F 0.0 0.0 0.0
M 0.0 0.0 0.0
18-66 F 0.0 0.0 0.0
M 0.0 0.0 0.0
67+ F 0.0 0.0 0.0
M 0.0 0.0 0.0
Inversely, to convert a DataFrame into a LArray object, use the function aslarray()
:
In [3]: pop = aslarray(df)
In [4]: pop
Out[4]:
age sex\year 2015 2016 2017
0-9 F 0.0 0.0 0.0
0-9 M 0.0 0.0 0.0
10-17 F 0.0 0.0 0.0
10-17 M 0.0 0.0 0.0
18-66 F 0.0 0.0 0.0
18-66 M 0.0 0.0 0.0
67+ F 0.0 0.0 0.0
67+ M 0.0 0.0 0.0
API Reference¶
Axis¶
|
Represents an axis. |
Exploring¶
Axis.name |
Name of the axis. None in the case of an anonymous axis. |
|
Labels of the axis. |
|
Short representation of the labels. |
|
Data type for the axis labels. |
Searching¶
|
Translates a label key to its numerical index counterpart. |
|
Returns a group with all the labels containing the specified substring. |
|
Returns a group with the labels starting with the specified string. |
|
Returns a group with the labels ending with the specified string. |
|
Returns a group with all the labels matching the specified pattern or regular expression. |
Modifying/Selecting¶
|
Returns a group (list or unique element) of label(s) usable in .sum or .filter |
Allows to define a subset using positions along the axis instead of labels. |
|
|
Split axis into several groups of specified length. |
|
Renames the axis. |
|
Returns an axis for a sub-array. |
|
Append new labels to an axis or increase its length in case of wildcard axis. |
|
Return a new axis with new_labels inserted before before or after after. |
|
Returns a new axis with some labels replaced. |
|
Returns a new axis with the labels transformed by func. |
|
Returns axis with the union of this axis labels and other labels. |
|
Returns axis with the (set) intersection of this axis labels and other labels. |
|
Returns axis with the (set) difference of this axis labels and other labels. |
|
Align axis with other object using specified join method. |
|
Split axis and returns a list of Axis. |
|
Returns a wildcard axis with the same name and length than this axis. |
Testing¶
|
Checks if self is compatible with another axis. |
|
Checks if self is equal to another axis. |
Save¶
|
Writes axis to a HDF file. |
Group¶
IGroup¶
|
Index Group. |
|
Returns group with a different name. |
|
Returns group with a different axis. |
|
Split group into several groups of specified length. |
|
Checks if this group is equal to another group. |
|
compute position(s) of group |
|
Returns (set) union of this label group and other. |
|
Returns (set) intersection of this label group and other. |
|
Returns (set) difference of this label group and other. |
|
Returns a group with all the labels containing the specified substring. |
|
Returns a group with the labels starting with the specified string. |
|
Returns a group with the labels ending with the specified string. |
|
Returns a group with all the labels matching the specified pattern or regular expression. |
|
Writes group to a HDF file. |
LGroup¶
|
Label group. |
|
Returns group with a different name. |
|
Returns group with a different axis. |
|
Split group into several groups of specified length. |
|
Checks if this group is equal to another group. |
|
compute position(s) of group |
|
Returns (set) union of this label group and other. |
|
Returns (set) intersection of this label group and other. |
|
Returns (set) difference of this label group and other. |
|
Returns a group with all the labels containing the specified substring. |
|
Returns a group with the labels starting with the specified string. |
|
Returns a group with the labels ending with the specified string. |
|
Returns a group with all the labels matching the specified pattern or regular expression. |
|
Writes group to a HDF file. |
AxisCollection¶
|
Returns the list of (raw) names of the axes. |
|
Returns the list of (display) names of the axes. |
|
Returns the list of labels of the axes. |
|
Returns the shape of the collection. |
|
Returns the size of the collection, i.e. |
|
Describes the collection (shape and labels for each axis). |
|
|
Returns a copy. |
Searching¶
|
Returns list of all axis names. |
|
Returns the index of axis. |
|
Returns the id of an axis. |
Returns the list of ids of the axes. |
|
|
Returns a view of the axes labels. |
Modifying/Selecting¶
|
Returns axis corresponding to key. |
|
Returns axis corresponding to a key, or to position i if the key has no name and key object not found. |
|
Returns all axes from key if present and length 1 wildcard axes otherwise. |
|
Removes and returns an axis. |
|
Appends axis at the end of the collection. |
|
Extends the collection by appending the axes from axes. |
|
Inserts axis before index. |
|
Renames axes of the collection. |
|
Replace one, several or all axes of the collection. |
|
Replaces the labels of one or several axes. |
|
Returns a new collection without some axes. |
|
Combine several axes into one. |
|
Split axes and returns a new collection |
|
Align this axis collection with another. |
Testing¶
|
Tests if input is an Axis object or the name of an axis contained in self. |
|
Checks if axes passed as argument are compatible with those contained in the collection. |
LArray¶
Overview¶
|
A LArray object represents a multidimensional, homogeneous array of fixed-size items with labeled axes. |
Array Creation Functions¶
|
Creates an array by sequentially applying modifications to the array along axis. |
|
Returns test array with given shape. |
|
Returns an array with the specified axes and filled with zeros. |
|
Returns an array with the same axes as array and filled with zeros. |
|
Returns an array with the specified axes and filled with ones. |
|
Returns an array with the same axes as array and filled with ones. |
|
Returns an array with the specified axes and uninitialized (arbitrary) data. |
|
Returns an array with the same axes as array and uninitialized (arbitrary) data. |
|
Returns an array with the specified axes and filled with fill_value. |
|
Returns an array with the same axes and type as input array and filled with fill_value. |
Copying¶
|
Returns a copy of the array. |
|
Copy of the array, cast to a specified type. |
Inspecting¶
LArray.data |
Data of the array (Numpy ndarray) |
LArray.axes |
Axes of the array (AxisCollection) |
LArray.title |
Title of the array (str) |
Describes a LArray (metadata + shape and labels for each axis). |
|
Returns the shape of the array as a tuple. |
|
Returns the number of dimensions of the array. |
|
Returns the type of the data of the array. |
|
Returns the number of elements in array. |
|
Returns the number of bytes used to store the array in memory. |
|
Returns the memory consumed by the array in human readable form. |
Modifying/Selecting¶
Allows selection of a subset using indices of labels. |
|
Allows selection of arbitrary items in the array based on their N-dimensional label index. |
|
Allows selection of arbitrary items in the array based on their N-dimensional index. |
|
Access the array by index as if it was flat (one dimensional) and all its axes were combined. |
|
|
Sets a subset of array to value. |
|
Return array without some labels or indices along an axis. |
|
Ignore labels from axes (replace those axes by “wildcard” axes). |
|
Filters the array along the axes given as keyword arguments. |
|
Apply a transformation function to array elements. |
|
Apply a transformation mapping to array elements. |
Changing Axes or Labels¶
|
Replace one, several or all axes of the array. |
|
Renames axes of the array. |
|
Replaces the labels of one or several axes of the array. |
|
Combine several axes into one. |
|
Split axes and returns a new array |
|
Reverse axes of an array |
Aggregation Functions¶
|
Computes the sum of array elements along given axes/groups. |
|
Computes the sum of array elements for the given axes/groups. |
|
Computes the product of array elements along given axes/groups. |
|
Computes the product of array elements for the given axes/groups. |
|
Returns the cumulative sum of array elements along an axis. |
|
Returns the cumulative product of array elements. |
|
Computes the arithmetic mean. |
|
Computes the arithmetic mean. |
|
Computes the arithmetic median. |
|
Computes the arithmetic median. |
|
Computes the unbiased variance. |
|
Computes the unbiased variance. |
|
Computes the sample standard deviation. |
|
Computes the sample standard deviation. |
|
Computes the qth percentile of the data along the specified axis. |
|
Computes the qth percentile of the data for the specified axis. |
|
Returns the range of values (maximum - minimum). |
|
Add aggregated values (sum by default) along each axis. |
|
Returns an array with values given as percent of the total of all values along given axes. |
|
Returns an array with all values divided by the sum of values along given axes. |
|
Returns a LArray with values array / array.sum(axes) where the sum is not 0, 0 otherwise. |
|
Calculates the growth along a given axis. |
|
Descriptive summary statistics, excluding NaN values. |
|
Descriptive summary statistics, excluding NaN values, along axes or for groups. |
Sorting¶
|
Sorts axes of the array. |
|
Sorts values of the array. |
|
Returns the labels that would sort this array. |
|
Returns the indices that would sort this array. |
Reshaping/Extending/Reordering¶
|
Given a list of new axes, changes the shape of the array. |
|
Same as reshape but with an array as input. |
|
Detects and removes “useless” axes (ie axes for which values are constant over the whole axis) |
|
Reorder and/or add new labels in axes. |
|
Reorder axes. |
|
Expands array to target_axes. |
|
Adds an array before self along an axis. |
|
Adds an array to self along an axis. |
|
Adds an array to self along an axis. |
|
Inserts value in array along an axis. |
|
Returns an array that is (NumPy) broadcastable with target. |
|
Align two arrays on their axes with the specified join method. |
Testing/Searching¶
|
Compares self with another array and returns True if they have the same axes and elements, False otherwise. |
|
Compares self with another array element-wise and returns an array of booleans. |
|
Computes whether each element of this array is in test_values. |
|
Returns the indices of the elements that are non-zero. |
|
Test whether all selected elements evaluate to True. |
|
Test whether all selected elements evaluate to True. |
|
Test whether any selected elements evaluate to True. |
|
Test whether any selected elements evaluate to True. |
|
Get minimum of array elements along given axes/groups. |
|
Get minimum of array elements for the given axes/groups. |
|
Get maximum of array elements along given axes/groups. |
|
Get maximum of array elements for the given axes/groups. |
|
Returns labels of the minimum values along a given axis. |
|
Returns indices of the minimum values along a given axis. |
|
Returns labels of the maximum values along a given axis. |
|
Returns indices of the maximum values along a given axis. |
Iterating¶
|
Returns a view on the array labels along axes. |
|
Returns a view on the values of the array along axes. |
|
Returns a (label, value) view of the array along axes. |
Operators¶
|
Matrix multiplication |
Miscellaneous¶
|
Divides array by other, but returns 0.0 where other is 0. |
|
Clip (limit) the values in an array. |
|
Shifts the cells of the array n-times to the right along axis. |
|
Rolls the cells of the array n-times to the right along axis. |
|
Calculates the n-th order discrete difference along a given axis. |
|
Returns unique values (optionally along axes) |
|
Sends the content of the array to clipboard. |
Converting to Pandas objects¶
|
Converts LArray into Pandas Series. |
|
Converts LArray into Pandas DataFrame. |
Plotting¶
Plots the data of the array into a graph (window pop-up). |
Utility Functions¶
Miscellaneous¶
|
Return elements, either from x or y, depending on condition. |
|
Element-wise maximum of array elements. |
|
Element-wise minimum of array elements. |
|
Compute the (multiplicative) inverse of a matrix. |
|
One-dimensional linear interpolation. |
|
Returns the discrete, linear convolution of two one-dimensional sequences. |
|
Calculate the absolute value element-wise. |
|
Compute the absolute values element-wise. |
|
Test element-wise for NaN and return result as a boolean array. |
|
Test element-wise for positive or negative infinity. |
|
Replace NaN with zero and infinity with large finite numbers (default behaviour) or with the numbers defined by the user using the nan, posinf and/or neginf keywords. |
|
Return the non-negative square-root of an array, element-wise. |
|
Modified Bessel function of the first kind, order 0. |
|
Return the sinc function. |
Rounding¶
|
Round an array to the given number of decimals. |
|
Return the floor of the input, element-wise. |
|
Return the ceiling of the input, element-wise. |
|
Return the truncated value of the input, element-wise. |
|
Round elements of the array to the nearest integer. |
|
Round to nearest integer towards zero. |
Exponents And Logarithms¶
|
Calculate the exponential of all elements in the input array. |
|
Calculate |
|
Calculate 2**p for all p in the input array. |
|
Natural logarithm, element-wise. |
|
Return the base 10 logarithm of the input array, element-wise. |
|
Base-2 logarithm of x. |
|
Return the natural logarithm of one plus the input array, element-wise. |
|
Logarithm of the sum of exponentiations of the inputs. |
|
Logarithm of the sum of exponentiations of the inputs in base-2. |
Trigonometric functions¶
|
Trigonometric sine, element-wise. |
|
Cosine element-wise. |
|
Compute tangent element-wise. |
|
Inverse sine, element-wise. |
|
Trigonometric inverse cosine, element-wise. |
|
Trigonometric inverse tangent, element-wise. |
|
Given the “legs” of a right triangle, return its hypotenuse. |
|
Element-wise arc tangent of |
|
Convert angles from radians to degrees. |
|
Convert angles from degrees to radians. |
|
Unwrap by changing deltas between values to 2*pi complement. |
Hyperbolic functions¶
|
Hyperbolic sine, element-wise. |
|
Hyperbolic cosine, element-wise. |
|
Compute hyperbolic tangent element-wise. |
|
Inverse hyperbolic sine element-wise. |
|
Inverse hyperbolic cosine, element-wise. |
|
Inverse hyperbolic tangent element-wise. |
Complex Numbers¶
|
Return the angle of the complex argument. |
|
Return the real part of the complex argument. |
|
Return the imaginary part of the complex argument. |
|
Return the complex conjugate, element-wise. |
Floating Point Routines¶
|
Returns element-wise True where signbit is set (less than zero). |
|
Change the sign of x1 to that of x2, element-wise. |
|
Decompose the elements of x into mantissa and twos exponent. |
|
Returns x1 * 2**x2, element-wise. |
Metadata¶
An ordered dictionary allowing key-values accessibly using attribute notation (AttributeDict.attribute) instead of key notation (Dict[“key”]). |
Input/Output¶
Read¶
|
Reads csv file and returns an array with the contents. |
|
|
|
Reads excel file from sheet name and returns an LArray with the contents |
|
Reads an axis or group or array named key from a HDF5 file in filepath (path+name) |
|
Reads EUROSTAT TSV (tab-separated) file into an array. |
|
Reads sas file and returns an LArray with the contents |
|
Reads Stata .dta file and returns an LArray with the contents |
Write¶
|
Writes array to a csv file. |
|
Writes array in the specified sheet of specified excel workbook. |
|
Writes array to a HDF file. |
|
Writes array to a Stata .dta file. |
|
Dump array as a 2D nested list. |
Excel¶
|
Open an Excel workbook |
|
Excel Workbook. |
|
Returns the names of the Excel sheets. |
|
Saves the Workbook. |
|
Close the workbook in Excel. |
|
Return the Excel instance this workbook is attached to. |
ExcelReport¶
Automate the generation of multiple graphs in an Excel file. |
|
Set the path to the directory containing the Excel template files (with ‘.crtx’ extension). |
|
Set a default Excel template file. |
|
|
Override the default ‘width’ and ‘height’ values for the given kind of item. |
Default number of graphs per row. |
|
|
Add a new empty output sheet. |
|
Returns the names of the output sheets. |
|
Generate the report Excel file. |
ReportSheet¶
Represents a sheet dedicated to contains only graphical items (title banners, graphs). |
|
Set the path to the directory containing the Excel template files (with ‘.crtx’ extension). |
|
Set a default Excel template file. |
|
|
Override the default ‘width’ and ‘height’ values for the given kind of item. |
Default number of graphs per row. |
|
|
Add a title item to the current sheet. |
|
Add a graph item to the current sheet. |
|
Add multiple graph items to the current sheet. |
|
Force a new row of graphs. |
Miscellaneous¶
|
Converts input as LArray if possible. |
|
Converts Pandas DataFrame into LArray. |
|
Converts Pandas Series into LArray. |
|
Return absolute path to an example file if exist. |
|
Set options for larray in a controlled context. |
Return the current options. |
|
|
Returns an array with specified axes and the combination of corresponding labels as values. |
|
Returns the union of several “value strings” as a list. |
|
Combines several arrays or sessions along an axis. |
|
|
|
Extracts a diagonal or construct a diagonal array. |
|
Returns a 2-D array with ones on the diagonal and zeros elsewhere. |
|
Apply Iterative Proportional Fitting Procedure (also known as bi-proportional fitting in statistics, RAS algorithm in economics) to array a, with target_sums as targets. |
Wrap a function using numpy arrays to work with LArray arrays instead. |
|
|
Returns a sequence as if simultaneously iterating on several arrays. |
|
Returns a sequence as if simultaneously iterating on several arrays as well as the current iteration “key”. |
Session¶
|
Groups several objects together. |
|
Returns a session containing all available arrays (whether they are defined in local or global variables) sorted in alphabetical order. |
|
Returns a session containing all local arrays sorted in alphabetical order. |
|
Returns a session containing all global arrays sorted in alphabetical order. |
|
Load arrays used in the tutorial so that all examples in it can be reproduced. |
Exploring¶
Returns the list of names of the objects in the session. |
|
|
Returns a view on the session’s keys. |
|
Returns a view on the session’s values. |
|
Returns a view of the session’s items ((key, value) pairs). |
|
Returns a summary of the content of the session. |
Copying¶
|
Returns a copy of the session. |
Testing¶
|
Test if each element (group, axis and array) of the current session equals the corresponding element of another session. |
|
Test if all elements (groups, axes and arrays) of the current session are equal to those of another session. |
Selecting¶
|
Returns the object corresponding to the key. |
Modifying¶
|
Adds objects to the current session. |
|
Update the session with the key/value pairs from other or passed keyword arguments, overwriting existing keys. |
|
Returns the object corresponding to the key. |
|
Apply function func on elements of the session and return a new session. |
|
Reorder axes of arrays in session, ignoring missing axes for each array. |
Filtering/Cleaning¶
|
Returns a new session with objects which match some criteria. |
|
Detects and removes “useless” axes (ie axes for which values are constant over the whole axis) for all array objects in session |
Load/Save¶
|
Load LArray, Axis and Group objects from a file, or several .csv files. |
|
Dumps LArray, Axis and Group objects from the current session to a file. |
|
Dumps LArray, Axis and Group objects from the current session to CSV files. |
|
Dumps LArray, Axis and Group objects from the current session to an Excel file. |
|
Dumps LArray, Axis and Group objects from the current session to an HDF file. |
|
Dumps LArray, Axis and Group objects from the current session to a file using pickle. |
Editor¶
|
Opens a new viewer window. |
|
Opens a new editor window. |
|
Opens a new comparator window, comparing arrays or sessions. |
Random¶
|
Return random integers from low (inclusive) to high (exclusive). |
|
Draw random samples from a normal (Gaussian) distribution. |
|
Draw samples from a uniform distribution. |
|
Randomly permute a sequence along an axis, or return a permuted range. |
|
Generates a random sample from given choices |
Indices and tables¶
Appendix¶
Change log¶
Version 0.31¶
In development.
Syntax changes¶
renamed
LArray.old_method_name()
toLArray.new_method_name()
(closes issue 1).renamed
old_argument_name
argument ofLArray.method_name()
tonew_argument_name
.
Backward incompatible changes¶
other backward incompatible changes
New features¶
added the
ExcelReport
class allowing to generate multiple graphs in an Excel file at once (closes issue 676).
Miscellaneous improvements¶
improved something.
Version 0.30¶
Released on 2019-06-27.
Syntax changes¶
stack()
axis
argument was renamed toaxes
to reflect the fact that the function can now stack along multiple axes at once (see below).to accommodate for the “simpler pattern language” now supported for those functions, using a regular expression in
Axis.matching()
orGroup.matching()
now requires passing the pattern as an explicitregex
keyword argument instead of just the first argument of those methods. For examplemy_axis.matching('test.*')
becomesmy_axis.matching(regex='test.*')
.LArray.as_table()
is deprecated because it duplicated functionality found inLArray.dump()
. Please only useLArray.dump()
from now on.renamed
a_min
anda_max
arguments ofLArray.clip()
tominval
andmaxval
respectively and made them optional (closes issue 747).
Backward incompatible changes¶
modified the behavior of the
pattern
argument ofSession.filter()
to actually support patterns instead of only checking if the object names start with the pattern. Special characters include?
for matching any single character and*
for matching any number of characters. Closes issue 703.Warning
If you were using Session.filter, you must add a
*
to your pattern to keep your code working. For example,my_session.filter('test')
must be changed tomy_session.filter('test*')
.LArray.equals()
now returns True for arrays even when axes are in a different order or some axes are missing on either side (but the data is constant over that axis on the other side). Closes issue 237.Warning
If you were using
LArray.equals()
and want to keep the old, stricter, behavior, you must addcheck_axes=True
.
New features¶
added
set_options()
andget_options()
functions to respectively set and get options for larray. Available options currently includedisplay_precision
for controlling the number of decimal digits used when showing floating point numbers,display_maxlines
to control the maximum number of lines to use when displaying an array, etc.set_options()
can used either like a normal function to set the options globally or within awith
block to set them only temporarily. Closes issue 274.implemented
read_stata()
andLArray.to_stata()
to read arrays from and write arrays to Stata .dta files.implemented
LArray.isin()
method to check whether each value of an array is contained in a list (or array) of values.implemented
LArray.unique()
method to compute unique values (or sub-arrays) for an array, optionally along axes.implemented
LArray.apply()
method to apply a python function to all values of an array or to all sub-arrays along some axes of an array and return the result. This is an extremely versatile method as it can be used both with aggregating functions or element-wise functions.implemented
LArray.apply_map()
method to apply a transformation mapping to array elements. For example, this can be used to transform some numeric codes to labels.implemented
LArray.reverse()
method to reverse one or several axes of an array (closes issue 631).implemented
LArray.roll()
method to roll the cells of an array n-times to the right along an axis. This is similar toLArray.shift()
, except that cells which are pushed “outside of the axis” are reintroduced on the opposite side of the axis instead of being dropped.implemented
Axis.apply()
method to transform an axis labels by a function and return a new Axis.added
Session.update()
method to add and modify items from an existing session by passing either another session or a dict-like object or an iterable object with (key, value) pairs (closes issue 754).implemented
AxisCollection.rename()
to rename axes of an AxisCollection, independently of any array.implemented
AxisCollection.set_labels()
(closes issue 782).implemented
wrap_elementwise_array_func()
function to make a function defined in another library work with LArray arguments instead of with numpy arrays.implemented
LArray.keys()
,LArray.values()
andLArray.items()
methods to respectively loop on an array labels, values or (key, value) pairs.implemented
zip_array_values()
andzip_array_items()
to loop respectively on several arrays values or (key, value) pairs.implemented
AxisCollection.iter_labels()
to iterate over all (possible combinations of) labels of the axes of the collection.
Miscellaneous improvements¶
improved speed of
read_hdf()
function when reading a stored LArray object dumped with the current and future version of larray. To get benefit of the speedup of reading arrays dumped with older versions of larray, please read and re-dump them. Closes issue 563.allowed to not specify the axes in
LArray.set_labels()
(closes issue 634):>>> a = ndtest('nat=BE,FO;sex=M,F') >>> a nat\sex M F BE 0 1 FO 2 3 >>> a.set_labels({'M': 'Men', 'BE': 'Belgian'}) nat\sex Men F Belgian 0 1 FO 2 3
LArray.set_labels()
can now take functions to transform axes labels (closes issue 536).>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> arr.set_labels('a', str.upper) a\b b0 b1 A0 0 1 A1 2 3
implemented the same “simpler pattern language” in
Axis.matching()
andGroup.matching()
than inSession.filter()
, in addition to regular expressions (which now require using theregexp
argument).py:obj:stack() can now stack along several axes at once (closes issue 56).
>>> country = Axis('country=BE,FR,DE') >>> gender = Axis('gender=M,F') >>> stack({('BE', 'M'): 0, ... ('BE', 'F'): 1, ... ('FR', 'M'): 2, ... ('FR', 'F'): 3, ... ('DE', 'M'): 4, ... ('DE', 'F'): 5}, ... (country, gender)) country\gender M F BE 0 1 FR 2 3 DE 4 5
py:obj:stack() using a dictionary as elements can now use a simple axis name instead of requiring a full axis object. This will print a warning on Python < 3.7 though because the ordering of labels is not guaranteed in that case. Closes issue 755 and issue 581.
py:obj:stack() using keyword arguments can now use a simple axis name instead of requiring a full axis object, even on Python < 3.6. This will print a warning though because the ordering of labels is not guaranteed in that case.
added password argument to
Workbook.save()
to allow protecting Excel files with a password.added option
exact
tojoin
argument ofAxis.align()
andLArray.align()
methods. Instead of aligning, passingjoin='exact'
to thealign
method will raise an error when axes are not equal. Closes issue 338.made
Axis.by()
andGroup.by()
return a list of named groups instead of anonymous groups. By default, group names are defined as<start>:<end>
. This can be changed via the newtemplate
argument:>>> age = Axis('age=0..6') >>> age Axis([0, 1, 2, 3, 4, 5, 6], 'age') >>> age.by(3) (age.i[0:3] >> '0:2', age.i[3:6] >> '3:5', age.i[6:7] >> '6') >>> age.by(3, step=2) (age.i[0:3] >> '0:2', age.i[2:5] >> '2:4', age.i[4:7] >> '4:6', age.i[6:7] >> '6') >>> age.by(3, template='{start}-{end}') (age.i[0:3] >> '0-2', age.i[3:6] >> '3-5', age.i[6:7] >> '6')
Closes issue 669.
allowed to specify an axis by its position when selecting a subset of an array using the string notation:
>>> pop_mouv = ndtest('geo_from=BE,FR,UK;geo_to=BE,FR,UK') >>> pop_mouv geo_from\geo_to BE FR UK BE 0 1 2 FR 3 4 5 UK 6 7 8 >>> pop_mouv['0[BE, UK]'] # equivalent to pop_mouv[pop_mouv.geo_from['BE,UK']] geo_from\geo_to BE FR UK BE 0 1 2 UK 6 7 8 >>> pop_mouv['1.i[0, 2]'] # equivalent to pop_mouv[pop_mouv.geo_to.i[0, 2]] geo_from\geo_to BE UK BE 0 2 FR 3 5 UK 6 8
Closes issue 671.
added documentation and examples for
where()
,maximum()
andminimum()
functions (closes issue 700)updated the
Working With Sessions
section of the tutorial (closes issue 568).added dtype argument to LArray to set the type of the array explicitly instead of relying on auto-detection.
added dtype argument to stack to set the type of the resulting array explicitly instead of relying on auto-detection.
allowed to pass a single axis or group as
axes_to_reindex
argument of theLArray.reindex()
method (closes issue 712).LArray.dump()
gained a few extra arguments to further customize output : - axes_names : to specify whether or not the output should contain the axes names (and which) - maxlines and edgeitems : to dump only the start and end of large arrays - light : to output axes labels only when they change instead of repeating them on each line - na_repr : to specify how to represent N/A (NaN) valuessubstantially improved performance of creating, iterating, and doing a few other operations over larray objects. This solves a few pathological cases of slow operations, especially those involving many small-ish arrays but sadly the overall performance improvement is negligible over most of the real-world models using larray that we tested these changes on.
Fixes¶
fixed dumping to Excel arrays of “object” dtype containing NaN values using numpy float types (fixes the infamous 65535 bug).
fixed
LArray.divnot0()
being slow when the divisor has many axes and many zeros (closes issue 705).fixed maximum length of sheet names (31 characters instead of 30 characters) when adding a new sheet to an Excel Workbook (closes issue 713).
fixed missing documentation of many functions in Utility Functions section of the API Reference (closes issue 698).
fixed arithmetic operations between two sessions returning a nan value for each axis and group (closes issue 725).
fixed dumping sessions with metadata in HDF format (closes issue 702).
fixed minimum version of pandas to install. The minimum version is now 0.20.0.
fixed from_frame for dataframes with non string index names.
fixed creating an LSet from an IGroup with a (single) scalar key
>>> a = Axis('a=a0,a1,a2') >>> a.i[1].set() a['a1'].set()
Version 0.29¶
Released on 2018-09-07.
deprecated
title
attribute ofLArray
objects andtitle
argument of array creation functions. A title is now considered as a metadata and must be added as:>>> # add title at array creation >>> arr = ndtest((3, 3), meta=[('title', 'array for testing')])
>>> # or after array creation >>> arr = ndtest((3, 3)) >>> arr.meta.title = 'array for testing'
See below for more information about metadata handling.
renamed
LArray.drop_labels()
toLArray.ignore_labels()
to avoid confusion with the newLArray.drop()
method (closes issue 672).renamed
Session.array_equals()
toSession.element_equals()
because this method now also compares axes and groups in addition to arrays.renamed
Sheet.load()
andRange.load()
nb_index
argument tonb_axes
to be consistent with all other input functions (read_*).Sheet
andRange
are the objects one gets when taking subsets of the excelWorkbook
objects obtained viaopen_excel()
(closes issue 648).deprecated the
element_equal()
function in favor of theLArray.eq()
method (closes issue 630) to be consistent with other future methods for operations between two arrays.renamed
nan_equals
argument ofLArray.equals()
andLArray.eq()
methods tonans_equal
because it is grammatically more correct and is explained more naturally as “whether two nans should be considered equal”.LArray.insert()
pos
andaxis
arguments are deprecated because those were only useful for very specific cases and those can easily be rewritten by using an indices group (axis.i[pos]
) for thebefore
argument instead (closes issue 652).
allowed arrays to have metadata (e.g. title, description, authors, …).
Metadata can be added when creating arrays:
>>> # for Python <= 3.5 >>> arr = ndtest((3, 3), meta=[('title', 'array for testing'), ('author', 'John Smith')])
>>> # for Python >= 3.6 >>> arr = ndtest((3, 3), meta=Metadata(title='array for testing', author='John Smith'))
To access all existing metadata, use
array.meta
, for example:>>> arr.meta title: array for testing author: John Smith
To access some specific existing metadata, use
array.meta.<name>
, for example:>>> arr.meta.author 'John Smith'
Updating some existing metadata, or creating new metadata (the metadata is added if there was no metadata using that name) should be done using
array.meta.<name> = <value>
. For example:>>> arr.meta.city = 'London'
To remove some metadata, use
del array.meta.<name>
, for example:>>> del arr.meta.city
Note
Currently, only the HDF (.h5) file format supports saving and loading array metadata.
Metadata is not kept when actions or methods are applied on an array except for operations modifying the object in-place, such as pop[age < 10] = 0, and when the method copy() is called. Do not add metadata to an array if you know you will apply actions or methods on it before dumping it.
allowed sessions to have metadata. Session metadata is created and accessed using the same syntax than for arrays (
session.meta.<name>
), for example to add metadata to a session at creation:>>> # Python <= 3.5 >>> s = Session([('arr1', ndtest(2)), ('arr2', ndtest(3)], meta=[('title', 'my title'), ('author', 'John Smith')])
>>> # Python 3.6+ >>> s = Session(arr1=ndtest(2), arr2=ndtest(3), meta=Metadata(title='my title', author='John Smith'))
Note
Contrary to array metadata, saving and loading session metadata is supported for all current session file formats: Excel, CSV and HDF (.h5)
Metadata is not kept when actions or methods are applied on a session except for operations modifying a specific array, such as: s[‘arr1’] = 0. Do not add metadata to a session if you know you will apply actions or methods on it before dumping it.
Closes issue 640.
implemented
LArray.drop()
to return an array without some labels or indices along an axis (closes issue 506).>>> arr1 = ndtest((2, 4)) >>> arr1 a\b b0 b1 b2 b3 a0 0 1 2 3 a1 4 5 6 7 >>> a, b = arr1.axes
Dropping a single label
>>> arr1.drop('b1') a\b b0 b2 b3 a0 0 2 3 a1 4 6 7
Dropping multiple labels
>>> # arr1.drop('b1,b3') >>> arr1.drop(['b1', 'b3']) a\b b0 b2 a0 0 2 a1 4 6
Dropping a slice
>>> # arr1.drop('b1:b3') >>> arr1.drop(b['b1':'b3']) a\b b0 a0 0 a1 4
Dropping labels by position requires to specify the axis
>>> # arr1.drop('b.i[1]') >>> arr1.drop(b.i[1]) a\b b0 b2 b3 a0 0 2 3 a1 4 6 7
added new module to create arrays with values generated randomly following a few different distributions, or shuffle an existing array along an axis:
>>> from larray.random import *
Generate integers between two bounds (0 and 10 in this example)
>>> randint(0, 10, axes='a=a0..a2') a a0 a1 a2 3 6 2
Generate values following a uniform distribution
>>> uniform(axes='a=a0..a2') a a0 a1 a2 0.33293756929238394 0.5331412592583252 0.6748786766763107
Generate values following a normal distribution (\(\mu\) = 1 and \(\sigma\) = 2 in this example)
>>> normal(1, scale=2, axes='a=a0..a2') a a0 a1 a2 -0.9216651561025018 5.119734598931103 4.4467876992838935
Randomly shuffle an existing array along one axis
>>> arr = ndtest((3, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 6 7 8 >>> permutation(arr, axis='b') a\b b1 b2 b0 a0 1 2 0 a1 4 5 3 a2 7 8 6
Generate values by randomly choosing between specified values (5, 10 and 15 in this example), potentially with a specified probability for each value (respectively a 30%, 50%, 20% probability of occurring in this example).
>>> choice([5, 10, 15], p=[0.3, 0.5, 0.2], axes='a=a0,a1;b=b0..b2') a\b b0 b1 b2 a0 15 10 10 a1 10 5 10
Same as above with labels and probabilities given as a one dimensional LArray
>>> proba = LArray([0.3, 0.5, 0.2], Axis([5, 10, 15], 'outcome')) >>> proba outcome 5 10 15 0.3 0.5 0.2 >>> choice(p=proba, axes='a=a0,a1;b=b0..b2') a\b b0 b1 b2 a0 10 15 5 a1 10 5 10
made a few useful constants accessible directly from the larray module:
nan
,inf
,pi
,e
andeuler_gamma
. Like for any Python functionality, you can choose how to import and use them. For example, forpi
:>>> from larray import * >>> pi 3.141592653589793 OR >>> from larray import pi >>> pi 3.141592653589793 OR >>> import larray as la >>> la.pi 3.141592653589793
added
Group.equals()
method which compares group names, associated axis names and labels between two groups:>>> a = Axis('a=a0..a3') >>> a02 = a['a0:a2'] >> 'group_a' >>> # different group name >>> a02.equals(a['a0:a2']) False >>> # different axis name >>> other_axis = a.rename('other_name') >>> a02.equals(other_axis['a0:a2'] >> 'group_a') False >>> # different labels >>> a02.equals(a['a1:a3'] >> 'group_a') False
completely rewritten the ‘Load And Dump Arrays, Sessions, Axes And Groups’ section of the tutorial (closes issue 645)
saving or loading a session from a file now includes Axis and Group objects in addition to arrays (closes issue 578).
Create a session containing axes, groups and arrays
>>> a, b = Axis("a=a0..a2"), Axis("b=b0..b2") >>> a01 = a['a0,a1'] >> 'a01' >>> arr1, arr2 = ndtest((a, b)), ndtest(a) >>> s = Session([('a', a), ('b', b), ('a01', a01), ('arr1', arr1), ('arr2', arr2)])
Saving a session will save axes, groups and arrays
>>> s.save('session.h5')
Loading a session will load axes, groups and arrays
>>> s2 = s.load('session.h5') >>> s2 Session(arr1, arr2, a, b, a01)
Note
All axes and groups of a session are stored in the same CSV file/Excel sheet/HDF group named respectively
__axes__
and__groups__
.vastly improved indexing using arrays (of labels, indices or booleans). Many advanced cases did not work, including when combining several indexing arrays, or when (one of) the indexing array(s) had an axis present in the array.
First let’s create some test axes
>>> a, b, c = ndtest((2, 3, 2)).axes
Then create a test array.
>>> arr = ndtest((a, b)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
If the key array has an axis not already present in arr (e.g. c), the target axis (a) is replaced by the extra axis (c). This already worked previously.
>>> key = LArray(['a1', 'a0'], c) >>> key c c0 c1 a1 a0 >>> arr[key] c\b b0 b1 b2 c0 3 4 5 c1 0 1 2
If the key array has the target axis, the axis stays the same, but the data is reordered (this also worked previously):
>>> key = LArray(['b1', 'b0', 'b2'], b) >>> key b b0 b1 b2 b1 b0 b2 >>> arr[key] a\b b0 b1 b2 a0 1 0 2 a1 4 3 5
From here on, the examples shown did not work previously…
Now, if the key contains another axis present in the array (b) which is not the target axis (a), the target axis completely disappears (both axes are replaced by the key axis):
>>> key = LArray(['a0', 'a1', 'a0'], b) >>> key b b0 b1 b2 a0 a1 a0 >>> arr[key] b b0 b1 b2 0 4 2
If the key has both the target axis (a) and another existing axis (b)
>>> key a\b b0 b1 b2 a0 a0 a1 a0 a1 a1 a0 a1 >>> arr[key] a\b b0 b1 b2 a0 0 4 2 a1 3 1 5
If the key has both another existing axis (a) and an extra axis (c)
>>> key a\c c0 c1 a0 b0 b1 a1 b2 b0 >>> arr[key] a\c c0 c1 a0 0 1 a1 5 3
It also works if the key has the target axis (a), another existing axis (b) and an extra axis (c), but this is not shown for brevity.
updated
Session.summary()
so as to display all kinds of objects and allowed to pass a function returning a string representation of an object instead of passing a pre-defined string template (closes issue 608):>>> axis1 = Axis("a=a0..a2") >>> group1 = axis1['a0,a1'] >> 'a01' >>> arr1 = ndtest((2, 2), title='array 1', dtype=np.int64) >>> arr2 = ndtest(4, title='array 2', dtype=np.int64) >>> arr3 = ndtest((3, 2), title='array 3', dtype=np.int64) >>> s = Session([('axis1', axis1), ('group1', group1), ('arr1', arr1), ('arr2', arr2), ('arr3', arr3)])
Using the default template
>>> print(s.summary()) axis1: a ['a0' 'a1' 'a2'] (3) group1: a['a0', 'a1'] >> a01 (2) arr1: a, b (2 x 2) [int64] array 1 arr2: a (4) [int64] array 2 arr3: a, b (3 x 2) [int64] array 3
Using a specific template
>>> def print_array(key, array): ... axes_names = ', '.join(array.axes.display_names) ... shape = ' x '.join(str(i) for i in array.shape) ... return "{} -> {} ({})\\n title = {}\\n dtype = {}".format(key, axes_names, shape, ... array.title, array.dtype) >>> template = {Axis: "{key} -> {name} [{labels}] ({length})", ... Group: "{key} -> {name}: {axis_name} {labels} ({length})", ... LArray: print_array} >>> print(s.summary(template)) axis1 -> a ['a0' 'a1' 'a2'] (3) group1 -> a01: a ['a0', 'a1'] (2) arr1 -> a, b (2 x 2) title = array 1 dtype = int64 arr2 -> a (4) title = array 2 dtype = int64 arr3 -> a, b (3 x 2) title = array 3 dtype = int64
methods
Session.equals()
andSession.element_equals()
now also compare axes and groups in addition to arrays (closes issue 610):>>> a = Axis('a=a0..a2') >>> a01 = a['a0,a1'] >> 'a01' >>> s1 = Session([('a', a), ('a01', a01), ('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('a', a), ('a01', a01), ('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
Identical sessions
>>> s1.element_equals(s2) name a a01 arr1 arr2 True True True True
Different value(s) between two arrays
>>> s2.arr1['a1'] = 0 >>> s1.element_equals(s2) name a a01 arr1 arr2 True True False True
Different label(s)
>>> s2.arr2 = ndtest("b=b0,b1; a=a0,a1") >>> s2.a = Axis('a=a0,a1') >>> s1.element_equals(s2) name a a01 arr1 arr2 False True False False
Extra/missing objects
>>> s2.arr3 = ndtest((3, 3)) >>> del s2.a >>> s1.element_equals(s2) name a a01 arr1 arr2 arr3 False True False False False
added arguments
wide
andvalue_name
to methodsLArray.as_table()
andLArray.dump()
like inLArray.to_excel()
andLArray.to_csv()
(closes issue 653).the
from_series()
function supports Pandas series with a MultiIndex (closes issue 465)the
stack()
function supports any array-like object instead of only LArray objects.>>> stack(a0=[1, 2, 3], a1=[4, 5, 6], axis='a') {0}*\a a0 a1 0 1 4 1 2 5 2 3 6
made some operations on Excel Workbooks a bit faster by telling Excel to avoid updating the screen when the Excel instance is not visible anyway. This affects all workbooks opened via
open_excel()
as well asread_excel()
andLArray.to_excel()
when using the defaultxlwings
engine.made the documentation link in Windows start menu version-specific (instead of always pointing to the latest release) so that users do not inadvertently use the latest release syntax when using an older version of larray (closes issue 142).
added menu bar with undo/redo when editing single arrays (as a byproduct of issue 133).
fixed Copy(to Excel)/Paste/Plot in the editor not working for 1D and 2D arrays (closes issue 140).
fixed Excel add-ins not loaded when opening an Excel Workbook by calling the
LArray.to_excel()
method with no path or via “Copy to Excel (CTRL+E)” in the editor (closes issue 154).made LArray support Pandas versions >= 0.21 (closes issue 569)
fixed current active Excel Workbook being closed when calling the
LArray.to_excel()
method on an array with-1
asfilepath
argument (closes issue 473).fixed
LArray.split_axes()
when splitting a single axis and using the names argument (e.g.arr.split_axes('bd', names=('b', 'd'))
).fixed splitting an anonymous axis without specifying the names argument.
>>> combined = ndtest('a0_b0,a0_b1,a0_b2,a1_b0,a1_b1,a1_b2') >>> combined {0} a0_b0 a0_b1 a0_b2 a1_b0 a1_b1 a1_b2 0 1 2 3 4 5 >>> combined.split_axes(0) {0}\{1} b0 b1 b2 a0 0 1 2 a1 3 4 5
fixed
LArray.combine_axes()
withwildcard=True
.fixed taking a subset of an array by giving an index along a specific axis using a string (strings like
"axisname.i[pos]"
).fixed the editor not working with Python 2 or recent Qt4 versions.
Version 0.28¶
Released on 2018-03-15.
changed behavior of operators session1 == session2 and session1 != session2: returns a session of boolean arrays (closes issue 516):
>>> s1 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> (s1 == s2).arr1 a a0 a1 True True >>> s2.arr1['a1'] = 0 >>> (s1 == s2).arr1 a a0 a1 True False >>> (s1 != s2).arr1 a a0 a1 False True
made it possible to run the tutorial online (as a Jupyter notebook) by clicking on the
launch|binder
badge on top of the tutorial web page (closes issue 73)added methods array_equals and equals to Session object to compare arrays from two sessions. The method array_equals return a boolean value for each array while the method equals returns a unique boolean value (True if all arrays of both sessions are equal, False otherwise):
>>> s1 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s1.array_equals(s2) name arr1 arr2 True True >>> s1.equals(s2) True
Different value(s)
>>> s2.arr1['a1'] = 0 >>> s1.array_equals(s2) name arr1 arr2 False True >>> s1.equals(s2) False
Different label(s)
>>> from larray import ndrange >>> s2.arr2 = ndrange("b=b0,b1; a=a0,a1") >>> s1.array_equals(s2) name arr1 arr2 False False >>> s1.equals(s2) False
Extra/missing array(s)
>>> s2.arr3 = ndtest((3, 3)) >>> s1.array_equals(s2) name arr1 arr2 arr3 False False False >>> s1.equals(s2) False
Closes issue 517.
added method equals to LArray object to compare two arrays:
>>> arr1 = ndtest((2, 3)) >>> arr1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr2 = arr1.copy() >>> arr1.equals(arr2) True >>> arr2['b1'] += 1 >>> arr1.equals(arr2) False >>> arr3 = arr1.set_labels('a', ['x0', 'x1']) >>> arr1.equals(arr3) False
Arrays with nan values
>>> arr1 = ndtest((2, 3), dtype=float) >>> arr1['a1', 'b1'] = nan >>> arr1 a\b b0 b1 b2 a0 0.0 1.0 2.0 a1 3.0 nan 5.0 >>> arr2 = arr1.copy() >>> # By default, an array containing nan values is never equal to another array, >>> # even if that other array also contains nan values at the same positions. >>> # The reason is that a nan value is different from *anything*, including itself. >>> arr1.equals(arr2) False >>> # set flag nan_equal to True to override this behavior >>> arr1.equals(arr2, nan_equal=True) True
This method also includes the arguments rtol (relative tolerance) and atol (absolute tolerance) allowing to test the equality between two arrays within a given relative or absolute tolerance:
>>> arr1 = LArray([6., 8.], "a=a0,a1") >>> arr1 a a0 a1 6.0 8.0 >>> arr2 = LArray([5.999, 8.001], "a=a0,a1") >>> arr2 a a0 a1 5.999 8.001 >>> arr1.equals(arr2) False >>> # equals returns True if abs(array1 - array2) <= (atol + rtol * abs(array2)) >>> arr1.equals(arr2, atol=0.01) True >>> arr1.equals(arr2, rtol=0.01) True
added Load from Script in the File menu of the editor allowing to load commands from an existing Python file (closes issue 96).
added Edit menu allowing to undo and redo changes of array values by editing cells and removed Apply and Discard buttons. Changes are now kept when switching from an array to another instead of losing them as previously (closes issue 32).
allowed to provide an absolute or relative tolerance value when comparing arrays through the compare function (closes issue 131).
made the editor able to detect and display plot objects stored in tuple, list or arrays. For example, arrays of plot objects are returned when using subplots=True option in calls of plot method:
>>> a = ndtest('sex=M,F; nat=BE,FO; year=2000..2017') >>> # display 4 plots vertically placed (one plot for each pair (sex, nationality)) >>> a.plot(subplots=True) >>> # display 4 plots ordered in a 2 x 2 grid >>> a.plot(subplots=True, layout=(2, 2))
Closes issue 135.
functions local_arrays, global_arrays and arrays returns a session excluding arrays starting by an underscore by default. To include them, set the flag include_private to True (closes issue 513):
>>> global_arr1 = ndtest((2, 2)) >>> _global_arr2 = ndtest((3, 3)) >>> def foo(): ... local_arr1 = ndtest(2) ... _local_arr2 = ndtest(3) ... ... # exclude arrays starting with '_' by default ... s = arrays() ... print(s.names) ... ... # use flag 'include_private' to include arrays starting with '_' ... s = arrays(include_private=True) ... print(s.names) >>> foo() ['global_arr1', 'local_arr1'] ['_global_arr2', '_local_arr2', 'global_arr1', 'local_arr1']
implemented sessions binary operations with non sessions objects (closes issue 514 and issue 515):
>>> s = Session(arr1=ndtest((2, 2)), arr2=ndtest((3, 3))) >>> s.arr1 a\b b0 b1 a0 0 1 a1 2 3 >>> s.arr2 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 6 7 8
Add a scalar to all arrays
>>> # equivalent to s2 = 3 + s >>> s2 = s + 3 >>> s2.arr1 a\b b0 b1 a0 3 4 a1 5 6 >>> s2.arr2 a\b b0 b1 b2 a0 3 4 5 a1 6 7 8 a2 9 10 11
Apply binary operations between two sessions
>>> sdiff = (s2 - s) / s >>> sdiff.arr1 a\b b0 b1 a0 inf 3.0 a1 1.5 1.0 >>> sdiff.arr2 a\b b0 b1 b2 a0 inf 3.0 1.5 a1 1.0 0.75 0.6 a2 0.5 0.43 0.375
added possibility to call the method reindex with a group (closes issue 531):
>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> b = Axis("b=b2..b0") >>> arr.reindex('b', b['b1':]) a\b b1 b0 a0 1 0 a1 3 2
added possibility to call the methods diff and growth_rate with a group (closes issue 532):
>>> data = [[2, 4, 5, 4, 6], [4, 6, 3, 6, 9]] >>> a = LArray(data, "sex=M,F; year=2016..2020") >>> a sex\year 2016 2017 2018 2019 2020 M 2 4 5 4 6 F 4 6 3 6 9 >>> a.diff(a.year[2017:]) sex\year 2018 2019 2020 M 1 -1 2 F -3 3 3 >>> a.growth_rate(a.year[2017:]) sex\year 2018 2019 2020 M 0.25 -0.2 0.5 F -0.5 1.0 0.5
function ndrange has been deprecated in favor of sequence or ndtest. Also, an Axis or a list/tuple/collection of axes can be passed to the ndtest function (closes issue 534):
>>> ndtest("nat=BE,FO;sex=M,F") nat\sex M F BE 0 1 FO 2 3
allowed to pass a group for argument axis of stack function (closes issue 535):
>>> b = Axis('b=b0..b2') >>> stack(b0=ndtest(2), b1=ndtest(2), axis=b[:'b1']) a\b b0 b1 a0 0 0 a1 1 1
renamed argument nb_index of read_csv, read_excel, read_sas, from_lists and from_string functions as nb_axes. The relation between nb_index and nb_axes is given by nb_axes = nb_index + 1:
For a given file ‘arr.csv’ with content
a,b\c,c0,c1 a0,b0,0,1 a0,b1,2,3 a1,b0,4,5 a1,b1,6,7
previous code to read this array such as :
>>> # deprecated >>> arr = read_csv('arr.csv', nb_index=2)
must be updated as follow :
>>> arr = read_csv('arr.csv', nb_axes=3)
Closes issue 548.
deprecated nan_equal function in favor of element_equal function. The element_equal function has the same optional arguments as the LArray.equals method but compares two arrays element-wise and returns an array of booleans:
>>> arr1 = LArray([6., np.nan, 8.], "a=a0..a2") >>> arr1 a a0 a1 a2 6.0 nan 8.0 >>> arr2 = LArray([5.999, np.nan, 8.001], "a=a0..a2") >>> arr2 a a0 a1 a2 5.999 nan 8.001 >>> element_equal(arr1, arr2) a a0 a1 a2 False False False >>> element_equal(arr1, arr2, nan_equals=True) a a0 a1 a2 False True False >>> element_equal(arr1, arr2, atol=0.01, nan_equals=True) a a0 a1 a2 True True True >>> element_equal(arr1, arr2, rtol=0.01, nan_equals=True) a a0 a1 a2 True True True
Closes issue 593.
renamed argument transpose by wide in to_csv method.
added argument wide in to_excel method. When argument wide is set to False, the array is exported in “narrow” format, i.e. one column per axis plus one value column:
>>> arr = ndtest((2, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
Default behavior (wide=True):
>>> arr.to_excel('my_file.xlsx') a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
With wide=False:
>>> arr.to_excel('my_file.xlsx', wide=False) a b value a0 b0 0 a0 b1 1 a0 b2 2 a1 b0 3 a1 b1 4 a1 b2 5
Argument transpose has a different purpose than wide and is mainly useful to allow multiple axes as header when exporting arrays with more than 2 dimensions. Closes issue 575 and issue 371.
added argument wide to read_csv and read_excel functions. If False, the array to be loaded is assumed to be stored in “narrow” format:
>>> # assuming the array was saved using command: arr.to_excel('my_file.xlsx', wide=False) >>> read_excel('my_file.xlsx', wide=False) a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
Closes issue 574.
added argument name to to_series method allowing to set a name to the Pandas Series returned by the method.
added argument value_name to to_csv and to_excel allowing to change the default name (‘value’) to the column containg the values when the argument wide is set to False:
>>> arr.to_csv('my_file.csv', wide=False, value_name='data') a,b,data a0,b0,0 a0,b1,1 a0,b2,2 a1,b0,3 a1,b1,4 a1,b2,5
Closes issue 549.
renamed argument sheetname of read_excel function as sheet (closes issue 587).
Renamed sheet_name of LArray.to_excel to sheet since it can also be an index (closes issue 580).
allowed to create axes with zero padded string labels (closes issue 533):
>>> Axis('zero_padding=01,02,03,10,11,12') Axis(['01', '02', '03', '10', '11', '12'], 'zero_padding')
added a dropdown menu containing recently used files in dialog boxes of Save Command History To Script and Load from Script from File menu.
fixed passing a scalar group from an external axis to get a subset of an array (closes issue 178):
>>> arr = ndtest((3, 2)) >>> arr['a1'] b b0 b1 2 3 >>> alt_a = Axis("alt_a=a1..a2") >>> arr[alt_a['a1']] b b0 b1 2 3 >>> arr[alt_a.i[0]] b b0 b1 2 3
fixed subscript a string LGroup key (closes issue 437):
>>> axis = Axis("a=a0,a1") >>> axis['a0'][0] 'a'
fixed Axis.union, Axis.intersection and Axis.difference when passed value is a single string (closes issue 489):
>>> a = Axis('a=a0..a2') >>> a.union('a1') Axis(['a0', 'a1', 'a2'], 'a') >>> a.union('a3') Axis(['a0', 'a1', 'a2', 'a3'], 'a') >>> a.union('a1..a3') Axis(['a0', 'a1', 'a2', 'a3'], 'a') >>> a.intersection('a1..a3') Axis(['a1', 'a2'], 'a') >>> a.difference('a1..a3') Axis(['a0'], 'a')
fixed to_excel applied on >= 2D arrays using transpose=True (closes issue 579)
>>> arr = ndtest((2, 3)) >>> arr.to_excel('my_file.xlsx', transpose=True) b\a a0 a1 b0 0 3 b1 1 4 b2 2 5
fixed aggregation on arrays containing zero padded string labels (closes issue 522):
>>> arr = ndtest('zero_padding=01,02,03,10,11,12') >>> arr zero_padding 01 02 03 10 11 12 0 1 2 3 4 5 >>> arr.sum('01,02,03 >> 01_03; 10') zero_padding 01_03 10 3 3
Version 0.27¶
Released on 2017-11-30.
renamed Axis.translate to Axis.index (closes issue 479).
deprecated reverse argument of sort_values and sort_axes methods in favor of ascending argument (defaults to True). Closes issue 540.
labels are checked during array subset assignment (closes issue 269):
>>> arr = ndtest(4) >>> arr a a0 a1 a2 a3 0 1 2 3 >>> arr['a0,a1'] = arr['a2,a3'] ValueError: incompatible axes: Axis(['a0', 'a1'], 'a') vs Axis(['a2', 'a3'], 'a')
previous behavior can be recovered through drop_labels or by changing labels via set_labels or set_axes:
>>> arr['a0,a1'] = arr['a2,a3'].drop_labels('a') >>> arr['a0,a1'] = arr['a2,a3'].set_labels('a', {'a2': 'a0', 'a3': 'a1'})
from_frame parse_header argument defaults to False instead of True.
implemented Axis.insert and LArray.insert to add values at a given position of an axis (closes issue 54).
>>> arr1 = ndtest((2, 3)) >>> arr1 a\\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr1.insert(42, before='b1', label='b0.5') a\\b b0 b0.5 b1 b2 a0 0 42 1 2 a1 3 42 4 5
insert an array
>>> arr2 = ndtest(2) >>> arr2 a a0 a1 0 1 >>> arr1.insert(arr2, after='b0', label='b0.5') a\\b b0 b0.5 b1 b2 a0 0 0 1 2 a1 3 1 4 5
insert an array which already has the axis
>>> arr3 = ndrange('a=a0,a1;b=b0.1,b0.2') + 42 >>> arr3 a\\b b0.1 b0.2 a0 42 43 a1 44 45 >>> arr1.insert(arr3, before='b1') a\\b b0 b0.1 b0.2 b1 b2 a0 0 42 43 1 2 a1 3 44 45 4 5
added new items in the Help menu of the editor:
Report Issue…: to report an issue on the Github project website.
Users Discussion…: redirect to the LArray Users Google Group (you need to be registered to participate).
New Releases And Announces Mailing List…: redirect to the LArray Announce mailing list.
About: give information about the editor and the versions of packages currently installed on your computer (closes issue 88).
added Save Command History To Script in the File menu of the editor allowing to save executed commands in a new or existing Python file.
added possibility to show only rows with differences when comparing arrays or sessions through the compare function in the editor (closes issue 102).
added ascending argument to methods indicesofsorted and labelsofsorted. Values are sorted in ascending order by default. Set to False to sort values in descending order:
>>> arr = LArray([[1, 5], [3, 2], [0, 4]], "nat=BE,FR,IT; sex=M,F") >>> arr nat\sex M F BE 1 5 FR 3 2 IT 0 4 >>> arr.indicesofsorted("nat", ascending=False) nat\sex M F 0 1 0 1 0 2 2 2 1 >>> arr.labelsofsorted("nat", ascending=False) nat\sex M F 0 FR BE 1 BE IT 2 IT FR
Closes issue 490.
allowed to sort values of an array along an axis (closes issue 225):
>>> a = LArray([[10, 2, 4], [3, 7, 1]], "sex=M,F; nat=EU,FO,BE") >>> a sex\nat EU FO BE M 10 2 4 F 3 7 1 >>> a.sort_values(axis='sex') sex*\nat EU FO BE 0 3 2 1 1 10 7 4 >>> a.sort_values(axis='nat') sex\nat* 0 1 2 M 2 4 10 F 1 3 7
method LArray.sort_values can be called without argument (closes issue 478):
>>> arr = LArray([0, 1, 6, 3, -1], "a=a0..a4") >>> arr a a0 a1 a2 a3 a4 0 1 6 3 -1 >>> arr.sort_values() a a4 a0 a1 a3 a2 -1 0 1 3 6
If the array has more than one dimension, axes are combined together:
>>> a = LArray([[10, 2, 4], [3, 7, 1]], "sex=M,F; nat=EU,FO,BE") >>> a sex\nat EU FO BE M 10 2 4 F 3 7 1 >>> a.sort_values() sex_nat F_BE M_FO F_EU M_BE F_FO M_EU 1 2 3 4 7 10
when appending/prepending/extending an array, both the original array and the added values will be converted to a data type which can hold both without loss of information. It used to convert the added values to the type of the original array. For example, given an array of integers like:
>>> arr = ndtest(3) a a0 a1 a2 0 1 2
Trying to add a floating point number to that array used to result in:
>>> arr.append('a', 2.5, 'a3') a a0 a1 a2 a3 0 1 2 2
Now it will result in:
>>> arr.append('a', 2.5, 'a3') a a0 a1 a2 a3 0.0 1.0 2.0 2.5
made the editor more responsive when switching to or changing the filter of large arrays (closes issue 93).
added support for coloring numeric values for object arrays (e.g. arrays containing both strings and numbers).
documentation links in the Help menu of the editor point to the version of the documentation corresponding to the installed version of larray (closes issue 105).
fixed array values being editable in view() (instead of only in edit()).
Version 0.26.1¶
Released on 2017-10-25.
Made handling Excel sheets with many blank columns/rows after the data much faster (but still slower than sheets without such blank cells).
fixed reading from and writing to Excel sheets with 16384 columns or 1048576 rows (Excel’s maximum).
fixed LArray.split_axes using a custom separator and not using sort=True or when the split labels are ambiguous with labels from other axes (closes issue 485).
fixed reading 1D arrays with non-string labels (closes issue 495).
fixed read_csv(sort_columns=True) for 1D arrays (closes issue 497).
Version 0.26¶
Released on 2017-10-13.
renamed special variable x to X to let users define an x variable in their code without breaking all subsequent code using that special variable (closes issue 167).
renamed Axis.startswith, endswith and matches to startingwith, endingwith and matching to avoid a possible confusion with str.startswith and endswith which return booleans (closes issue 432).
renamed na argument of read_csv, read_excel, read_hdf and read_sas functions to fill_value to avoid confusion as to what the argument does and to be consistent with reindex and align (closes issue 394).
renamed split_axis to split_axes to reflect the fact that it can now split several axes at once (see below).
renamed sort_axis to sort_axes to reflect the fact that it can sort multiple axes at once (and does so by default).
renamed several methods with more explicit names (closes issue 50):
argmax, argmin, argsort to labelofmax, labelofmin, labelsofsorted
posargmax, posargmin, posargsort to indexofmax, indexofmin, indicesofsorted
renamed PGroup to IGroup to be consistent with other methods, especially the .i methods on axes and arrays (I is for Index – P was for Position).
getting a subset using a boolean selection returns an array with labels combined with underscore by defaults (for consistency with split_axes and combine_axes). Closes issue 376:
>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> arr[arr < 3] a_b a0_b0 a0_b1 a1_b0 0 1 2
added global_arrays() and arrays() functions to complement the local_arrays() function. They return a Session containing respectively all arrays defined in global variables and all available arrays (whether they are defined in local or global variables).
When used outside of a function, these three functions should have the same results, but inside a function local_arrays() will return only arrays local to the function, global_arrays() will return only arrays defined globally and arrays() will return arrays defined either locally or globally. Closes issue 416.
a * symbol is appended to the window title when unsaved changes are detected in the viewer (closes issue 21).
implemented Axis.containing to create a Group with all labels of an axis containing some substring (closes issue 402).
>>> people = Axis(['Bruce Wayne', 'Bruce Willis', 'Arthur Dent'], 'people') >>> people.containing('Will') people['Bruce Willis']
implemented Group.containing, startingwith, endingwith and matching to create a group with all labels of a group matching some criterion (closes issue 108).
>>> group = people.startingwith('Bru') >>> group people['Bruce Wayne', 'Bruce Willis'] >>> group.containing('Will') people['Bruce Willis']
implemented nan_equal() function to create an array of booleans telling whether each cell of the first array is equal to the corresponding cell in the other array, even in the presence of NaN.
>>> arr1 = ndtest(3, dtype=float) >>> arr1['a1'] = nan >>> arr1 a a0 a1 a2 0.0 nan 2.0 >>> arr2 = arr1.copy() >>> arr1 == arr2 a a0 a1 a2 True False True >>> nan_equal(arr1, arr2) a a0 a1 a2 True True True
implemented from_frame() to convert a Pandas DataFrame to an array:
>>> df = ndtest((2, 2, 2)).to_frame() >>> df c c0 c1 a b a0 b0 0 1 b1 2 3 a1 b0 4 5 b1 6 7 >>> from_frame(df) a b\\c c0 c1 a0 b0 0 1 a0 b1 2 3 a1 b0 4 5 a1 b1 6 7
implemented Axis.split to split an axis into several.
>>> a_b = Axis('a_b=a0_b0,a0_b1,a0_b2,a1_b0,a1_b1,a1_b2') >>> a_b.split() [Axis(['a0', 'a1'], 'a'), Axis(['b0', 'b1', 'b2'], 'b')]
added the possibility to load the example dataset used in the tutorial via the menu
File > Load Example
in the viewer
view() and edit() without argument now display global arrays in addition to local ones (closes issue 54).
using the mouse scrollwheel on filter combo boxes will switch to the previous/next label.
implemented a combobox to choose which color gradient to use and provide a few gradients.
inverted background colors in the viewer (red for low values and blue for high values). Closes issue 18.
allowed to pass an array of labels as new_axis argument to reindex method (closes issue 384):
>>> arr = ndrange('a=v0..v1;b=v0..v2') >>> arr a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 >>> arr.reindex('a', arr.b.labels) a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 v2 nan nan nan
allowed to call the reindex method using a differently named axis for labels (closes issue 386):
>>> arr = ndrange('a=v0..v1;b=v0..v2') >>> arr a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 >>> arr.reindex('a', arr.b) a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 v2 nan nan nan
arguments fill_value, sort_rows and sort_columns of read_excel function are also supported by the default xlwings engine (closes issue 393).
allowed to pass a label or group as sheet_name argument of the method to_excel or to a Workbook (open_excel). Same for key argument of the method to_hdf. Closes issue 328.
>>> arr = ndtest((4, 4, 4))
>>> # iterate over labels of a given axis >>> with open_excel('my_file.xlsx') as wb: >>> for label in arr.a: ... wb[label] = arr[label].dump() ... wb.save() >>> for label in arr.a: ... arr[label].to_hdf('my_file.h5', label)
>>> # create and use a group >>> even = arr.a['a0,a2'] >> 'even' >>> arr[even].to_excel('my_file.xlsx', even) >>> arr[even].to_hdf('my_file.h5', even)
>>> # special characters : \ / ? * [ or ] in labels or groups are replaced by an _ when exporting to excel >>> # sheet names cannot exceed 31 characters >>> g = arr.a['a1,a3,a4'] >> '?name:with*special\/[char]' >>> arr[g].to_excel('my_file.xlsx', g) >>> print(open_excel('my_file.xlsx').sheet_names()) ['_name_with_special___char_'] >>> # special characters \ or / in labels or groups are replaced by an _ when exporting to HDF file
allowed to pass a Group to read_excel/read_hdf as sheetname/key argument (closes issue 439).
>>> a, b, c = arr.a, arr.b, arr.c
>>> # For Excel >>> new_from_excel = zeros((a, b, c), dtype=int) >>> for label in a: ... new_from_excel[label] = read_excel('my_file.xlsx', label) >>> # But, to avoid loading the file in Excel repeatedly (which is very inefficient), >>> # this particular example should rather be written like this: >>> new_from_excel = zeros((a, b, c), dtype=int) >>> with open_excel('my_file.xlsx') as wb: ... for label in a: ... new_from_excel[label] = wb[label].load()
>>> # For HDF >>> new_from_hdf = zeros((a, b, c), dtype=int) >>> for label in a: ... new_from_hdf[label] = read_hdf('my_file.h5', label)
allowed setting the name of a Group using another Group or Axis (closes issue 341):
>>> arr = ndrange('axis=a,a0..a3,b,b0..b3,c,c0..c3') >>> arr axis a a0 a1 a2 a3 b b0 b1 b2 b3 c c0 c1 c2 c3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> # matches('^.$') will select labels with only one character: 'a', 'b' and 'c' >>> groups = tuple(arr.axis.startswith(code) >> code for code in arr.axis.matches('^.$')) >>> groups (axis['a', 'a0', 'a1', 'a2', 'a3'] >> 'a', axis['b', 'b0', 'b1', 'b2', 'b3'] >> 'b', axis['c', 'c0', 'c1', 'c2', 'c3'] >> 'c') >>> arr.sum(groups) axis a b c 10 35 60
allowed to test if an array contains a label using the in operator (closes issue 343):
>>> arr = ndrange('age=0..99;sex=M,F') >>> 'M' in arr True >>> 'Male' in arr False >>> # this can be useful for example in an 'if' statement >>> if 102 not in arr: ... # with 'reindex', we extend 'age' axis to 102 ... arr = arr.reindex('age', Axis('age=0..102'), fill_value=0) >>> arr.info 103 x 2 age [103]: 0 1 2 ... 100 101 102 sex [2]: 'M' 'F'
allowed to create a group on an axis using labels of another axis (closes issue 362):
>>> year = Axis('year=2000..2017') >>> even_year = Axis(range(2000, 2017, 2), 'even_year') >>> group_even_year = year[even_year] >>> group_even_year year[2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016]
split_axes (formerly split_axis) now allows to split several axes at once (closes issue 366):
>>> combined = ndrange('a_b = a0_b0..a1_b1; c_d = c0_d0..c1_d1') >>> combined a_b\c_d c0_d0 c0_d1 c1_d0 c1_d1 a0_b0 0 1 2 3 a0_b1 4 5 6 7 a1_b0 8 9 10 11 a1_b1 12 13 14 15 >>> combined.split_axes(['a_b', 'c_d']) a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15 >>> combined.split_axes({'a_b': ('A', 'B'), 'c_d': ('C', 'D')}) A B C\D d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15
argument axes of split_axes has become optional: defaults to all axes whose name contains the specified delimiter (closes issue 365):
>>> combined = ndrange('a_b = a0_b0..a1_b1; c_d = c0_d0..c1_d1') >>> combined a_b\c_d c0_d0 c0_d1 c1_d0 c1_d1 a0_b0 0 1 2 3 a0_b1 4 5 6 7 a1_b0 8 9 10 11 a1_b1 12 13 14 15 >>> combined.split_axes() a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15
allowed to perform several axes combinations at once with the combine_axes() method (closes issue 382):
>>> arr = ndtest((2, 2, 2, 2)) >>> arr a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15 >>> arr.combine_axes([('a', 'c'), ('b', 'd')]) a_c\b_d b0_d0 b0_d1 b1_d0 b1_d1 a0_c0 0 1 4 5 a0_c1 2 3 6 7 a1_c0 8 9 12 13 a1_c1 10 11 14 15 >>> # set output axes names by passing a dictionary >>> arr.combine_axes({('a', 'c'): 'ac', ('b', 'd'): 'bd'}) ac\bd b0_d0 b0_d1 b1_d0 b1_d1 a0_c0 0 1 4 5 a0_c1 2 3 6 7 a1_c0 8 9 12 13 a1_c1 10 11 14 15
allowed to use keyword arguments in set_labels (closes issue 383):
>>> a = ndrange('nat=BE,FO;sex=M,F') >>> a nat\sex M F BE 0 1 FO 2 3 >>> a.set_labels(sex='Men,Women', nat='Belgian,Foreigner') nat\sex Men Women Belgian 0 1 Foreigner 2 3
allowed passing an axis to set_labels as ‘labels’ argument (closes issue 408).
added data type (dtype) to array.info (closes issue 454):
>>> arr = ndtest((2, 2), dtype=float) >>> arr a\b b0 b1 a0 0.0 1.0 a1 2.0 3.0 >>> arr.info 2 x 2 a [2]: 'a0' 'a1' b [2]: 'b0' 'b1' dtype: float64
To create a 1D array using from_string() and the default separator ” “, a tabulation character
\t
(instead of-
previously) must be added in front of the data line:>>> from_string('''sex M F ... \t 0 1''') sex M F 0 1
viewer window title also includes the dtype of the current displayed array (closes issue 85)
viewer window title uses only the file name instead of the entire file path as it made titles too long in some cases.
when editing .csv files, the viewer window title will be “directoryfname.csv - axes_info” instead of having the file name repeated as before (“dirfname.csv - fname: axes_info”).
the viewer will not update digits/scientific notation nor colors when the filter changes, so that numbers are more easily comparable when quickly changing the filter, especially using the scrollwheel on filter boxes.
NaN values display as grey in the viewer so that they stand out more.
compare() will color values depending on relative difference instead of absolute difference as this is usually more useful.
compare(sessions) uses nan_equal to compare arrays so that identical arrays are not marked different when they contain NaN values.
changed compare() “stacked axis” names: arrays -> array and sessions -> session because that reads a bit more naturally.
fixed array creation with axis(es) given as string containing only one label (axis name and label were inverted).
fixed reading an array from a CSV or Excel file when the columns axis is not explicitly named (via
\
). For example, let’s say we want to read a CSV file ‘pop.csv’ with the following content (indented for clarity)sex, 2015, 2016 F, 11, 13 M, 12, 10
The result of function read_csv is:
>>> pop = read_csv('pop.csv') >>> pop sex\{1} 2015 2016 F 11 13 M 12 10
Closes issue 372.
fixed converting a 1xN Pandas DataFrame to an array using aslarray (closes issue 427):
>>> df = pd.DataFrame([[1, 2, 3]], index=['a0'], columns=['b0', 'b1', 'b2']) >>> df b0 b1 b2 a0 1 2 3 >>> aslarray(df) {0}\{1} b0 b1 b2 a0 1 2 3
>>> # setting name to index and columns >>> df.index.name = 'a' >>> df.columns.name = 'b' >>> df b b0 b1 b2 a a0 1 2 3 >>> aslarray(df) a\b b0 b1 b2 a0 1 2 3
fixed original file being deleted when trying to overwrite a file via Session.save or open_excel failed (closes issue 441)
fixed loading arrays from Excel sheets containing blank cells below or right of the array to read (closes issue 443)
fixed unary and binary operations between sessions failing entirely when the operation failed/was invalid on any array. Now the result will be nan for that array but the operation will carry on for other arrays.
fixed stacking sessions failing entirely when the stacking failed on any array. Now the result will be nan for that array but the operation will carry on for other arrays.
fixed stacking arrays with anonymous axes.
fixed applying split_axes on an array with labels of type ‘Object’ (could happen when an array is read from a file).
fixed background color in the viewer when using filters in the compare() dialog (closes issue 66)
fixed autoresize of columns by double clicking between column headers (closes issue 43)
fixed representing a 0D array (scalar) in the viewer (closes issue 71)
fixed viewer not displaying an error message when saving or loading a file failed (closes issue 75)
fixed array.split_axis when the combined axis does not contain all the combination of labels resulting from the split (closes issue 369).
fixed array.split_axis when combined labels are not sorted by the first part then second part (closes issue 364).
fixed opening .csv files in the editor will create variables named using only the filename without extension (instead of being named using the full path of the file – making it almost useless). Closes issue 90.
fixed deleting a variable (using the del key in the list) not marking the session/file as being modified.
fixed the link to the tutorial (Help->Online Tutorial) (closes issue 92).
fixed inplace modifications of arrays in the console (via array[xxx] = value) not updating the view (closes issue 94).
fixed background color in compare() being wrong after changing axes order by drag-and-dropping them (closes issue 89).
fixed the whole array/compare being the same color in the presence of -inf or +inf in the array.
Version 0.25.2¶
Released on 2017-09-06.
Excel Workbooks opened with open_excel(visible=False) will use the global Excel instance by default and those using visible=True will use a new Excel instance by default (closes issue 405).
fixed view() which did not show any array (closes issue 57).
fixed exceptions in the viewer crashing it when a Qt app was created (e.g. from a plot) before the viewer was started (closes issue 58).
fixed compare() arrays names not being determined correctly (closes issue 61).
fixed filters and title not being updated when displaying array created via the console (closes issue 55).
fixed array grid not being updated when selecting a variable when no variable was selected (closes issue 56).
fixed copying or plotting multiple rows in the editor when they were selected via drag and drop on headers (closes issue 59).
fixed digits not being automatically updated when changing filters.
Version 0.25.1¶
Released on 2017-09-04.
Deprecated methods display a warning message when they are still used (replaced DeprecationWarning by FutureWarning). Closes issue 310.
updated documentation of method with_total (closes issue 89).
trying to set values of a subset by passing an array with incompatible axes displays a better error message (closes issue 268).
fixed error raised in viewer when switching between arrays when a filter was set.
fixed displaying empty array when starting the viewer or a new session in it.
fixed Excel instance created via to_excel() and open_excel() without any filename being closed at the end of the Python program (closes issue 390).
fixed the view(), edit() and compare() functions not being available in the viewer console.
fixed row and column resizing by double clicking on the edge of an header cell.
fixed New and Open in the menu File of the viewer when IPython console is not available.
fixed getting a subset of an array by mixing boolean filters and other filters (closes issue 246):
>>> arr = ndrange('a=a0..a2;b=0..3') >>> arr a\b 0 1 2 3 a0 0 1 2 3 a1 4 5 6 7 a2 8 9 10 11 >>> arr['a0,a2', x.b < 2] a\b 0 1 a0 0 1 a2 8 9
Warning: when mixed with other filters, boolean filters are limited to one dimension.
fixed setting an array values using array.points[key] = value when value is an LArray (closes issue 368).
fixed using syntax ‘int..int’ in a selection (closes issue 350):
>>> arr = ndrange('a=2017..2012') >>> arr a 2017 2016 2015 2014 2013 2012 0 1 2 3 4 5 >>> arr['2012..2015'] a 2012 2013 2014 2015 5 4 3 2
fixed mixing ‘..’ sequences and spaces in an indexing string (closes issue 389):
>>> arr = ndtest(7) >>> arr a a0 a1 a2 a3 a4 a5 a6 0 1 2 3 4 5 6 >>> arr['a0, a2, a4..a6'] a a0 a2 a4 a5 a6 0 2 4 5 6
fixed indexing/aggregating using groups with renaming (using >>) when the axis has mixed type labels (object dtype).
Version 0.25¶
Released on 2017-08-22.
viewer functions (view, edit and compare) have been moved to the separate larray-editor package, which needs to be installed separately, unless you are using larrayenv. Closes issue 332.
installing larray-editor (or larrayenv) from conda environment creates a new menu ‘LArray’ in the Windows start menu. It contains a link to open the documentation, a shortcut to launch the user interface in edition mode and a shortcut to update larrayenv. Closes issue 281.
added possibility to transpose an array in the viewer by dragging and dropping axes’ names in the filter bar.
implemented array.align(other_array) which makes two arrays compatible with each other (by making all common axes compatible). This is done by adding, removing or reordering labels for each common axis according to the join method used:
outer: will use a label if it is in either arrays axis (ordered like the first array). This is the default as it results in no information loss.
inner: will use a label if it is in both arrays axis (ordered like the first array)
left: will use the first array axis labels
right: will use the other array axis labels
The fill value for missing labels defaults to nan.
>>> arr1 = ndtest((2, 3)) >>> arr1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr2 = -ndtest((3, 2)) >>> # reorder array to make the test more interesting >>> arr2 = arr2[['b1', 'b0']] >>> arr2 a\\b b1 b0 a0 -1 0 a1 -3 -2 a2 -5 -4
Align arr1 and arr2
>>> aligned1, aligned2 = arr1.align(arr2) >>> aligned1 a\b b0 b1 b2 a0 0.0 1.0 2.0 a1 3.0 4.0 5.0 a2 nan nan nan >>> aligned2 a\b b0 b1 b2 a0 0.0 -1.0 nan a1 -2.0 -3.0 nan a2 -4.0 -5.0 nan
After aligning all common axes, one can then do operations between the two arrays
>>> aligned1 + aligned2 a\b b0 b1 b2 a0 0.0 0.0 nan a1 1.0 1.0 nan a2 nan nan nan
The fill value for missing labels defaults to nan but can be changed to any compatible value.
>>> aligned1, aligned2 = arr1.align(arr2, fill_value=0) >>> aligned1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 0 0 0 >>> aligned2 a\b b0 b1 b2 a0 0 -1 0 a1 -2 -3 0 a2 -4 -5 0 >>> aligned1 + aligned2 a\b b0 b1 b2 a0 0 0 2 a1 1 1 5 a2 -4 -5 0
implemented Session.transpose(axes) to reorder axes of all arrays within a session, ignoring missing axes for each array. For example, let us first create a test session and a small helper function to display sessions as a short summary.
>>> arr1 = ndtest((2, 2, 2)) >>> arr2 = ndtest((2, 2)) >>> sess = Session([('arr1', arr1), ('arr2', arr2)]) >>> def print_summary(s): ... print(s.summary("{name} -> {axes_names}")) >>> print_summary(sess) arr1 -> a, b, c arr2 -> a, b
Put the ‘b’ axis in front of all arrays
>>> print_summary(sess.transpose('b')) arr1 -> b, a, c arr2 -> b, a
Axes missing on an array are ignored (‘c’ for arr2 in this case)
>>> print_summary(sess.transpose('c', 'b')) arr1 -> c, b, a arr2 -> b, a
Use … to move axes to the end
>>> print_summary(sess.transpose(..., 'a')) arr1 -> b, c, a arr2 -> b, a
implemented unary operations on Session, which means one can negate all arrays in a Session or take the absolute value of all arrays in a Session without writing an explicit loop for that.
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(4) - 1 >>> arr2 a a0 a1 a2 a3 -1 0 1 2 >>> sess1 = Session([('arr1', arr1), ('arr2', arr2)]) >>> sess2 = -sess1 >>> sess2.arr1 a a0 a1 0 -1 >>> sess2.arr2 a a0 a1 a2 a3 1 0 -1 -2 >>> sess3 = abs(sess1) >>> sess3.arr2 a a0 a1 a2 a3 1 0 1 2
implemented stacking sessions using stack().
Let us first create two test sessions. For example suppose we have a session storing the results of a baseline simulation:
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(3) >>> arr2 a a0 a1 a2 0 1 2 >>> baseline = Session([('arr1', arr1), ('arr2', arr2)])
and another session with a variant
>>> arr1variant = arr1 * 2 >>> arr1variant a a0 a1 0 2 >>> arr2variant = 2 - arr2 / 2 >>> arr2variant a a0 a1 a2 2.0 1.5 1.0 >>> variant = Session([('arr1', arr1variant), ('arr2', arr2variant)])
then we stack them together
>>> stacked = stack([('baseline', baseline), ('variant', variant)], 'sessions') >>> stacked Session(arr1, arr2) >>> stacked.arr1 a\sessions baseline variant a0 0 0 a1 1 2 >>> stacked.arr2 a\sessions baseline variant a0 0.0 2.0 a1 1.0 1.5 a2 2.0 1.0
Combined with the fact that we can compute some very simple expressions on sessions, this can be extremely useful to quickly compare all arrays of several sessions (e.g. simulation variants):
>>> diff = variant - baseline >>> # compute the absolute difference and relative difference for each array of the sessions >>> stacked = stack([('baseline', baseline), ('variant', variant), ('diff', diff), ('abs diff', abs(diff)), ('rel diff', diff / baseline)], 'sessions') >>> stacked Session(arr1, arr2) >>> stacked.arr2 a\sessions baseline variant diff abs diff rel diff a0 0.0 2.0 2.0 2.0 inf a1 1.0 1.5 0.5 0.5 0.5 a2 2.0 1.0 -1.0 1.0 -0.5
implemented Axis.align(other_axis) and AxisCollection.align(other_collection) which makes two axes / axis collections compatible with each other, see LArray.align above.
implemented Session.apply(function) to apply a function to all elements (arrays) of a Session and return a new Session.
Let us first create a test session
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(3) >>> arr2 a a0 a1 a2 0 1 2 >>> sess1 = Session([('arr1', arr1), ('arr2', arr2)]) >>> sess1 Session(arr1, arr2)
Then define the function we want to apply to all arrays of our session
>>> def increment(element): ... return element + 1
Apply it
>>> sess2 = sess1.apply(increment) >>> sess2.arr1 a a0 a1 1 2 >>> sess2.arr2 a a0 a1 a2 1 2 3
implemented setting the value of multiple points using array.points[labels] = value
>>> arr = ndtest((3, 4)) >>> arr a\b b0 b1 b2 b3 a0 0 1 2 3 a1 4 5 6 7 a2 8 9 10 11
Now, suppose you want to retrieve several specific combinations of labels, for example (a0, b1), (a0, b3), (a1, b0) and (a2, b2). You could write a loop like this:
>>> values = [] >>> for a, b in [('a0', 'b1'), ('a0', 'b3'), ('a1', 'b0'), ('a2', 'b2')]: ... values.append(arr[a, b]) >>> values [1, 3, 4, 10]
but you could also (this already worked in previous versions) use array.points like:
>>> arr.points[['a0', 'a0', 'a1', 'a2'], ['b1', 'b3', 'b0', 'b2']] a,b a0,b1 a0,b3 a1,b0 a2,b2 1 3 4 10
which has the advantages of being both much faster and keep more information. Now suppose you want to set the value of those points, you could write:
>>> for a, b in [('a0', 'b1'), ('a0', 'b3'), ('a1', 'b0'), ('a2', 'b2')]: ... arr[a, b] = 42 >>> arr a\b b0 b1 b2 b3 a0 0 42 2 42 a1 42 5 6 7 a2 8 9 42 11
but now you can also use the faster alternative:
>>> arr.points[['a0', 'a0', 'a1', 'a2'], ['b1', 'b3', 'b0', 'b2']] = 42
added icon to display in Windows start menu and editor windows.
viewer keeps labels visible even when scrolling (label rows and columns are now frozen).
added ‘Getting Started’ section in documentation.
implemented axes argument to ipfp to specify on which axes the fitting procedure should be applied (closes issue 185). For example, let us assume you have a 3D array, such as:
>>> initial = ndrange('a=a0..a9;b=b0..b9;year=2000..2016')
and you want to apply a 2D fitting procedure for each value of the year axis. Previously, you had to loop on that year axis explicitly and call ipfp within the loop, like:
>>> result = zeros(initial.axes) >>> for year in initial.year: ... current = initial[year] ... # assume you have some targets for each year ... current_targets = [current.sum(x.a) + 1, current.sum(x.b) + 1] ... result[year] = ipfp(current_targets, current)
Now you can apply the procedure on all years at once, by telling you want to do the fitting procedure on the other axes. This is a bit shorter to type, but this is also much faster.
>>> all_targets = [initial.sum(x.a) + 1, initial.sum(x.b) + 1] >>> result = ipfp(all_targets, initial, axes=(x.a, x.b))
made ipfp 10 to 20% faster (even without using the axes argument).
implemented Session.to_globals(inplace=True) which will update the content of existing arrays instead of creating new variables and overwriting them. This ensures the arrays have the same axes in the session than the existing variables.
added the ability to provide a pattern when loading several .csv files as a session. Among others, patterns can use * to match any number of characters and ? to match any single character.
>>> s = Session() >>> # load all .csv files starting with "output" in the data directory >>> s.load('data/output*.csv')
stack can be used with keyword arguments when labels are “simple strings” (i.e. no integers, no punctuation, no string starting with integers, etc.). This is an attractive alternative but as it only works in the usual case and not in all cases, it is not recommended to use it except in the interactive console.
>>> arr1 = ones('nat=BE,FO') >>> arr1 nat BE FO 1.0 1.0 >>> arr2 = zeros('nat=BE,FO') >>> arr2 nat BE FO 0.0 0.0 >>> stack(M=arr1, F=arr2, axis='sex=M,F') nat\\sex M F BE 1.0 0.0 FO 1.0 0.0
Without passing an explicit order for labels like above (or an axis object), it should only be used on Python 3.6 or later because keyword arguments are NOT ordered on earlier Python versions.
>>> # use this only on Python 3.6 and later >>> stack(M=arr1, F=arr2, axis='sex') nat\\sex M F BE 1.0 0.0 FO 1.0 0.0
binary operations between session now ignore type errors. For example, if you are comparing two sessions with many arrays by computing the difference between them but a few arrays contain strings, the whole operation will not fail, the concerned arrays will be assigned a nan instead.
added optional argument ignore_exceptions to Session.load to ignore exceptions during load. This is mostly useful when trying to load many .csv files in a Session and some of them have an invalid format but you want to load the others.
fixed disambiguating an ambiguous key by adding the axis within the string, for example arr[‘axis_name[ambiguouslabel]’] (closes issue 331).
fixed converting a string group to integer or float using int() and float() (when that makes sense).
>>> a = Axis('a=10,20,30,total') >>> a Axis(['10', '20', '30', 'total'], 'a') >>> str(a.i[0]) '10' >>> int(a.i[0]) 10 >>> float(a.i[0]) 10.0
Version 0.24¶
Released on 2017-06-14.
implemented Session.to_globals which creates global variables from variables stored in the session (closes issue 276). Note that this should usually only be used in an interactive console and not in a script. Code editors are confused by this kind of manipulation and will likely consider as invalid the code using variables created in this way. Additionally, when using this method auto-completion, “show definition”, “go to declaration” and other similar code editor features will probably not work for the variables created in this way and any variable derived from them.
>>> s = Session(arr1=ndtest(3), arr2=ndtest((2, 2))) >>> s.to_globals() >>> arr1 a a0 a1 a2 0 1 2 >>> arr2 a\b b0 b1 a0 0 1 a1 2 3
added new boolean argument ‘overwrite’ to Session.save, Session.to_hdf, Session.to_excel and Session.to_pickle methods (closes issue 293). If overwrite=True and the target file already existed, it is deleted and replaced by a new one. This is the new default behavior. If overwrite=False, an existing file is updated (like it was in previous larray versions):
>>> arr1, arr2, arr3 = ndtest((2, 2)), ndtest(4), ndtest((3, 2)) >>> s = Session([('arr1', arr1), ('arr2', arr2), ('arr3', arr3)])
>>> # save arr1, arr2 and arr3 in file output.h5 >>> s.save('output.h5')
>>> # replace arr1 and create arr4 + put them in an second session >>> arr1, arr4 = ndtest((3, 3)), ndtest((2, 3)) >>> s2 = Session([('arr1', arr1), ('arr4', arr4)])
>>> # replace arr1 and add arr4 in file output.h5 >>> s2.save('output.h5', overwrite=False)
>>> # erase content of 'output.h5' and save only arrays contained in the second session >>> s2.save('output.h5')
renamed create_sequential() to sequence() (closes issue 212).
improved auto-completion in ipython interactive consoles (e.g. the viewer console) for Axis, AxisCollection, Group and Workbook objects. These objects can now complete keys within [].
>>> gender = Axis('gender=Male,Female') >>> gender Axis(['Male', 'Female'], 'gender') gender['Female >>> gender['Fe<tab> # will be completed to `gender['Female`
>>> arr = ndrange(gender) >>> arr.axes['gen<tab> # will be completed to `arr.axes['gender`
>>> wb = open_excel() >>> wb['Sh<tab> # will be completed to `wb['Sheet1`
added documentation for Session methods (closes issue 277).
allowed to provide explict names for arrays or sessions in compare(). Closes issue 307.
fixed title argument of ndtest creation function: title was not passed to the returned array.
fixed create_sequential when arguments initial and inc are array and scalar respectively (closes issue 288).
fixed auto-completion of attributes of LArray and Group objects (closes issue 302).
fixed name of arrays/sessions in compare() not being inferred correctly (closes issue 306).
fixed indexing Excel sheets by position to always yield the requested shape even when bounds are outside the range of used cells. Closes issue 273.
fixed the array() method on excel.Sheet returning float labels when int labels are expected.
fixed getting float data instead of int when converting an Excel Sheet or Range to an larray or numpy array.
fixed some warning messages to point to the correct line in user code.
fixed crash of Session.save method when it contained 0D arrays. They are now skipped when saving a session (closes issue 291).
fixed Session.save and Session.to_excel failing to create new Excel files (it only worked if the file already existed). Closes issue 313.
fixed Session.load(file, engine=’pandas_excel’) : axes were considered as anonymous.
Version 0.23¶
Released on 2017-05-30.
changed display of arrays (closes issue 243):
>>> ndtest((2, 3)) a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
instead of
>>> ndtest((2, 3)) a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5
.. can now be used within keys (between []). Previously it could only be used to define new axes. As a reminder, it generates increasing values between the two bounds. It is slightly different from : which takes everything between the two bounds in the axis order.
>>> arr = ndrange('a=a1,a0,a2,a3') >>> arr a a1 a0 a2 a3 0 1 2 3 >>> arr['a1..a3'] a a1 a2 a3 0 2 3
this is different from : which takes everything in between the two bounds :
>>> arr['a1:a3'] a a1 a0 a2 a3 0 1 2 3
in both axes definitions and keys (within []) .. can now be mixed with , and other .. :
>>> arr = ndrange('code=A,C..E,G,X..Z') >>> arr code A C D E G X Y Z 0 1 2 3 4 5 6 7 >>> arr['A,Z..X,G'] code A Z Y X G 0 7 6 5 4
within .. extra zeros are only padded to numbers if zeros are present in the pattern.
>>> ndrange('code=A1..A12') code A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 0 1 2 3 4 5 6 7 8 9 10 11
>>> ndrange('code=A01..A12') code A01 A02 A03 A04 A05 A06 A07 A08 A09 A10 A11 A12 0 1 2 3 4 5 6 7 8 9 10 11
in previous larray versions, the two above definitions returned the second array.
set sep argument of from_string function to ‘ ‘ by default (closes issue 271). For 1D array, a “-” must be added in front of the data line.
>>> from_string('''sex M F - 0 1''') sex M F 0 1 >>> from_string('''nat\\sex M F BE 0 1 FO 2 3''') nat\sex M F BE 0 1 FO 2 3
improved error message when trying to access nonexistent sheet in an Excel workbook (closes issue 266).
when creating an Axis from a Group and no explicit name was given, reuse the name of the group axis.
>>> a = Axis('a=a0..a2') >>> Axis(a[:'a1']) Axis(['a0', 'a1'], 'a')
allowed to create an array using a single group as if it was an Axis.
>>> a = Axis('a=a0..a2') >>> ndrange(a) a a0 a1 a2 0 1 2 >>> # using a group as an axis >>> ndrange(a[:'a1']) a a0 a1 0 1
allowed to use axes (Axis objects) to subset arrays (part of issue 210).
>>> arr = ndtest((2, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> b2 = Axis('b=b0,b2') >>> arr[b2] a\b b0 b2 a0 0 2 a1 3 5
improved string representation of Excel workbooks and sheets (they mention the actual file/sheet they correspond to). This is mostly useful in the interactive console to check what an object corresponds to.
>>> wb = open_excel() >>> wb <larray.io.excel.Workbook [Book1]> >>> wb[0] <larray.io.excel.Sheet [Book1]Sheet1>
open_excel(‘non existent file’) will raise an explicit error immediately when overwrite_file is False, instead of failing at a seemingly random point later on (closes issue 265).
integer-like strings in axis definition strings using , are converted to integers to be consistent with string definitions using ... In other words, ndrange(‘a=1,2,3’) did not create the same array than ndrange(‘a=1..3’).
fixed reading a single cell from an Excel sheet.
fixed script execution not resuming after quitting the viewer when it was called using view(a_single_array).
fixed opening the viewer after showing a plot window.
do not display an error when setting the value of an element of a non LArray sequence in the viewer console
>>> l = [1, 2, 3] >>> l[0] = 42
Version 0.22¶
Released on 2017-05-11.
viewer: added a menu bar with the ability to clear the current session, save all its arrays to a file (.h5, .xlsx, or a directory containing multiple .csv files), and load arrays from such a file (closes issue 88).
WARNING: Only array objects are currently saved. It means that scalars, functions or others non-LArray objects defined in the console are not saved in the file.
implemented a new describe() method on arrays to give quick summary statistics. By default, it includes the number of non-NaN values, the mean, standard deviation, minimum, 25, 50 and 75 percentiles and maximum.
>>> arr = ndrange('gender=Male,Female;year=2014..2020').astype(float) >>> arr gender\year | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 Male | 0.0 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 Female | 7.0 | 8.0 | 9.0 | 10.0 | 11.0 | 12.0 | 13.0 >>> arr.describe() statistic | count | mean | std | min | 25% | 50% | 75% | max | 14.0 | 6.5 | 4.031128874149275 | 0.0 | 3.25 | 6.5 | 9.75 | 13.0
an optional keyword argument allows to specify different percentiles to include
>>> arr.describe(percentiles=[20, 40, 60, 80]) statistic | count | mean | std | min | 20% | 40% | 60% | 80% | max | 14.0 | 6.5 | 4.031128874149275 | 0.0 | 2.6 | 5.2 | 7.8 | 10.4 | 13.0
its sister method, describe_by() was also implemented to give quick summary statistics along axes or groups.
>>> arr.describe_by('gender') gender\statistic | count | mean | std | min | 25% | 50% | 75% | max Male | 7.0 | 3.0 | 2.0 | 0.0 | 1.5 | 3.0 | 4.5 | 6.0 Female | 7.0 | 10.0 | 2.0 | 7.0 | 8.5 | 10.0 | 11.5 | 13.0 >>> arr.describe_by('gender', (x.year[:2015], x.year[2019:])) gender | year\statistic | count | mean | std | min | 25% | 50% | 75% | max Male | :2015 | 2.0 | 0.5 | 0.5 | 0.0 | 0.25 | 0.5 | 0.75 | 1.0 Male | 2019: | 2.0 | 5.5 | 0.5 | 5.0 | 5.25 | 5.5 | 5.75 | 6.0 Female | :2015 | 2.0 | 7.5 | 0.5 | 7.0 | 7.25 | 7.5 | 7.75 | 8.0 Female | 2019: | 2.0 | 12.5 | 0.5 | 12.0 | 12.25 | 12.5 | 12.75 | 13.0
This closes issue 184.
implemented reindex allowing to change the order of labels and add/remove some of them to one or several axes:
>>> arr = ndtest((2, 2)) >>> arr a\b | b0 | b1 a0 | 0 | 1 a1 | 2 | 3 >>> arr.reindex(x.b, ['b1', 'b2', 'b0'], fill_value=-1) a\b | b1 | b2 | b0 a0 | 1 | -1 | 0 a1 | 3 | -1 | 2 >>> a = Axis('a', ['a1', 'a2', 'a0']) >>> b = Axis('b', ['b2', 'b1', 'b0']) >>> arr.reindex({'a': a, 'b': b}, fill_value=-1) a\b | b2 | b1 | b0 a1 | -1 | 3 | 2 a2 | -1 | -1 | -1 a0 | -1 | 1 | 0
using reindex one can make an array compatible with another array which has more/less labels or with labels in a different order:
>>> arr2 = ndtest((3, 3)) >>> arr2 a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 a2 | 6 | 7 | 8 >>> arr.reindex(arr2.axes, fill_value=0) a\b | b0 | b1 | b2 a0 | 0 | 1 | 0 a1 | 2 | 3 | 0 a2 | 0 | 0 | 0 >>> arr.reindex(arr2.axes, fill_value=0) + arr2 a\b | b0 | b1 | b2 a0 | 0 | 2 | 2 a1 | 5 | 7 | 5 a2 | 6 | 7 | 8
This closes issue 18.
added load_example_data function to load datasets used in tutorial and be able to reproduce examples. The name of the dataset must be provided as argument (there is currently only one available dataset). Datasets are returned as Session objects:
>>> demo = load_example_data('demography') >>> demo.pop.info 26 x 3 x 121 x 2 x 2 time [26]: 1991 1992 1993 ... 2014 2015 2016 geo [3]: 'BruCap' 'Fla' 'Wal' age [121]: 0 1 2 ... 118 119 120 sex [2]: 'M' 'F' nat [2]: 'BE' 'FO' >>> demo.qx.info 26 x 3 x 121 x 2 x 2 time [26]: 1991 1992 1993 ... 2014 2015 2016 geo [3]: 'BruCap' 'Fla' 'Wal' age [121]: 0 1 2 ... 118 119 120 sex [2]: 'M' 'F' nat [2]: 'BE' 'FO'
(closes issue 170)
implemented Axis.union, intersection and difference which produce new axes by combining the labels of the axis with the other labels.
>>> letters = Axis('letters=a,b') >>> letters.union(Axis('letters=b,c')) Axis(['a', 'b', 'c'], 'letters') >>> letters.union(['b', 'c']) Axis(['a', 'b', 'c'], 'letters') >>> letters.intersection(['b', 'c']) Axis(['b'], 'letters') >>> letters.difference(['b', 'c']) Axis(['a'], 'letters')
implemented Group.union, intersection and difference which produce new groups by combining the labels of the group with the other labels.
>>> letters = Axis('letters=a..d') >>> letters['a', 'b'].union(letters['b', 'c']) letters['a', 'b', 'c'].set() >>> letters['a', 'b'].union(['b', 'c']) letters['a', 'b', 'c'].set() >>> letters['a', 'b'].intersection(['b', 'c']) letters['b'].set() >>> letters['a', 'b'].difference(['b', 'c']) letters['a'].set()
viewer: added possibility to delete an array by pressing Delete on keyboard (closes issue 116).
Excel sheets in workbooks opened via open_excel can be renamed by changing their .name attribute:
>>> wb = open_excel() >>> wb['old_sheet_name'].name = 'new_sheet_name'
Excel sheets in workbooks opened via open_excel can be deleted using “del”:
>>> wb = open_excel() >>> del wb['sheet_name']
implemented PGroup.set() to transform a positional group to an LSet.
>>> a = Axis('a=a0..a5') >>> a.i[:2].set() a['a0', 'a1'].set()
inverted name and labels arguments when creating an Axis and made name argument optional (to create anonymous axes). Now, it is also possible to create an Axis by passing a single string of the kind ‘name=labels’:
>>> anonymous = Axis('0..100') >>> age = Axis('age=0..100') >>> gender = Axis('M,F', 'gender')
(closes issue 152)
renamed Session.dump, dump_hdf, dump_excel and dump_csv to save, to_hdf, to_excel and to_csv (closes issue 217).
changed default value of ddof argument for var and std functions from 0 to 1 (closes issue 190).
implemented a new syntax for stack(): stack({label1: value1, label2: value2}, axis)
>>> nat = Axis('nat', 'BE, FO') >>> sex = Axis('sex', 'M, F') >>> males = ones(nat) >>> males nat | BE | FO | 1.0 | 1.0 >>> females = zeros(nat) >>> females nat | BE | FO | 0.0 | 0.0
In the case the axis has already been defined in a variable, this gives:
>>> stack({'M': males, 'F': females}, sex) nat\sex | M | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
Additionally, axis can now be an axis string definition in addition to an Axis object, which means one can write this:
>>> stack({'M': males, 'F': females}, 'sex=M,F')
It is better than the simpler but highly discouraged alternative:
>>> stack([males, females), sex)
because it is all too easy to invert labels. It is very hard to spot the error in the following line, and larray cannot spot it for you either:
>>> stack([females, males), sex) nat\sex | M | F BE | 0.0 | 1.0 FO | 0.0 | 1.0
When creating an axis from scratch (it does not already exist in a variable), one might want to use this:
>>> stack([males, females], 'sex=M,F')
even if this could suffer, to a lesser extent, the same problem as above when stacking many arrays.
handle … in transpose method to avoid having to list all axes. This can be useful, for example, to change which axis is displayed in columns (closes issue 188).
>>> arr.transpose(..., 'time') >>> arr.transpose('gender', ..., 'time')
made scalar Groups behave even more like their value: any method available on the value is available on the Group. For example, if the Group has a string value, the string methods are available on it (closes issue 202).
>>> test = Axis('test', ['abc', 'a1-a2']) >>> test.i[0].upper() 'ABC' >>> test.i[1].split('-') ['a1', 'a2']
updated AxisCollection.replace so as to replace one, several or all axes and to accept axis definitions as new axes.
>>> arr = ndtest((2, 3)) >>> axes = arr.axes >>> axes AxisCollection([ Axis(['a0', 'a1'], 'a'), Axis(['b0', 'b1', 'b2'], 'b') ]) >>> row = Axis(['r0', 'r1'], 'row') >>> column = Axis(['c0', 'c1', 'c2'], 'column')
Replace several axes (keywords, list of tuple or dictionary)
>>> axes.replace(a=row, b=column) >>> # or >>> axes.replace(a="row=r0,r1", b="column=c0,c1,c2") >>> # or >>> axes.replace([(x.a, row), (x.b, column)]) >>> # or >>> axes.replace({x.a: row, x.b: column}) AxisCollection([ Axis(['r0', 'r1'], 'row'), Axis(['c0', 'c1', 'c2'], 'column') ])
added possibility to delete an array from a session:
>>> s = Session({'a': ndtest((3, 3)), 'b': ndtest((2, 4)), 'c': ndtest((4, 2))}) >>> s.names ['a', 'b', 'c'] >>> del s.b >>> del s['c'] >>> s.names ['a']
made create_sequential axis argument accept axis definitions in addition to Axis objects like, for example, using a string definition (closes issue 160).
>>> create_sequential('year=2016..2019') year | 2016 | 2017 | 2018 | 2019 | 0 | 1 | 2 | 3
replaced *args, **kwargs by explicit arguments in documentation of aggregation functions (sum, prod, mean, std, var, …). Closes issue 41.
improved documentation of plot method (closes issue 169).
improved auto-completion in ipython interactive consoles for both LArray and Session objects. LArray objects can now complete keys within [].
>>> a = ndrange('sex=Male,Female') >>> a sex | Male | Female | 0 | 1 >>> a['Fe<tab>`
will autocomplete to a[‘Female. Sessions will now auto-complete both attributes (using session.) and keys (using session[).
>>> s = Session({'a_nice_test_array': ndtest(10)}) >>> s.a_<tab>
will autocomplete to s.a_nice_test_array and s[‘a_<tab> will be completed to s[‘a_nice_test_array
made warning messages for division by 0 and invalid values (usually caused by 0 / 0) point to the user code line, instead of the corresponding line in the larray module.
preserve order of arrays in a session when saving to/loading from an .xlsx file.
when creating a session from a directory containing CSV files, the directory may now contain other (non-CSV) files.
several calls to open_excel from within the same program/script will now reuses a single global Excel instance. This makes Excel I/O much faster without having to create an instance manually using xlwings.App, and still without risking interfering with other instances of Excel opened manually (closes issue 245).
improved error message when trying to copy a sheet from one instance of Excel to another (closes issue 231).
fixed keyword arguments such as out, ddof, … for aggregation functions (closes issue 189).
fixed percentile(_by) with multiple percentiles values, i.e. when argument q is a list/tuple (closes issue 192).
fixed group aggregates on integer arrays for median, percentile, var and std (closes issue 193).
fixed group sum over boolean arrays (closes issue 194).
fixed set_labels when inplace=True.
fixed array creation functions not raising an exception when called with wrong syntax func(axis1, axis2, …) instead of func([axis1, axis2, …]) (closes issue 203).
fixed position of added sheets in excel workbook: new sheets are appended instead of prepended (closes issue 229).
fixed Workbook behavior in case of new workbook: the first added sheet replaces the default sheet Sheet1 (closes issue 230).
fixed name of Workbook sheets created by copying another sheet (closes issue 244).
>>> wb = open_excel() >>> wb['name_of_new_sheet'] = wb['name_of_sheet_to_copy']
fixed with_axes warning to refer to set_axes instead of replace_axes.
fixed displayed title in viewer: shows path to file associated with current session + current array info + extra info (closes issue 181)
Version 0.21¶
Released on 2017-03-28.
implemented set_axes() method to replace one, several or all axes of an array (closes issue 67). The method with_axes() is now deprecated (set_axes() must be used instead).
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> row = Axis('row', ['r0', 'r1']) >>> column = Axis('column', ['c0', 'c1', 'c2'])
Replace one axis (second argument new_axis must be provided)
>>> arr.set_axes(x.a, row) row\b | b0 | b1 | b2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
Replace several axes (keywords, list of tuple or dictionary)
>>> arr.set_axes(a=row, b=column) or >>> arr.set_axes([(x.a, row), (x.b, column)]) or >>> arr.set_axes({x.a: row, x.b: column}) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
Replace all axes (list of axes or AxisCollection)
>>> arr.set_axes([row, column]) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5 >>> arr2 = ndrange([row, column]) >>> arr.set_axes(arr2.axes) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
implemented Axis.replace to replace some labels from an axis:
>>> sex = Axis('sex', ['M', 'F']) >>> sex Axis('sex', ['M', 'F']) >>> sex.replace('M', 'Male') Axis('sex', ['Male', 'F']) >>> sex.replace({'M': 'Male', 'F': 'Female'}) Axis('sex', ['Male', 'Female'])
implemented from_string() method to create an array from a string (closes issue 96).
>>> from_string('''age,nat\\sex, M, F ... 0, BE, 0, 1 ... 0, FO, 2, 3 ... 1, BE, 4, 5 ... 1, FO, 6, 7''') age | nat\sex | M | F 0 | BE | 0 | 1 0 | FO | 2 | 3 1 | BE | 4 | 5 1 | FO | 6 | 7
allowed to use a regular expression in split_axis method (closes issue 106):
>>> combined = ndrange('a_b = a0b0..a1b2') >>> combined a_b | a0b0 | a0b1 | a0b2 | a1b0 | a1b1 | a1b2 | 0 | 1 | 2 | 3 | 4 | 5 >>> combined.split_axis(x.a_b, regex='(\w{2})(\w{2})') a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5
one can assign a new axis to several groups at the same time by using axis[groups]:
>>> group1 = year[2001:2004] >>> group2 = year[2008,2009] >>> # let us change the year axis by time >>> x.time[group1, group2] (x.time[2001:2004], x.time[2008, 2009])
implemented Axis.by() which is equivalent to axis[:].by() and divides the axis into several groups of specified length:
>>> year = Axis('year', '2010..2016') >>> year.by(3) (year.i[0:3], year.i[3:6], year.i[6:7])
which is equivalent to (year[2010:2012], year[2013:2015], year[2016]). Like for groups, the optional second argument specifies the step between groups
>>> year.by(3, step=4) (year.i[0:3], year.i[4:7])
which is equivalent to (year[2010:2012], year[2014:2016]). And if step is smaller than length, we get overlapping groups, which can be useful for example for moving averages.
>>> year.by(3, 2) (year.i[0:3], year.i[2:5], year.i[4:7], year.i[6:7])
which is equivalent to (year[2010:2012], year[2012:2014], year[2014:2016], year[2016])
implemented larray_nan_equal to test whether two arrays are identical even in the presence of nan values. Two arrays are considered identical by larray_equal if they have exactly the same axes and data. However, since a nan value has the odd property of not being equal to itself, larray_equal returns False if either array contains a nan value. larray_nan_equal returns True if all not-nan data is equal and both arrays have nans at the same place.
>>> arr1 = ndtest((2, 3), dtype=float) >>> arr1['a1', 'b1'] = nan >>> arr1 a\b | b0 | b1 | b2 a0 | 0.0 | 1.0 | 2.0 a1 | 3.0 | nan | 5.0 >>> arr2 = arr1.copy() >>> arr2 a\b | b0 | b1 | b2 a0 | 0.0 | 1.0 | 2.0 a1 | 3.0 | nan | 5.0 >>> larray_equal(arr1, arr2) False >>> larray_nan_equal(arr1, arr2) True >>> arr2['b1'] = 0.0 >>> larray_nan_equal(arr1, arr2) False
viewer: make keyboard shortcuts work even when the focus is not on the array editor widget. It means that, for example, plotting an array (via Ctrl-P) or opening it in Excel (Ctrl-E) can be done directly even when interacting with the list of arrays or within the interactive console (closes issue 102).
viewer: automatically display plots done in the viewer console in a separate window (see example below), unless “%matplotlib inline” is used.
>>> arr = ndtest((3, 3)) >>> arr.plot()
viewer: when calling view(an_array) from within the viewer, the new window opened does not block the initial window, which means you can have several windows open at the same time. view() without argument can still result in odd behavior though.
improved LArray.set_labels to make it possible to replace only some labels of an axis, instead of all of them and to replace labels from several axes at the same time.
>>> a = ndrange('nat=BE,FO;sex=M,F') >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3
to replace only some labels, one must give a mapping giving the new label for each label to replace
>>> a.set_labels(x.sex, {'M': 'Men'}) nat\sex | Men | F BE | 0 | 1 FO | 2 | 3
to replace labels for several axes at the same time, one should give a mapping giving the new labels for each changed axis
>>> a.set_labels({'sex': 'Men,Women', 'nat': 'Belgian,Foreigner'}) nat\sex | Men | Women Belgian | 0 | 1 Foreigner | 2 | 3
one can also replace some labels in several axes by giving a mapping of mappings
>>> a.set_labels({'sex': {'M': 'Men'}, 'nat': {'BE': 'Belgian'}}) nat\sex | Men | F Belgian | 0 | 1 FO | 2 | 3
allowed matrix multiplication (@ operator) between arrays with dimension != 2 (closes issue 122).
improved LArray.plot to get nicer plots by default. The axes are transposed compared to what they used to, because the last axis is often used for time series. Also it considers a 1D array like a single series, not N series of 1 point.
added installation instructions (closes issue 101).
Axis.group and Axis.all are now deprecated (closes issue 148).
>>> city.group(['London', 'Brussels'], name='capitals') # should be written as: >>> city[['London', 'Brussels']] >> 'capitals'
and
>>> city.all() # should be written as: >>> city[:] >> 'all'
viewer: allow changing the number of displayed digits even for integer arrays as that makes sense when using scientific notation (closes issue 100).
viewer: fixed opening a viewer via view() edit() or compare() from within the viewer (closes issue 109)
viewer: fixed compare() colors when arrays have values which are very close but not exactly equal (closes issue 123)
viewer: fixed legend when plotting arbitrary rows (it always displayed the labels of the first rows) (closes issue 136).
viewer: fixed labels on the x axis when zooming on a plot (closes issue 143)
viewer: fixed storing an array in a variable with a name which existed previously but which was not displayable in the viewer, such as the name of any function or special object. In some cases, this error lead to a crash of the viewer. For example, this code failed when run in the viewer console, because x is already defined (for the x. syntax):
>>> x = ndtest(3)
fixed indexing an array using a positional group with a position which corresponds to a label on that axis. This used to return the wrong data (the data corresponding to the position as if it was the key).
>>> a = Axis('a', '1..3') >>> arr = ndrange(a) >>> arr a | 1 | 2 | 3 | 0 | 1 | 2 >>> # this used to return 0 ! >>> arr[a.i[1]] 1
fixed == for positional groups (closes issue 93)
>>> years = Axis('years', '1995..1997') >>> years Axis('years', [1995, 1996, 1997]) >>> # this used to return False >>> years.i[0] == 1995 True
fixed using positional groups for their value in many cases (slice bounds, within list of values, within other groups, etc.). For example, this used to fail:
>>> arr = ndtest((2, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 >>> b = arr.b >>> start = b.i[0] # equivalent to start = 'b0' >>> stop = b.i[2] # equivalent to stop = 'b2' >>> arr[start:stop] a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 4 | 5 | 6 >>> arr[[b.i[0], b.i[2]]] a\b | b0 | b2 a0 | 0 | 2 a1 | 4 | 6
fixed posargsort labels (closes issue 137).
fixed labels when doing group aggregates using positional groups. Previously, it used the positions as labels. This was most visible when using the Group.by() method (which creates positional groups).
>>> years = Axis('years', '2010..2015') >>> arr = ndrange(years) >>> arr years | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 0 | 1 | 2 | 3 | 4 | 5 >>> arr.sum(years.by(3)) years | 2010:2012 | 2013:2015 | 3 | 12
While this used to return:
>>> arr.sum(years.by(3)) years | 0:3 | 3:6 | 3 | 12
fixed Group.by() when the group was a slice with either bound unspecified. For example, years[2010:2015].by(3) worked but years[:].by(3), years[2010:].by(3) and years[:2015].by(3) did not.
fixed a speed regression in version 0.18 and later versions compared to 0.17. In some cases, it was up to 40% slower than it should (closes issue 165).
Version 0.20¶
Released on 2017-02-09.
To make sure all users have all optional dependencies installed and use the same version of packages, and to simplify the update process, we created a new “larrayenv” package which will install larray itself AND all its dependencies (including the optional ones). This means that this version needs to be installed using:
conda install larrayenv
in the future, to update from one version to the next, it should always be enough to do:
conda update larrayenv
implemented from_lists() to create constant arrays (instead of using LArray directly as that is very error prone). We are not really happy with its name though, so it might change in the future. Any suggestion of a better name is very welcome (closes issue 30).
>>> from_lists([['sex\\year', 1991, 1992, 1993], ... [ 'M', 0, 1, 2], ... [ 'F', 3, 4, 5]]) sex\year | 1991 | 1992 | 1993 M | 0 | 1 | 2 F | 3 | 4 | 5
added support for loading sparse arrays via open_excel().
For example, assuming you have a sheet like this:
age | sex\year | 2015 | 2016 10 | F | 0.0 | 1.0 10 | M | 2.0 | 3.0 20 | M | 4.0 | 5.0
loading it will yield:
>>> wb = open_excel('test_sparse.xlsx') >>> arr = wb['Sheet1'].load() >>> arr age | sex\year | 2015 | 2016 10 | F | 0.0 | 1.0 10 | M | 2.0 | 3.0 20 | F | nan | nan 20 | M | 4.0 | 5.0
allowed to get an axis from an array by using array.axis_name in addition to array.axes.axis_name:
>>> arr = ndtest((2, 3)) >>> arr.axes AxisCollection([ Axis('a', ['a0', 'a1']), Axis('b', ['b0', 'b1', 'b2']) ]) >>> arr.a Axis('a', ['a0', 'a1'])
viewer: several rows/columns can be plotted together. It draws a separate line for each row except if only one column has been selected.
viewer: the array labels are used as “ticks” in plots.
‘_by’ aggregation methods accept groups in addition to axes (closes issue 59). It will keep only the mentioned groups and aggregate all other dimensions:
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
>>> arr.sum_by('c0,c1;c1:c3') c | c0,c1 | c1:c3 | 126 | 216
viewer: view() and edit() now accept as argument a path to a file containing arrays.
>>> view('myfile.h5')
this is a shortcut for:
>>> view(Session('myfile.h5'))
AxisCollection.without now accepts a single integer position (to exclude an axis by position).
>>> a = ndtest((2, 3)) >>> a.axes AxisCollection([ Axis('a', ['a0', 'a1']), Axis('b', ['b0', 'b1', 'b2']) ]) >>> a.axes.without(0) AxisCollection([ Axis('b', ['b0', 'b1', 'b2']) ])
nicer display (repr) for LSet (closes issue 44).
>>> x.b['b0,b2'].set() x.b['b0', 'b2'].set()
implemented sep argument for LArray & AxisCollection.combine_axes() to allow using a custom delimiter (closes issue 53).
added a check that ipfp target sums haves expected axes (closes issue 42).
when the nb_index argument is not provided explicitly in read_excel(engine=’xlrd’), it is autodetected from the position of the first “” (closes issue 66).
allow any special character except “.” and whitespace when creating axes labels using “..” syntax (previously only _ was allowed).
added many more I/O tests to hopefully lower our regression rate in the future (closes issue 70).
viewer: selection of entire rows/columns will load any remaining data, if any (closes issue 37). Previously if you selected entire rows or columns of a large dataset (which is not loaded entirely from the start), it only selected (and thus copied/plotted) the part of the data which was already loaded.
viewer: filtering on anonymous axes is now possible (closes issue 33).
fixed loading sparse files using read_excel() (fixes issue 29).
fixed nb_index argument for read_excel().
fixed creating range axes with a negative start bound using string notation (e.g. Axis(‘name’, ‘-1..10’)) (fixes issue 51).
fixed ptp() function.
fixed with_axes() to copy the title of the array.
fixed Group >> ‘name’.
fixed workbook[sheet_position] when using open_excel().
fixed plotting in the viewer when using Qt4.
Version 0.19¶
Released on 2017-01-19.
Implemented a “by” variant to all aggregate methods (e.g. sum_by, mean_by, etc.). These methods aggregate all axes except those listed, which means the only axes remaining after the aggregate operation will be those listed. For example: arr.sum_by(x.a) is equivalent to arr.sum(arr.axes - x.a)
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23 >>> arr.sum_by(x.b) b | b0 | b1 | b2 | 60 | 92 | 124
Added .extend() method to Axis class
>>> a = Axis('a', 'a0..a2') >>> a Axis('a', ['a0', 'a1', 'a2']) >>> other = Axis('other', 'a3..a5') >>> a.extend(other) Axis('a', ['a0', 'a1', 'a2', 'a3', 'a4', 'a5'])
or directly specify the extra labels as a list or as a “label string”:
>>> a.extend('a3..a5') Axis('a', ['a0', 'a1', 'a2', 'a3', 'a4', 'a5'])
Added title argument to all array creation functions (ndrange, zeros, ones, …) and display it in the .info of array objects.
>>> a = ndrange(3, title='a simple test array') >>> a.info a simple test array 3 {0}* [3]: 0 1 2
implemented creating an Axis using a group:
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> a, b = arr.axes >>> zeros((a, b[:'b1'])) a\b | b0 | b1 a0 | 0.0 | 0.0 a1 | 0.0 | 0.0
made Axis.startswith, .endswith and .matches accept Group instances
>>> a = Axis('a', 'a0..b2') >>> a Axis('a', ['a0', 'a1', 'a2', 'b0', 'b1', 'b2'])
>>> prefix = Axis('prefix', 'a,b') >>> a.startswith(prefix['a']) a['a0', 'a1', 'a2'] >>> a.startswith(prefix.i[1]) a['b0', 'b1', 'b2']
implemented all usual binary operations (+, -, *, /, …) on Group
>>> year = Axis('year', '2011..2016') >>> year[2013] + 1 2014 >>> year.i[2] + 1 2014
made the viewer is much more useful as a debugger in the middle of a function by generalizing SessionEditor to handle any mapping, instead of only Session objects but made it list and display only array objects. To view the value of non-array variable one should type their name in the console. Given those changes, view() will superficially behave as before, but behind the scene, all variables which were defined in the scope where view() was called will be available in the viewer console, even though they will not appear in the list on the left. This means that the viewer console will be able to use scalars defined at that point and call others functions of your code. In other words, there are more chances you can execute some code from the function calling view() by simply copy-pasting the code line.
LGroup lost set-like operations (intersection and union) to the profit of a specific subclass (LSet). In other words, this no longer works:
>>> letters = Axis('letters', 'a..z') >>> letters[':c'] & letters['b:']
To make it work, we need to convert the LGroup(s) to LSets explicitly:
>>> letters[':c'].set() & letters['b:d'].set() letters.set[OrderedSet(['b', 'c'])]
>>> letters[':c'].set() | letters['b:d'].set() letters.set[OrderedSet(['a', 'b', 'c', 'd'])]
>>> letters[':c'].set() - 'b' letters.set[OrderedSet(['a', 'c'])]
group aggregates produce simple string labels for the new aggregated axis instead of using the group themselves as labels. This means one can no longer know where a group comes from but this simplifies the code and fixes a few issues, most notably export of aggregated arrays to Excel, and some operations between two aggregated arrays.
>>> arr = ndtest((3, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 a2 | 8 | 9 | 10 | 11 >>> agg = arr.sum(':b2 >> tob2;b2,b3 >> other') >>> agg a\b | tob2 | other a0 | 3 | 5 a1 | 15 | 13 a2 | 27 | 21 >>> agg.info 3 x 2 a [3]: 'a0' 'a1' 'a2' b [2]: 'tob2' 'other' >>> agg.axes.b.labels[0] 'tob2'
In previous versions this would have returned:
>>> agg.axes.b.labels[0] LGroup(':b2', name='tob2', axis=Axis('b', ['b0', 'b1', 'b2', 'b3']))
a string containing only a single “integer-like” is no longer transformed to an integer e.g. “10” will evaluate to (the string) “10” (like in version 0.17 and earlier) while “10,20” will evaluate to the list of integers: [10, 20]
changed how Group instances are displayed.
>>> a = Axis('a', 'a0..a2') >>> a['a1,a2'] a['a1', 'a2']
fixed > and >= on Group using slices
avoid a division by 0 warning when using divnot0
viewer: fixed plots when Qt5 is installed. This also removes the matplotlib warning people got when running the viewer with Qt5 installed.
viewer: display array when typing its name in the console even when no array was selected previously
misc code cleanup, improved docstrings, …
Version 0.18¶
Released on 2016-12-20.
the documentation (docstrings) of many functions was vastly improved (thanks to Alix)
implemented a new optional syntax to generate sequences of labels for axes by using patterns
integer strings generate integers
>>> ndrange('age=0..10') age | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
you can combine letters and numbers. The number part is treated like increasing (or decreasing numbers)
>>> ndrange('lipro=P01..P12') lipro | P01 | P02 | P03 | P04 | P05 | P06 | P07 | P08 | P09 | P10 | P11 | P12 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11
letter patterns generate all combination of letters between the start and end:
>>> ndrange('test=AA..CC') test | AA | AB | AC | BA | BB | BC | CA | CB | CC | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
other characters are left intact (and should be the same on the start and end patterns:
>>> ndrange('test=A_1..C_2') test | A_1 | A_2 | B_1 | B_2 | C_1 | C_2 | 0 | 1 | 2 | 3 | 4 | 5
this also works within Axis()
>>> Axis('age', '0..10') Axis('age', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
implemented new syntax for defining groups using strings:
>>> arr = ndtest((3, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 a2 | 8 | 9 | 10 | 11
groups can be named using “>>” instead of “=” previously
>>> arr.sum('b1,b3 >> b13;b0:b2 >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
if some labels are ambiguous, one can specify the axis by using “axis_name[labels]”:
>>> arr.sum('b[b1,b3] >> b13;b[b0:b2] >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
groups can also be defined by position using this syntax:
>>> arr.sum('b.i[1,3] >> b13;b.i[0:3] >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
A few notes:
the goal was to have that syntax as close as the “normal” syntax as possible (just remove the “x.” and all inner quotes).
in models, the normal syntax should be preferred, so that the groups can be stored in a variable and reused in several places
strings representing integers are evaluated as integers.
there is experimental support for evaluating expressions within string groups by using “{expr}”, but this is fragile and might be removed in the future.
implemented combine_axes & split_axis on arrays:
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
>>> arr2 = arr.combine_axes((x.a, x.b)) >>> arr2 a_b\c | c0 | c1 | c2 | c3 a0_b0 | 0 | 1 | 2 | 3 a0_b1 | 4 | 5 | 6 | 7 a0_b2 | 8 | 9 | 10 | 11 a1_b0 | 12 | 13 | 14 | 15 a1_b1 | 16 | 17 | 18 | 19 a1_b2 | 20 | 21 | 22 | 23
>>> arr2.split_axis(x.a_b) a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
implemented .by() method on groups which splits them into subgroups of specified length
>>> arr = ndtest((5, 2)) >>> arr a\b | b0 | b1 a0 | 0 | 1 a1 | 2 | 3 a2 | 4 | 5 a3 | 6 | 7 a4 | 8 | 9
>>> arr.sum(a['a0':'a4'].by(2)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a2' 'a3'] | 10 | 12 a['a4'] | 8 | 9
there is also an optional second argument to specify the “step” between groups
>>> arr.sum(a['a0':'a4'].by(2, step=3)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a3' 'a4'] | 14 | 16
if the step is < the group size, you get overlapping groups:
>>> arr.sum(a['a0':'a4'].by(2, step=1)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a1' 'a2'] | 6 | 8 a['a2' 'a3'] | 10 | 12 a['a3' 'a4'] | 14 | 16 a['a4'] | 8 | 9
groups can be renamed using >> (in addition to the “named” method)
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.sum((x.b['b0,b1'] >> 'b01', x.b['b1,b2'] >> 'b12')) a\b | b01 | b12 a0 | 1 | 3 a1 | 7 | 9
implemented rationot0
>>> a = Axis('a', 'a0,a1') >>> b = Axis('b', 'b0,b1,b2') >>> arr = LArray([[6, 0, 2], ... [4, 0, 8]], [a, b]) >>> arr a\b | b0 | b1 | b2 a0 | 6 | 0 | 2 a1 | 4 | 0 | 8 >>> arr.sum() 20 >>> arr.rationot0() a\b | b0 | b1 | b2 a0 | 0.3 | 0.0 | 0.1 a1 | 0.2 | 0.0 | 0.4 >>> arr.rationot0(x.a) a\b | b0 | b1 | b2 a0 | 0.6 | 0.0 | 0.2 a1 | 0.4 | 0.0 | 0.8
for reference, the normal ratio method would return:
>>> arr.ratio(x.a) a\b | b0 | b1 | b2 a0 | 0.6 | nan | 0.2 a1 | 0.4 | nan | 0.8
implemented [] on groups so that you can further subset them
added a new “condensed” option for ipfp’s display_progress argument to get back the old behavior
changed how named groups are displayed (only the name is displayed)
positional groups gained a few features and are almost on par with label groups now
when iterating over an axis (for example when doing “for y in year_axis:” it yields groups (instead of raw labels) so that it works even in the presence of ambiguous labels.
Axis.startswith, endswith, matches create groups which include the axis (so that those groups work even if the labels exist on several axes)
fixed Session.summary() when arrays in the session have axes without name
fixed full() and full_like() with an explicit dtype (the dtype was ignored)
Version 0.17¶
Released on 2016-11-29.
added ndtest function to create n-dimensional test arrays (of given shape). Axes are named by single letters starting from ‘a’. Axes labels are constructed using a ‘{axis_name}{label_pos}’ pattern (e.g. ‘a0’).
>>> ndtest(6) a | a0 | a1 | a2 | a3 | a4 | a5 | 0 | 1 | 2 | 3 | 4 | 5 >>> ndtest((2, 3)) a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> ndtest((2, 3), label_start=1) a\b | b1 | b2 | b3 a1 | 0 | 1 | 2 a2 | 3 | 4 | 5
allow naming “one-shot” groups in group aggregates.
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.sum('g1=b0;g2=b1,b2;g3=b0:b2') a\b | 'g1' ('b0') | 'g2' (['b1' 'b2']) | 'g3' ('b0':'b2') a0 | 0 | 3 | 3 a1 | 3 | 9 | 12
implemented argmin, argmax, posargmin, posargmax without an axis argument (works on the full array).
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.argmin() ('a0', 'b0')
added preliminary code to add a title attribute to LArray.
This needs a lot more work to be really useful though, as it can currently only be used in the LArray() function itself and is only used in Session.summary() (see below). There are many places where this should be used, but this is not done yet.
added Session.summary() which displays a list of all arrays, their dimension names and title if any.
This can be used in combination with local_arrays() to produce some kind of codebook with all the arrays of a function.
>>> arr = LArray([[1, 2], [3, 4]], 'sex=M,F;nat=BE,FO', title='a test array') >>> arr sex\nat | BE | FO M | 1 | 2 F | 3 | 4 >>> s = Session({'arr': arr}) >>> s Session(arr) >>> print(s.summary()) arr: sex, nat a test array
fixed using groups from other (compatible) axis
fixed group aggregates using groups without axis
fixed axis[another_label_group] when said group had a non-string Axis
fixed axis.group(another_label_group, name=’a_name’) (name was not set correctly)
fixed ipfp progress message when progress is negative
when setting part of an array in the console (by using e.g. arr[‘M’] = 10), display that array
when typing in the console the name of an existing array, select it in the list
fixed missing tooltips for arrays added to the session from within the session viewer
fixed window title (with axes info) not updating in many cases
fixed the filters bar not being cleared when displaying a non-LArray object after an LArray object
improved messages in ipfp(display_progress=True)
improved tests, docstrings, …
Version 0.16.1¶
Released on 2016-11-04.
renamed “Ok” button in array/session viewer to “Close”.
added apply and discard buttons in session editor, which permanently apply or discard changes to the current array.
fixed array[sequence, scalar] = value
fixed array.to_excel() which was broken in 0.16 (by the upgrade to xlwings 0.9+).
improved a few tests
Version 0.16¶
Released on 2016-10-26.
Warning: this release needs to be installed using:
conda update larray conda update xlwings
implemented support for xlwings 0.9+. This allowed us to change the way we interact with Excel:
by default, the Excel instance we use is configured to be both hidden and silent (for example, it does not prompt to update/edit links).
by default, we now use a dedicated Excel instance for each call to open_excel, instead of reusing any existing instance if there was any open. In practice, it means input/output from/to Excel is more reliable and does not risk altering any workbook you had open (except if you ask for that explicitly). The cost of this is that it is slower by default. If you open many different workbooks, it is recommended that you create a single Excel instance and reuse it. This can be done with:
>>> from larray import * >>> import xlwings as xw
>>> app = xw.App(visible=False, add_book=False) >>> wb1 = open_excel('workbook1.xlsx', app=app) # use wb1 as before >>> wb1.close() >>> wb2 = open_excel('workbook2.xlsx', app=app) # use wb2 as before >>> wb2.close() >>> app.quit()
added ipfp function which does Iterative Proportional Fitting Procedure (also known as bi-proportional fitting in statistics or RAS algorithm in economics). Note that this new function is currently not in the core module, so it needs a specific import command:
>>> from larray.ipfp import ipfp
>>> a = Axis('a', 2) >>> b = Axis('b', 2) >>> initial = LArray([[2, 1], ... [1, 2]], [a, b]) >>> initial a*\b* | 0 | 1 0 | 2 | 1 1 | 1 | 2 >>> target_sum_along_a = LArray([2, 1], b) >>> target_sum_along_b = LArray([1, 2], a) >>> ipfp([target_sum_along_a, target_sum_along_b], initial, threshold=0.01) a*\b* | 0 | 1 0 | 0.8450704225352113 | 0.15492957746478875 1 | 1.1538461538461537 | 0.8461538461538463
made it possible to create arrays more succintly in some usual cases (especially for quick arrays for testing purposes). Previously, when one created an array from scratch, he had to provide Axis object(s) (or another array). Note that the following examples use zeros() but this change affects all array creation functions (ones, zeros, ndrange, full, empty):
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> zeros([nat, sex]) nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
Now, when you have axe names and axes labels but do not have/want to reuse an existing axis, you can use this syntax:
>>> zeros([('nat', ['BE', 'FO']), ... ('sex', ['M', 'F'])]) nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
If additionally all axe names and labels are strings (not integers or other types) which do not contain any special character (“=”, “,” or “;”) you can use:
>>> zeros('nat=BE,FO;sex=M,F') nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
See below (*) for some more alternate syntaxes and an explanation of how this works.
added additional, less error-prone syntax for stack:
>>> nat = Axis('nat', 'BE,FO') >>> arr1 = ones(nat) >>> arr1 nat | BE | FO | 1.0 | 1.0 >>> arr2 = zeros(nat) >>> arr2 nat | BE | FO | 0.0 | 0.0 >>> stack([('M', arr1), ('F', arr2)], 'sex') nat\sex | H | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
in addition to the still supported but discouraged (because one has to remember the order of labels):
>>> sex = Axis('sex', ['M', 'F']) >>> stack((arr1, arr2), sex) nat\sex | H | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
added LArray.compact and Session.compact() to detect and remove “useless” axes (ie axes for which values are constant over the whole axis)
>>> a = LArray([[1, 2], [1, 2]], [Axis('sex', 'M,F'), Axis('nat', 'BE,FO')]) >>> a sex\nat | BE | FO M | 1 | 2 F | 1 | 2 >>> a.compact() nat | BE | FO | 1 | 2
made Session keep the order in which arrays were added to it. The main goal was to make this work:
>>> b, a = s['b', 'a']
Previously, since sessions were always traversed alphabetically, this was a dangerous operation because if the keys (a and b) were not sorted alphabetically, the result would not be in the expected order:
s[‘b’, ‘a’] previously returned a, b instead of b, a !!
Session.names is still sorted alphabetically though (Session.keys() is not)
added LArray.with_axes(axes) to return a new LArray with the same data but different axes
>>> a = ndrange(2) >>> a {0}* | 0 | 1 | 0 | 1 >>> a.with_axes([Axis('sex', 'H,F')]) sex | H | F | 0 | 1
changed width from which an LArray is summarized (using “…”) from 80 characters to 200.
implemented memory_used property which displays nbytes in human-readable form
>>> a = ndrange('sex=H,F;nat=BE,FO') >>> a.memory_used '16 bytes' >>> a = ndrange(100000) >>> a.memory_used '390.62 Kb'
implemented Axis + AxisCollection
>>> a = ndrange('sex=M,F;type=t1,t2') >>> Axis('nat', 'BE,FO') + a.axes AxisCollection([ Axis('nat', ['BE', 'FO']), Axis('sex', ['M', 'F']), Axis('type', ['t1', 't2']) ])
(*) For the curious, there are also many syntaxes supported for array creation functions. In fact, during array creation, at any place a list or tuple of values is expected, you can specify it using a single string, which will be split successively at the following characters if present: “;” then “=” then “,”. If you apply that algorithm to ‘nat=BE,FO;sex=M,F’, you get:
‘nat=BE,FO;sex=M,F’
(‘nat=BE,FO’, ‘sex=M,F’)
((‘nat’, ‘BE,FO’), (‘sex’, ‘M,F’))
((‘nat’, (‘BE’, ‘FO’)), (‘sex’, (‘M’, ‘F’)))
Recognise this last syntax? This is the same as above, except above we replaced some () with [] for clarity. In fact all the intermediate forms here above are valid (and equivalent) in array creation functions.
Version 0.15¶
Released on 2016-09-23.
added new methods on axes: matches, startswith, endswith
>>> country = Axis('country', ['FR', 'BE', 'DE', 'BR']) >>> country.matches('BE|FR') LGroup(['FR', 'BE']) >>> country.matches('^..$') # labels 2 characters long LGroup(['FR', 'BE', 'DE', 'BR'])
>>> country.startswith('B') LGroup(['BE', 'BR']) >>> country.endswith('R') LGroup(['FR', 'BR'])
implemented set-like operations on LGroup: & (intersection), | (union), - (difference). Slice groups do not work yet on axes references (x.) but that will come in the future…
>>> alpha = Axis('alpha', 'a,b,c,d') >>> alpha['a', 'b'] | alpha['c', 'd'] LGroup(['a', 'b', 'c', 'd'], axis=…) >>> alpha['a', 'b', 'c'] | alpha['c', 'd'] LGroup(['a', 'b', 'c', 'd'], axis=…)
a name is computed automatically when both operands are named
>>> r = alpha['a', 'b'].named('ab') | alpha['c', 'd'].named('cd') >>> r.name 'ab | cd' >>> r.key ['a', 'b', 'c', 'd']
numeric axes work too
>>> num = Axis('num', range(10)) >>> num[:2] | num[8:] num[0, 1, 2, 8, 9] >>> num[:2] | num[5] num[0, 1, 2, 5])
intersection
>>> LGroup(['a', 'b', 'c']) & LGroup(['c', 'd']) LGroup(['c'])
difference
>>> LGroup(['a', 'b', 'c']) - LGroup(['c', 'd']) LGroup(['a', 'b']) >>> LGroup(['a', 'b', 'c']) - 'b' LGroup(['a', 'c'])
fixed loading 1D arrays using open_excel
added tooltip with the axes labels corresponding to each cell of the array viewer
added name and dimensions of the current array to the window title bar in the session viewer
added tooltip with each array .info() in the list of arrays of the session viewer
fixed eval box throwing an exception when trying to set a new variable (if qtconsole is not present)
fixed group aggregates using LGroups defined using axes references (x.), for example:
>>> arr.sum(x.age[:10])
fixed group aggregates using anonymous axes
Version 0.14.1¶
Released on 2016-08-12.
fixed support for loading arrays without axe names from Excel files (in that case index_col/nb_index are necessary)
fixed using a single int for index_col in read_excel() and sheet.load()
fixed loading empty Excel sheets via xlwings correctly (ie do not crash)
fixed dumping a session loaded from an H5 file to Excel
Version 0.14¶
Released on 2016-08-10.
This version is not compatible with the new version of xlwings that just came out. Consequently, upgrading to this version is different from the usual “conda update larray”. You should rather use:
conda update larray –no-update-deps
To get the most of this release, you should also install the “qtconsole” package via:
conda install qtconsole
upgraded session viewer/editor to work like a super-calculator. The input box below the array view can be used to type any expression. eg array1.sum(x.age) / array2, which will be displayed in the viewer. One can also type assignment commands, like: array3 = array1.sum(x.age) / array2 In which case, the new array will be displayed in the viewer AND added to the session (appear on the list on the left), so that you can use it in other expressions.
- If you have the “qtconsole” package installed (see above), that input box will be a full ipython console. This means:
history of typed commands,
tab-completion (for example, type “nd<tab>” and it will change to “ndrange”),
syntax highlighting,
calltips (show the documentation of functions when typing commands using them),
help on functions using “?”. For example, type “ndrange?<enter>” to get the full documentation about ndrange. Use <ESC> or <q> to quit that screen !),
etc.
When having the “qtconsole” package installed, you might get a warning when starting the viewer:
WARNING:root:Message signing is disabled. This is insecure and not recommended!
This is totally harmless and can be safely ignored !
made view() and edit() without argument equivalent to view(local_arrays()) and edit(local_arrays()) respectively.
made the viewer on large arrays start a lot faster by using a small subset of the array to guess the number of decimals to display and whether or not to use scientific notation.
- improved compare():
added support for comparing sessions. Arrays with differences between sessions are colored in red.
use a single array widget instead of 3. This is done by stacking arrays together to create a new dimension. This has the following advantages:
the filter and scrollbars are de-facto automatically synchronized.
any number of arrays can be compared, not just 2. All arrays are compared to the first one.
arrays with different sets of compatible axes can be compared (eg compare an array with its mean along an axis).
added label to show maximum absolute difference.
implemented edit(session) in addition to view(session).
added support for copying sheets via: wb[‘x’] = wb[‘y’] if ‘x’ sheet already existed, it is completely overwritten.
improved performance. My test models run about 10% faster than with 0.13.
made cumsum and cumprod aggregate on the last axis by default so that the axis does not need to be specified when there is only one.
implemented much better support for operations using arrays of different types. For example,
fixed create_sequential when mult, inc and initial are of different types eg create_sequential(…, initial=1, inc=0.1) had an unexpected integer result because it always used the type of the initial value for the output
when appending a string label to an integer axis (eg adding total to an age axis by using with_total()), the resulting axis should have a mixed type, and not be suddenly all string.
stack() now supports arrays with different types.
made stack support arrays with different axes (the result has the union of all axes)
use xlwings (ie live Excel instance) by default for all Excel input/output, including read_excel(), session.dump and session.load/Session(filename). This has the advantage of more coherent results among the different ways to load/save data to Excel and that simple sessions correctly survive a round-trip to an .xlsx workbook (ie (named) axes are detected properly). However, given the very different library involved, we loose most options that read_excel used to provide (courtesy of pandas.read_excel) and some bugs were probably introduced in the conversion.
fixed creating a new file via open_excel()
fixed loading 1D arrays (ranges with height 1 or width 1) via open_excel()
fixed sheet[‘A1’] = array in some cases
wb.close() only really close if the workbook was not already open in Excel when open_excel was called (so that we do not close a workbook a user is actually viewing).
added support for wb.save(filename), or actually for using any relative path, instead of a full absolute path.
when dumping a session to Excel, sort sheets alphabetically instead of dumping them in a “random” order.
try to convert float to int in more situations
added support for using stack() without providing an axis. It creates an anonymous wildcard axis of the correct length.
added aslarray() top-level function to translate anything into an LArray if it is not already one
made labels_array available via from larray import *
fixed binary operations between an array and an axis where the array appeared first (eg array > axis). Confusingly, axis < array already worked.
added check in “a[bool_larray_key]” to make sure key.axes are compatible with a.axes
made create_sequential a lot faster when mult or inc are constants
made axes without name compatible with any name (this is the equivalent of a wildcard name for labels)
misc cleanup/docstring improvements/improved tests/improved error messages
Version 0.13¶
Released on 2016-07-11.
implemented a new way to do input/output from/to Excel
>>> a = ndrange((2, 3)) >>> wb = open_excel('c:/tmp/y.xlsx') # put a at A1 in Sheet1, excluding headers (labels) >>> wb['Sheet1'] = a # dump a at A1 in Sheet2, including headers (labels) >>> wb['Sheet2'] = a.dump() # save the file to disk >>> wb.save() # close it >>> wb.close()
>>> wb = open_excel('c:/tmp/y.xlsx') # load a from the data starting at A1 in Sheet1, assuming the absence of headers. >>> a1 = wb['Sheet1'] # load a from the data starting at A1 in Sheet1, assuming the presence of (correctly formatted) headers. >>> a2 = wb['Sheet2'].load() >>> wb.close()
>>> wb = open_excel('c:/tmp/y.xlsx') # note that Sheet2 must exist >>> sheet2 = wb['Sheet2'] # write a without labels starting at C5 >>> sheet2['C5'] = a # write a with its labels starting at A10 >>> sheet2['A10'] = a.dump()
load an array with its axes information from a range. As you might have guessed, we could also use the sheet2 variable here
>>> b = wb['Sheet2']['A10:D12'].load() >>> b {0}*\{1}* | 0 | 1 | 2 0 | 0 | 1 | 2 1 | 3 | 4 | 5
load an array (raw data) with no axis information from a range.
>>> c = sheet['B11:D12'] >>> # in fact, this is not really an LArray ... >>> c <larray.excel.Range at 0x1ff1bae22e8> >>> # but it can be used as such (this is currently very experimental) >>> c.sum(axis=0) {0}* | 0 | 1 | 2 | 3.0 | 5.0 | 7.0 >>> # ... and it can be used for other stuff, like setting the formula instead of the value: >>> c.formula = '=D10+1' >>> # in the future, we should also be able to set font name, size, style, etc.
implemented LArray.rename({axis: new_name}) as well as using kwargs to rename several axes at once
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange([nat, sex]) >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3 >>> a.rename(nat='nat2', sex='gender') nat2\gender | M | F BE | 0 | 1 FO | 2 | 3 >>> a.rename({'nat': 'nat2', 'sex': 'gender'}) nat2\gender | M | F BE | 0 | 1 FO | 2 | 3
made tab-completion of axes names possible in an interactive console
taking a subset of an array with wildcard axes now returns an array with wildcard axes
fixed a case where wildcard axes were considered incompatible when they actually were compatible
better support for anonymous axes
fix for obscure bugs, better doctests, cleaner implementation for a few functions, …
Version 0.12¶
Released on 2016-06-21.
implemented boolean indexing by using axes objects:
>>> sex = Axis('sex', 'M,F') >>> age = Axis('age', range(5)) >>> a = ndrange((sex, age)) >>> a sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 1 | 2 | 3 | 4 F | 5 | 6 | 7 | 8 | 9
>>> a[age < 3] sex\age | 0 | 1 | 2 M | 0 | 1 | 2 F | 5 | 6 | 7
This new syntax is equivalent to (but currently much slower than):
>>> a[age[:2]] sex\age | 0 | 1 | 2 M | 0 | 1 | 2 F | 5 | 6 | 7
However, the power of this new syntax comes from the fact that you are not limited to scalar constants
>>> age_limit = LArray([2, 3], sex) >>> age_limit sex | M | F | 2 | 3
>>> a[age < age_limit] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7
Notice that the concerned axes are merged, so you cannot do much as much with them. For example, a[age < age_limit].sum(x.age) would not work since there is no “age” axis anymore.
To keep axes intact, one can often set the values of the corresponding cells to 0 or nan instead.
>>> a[age < age_limit] = 0 >>> a sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9 >>> # in this case, the sum *is* valid (but the mean would not -- one should use nan for that) >>> a.sum(x.age) sex | M | F | 9 | 17
To keep axes intact, this idiom is also often useful:
>>> b = a * (age >= age_limit) >>> b sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9
This also works with axes references (x.axis_name), though this is experimental and the filter value is only computed as late as possible (during []), so you cannot display it before that, like you can with “real” axes.
Using “real” axes:
>>> filter1 = age < age_limit >>> filter1 age\sex | M | F 0 | True | True 1 | True | True 2 | False | True 3 | False | False 4 | False | False >>> a[filter1] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7
With axes references:
>>> filter2 = x.age < age_limit >>> filter2 <larray.core.BinaryOp at 0x1332ae3b588> >>> a[filter2] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7 >>> a * ~filter2 sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9
implemented LArray.divnot0
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange((nat, sex)) >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3 >>> b = ndrange(sex) >>> b sex | M | F | 0 | 1 >>> a / b nat\sex | M | F BE | nan | 1.0 FO | inf | 3.0 >>> a.divnot0(b) nat\sex | M | F BE | 0.0 | 1.0 FO | 0.0 | 3.0
implemented .named() on groups to name groups after the fact
>>> a = ndrange(Axis('age', range(100))) >>> a age | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 >>> a.sum((x.age[10:19].named('teens'), x.age[20:29].named('twenties'))) age | 'teens' (10:19) | 'twenties' (20:29) | 145 | 245
made all array creation functions (ndrange, zeros, ones, full, LArray, …) more flexible:
They accept a single Axis argument instead of requiring a tuple/list of them
>>> sex = Axis('sex', 'M,F') >>> a = ndrange(sex) >>> a sex | M | F | 0 | 1
Shortcut definition for axes work
>>> ndrange("a,b,c") {0} | a | b | c | 0 | 1 | 2 >>> ndrange(["1:3", "d,e"]) {0}\{1} | d | e 1 | 0 | 1 2 | 2 | 3 3 | 4 | 5 >>> LArray([1, 5, 7], "a,b,c") {0} | a | b | c | 1 | 5 | 7
One can mix Axis objects and ints (for axes without labels)
>>> sex = Axis('sex', 'M,F') >>> ndrange([sex, 3]) sex\{1}* | 0 | 1 | 2 M | 0 | 1 | 2 F | 3 | 4 | 5
made it possible to iterate on labels of a group (eg a slice of an Axis):
>>> for year in a.axes.year[2010:]: ... # do stuff
changed representation of anonymous axes from “axisN” (where N is the position of the axis) to “{N}”. The problem was that “axisN” was not recognizable enough as an anonymous axis, and it was thus misleading. For example “a[x.axis0[…]]” would not work.
better overall support for arrays with anonymous axes or several axes with the same name
fixed all output functions (to_csv, to_excel, to_hdf, …) when the last axis has no name but other axes have one
implemented eye() which creates 2D arrays with ones on the diagonal and zeros elsewhere.
>>> eye(sex) sex\sex | M | F M | 1.0 | 0.0 F | 0.0 | 1.0
implemented the @ operator to do matrix multiplication (Python3.5+ only)
implemented inverse() to return the (matrix) inverse of a (square) 2D array
>>> a = eye(sex) * 2 >>> a sex\sex | M | F M | 2.0 | 0.0 F | 0.0 | 2.0
>>> a @ inverse(a) sex\sex | M | F M | 1.0 | 0.0 F | 0.0 | 1.0
implemented diag() to extract a diagonal or construct a diagonal array.
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange([nat, sex], start=1) >>> a nat\sex | M | F BE | 1 | 2 FO | 3 | 4 >>> d = diag(a) >>> d nat,sex | BE,M | FO,F | 1 | 4 >>> diag(d) nat\sex | M | F BE | 1 | 0 FO | 0 | 4 >>> a = ndrange(sex, start=1) >>> a sex | M | F | 1 | 2 >>> diag(a) sex\sex | M | F M | 1 | 0 F | 0 | 2
added Axis.rename method which returns a copy of the axis with a different name and deprecate Axis._rename
added labels_array as a generalized version of identity (which is deprecated)
implemented LArray.ipoints[…] to do point selection using coordinates instead of labels (aka numpy indexing)
raise an error when trying to do a[key_with_more_axes_than_a] = value instead of silently ignoring extra axes.
allow using a single int for index_col in read_csv in addition to a list of ints
implemented __getitem__ for “x”. You can now write stuff like:
>>> a = ndrange((3, 4)) >>> a[x[0][1:]] {0}\{1}* | 0 | 1 | 2 | 3 1 | 4 | 5 | 6 | 7 2 | 8 | 9 | 10 | 11 >>> a[x[1][2:]] {0}*\{1} | 2 | 3 0 | 2 | 3 1 | 6 | 7 2 | 10 | 11 >>> a.sum(x[0]) {0}* | 0 | 1 | 2 | 3 | 12 | 15 | 18 | 21
produce normal axes instead of wildcard axes on LArray.points[…]. This is (much) slower but more correct/informative.
changed the way we store axes internally, which has several consequences
better overall support for anonymous axes
better support for arrays with several axes with the same name
small performance improvement
the same axis object cannot be added twice in an array (one should use axis.copy() if that need arises)
changes the way groups with an axis are displayed
fixed sum, min, max functions on non-LArray arguments
changed __repr__ for wildcard axes to not display their labels but their length
>>> ndrange(3).axes[0] Axis(None, 3)
fixed aggregates on several groups “forgetting” the name of groups which had been created using axis.all()
allow Axis(…, long) in addition to int (Python2 only)
better docstrings/tests/comments/error messages/thoughts/…
Version 0.11.1¶
Released on 2016-05-25.
fixed new functions full, full_like and create_sequential not being available when using from larray import *
Version 0.11¶
Released on 2016-05-25.
implemented “Copy to Excel” in context menu (Ctrl+E), to open the selection in a new Excel sheet directly, without the need to use paste. If nothing is selected, copies the whole array.
when nothing is selected, Ctrl C selects & copies the whole array to the clipboard.
when nothing is selected, Ctrl V paste at top-left corner
implemented view(dict_with_array_values)
>>> view({'a': array1, 'b': array2})
fixed copy (ctrl-C) when viewing a 2D array: it did not include labels from the first axis in that case
implemented LArray.growth_rate to compute the growth along an axis
>>> sex = Axis('sex', ['M', 'F']) >>> year = Axis('year', [2015, 2016, 2017]) >>> a = ndrange([sex, year]).cumsum(x.year) >>> a sex\year | 2015 | 2016 | 2017 M | 0 | 1 | 3 F | 3 | 7 | 12 >>> a.growth_rate() sex\year | 2016 | 2017 M | inf | 2.0 F | 1.33333333333 | 0.714285714286 >>> a.growth_rate(d=2) sex\year | 2017 M | inf F | 3.0
implemented LArray.diff (difference along an axis)
>>> sex = Axis('sex', ['M', 'F']) >>> xtype = Axis('type', ['type1', 'type2', 'type3']) >>> a = ndrange([sex, xtype]).cumsum(x.type) >>> a sex\type | type1 | type2 | type3 M | 0 | 1 | 3 F | 3 | 7 | 12 >>> a.diff() sex\type | type2 | type3 M | 1 | 2 F | 4 | 5 >>> a.diff(n=2) sex\type | type3 M | 1 F | 1 >>> a.diff(x.sex) sex\type | type1 | type2 | type3 F | 3 | 6 | 9
implemented round() (as a nicer alias to around() and round_())
>>> a = ndrange(5) + 0.5 >>> a axis0 | 0 | 1 | 2 | 3 | 4 | 0.5 | 1.5 | 2.5 | 3.5 | 4.5 >>> round(a) axis0 | 0 | 1 | 2 | 3 | 4 | 0.0 | 2.0 | 2.0 | 4.0 | 4.0
implemented Session[[‘list’, ‘of’, ‘str’]] to get a subset of a Session
>>> s = Session({'a': ndrange(3), 'b': ndrange(4), 'c': ndrange(5)}) >>> s Session(a, b, c) >>> s['a', 'c'] Session(a, c)
implemented LArray.points to do pointwise indexing instead of the default orthogonal indexing when indexing several dimensions at the same time.
>>> a = Axis('a', ['a1', 'a2', 'a3']) >>> b = Axis('b', ['b1', 'b2', 'b3']) >>> arr = ndrange((a, b)) >>> arr a\b | b1 | b2 | b3 a1 | 0 | 1 | 2 a2 | 3 | 4 | 5 >>> arr[['a1', 'a3'], ['b1', 'b2']] a\b | b1 | b2 a1 | 0 | 1 a3 | 6 | 7 # this selects the points ('a1', 'b1') and ('a3', 'b2') >>> arr.points[['a1', 'a3'], ['b1', 'b2']] a,b* | 0 | 1 | 0 | 7
Note that .ipoints (to do pointwise indexing with positions instead of labels – aka numpy indexing) is planned but not functional yet.
made “arr1.drop_labels() * arr2” use the labels from arr2 if any
>>> a = Axis('a', ['a1', 'a2']) >>> b = Axis('b', ['b1', 'b2']) >>> b2 = Axis('b', ['b2', 'b3']) >>> arr1 = ndrange([a, b]) >>> arr1 a\b | b1 | b2 a1 | 0 | 1 a2 | 2 | 3 >>> arr1.drop_labels(b) a\b* | 0 | 1 a1 | 0 | 1 a2 | 2 | 3 >>> arr1.drop_labels([a, b]) a*\b* | 0 | 1 0 | 0 | 1 1 | 2 | 3 >>> arr2 = ndrange([a, b2]) >>> arr2 a\b | b2 | b3 a1 | 0 | 1 a2 | 2 | 3 >>> arr1 * arr2 Traceback (most recent call last): ... ValueError: incompatible axes: Axis('b', ['b2', 'b3']) vs Axis('b', ['b1', 'b2']) >>> arr1 * arr2.drop_labels() a\b | b1 | b2 a1 | 0 | 1 a2 | 4 | 9 # in versions < 0.11, it used to return: # >>> arr1.drop_labels() * arr2 # a*\b* | 0 | 1 # 0 | 0 | 1 # 1 | 2 | 3 >>> arr1.drop_labels() * arr2 a\b | b2 | b3 a1 | 0 | 1 a2 | 4 | 9 >>> arr1.drop_labels('a') * arr2.drop_labels('b') a\b | b1 | b2 a1 | 0 | 1 a2 | 4 | 9
made .plot a property, like in Pandas, so that we can do stuff like:
>>> a.plot.bar() # instead of >>> a.plot(kind='bar')
made labels from different types not match against each other even if their value is the same. This might break some code but it is both more efficient and more convenient in some cases, so let us see how it goes:
>>> a = ndrange(4) >>> a axis0 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 >>> a[1] 1 >>> # This used to "work" (and return 1) >>> a[True] … ValueError: True is not a valid label for any axis
>>> a[1.0] … ValueError: 1.0 is not a valid label for any axis
implemented read_csv(dialect=’liam2’) to read .csv files formatted like in LIAM2 (with the axes names on a separate line than the last axis labels)
implemented Session[boolean LArray]
>>> a = ndrange(3) >>> b = ndrange(4) >>> s1 = Session({'a': a, 'b': b}) >>> s2 = Session({'a': a + 1, 'b': b}) >>> s1 == s2 name | a | b | False | True >>> s1[s1 == s2] Session(b) >>> s1[s1 != s2] Session(a)
implemented experimental support for creating an array sequentially. Comments on the name of the function and syntax (especially compared to ndrange) would be appreciated.
>>> year = Axis('year', range(2016, 2020)) >>> sex = Axis('sex', ['M', 'F']) >>> create_sequential(year) year | 2016 | 2017 | 2018 | 2019 | 0 | 1 | 2 | 3 >>> create_sequential(year, 1.0, 0.1) year | 2016 | 2017 | 2018 | 2019 | 1.0 | 1.1 | 1.2 | 1.3 >>> create_sequential(year, 1.0, mult=1.1) year | 2016 | 2017 | 2018 | 2019 | 1.0 | 1.1 | 1.21 | 1.331 >>> inc = LArray([1, 2], [sex]) >>> inc sex | M | F | 1 | 2 >>> create_sequential(year, 1.0, inc) sex\year | 2016 | 2017 | 2018 | 2019 M | 1.0 | 2.0 | 3.0 | 4.0 F | 1.0 | 3.0 | 5.0 | 7.0 >>> mult = LArray([2, 3], [sex]) >>> mult sex | M | F | 2 | 3 >>> create_sequential(year, 1.0, mult=mult) sex\year | 2016 | 2017 | 2018 | 2019 M | 1.0 | 2.0 | 4.0 | 8.0 F | 1.0 | 3.0 | 9.0 | 27.0 >>> initial = LArray([3, 4], [sex]) >>> initial sex | M | F | 3 | 4 >>> create_sequential(year, initial, inc, mult) sex\year | 2016 | 2017 | 2018 | 2019 M | 3 | 7 | 15 | 31 F | 4 | 14 | 44 | 134 >>> def modify(prev_value): ... return prev_value / 2 >>> create_sequential(year, 8, func=modify) year | 2016 | 2017 | 2018 | 2019 | 8 | 4 | 2 | 1 >>> create_sequential(3) axis0* | 0 | 1 | 2 | 0 | 1 | 2 >>> create_sequential(x.year, axes=(sex, year)) sex\year | 2016 | 2017 | 2018 | 2019 M | 0 | 1 | 2 | 3 F | 0 | 1 | 2 | 3
implemented full and full_like to create arrays initialize to something else than zeros or ones
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> full([nat, sex], 42.0) nat\sex | M | F BE | 42.0 | 42.0 FO | 42.0 | 42.0 >>> initial_value = ndrange([sex]) >>> initial_value sex | M | F | 0 | 1 >>> full([nat, sex], initial_value) nat\sex | M | F BE | 0 | 1 FO | 0 | 1
performance improvements when using label keys: a[key] is faster, especially if key is large
to_excel(filepath) only closes the file if it was not open before
removed code which forced labels from .csv files to be strings (as it caused problems in many cases, e.g. ages in LIAM2 files)
made LGroups usable in Python’s builtin range() and convertible to int and float
implemented AxisCollection.union (equivalent to AxisCollection | Axis)
fixed boolean array keys (boolean filter) in combination with scalar keys (for other dimensions)
fixed support for older numpy
fixed LArray.shift(n=0)
still more work on making arrays with anonymous axes usable (not there yet)
added more tests
better docstrings/error messages…
misc. code cleanup/simplification/improved comments
Version 0.10.1¶
Released on 2016-03-25.
A single change in this release: a much more powerful to_excel function which (by default) use Excel itself to write files. Additional functionality include:
write in an existing file without overwriting existing data/sheet/…
write at a precise position
view an array in a live Excel instance (a new OR an existing workbook)
See
to_excel()
documentation for details.
Version 0.10¶
Released on 2016-03-22.
implemented dropna argument for to_csv, to_frame and to_series to avoid writing lines with either ‘all’ or ‘any’ NA values.
implemented read_sas. Needs pandas >= 0.18 (though it seems still buggy on some files).
implemented experimental support for __getattr__ and __setattr__ on LArray. One can use arr.H instead of arr[‘M’]. It only works for single string labels though (not for slices or list of labels nor integer labels). Not sure it is a good idea :).
- implemented Session +-*/
Eg. sess1 - sess2 will compute the difference on each array present in either session. If an array is present in one session and not in the other, it is replaced by “NaN”.
added .nbytes property to LArray objects (to know how many bytes of memory the array uses)
made sort_axis accept a tuple of axes
raises an error on a.i[tuple_with_len_greater_than_array_ndim]
slightly better support for axes with no name (no, still no complete support yet ;-))
improved AxisCollection: implemented __delitem__(slice), __setitem__(list), __setitem__(slice)
fixed exception on AxisCollection.index(invalid_index)
better docstrings for a few functions
misc code cleanups, refactoring & improved tests
added .dirty property on ArrayEditorWidget
fixed viewing arrays with “inf” (infinite)
fixed a few edge cases for the ndigit detection code
fixed colors in some cases in edit()
made copy-paste of large regions faster in some cases
Version 0.9.2¶
Released on 2016-03-02.
much better support for unnamed axes overall. Still a long way to go for full support, but it’s getting there…
fixed edit() for arrays with the same labels on several axes
Version 0.9.1¶
Released on 2016-03-01.
better .info for arrays with groups in axes
>>> # example using groups without a name >>> reg = la.sum((fla, wal, bru, belgium)) >>> reg.info 4 x 15 geo [4]: ['A11' ... 'A73'] ['A25' ... 'A93'] 'A21' ['A11' ... 'A21'] lipro [15]: 'P01' 'P02' 'P03' ... 'P13' 'P14' 'P15'
>>> # example using groups with a name >>> fla = geo.group(fla_str, name='Flanders') >>> wal = geo.group(wal_str, name='Wallonia') >>> bru = geo.group(bru_str, name='Brussels') >>> reg = la.sum((fla, wal, bru)) >>> reg.info 3 x 15 geo [3]: 'Flanders' (['A11' ... 'A73']) 'Wallonia' (['A25' ... 'A93']) 'Brussels' ('A21') lipro [15]: 'P01' 'P02' 'P03' ... 'P13' 'P14' 'P15'
fixed edit() with non-string labels in axes
fixed edit() with filters in some more cases
fixed ArrayEditorWidget.reject_changes and accept_changes to update the model & view accordingly (in case the widget is kept open)
avoid (harmless) error messages in some cases
Version 0.9¶
Released on 2016-02-25.
A minor but backward incompatible version (hence the bump in version number)!
fixed int_array.mean() to return floats instead of int (regression in 0.8)
larray_equal returns False when either value is not an LArray, instead of raising an exception
changed Session == Session to return an array of booleans instead of a single boolean, so that we know which array(s) differ. Code like session1 == session2, should be changed to all(session1 == session2).
implemented Session != Session
implemented Session.get(k, default) (returns default if k does not exist in Session)
implemented len() for Session objects to know how many objects are in the Session
fixed view() (regression in 0.8.1)
fixed edit() to actually apply changes on “OK”/accept_changes even when no filter change occurred after the last edit.
Version 0.8.1¶
Released on 2016-02-24.
implemented min/maxvalue arguments for edit()
do not close the window when pressing Enter
allow to start editing cells by pressing Enter
fixed copy of changed cells (copy the changed value)
fixed pasted values to not be accepted directly (they go to “changes” like for manual edits)
fixed color updates on paste
disabled experimental tooltips on headers
better error message when entering invalid values
implemented indexing by position on several dimensions at once (like numpy)
>>> # takes the first item in the first and third dimensions, leave the second dimension intact >>> arr.i[0, :, 0] <some result> >>> # sets all the cells corresponding to the first item in the first dimension and the second item in the fourth >>> # dimension >>> arr.i[0, :, :, 1] = 42
added optional ‘readonly’ argument to expand() to produce a readonly view (much faster since no copying is done)
Version 0.8¶
Released on 2016-02-16.
implemented skipna argument for most aggregate functions. defaults to True.
implemented LArray.sort_values(key)
implemented percentile and median
added isnan and isinf toplevel functions
made axis argument optional for argsort & posargsort on 1D arrays
fixed a[key] = value when key corresponds to a single cell of the array
fixed keepaxes argument for aggregate functions
fixed a[int_array] (when the axis needs to be guessed)
fixed empty_like
fixed aggregates on several axes given as integers e.g. arr.sum(axis=(0, 2))
fixed “kind” argument in posargsort
added title argument to edit() (set automatically if not provided, like for view())
fixed edit() on filtered arrays
fixed view(expression). anything which was not stored in a variable was broken in 0.7.1
reset background color when setting values if necessary (still buggy in some cases, but much less so ;-))
background color for headers is always on
view() => array cells are not editable, instead of being editable and ignoring entered values
fixed compare() colors when arrays are entirely equal
fixed error message for compare() when PyQt is not available
bump numpy requirement to 1.10, implicitly dropping support for python 3.3
renamed view module to editor to not collide with view function
improved/added a few tests
Version 0.7.1¶
Released on 2016-01-29.
implemented paste (ctrl-V)
implemented experimental array comparator:
>>> compare(array1, array2)
Known limitation: the arrays must have exactly the same axes and the background color is buggy when using filters
when no title is specified in view(), it is determined automatically by inspecting the local variables of the function where view() is called and using the names of the ones matching the object passed. If several matches, up to 3 are displayed.
added axes names to copy (ctrl-C)
fixed copy (ctrl-C) of 0d array
added ‘dialect’ argument to to_csv. For example, dialect=’classic’ does not include the last (horizontal) axis name.
fixed loading .csv files without (ie ‘classic’ .csv files), though one needs to specify nb_index in that case if ndim > 2
strip spaces around axes names so that you can use “axis0<space><space>axis1” instead of “axis0axis1” in .csv files
fixed 1d arrays I/O
more precise parsing of input headers: 1 and 0 come out as int, not bool
nicer error message when using an invalid axes names
changed LArray .df property to a to_frame() method so that we can pass options to it
Version 0.7¶
Released on 2016-01-26.
implemented view() on Session objects
added axes length in window title and add axes info even if title is provided manually (concatenate both)
ndecimals are recomputed when toggling the scientific checkbox
allow viewing (some) non-ndarray stuff (e.g. python lists)
refactored viewer code so that the filter drop downs can be reused too
Known regression: the viewer is slow on large arrays (this will be fixed in a later release, obviously)
implemented local_arrays() to return all LArray in locals() as a Session
implemented Session.__getitem__(int_position)
implement Session(filename) to directly load all arrays from a file. Equivalent to:
>>> s = Session() >>> s.load(filename)
implemented Session.__eq__, so that you can compare two sessions and see if all arrays are equal. Suppose you want to refactor your code and make sure you get the same results.
>>> # put results in a Session >>> res = Session({'array1': array1, 'array2': array2}) >>> # before refactoring >>> res.dump('results.h5') >>> # after refactoring >>> assert Session('results.h5') == res
you can load all sheets/arrays of a file (if you do not specify which ones you want, it takes all)
loading several sheets from an excel file is now MUCH faster because the same file is kept open (apparently xlrd parses the whole file each time we open it).
you can specify a subset of arrays to dump
implemented rudimentary session I/O for .csv files, usage is a bit different from .h5 & excel files
>>> # need to specify format manually >>> s.dump('directory_name', fmt='csv') >>> # need to specify format manually >>> s = Session() >>> s.load('directory_name', fmt='csv')
pass *args and **kwargs to lower level functions in Session.load
fail when trying to read an inexistant H5 file through Session, instead of creating it
added start argument in ndrange to specify starting value
implemented Axis._rename. Not sure it’s a good idea though…
implemented identity function which takes an Axis and returns an LArray with the axis labels as values
implemented size property on AxisCollection
allow a single int in AxisCollection.without
fixed broadcast_with when other_axes contains 0-len axes
fixed a[bool_array] = value when the first axis of a is not in bool_array
fixed view() on arrays with unnamed axes
fixed view() on arrays of Python objects
various other small bugs fixed
Version 0.6.1¶
Released on 2016-01-13.
added dtype argument to all array creation functions to override default data type
aggregates can take an explicit “axis” keyword argument which can be used to target an axis by index
>>> arr.sum(axis=0)
implemented LGroup.__getitem__ & LGroup.__iter__, so that for list-based groups (ie not slices) you can write:
>>> for v in my_group: ... # some code
or
>>> my_group[0]
renamed LabelGroup to LGroup and PositionalKey to PGroup. We might want to rename the later to IGroup (to be consistent with axis.i[…]).
slightly better support for axes without name
better docstrings for a few functions
misc cleanup
fixed XXX_like(a) functions to use the same dtype than a instead of always float
fixed to_XXX with 1d arrays (e.g. to_clipboard())
fixed all() and any() toplevel functions without argument
fixed LArray without axes in some cases
fixed array creation functions with only shapes on python2
Version 0.6¶
Released on 2016-01-12.
a[bool_array_key] broadcasts missing/differently ordered dimensions and returns an LArray with combined axes
a[bool_array_key] = value broadcasts missing/differently ordered dimensions on both key and value
- implemented argmin, argmax, argsort, posargmin, posargmax, posargsort.
they do indirect operation along an axis. E.g. argmin gives the label of the minimum value, argsort gives the labels which would sort the array along that dimension. posargXXX gives the position/indexes instead of the labels.
implemented Axis.__iter__ so that one can write:
>>> for label in an_array.axes.an_axis: ... <some code>
instead of
>>> for label in an_array.axes.an_axis.labels: ... <some code>
implemented the .info property on AxisCollection
implement all/any top level functions, so that you can use them in with_total.
renamed ValueGroup to LabelGroup. We might want to rename it to LGroup to be consistent with LArray?
allow a single int as argument to LArray creation functions (ndrange et al.)
e.g. ndrange(10) is now allowed instead of ndrange([10])
use display_name in .info (ie add * next to wildcard axes in .info).
allow specifying a custom window title in view()
viewer displays booleans as True/False instead of 1/0
slightly better support for axes with no name (None). There is still a long way to go for full support though.
improved a few docstrings
nicer errors when tests results are different from expected
removed debug prints from viewer
misc cleanups
fixed view() on all-negative arrays
fixed view() on string arrays
Version 0.5¶
Released on 2015-12-15.
experimental support for indexing an LArray by another (integer) LArray
>>> array[other_array]
experimental support for LArray.drop_labels and the concept of wildcard axes
added LArray.display_name and AxisCollection.display_names which add ‘*’ next to wildcard axes
implemented where(cond, array1, array2)
implemented LArray.__iter__ so that this works:
>>> for value in array: ... <some code>
implement keepaxes=label or keepaxes=True for aggregate functions on full axes
array.sum(x.age, keepaxes=’total’)
AxisCollection.replace can replace several axes in one call
implemented .expand(out=) to expand into an existing array
removed Axis.sorted()
removed LArray.axes_names & axes_labels. One should use .axes.names & .axes.labels instead.
raise an error when trying to convert an array with more than one value to a Boolean. For example, this will fail:
>>> arr = ndrange([sex]) >>> if arr: ... <some code>
convert value to self.dtype in append/prepend
faster .extend, .append, .prepend and .expand
some code cleanup, better tests, …
fixed .extend when other has longer axes than self
Version 0.4¶
Released on 2015-12-09.
implemented LArray.expand to add dimensions
implemented prepend
implemented sort_axis
allow creating 0d (scalar) LArrays
made extend expand its arguments
made .append expand its value before appending
changed read_* to not sort data by default
more minor stuff :)
fixed loading 1d arrays
Version 0.3¶
Released on 2015-11-26.
implemented LArray.with_total(): appends axes or group aggregates to the array.
Without argument, it adds totals on all axes. It has optional keyword only arguments:
label: specify the label (“total” by default)
op: specify the aggregate function (sum by default, all other aggregates should work too)
With multiple arguments, it adds totals sequentially. There are some tricky cases. For example when, for the same axis, you add group aggregates and axis aggregates:
>>> # works but "wrong" for x.geo (double what is expected because the total also >>> # includes fla wal & bru) >>> la.with_total(x.sex, (fla, wal, bru), x.geo, x.lipro)
>>> # correct total but the order is not very nice >>> la.with_total(x.sex, x.geo, (fla, wal, bru), x.lipro)
>>> # the correct way to do it, but it is probably not entirely obvious >>> la.with_total(x.sex, (fla, wal, bru, x.geo.all()), x.lipro)
>>> # we probably want to display a warning (or even an error?) in that case. >>> # If the user really wants that behavior, he can split the operation: >>> # .with_total((fla, wal, bru)).with_total(x.geo)
implemented group aggregates without using keyword arguments. As a consequence of this, one can no longer use axis numbers in aggregates. Eg. a.sum(0) does not sum on the first axis anymore (but you can do a.sum(a.axes[0]) if needed)
implemented LArray.percent: equivalent to ratio * 100
implemented Session.filter -> returns a new Session with only objects matching the filter
implemented Session.dump -> dumps all LArray in the Session to a file
implemented Session.load -> load several LArrays from a file to a Session
Version 0.2.6¶
Released on 2015-11-24.
fixed LArray.cumsum and cumprod.
fixed all doctests just enough so that they run.
Version 0.2.5¶
Released on 2015-10-29.
many methods got (improved) docstrings (Thanks to Johan).
fixed mixing keys without axis (e.g. arr[10:15]) with key with axes (e.g. arr[x.age[10:15]]).
Version 0.2.4¶
Released on 2015-10-27.
includes an experimental (slightly inefficient) version of guess axis, so that one can write:
>>> arr[10:20]
instead of
>>> arr[age[10:20]]
Version 0.2.3¶
Released on 2015-10-19.
positional slicing via “x.” syntax (x.axis.i[:5])
view(array) is usable when doing from larray import *
fixed a nasty bug for doing “group” aggregates when there is only one dimension
Version 0.2.2¶
Released on 2015-10-15.
implement AxisCollection.replace(old_axis, new_axis)
implement positional indexing
more powerful AxisCollection.pop added support .pop(name) or .pop(Axis object)
LArray.set_labels returns a new LArray by default use inplace=True to get previous behavior
include ndrange and __version__ in __all__
fixed shift with n <= 0
Version 0.2.1¶
Released on 2015-10-14.
implemented LArray.shift(axis, n=1)
change set_labels API (axis, new_labels)
transform Axis.labels into a property so that _mapping is kept in sync
hopefully fix build
Version 0.2¶
Released on 2015-10-13.
added to_clipboard.
added embryonic documentation.
added sort_columns and na arguments to read_hdf.
added sort_rows, sort_columns and na arguments to read_excel.
added setup.py to install the module.
IO functions (to_*/read_*) now support unnamed axes. The set of supported operations is very limited with such arrays though.
to_excel sheet_name defaults to “Sheet1” like in Pandas.
reorganised files.
automated somewhat releases (added a rudimentary release script).
column titles are no longer converted to lowercase.
Version 0.1¶
Released on 2014-10-22.
How to contribute¶
Before Starting¶
Tools¶
To contribute you will need to sign up for a free GitHub account.
We use Git for version control to allow many people to work together on the project.
The documentation is written partly using reStructuredText and partly using Jupyter notebooks (for the tutorial). It is built to various formats using Sphinx and nbsphinx.
The unit tests are written using the pytest library. The compliance with the PEP8 conventions is tested using the extension pytest-pep8.
Many editors and IDE exist to edit Python code and provide integration with version control tools (like git). A good IDE, such as PyCharm, can make many of the steps below much more efficient.
Licensing¶
LArray is licensed under the GPLv3. Before starting to work on any issue, make sure you accept and are allowed to have your contributions released under that license.
Creating a development environment¶
Getting started with Git¶
GitHub has instructions for installing and configuring git.
Getting the code (for the first time)¶
You will need your own fork to work on the code. Go to the larray project page and hit the Fork
button.
You will want to clone your fork to your machine. To do it manually, follow these steps:
git clone https://github.com/your-user-name/larray.git
cd larray
git remote add upstream https://github.com/larray-project/larray.git
This creates the directory larray and connects your repository to the upstream (main project) larray repository. You can see the remote repositories:
git remote -v
If you added the upstream repository as described above you will see something like:
origin git@github.com:yourname/larray.git (fetch)
origin git@github.com:yourname/larray.git (push)
upstream git://github.com/larray-project/larray.git (fetch)
upstream git://github.com/larray-project/larray.git (push)
Creating a Python Environment¶
Before starting any development, you will need a working Python installation. It is recommended (but not required) to create an isolated larray development environment. One of the easiest way to do it is via Anaconda or Miniconda:
Install either Anaconda or miniconda as suggest earlier
Make sure your conda is up to date (
conda update conda
)Make sure that you have cloned the repository
cd
to the larray source directory
We’ll now kick off a two-step process:
Install the build dependencies
# add 'conda-forge' channel (required to install some dependencies)
conda config --add channels conda-forge
# Create and activate the build environment
conda create -n larray_dev numpy pandas pytables pyqt qtpy matplotlib xlrd openpyxl xlsxwriter pytest pytest-pep8
conda activate larray_dev
This will create the new environment, and not touch any of your existing environments, nor any existing Python installation.
To view your environments:
conda info -e
To return to your root environment:
conda deactivate
See the full conda docs here.
Build and install larray
Install larray using the following command:
python setup.py develop
This creates some kind of symbolic link between your python installation “modules” directory and your repository, so that any change in your local copy is automatically usable by other modules.
At this point you should be able to import larray from your local version:
$ python # start an interpreter
>>> import larray
>>> larray.__version__
'0.29-dev'
Starting to contribute¶
With your local version of larray, you are now ready to contribute to the project. To make a contribution, please follow the steps described bellow.
Step 1: Create a new branch¶
You want your master branch to reflect only production-ready code, so create a feature branch for making your changes. For example:
git checkout -b issue123
This changes your working directory to the issue123 branch.
Keep any changes in this branch specific to one bug or feature so it is clear
what the branch brings to the project. You can have many different branches
and switch between them using the git checkout
command.
To update this branch, you need to retrieve the changes from the master branch:
git fetch upstream
git rebase upstream/master
This will replay your commits on top of the latest larray git master. If this
leads to merge conflicts, you must resolve these before submitting your pull
request. If you have uncommitted changes, you will need to stash
them prior
to updating. This will effectively store your changes and they can be reapplied
after updating.
Step 2: Write your code¶
When writing your code, please follow the PEP8 code conventions. Among others, this means:
120 characters lines
4 spaces indentation
lowercase (with underscores if needed) variables, functions, methods and modules names
CamelCase classes names
all uppercase constants names
whitespace around binary operators
no whitespace before a comma, semicolon, colon or opening parenthesis
whitespace after commas
This summary should not prevent you from reading the PEP!
LArray is currently compatible with both Python 2 and 3. So make sure your code is compatible with both versions.
Step 3: Document your code¶
We use Numpy conventions for docstrings. Here is a template:
def funcname(arg1, arg2=default2, arg3=default3):
"""Summary line.
Extended description of function.
.. versionadded:: 0.2.0
Parameters
----------
arg1 : type1
Description of arg1.
arg2 : {value1, value2, value3}, optional
Description of arg2.
* value1 -- description of value1 (default2)
* value2 -- description of value2
* value3 -- description of value3
arg3 : type3 or type3bis, optional
Description of arg3. Default is default3.
.. versionadded:: 0.3.0
Returns
-------
type
Description of return value.
Notes
-----
Some interesting facts about this function.
See Also
--------
LArray.otherfunc : How other function or method is related.
Examples
--------
>>> funcname(arg)
result
"""
For example:
def check_number_string(number, string="1"):
"""Compares the string representation of a number to a string.
Parameters
----------
number : int
The number to test.
string : str, optional
The string to test against. Default is "1".
Returns
-------
bool
Whether the string representation of the number is equal to the string.
Examples
--------
>>> check_number_string(42, "42")
True
>>> check_number_string(25, "2")
False
>>> check_number_string(1)
True
"""
return str(number) == string
Step 4: Test your code¶
Our unit tests are written using the pytest library and our tests modules are located in /larray/tests/. We also use its extension pytest-pep8 to check if the code is PEP8 compliant. The pytest library is able to automatically detect and run unit tests as long as you respect some conventions:
pytest will search for
test_*.py
or*_test.py files
.From those files, collect test items:
test_
prefixed test functions or methods outside of class.
test_
prefixed test functions or methods inside Test prefixed test classes (without an __init__ method).
For more details, please read the section Conventions for Python test discovery from the pytest documentation.
Here is an example of a unit test function using pytest:
from larray.core.axis import _to_key
def test_key_string_split():
assert _to_key('M,F') == ['M', 'F']
assert _to_key('M,') == ['M']
To run unit tests for a given test module:
> pytest larray/tests/test_array.py
We also use doctests for some tests. Doctests is specially-formatted code within the docstring of a function which embeds the result of calling said function with a particular set of arguments. This can be used both as documentation and testing. We only use doctests for the cases where the test is simple enough to fit on one line and it can help understand what the function does. For example:
def slice_to_str(key):
"""Converts a slice to a string
>>> slice_to_str(slice(None))
':'
"""
# some clever code here
return ':'
To run doc tests:
> pytest larray/core/array.py
To run all the tests, simply go to root directory and type:
> pytest
pytest will automatically detect all existing unit tests and doctests and run them all.
Step 5: Add a change log¶
Changes should be reflected in the release notes located in doc/source/changes/version_<next_release_version>.inc
.
This file contains an ongoing change log for the next release.
Add an entry to this file to document your fix, enhancement or (unavoidable) breaking change.
If you hesitate in which section to add your change log, feel free to ask.
Make sure to include the GitHub issue number when adding your entry (using closes :issue:`123`
where 123 is the number associated with the fixed issue).
Step 6: Commit your changes¶
When all the above is done, commit your changes. Make sure that one of your commit messages starts with
fix #123 :
(where 123 is the issue number) before starting any pull request
(see this github page for more details).
Step 7: Push your changes¶
When you want your changes to appear publicly on the web page of your fork on GitHub, push your forked feature branch’s commits:
git push origin issue123
Here origin
is the default name given to your remote repository on GitHub.
Step 8: Start a pull request¶
You are ready to request your changes to be included in the master branch (so that they will be available in the next release). To submit a pull request:
Navigate to your repository on GitHub
Click on the
Pull Request
buttonYou can then click on
Commits
andFiles Changed
to make sure everything looks okay one last timeWrite a description of your changes in the
Preview Discussion
tabIf this is your first pull request, please state explicitly that you accept and are allowed to have your contribution (and any future contribution) licensed under the GPL license (See section Licensing above).
Click
Send Pull Request
.
This request then goes to the repository maintainers, and they will review the code. Your modifications will also be automatically tested by running the larray test suite on Travis-CI continuous integration service. A pull request will only be considered for merging when you have an all ‘green’ build. If any tests are failing, then you will get a red ‘X’, where you can click through to see the individual failed tests.
If you need to make more changes to fix test failures or to take our comments into account, you can make them in your branch, add them to a new commit and push them to GitHub using:
git push origin issue123
This will automatically update your pull request with the latest code and trigger the automated tests again.
Warning
: Please do not rebase your local branch during the review process.
Documentation¶
The documentation is written using reStructuredText and built to various formats using Sphinx. See the reStructuredText Primer for a first introduction of the syntax.
Installing Requirements¶
Basic requirements (to generate an .html version of the documentation) can be installed using:
> conda install sphinx numpydoc nbsphinx
To build the .pdf version, you need a LaTeX processor. We use MiKTeX.
To build the .chm version, you need HTML Help Workshop.
Generating the documentation¶
Open a command prompt and go to the documentation directory:
> cd doc
If you just want to check that there is no syntax error in the documentation and that it formats properly, it is usually enough to only generate the .html version, by using:
> make html
Open the result in your favourite web browser. It is located in:
build/html/index.html
If you want to also generate the .pdf and .chm (and you have the extra requirements to generate those), you could use:
> buildall