fntools’s documentation¶
Functional programming tools for data processing
fntools is a simple library providing the user with functional programming functions to transform, filter and inspect Python data structures. It introduces no new class or data structures but instead emphasizes the use of:
- pure
- composable
- lightweight
functions to solve common problems while processing your data.
API¶
Transformation¶
-
fntools.
use_with
(data, fn, *attrs)[source]¶ Apply a function on the attributes of the data
Parameters: - data – an object
- fn – a function
- attrs – some attributes of the object
Returns: an object
Let’s create some data first:
>>> from collections import namedtuple >>> Person = namedtuple('Person', ('name', 'age', 'gender')) >>> alice = Person('Alice', 30, 'F')
Usage:
>>> make_csv_row = lambda n, a, g: '%s,%d,%s' % (n, a, g) >>> use_with(alice, make_csv_row, 'name', 'age', 'gender') 'Alice,30,F'
-
fntools.
zip_with
(fn, *colls)[source]¶ Return the result of the function applied on the zip of the collections
Parameters: - fn – a function
- colls – collections
Returns: an iterator
>>> list(zip_with(lambda x, y: x-y, [10, 20, 30], [42, 19, 43])) [-32, 1, -13]
-
fntools.
unzip
(colls)[source]¶ Unzip collections
Parameters: colls – collections Returns: unzipped collections >>> unzip([[1, 2, 3], [10, 20, 30], [100, 200, 300]]) [(1, 10, 100), (2, 20, 200), (3, 30, 300)]
-
fntools.
concat
(colls)[source]¶ Concatenate a list of collections
Parameters: colls – a list of collections Returns: the concatenation of the collections >>> concat(([1, 2], [3, 4])) [1, 2, 3, 4]
-
fntools.
mapcat
(fn, colls)[source]¶ Concatenate the result of a map
Parameters: - fn – a function
- colls – a list of collections
Returns: a list
>>> mapcat(reversed, [[3, 2, 1, 0], [6, 5, 4], [9, 8, 7]]) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
-
fntools.
dmap
(fn, record)[source]¶ map for a directory
Parameters: - fn – a function
- record – a dictionary
Returns: a dictionary
>>> grades = [{'math': 13, 'biology': 17, 'chemistry': 18}, ... {'math': 15, 'biology': 12, 'chemistry': 13}, ... {'math': 16, 'biology': 17, 'chemistry': 11}]
>>> def is_greater_than(x): ... def func(y): ... return y > x ... return func
>>> dmap(is_greater_than(15), grades[0]) {'biology': True, 'chemistry': True, 'math': False}
-
fntools.
rmap
(fn, coll, is_iterable=None)[source]¶ A recursive map
Parameters: - fn – a function
- coll – a list
- isiterable – a predicate function determining whether a value is iterable.
Returns: a list
>>> rmap(lambda x: 2*x, [1, 2, [3, 4]]) [2, 4, [6, 8]]
-
fntools.
replace
(x, old, new, fn=<built-in function eq>)[source]¶ Replace x with new if fn(x, old) is True.
Parameters: - x – Any value
- old – The old value we want to replace
- new – The value replacing old
- fn – The predicate function determining the relation between x and old. By default fn is the equality function.
Returns: x or new
>>> map(lambda x: replace(x, None, -1), [None, 1, 2, None]) [-1, 1, 2, -1]
-
fntools.
compose
(*fns)[source]¶ Return the function composed with the given functions
Parameters: fns – functions Returns: a function >>> add2 = lambda x: x+2 >>> mult3 = lambda x: x*3 >>> new_fn = compose(add2, mult3) >>> new_fn(2) 8
Note
compose(fn1, fn2, fn3) is the same as fn1(fn2(fn3)) which means that the last function provided is the first to be applied.
-
fntools.
groupby
(fn, coll)[source]¶ Group elements in sub-collections by fn
Parameters: - fn – a function
- coll – a collection
Returns: a dictionary
>>> groupby(len, ['John', 'Terry', 'Eric', 'Graham', 'Mickael']) {4: ['John', 'Eric'], 5: ['Terry'], 6: ['Graham'], 7: ['Mickael']}
-
fntools.
reductions
(fn, seq, acc=None)[source]¶ Return the intermediate values of a reduction
Parameters: - fn – a function
- seq – a sequence
- acc – the accumulator
Returns: a list
>>> reductions(lambda x, y: x + y, [1, 2, 3]) [1, 3, 6]
>>> reductions(lambda x, y: x + y, [1, 2, 3], 10) [11, 13, 16]
-
fntools.
split
(coll, factor)[source]¶ Split a collection by using a factor
Parameters: - coll – a collection
- factor – a collection of factors
Returns: a dictionary
>>> bands = ('Led Zeppelin', 'Debussy', 'Metallica', 'Iron Maiden', 'Bach') >>> styles = ('rock', 'classic', 'rock', 'rock', 'classic') >>> split(bands, styles) {'classic': ['Debussy', 'Bach'], 'rock': ['Led Zeppelin', 'Metallica', 'Iron Maiden']}
-
fntools.
assoc
(_d, key, value)[source]¶ Associate a key with a value in a dictionary
Parameters: - _d – a dictionary
- key – a key in the dictionary
- value – a value for the key
Returns: a new dictionary
>>> data = {} >>> new_data = assoc(data, 'name', 'Holy Grail') >>> new_data {'name': 'Holy Grail'} >>> data {}
Note
the original dictionary is not modified
-
fntools.
dispatch
(data, fns)[source]¶ Apply the functions on the data
Parameters: - data – the data
- fns – a list of functions
Returns: a collection
>>> x = (1, 42, 5, 79) >>> dispatch(x, (min, max)) [1, 79]
-
fntools.
multimap
(fn, colls)[source]¶ Apply a function on multiple collections
Parameters: - fn – a function
- colls – collections
Returns: a collection
>>> multimap(operator.add, ((1, 2, 3), (4, 5, 6))) [5, 7, 9]
>>> f = lambda x, y, z: 2*x + 3*y - z >>> result = multimap(f, ((1, 2), (4, 1), (1, 1))) >>> result[0] == f(1, 4, 1) True >>> result[1] == f(2, 1, 1) True
-
fntools.
multistarmap
(fn, *colls)[source]¶ Apply a function on multiple collections
Parameters: - fn – a function
- colls – collections
Returns: a collection
>>> multistarmap(operator.add, (1, 2, 3), (4, 5, 6)) [5, 7, 9]
>>> f = lambda x, y, z: 2*x + 3*y - z >>> result = multistarmap(f, (1, 2), (4, 1), (1, 1)) >>> result[0] == f(1, 4, 1) True >>> result[1] == f(2, 1, 1) True
-
fntools.
pipe
(data, *fns)[source]¶ Apply functions recursively on your data
Parameters: - data – the data
- fns – functions
Returns: an object
>>> inc = lambda x: x + 1 >>> pipe(42, inc, str) '43'
-
fntools.
pipe_each
(coll, *fns)[source]¶ Apply functions recursively on your collection of data
Parameters: - coll – a collection
- fns – functions
Returns: a list
>>> inc = lambda x: x + 1 >>> pipe_each([0, 1, 1, 2, 3, 5], inc, str) ['1', '2', '2', '3', '4', '6']
-
fntools.
shift
(func, *args, **kwargs)[source]¶ This function is basically a beefed up lambda x: func(x, *args, **kwargs)
shift()
comes in handy when it is used in a pipeline with a function that needs the passed value as its first argument.Parameters: - func – a function
- args – objects
- kwargs – keywords
>>> def div(x, y): return float(x) / y
This is equivalent to div(42, 2):
>>> shift(div, 2)(42) 21.0
which is different from div(2, 42):
>>> from functools import partial >>> partial(div, 2)(42) 0.047619047619047616
-
fntools.
repeatedly
(func)[source]¶ Repeat a function taking no argument
Parameters: func – a function Returns: a generator >>> import random as rd >>> rd.seed(123) >>> take(3, repeatedly(rd.random)) [0.052363598850944326, 0.08718667752263232, 0.4072417636703983]
-
fntools.
update
(records, column, values)[source]¶ Update the column of records
Parameters: - records – a list of dictionaries
- column – a string
- values – an iterable or a function
Returns: new records with the columns updated
>>> movies = [ ... {'title': 'The Holy Grail', 'year': 1975, 'budget': 4E5, 'total_gross': 5E6}, ... {'title': 'Life of Brian', 'year': 1979, 'budget': 4E6, 'total_gross': 20E6}, ... {'title': 'The Meaning of Life', 'year': 1983, 'budget': 9E6, 'total_gross': 14.9E6} ... ] >>> new_movies = update(movies, 'budget', lambda x: 2*x) >>> [new_movies[i]['budget'] for i,_ in enumerate(movies)] [800000.0, 8000000.0, 18000000.0]
>>> new_movies2 = update(movies, 'budget', (40, 400, 900)) >>> [new_movies2[i]['budget'] for i,_ in enumerate(movies)] [40, 400, 900]
-
fntools.
use
(data, attrs)[source]¶ Return the values of the attributes for the given data
Parameters: - data – the data
- attrs – strings
Returns: a list
With a dict:
>>> band = {'name': 'Metallica', 'singer': 'James Hetfield', 'guitarist': 'Kirk Hammet'} >>> use(band, ('name', 'date', 'singer')) ['Metallica', None, 'James Hetfield']
With a non dict data structure:
>>> from collections import namedtuple >>> Person = namedtuple('Person', ('name', 'age', 'gender')) >>> alice = Person('Alice', 30, 'F') >>> use(alice, ('name', 'gender')) ['Alice', 'F']
-
fntools.
get_in
(record, *keys, **kwargs)[source]¶ Return the value corresponding to the keys in a nested record
Parameters: - record – a dictionary
- keys – strings
- kwargs – keywords
Returns: the value for the keys
>>> d = {'id': {'name': 'Lancelot', 'actor': 'John Cleese', 'color': 'blue'}} >>> get_in(d, 'id', 'name') 'Lancelot'
>>> get_in(d, 'id', 'age', default='?') '?'
-
fntools.
valueof
(records, key)[source]¶ Extract the value corresponding to the given key in all the dictionaries
>>> bands = [{'name': 'Led Zeppelin', 'singer': 'Robert Plant', 'guitarist': 'Jimmy Page'}, ... {'name': 'Metallica', 'singer': 'James Hetfield', 'guitarist': 'Kirk Hammet'}] >>> valueof(bands, 'singer') ['Robert Plant', 'James Hetfield']
Filtering¶
-
fntools.
duplicates
(coll)[source]¶ Return the duplicated items in the given collection
Parameters: coll – a collection Returns: a list of the duplicated items in the collection >>> duplicates([1, 1, 2, 3, 3, 4, 1, 1]) [1, 3]
-
fntools.
pluck
(record, *keys, **kwargs)[source]¶ Return the record with the selected keys
Parameters: - record – a list of dictionaries
- keys – some keys from the record
- kwargs – keywords determining how to deal with the keys
>>> d = {'name': 'Lancelot', 'actor': 'John Cleese', 'color': 'blue'} >>> pluck(d, 'name', 'color') {'color': 'blue', 'name': 'Lancelot'}
The keyword ‘default’ allows to replace a
None
value:>>> d = {'year': 2014, 'movie': 'Bilbo'} >>> pluck(d, 'year', 'movie', 'nb_aliens', default=0) {'movie': 'Bilbo', 'nb_aliens': 0, 'year': 2014}
-
fntools.
pluck_each
(records, columns)[source]¶ Return the records with the selected columns
Parameters: - records – a list of dictionaries
- columns – a list or a tuple
Returns: a list of dictionaries with the selected columns
>>> movies = [ ... {'title': 'The Holy Grail', 'year': 1975, 'budget': 4E5, 'total_gross': 5E6}, ... {'title': 'Life of Brian', 'year': 1979, 'budget': 4E6, 'total_gross': 20E6}, ... {'title': 'The Meaning of Life', 'year': 1983, 'budget': 9E6, 'total_gross': 14.9E6} ... ] >>> pluck_each(movies, ('title', 'year')) [{'year': 1975, 'title': 'The Holy Grail'}, {'year': 1979, 'title': 'Life of Brian'}, {'year': 1983, 'title': 'The Meaning of Life'}]
-
fntools.
take
(n, seq)[source]¶ Return the n first items in the sequence
Parameters: - n – an integer
- seq – a sequence
Returns: a list
>>> take(3, xrange(10000)) [0, 1, 2]
-
fntools.
drop
(n, seq)[source]¶ Return the n last items in the sequence
Parameters: - n – an integer
- seq – a sequence
Returns: a list
>>> drop(9997, xrange(10000)) [9997, 9998, 9999]
-
fntools.
find
(fn, record)[source]¶ Apply a function on the record and return the corresponding new record
Parameters: - fn – a function
- record – a dictionary
Returns: a dictionary
>>> find(max, {'Terry': 30, 'Graham': 35, 'John': 27}) {'Graham': 35}
-
fntools.
find_each
(fn, records)[source]¶ Apply a function on the records and return the corresponding new record
Parameters: - fn – a function
- records – a collection of dictionaries
Returns: new records
>>> grades = [{'math': 13, 'biology': 17, 'chemistry': 18}, ... {'math': 15, 'biology': 12, 'chemistry': 13}, ... {'math': 16, 'biology': 17, 'chemistry': 11}] >>> find_each(max, grades) [{'chemistry': 18}, {'math': 15}, {'biology': 17}]
Inspection¶
-
fntools.
isiterable
(coll)[source]¶ Return True if the collection is any iterable except a string
Parameters: coll – a collection Returns: a boolean >>> isiterable(1) False >>> isiterable('iterable') False >>> isiterable([1, 2, 3]) True
-
fntools.
are_in
(items, collection)[source]¶ Return True for each item in the collection
Parameters: - items – a sub-collection
- collection – a collection
Returns: a list of booleans
>>> are_in(['Terry', 'James'], ['Terry', 'John', 'Eric']) [True, False]
-
fntools.
any_in
(items, collection)[source]¶ Return True if any of the items are in the collection
Parameters: - items – items that may be in the collection
- collection – a collection
Returns: a boolean
>>> any_in(2, [1, 3, 2]) True >>> any_in([1, 2], [1, 3, 2]) True >>> any_in([1, 2], [1, 3]) True
-
fntools.
all_in
(items, collection)[source]¶ Return True if all of the items are in the collection
Parameters: - items – items that may be in the collection
- collection – a collection
Returns: a boolean
>>> all_in(2, [1, 3, 2]) True >>> all_in([1, 2], [1, 3, 2]) True >>> all_in([1, 2], [1, 3]) False
-
fntools.
monotony
(seq)[source]¶ Determine the monotony of a sequence
Parameters: seq – a sequence Returns: 1 if the sequence is sorted (increasing) Returns: 0 if it is not sorted Returns: -1 if it is sorted in reverse order (decreasing) >>> monotony([1, 2, 3]) 1 >>> monotony([1, 3, 2]) 0 >>> monotony([3, 2, 1]) -1
-
fntools.
occurrences
(coll, value=None, **options)[source]¶ Return the occurrences of the elements in the collection
Parameters: - coll – a collection
- value – a value in the collection
- options – an optional keyword used as a criterion to filter the values in the collection
Returns: the frequency of the values in the collection as a dictionary
>>> occurrences((1, 1, 2, 3)) {1: 2, 2: 1, 3: 1} >>> occurrences((1, 1, 2, 3), 1) 2
Filter the values of the occurrences that are <, <=, >, >=, == or != than a given number:
>>> occurrences((1, 1, 2, 3), lt=3) {1: 2, 2: 1, 3: 1} >>> occurrences((1, 1, 2, 3), gt=1) {1: 2} >>> occurrences((1, 1, 2, 3), ne=1) {1: 2}
-
fntools.
attributes
(data)[source]¶ Return all the non callable and non special attributes of the input data
Parameters: data – an object Returns: a list >>> class table: ... def __init__(self, name, rows, cols): ... self.name = name ... self.rows = rows ... self.cols = cols
>>> t = table('people', 100, 3) >>> attributes(t) ['cols', 'name', 'rows']
-
fntools.
indexof
(coll, item, start=0, default=None)[source]¶ Return the index of the item in the collection
Parameters: - coll – iterable
- item – scalar
- start – (optional) The start index
Default: The default value of the index if the item is not in the collection
Returns: idx – The index of the item in the collection
>>> monties = ['Eric', 'John', 'Terry', 'Terry', 'Graham', 'Mickael'] >>> indexof(monties, 'Terry') 2
>>> indexof(monties, 'Terry', start=3) 3
>>> indexof(monties, 'Terry', start=4) is None True
-
fntools.
indexesof
(coll, item)[source]¶ Return all the indexes of the item in the collection
Parameters: - coll – the collection
- item – a value
Returns: a list of indexes
>>> monties = ['Eric', 'John', 'Terry', 'Terry', 'Graham', 'Mickael'] >>> indexesof(monties, 'Terry') [2, 3]
-
fntools.
count
(fn, coll)[source]¶ Return the count of True values returned by the predicate function applied to the collection
Parameters: - fn – a predicate function
- coll – a collection
Returns: an integer
>>> count(lambda x: x % 2 == 0, [11, 22, 31, 24, 15]) 2
-
fntools.
isdistinct
(coll)[source]¶ Return True if all the items in the collections are distinct.
Parameters: coll – a collection Returns: a boolean >>> isdistinct([1, 2, 3]) True >>> isdistinct([1, 2, 2]) False
-
fntools.
nrow
(records)[source]¶ Return the number of rows in the records
Parameters: records – a list of dictionaries Returns: an integer >>> movies = [ ... {'title': 'The Holy Grail', 'year': 1975, 'budget': 4E5, 'total_gross': 5E6}, ... {'title': 'Life of Brian', 'year': 1979, 'budget': 4E6, 'total_gross': 20E6}, ... {'title': 'The Meaning of Life', 'year': 1983, 'budget': 9E6, 'total_gross': 14.9E6} ... ] >>> nrow(movies) 3
-
fntools.
ncol
(records)[source]¶ Return the number of columns in the records
Parameters: records – a list of dictionaries Returns: an integer >>> movies = [ ... {'title': 'The Holy Grail', 'year': 1975, 'budget': 4E5, 'total_gross': 5E6}, ... {'title': 'Life of Brian', 'year': 1979, 'budget': 4E6, 'total_gross': 20E6}, ... {'title': 'The Meaning of Life', 'year': 1983, 'budget': 9E6, 'total_gross': 14.9E6} ... ] >>> ncol(movies) 4
-
fntools.
names
(records)[source]¶ Return the column names of the records
Parameters: records – a list of dictionaries Returns: a list of strings >>> movies = [ ... {'title': 'The Holy Grail', 'year': 1975, 'budget': 4E5, 'total_gross': 5E6}, ... {'title': 'Life of Brian', 'year': 1979, 'budget': 4E6, 'total_gross': 20E6}, ... {'title': 'The Meaning of Life', 'year': 1983, 'budget': 9E6, 'total_gross': 14.9E6} ... ] >>> names(movies) ['total_gross', 'year', 'budget', 'title']
CHANGELOG¶
v1.1.2 / 2016-03-04¶
- Bump to version 1.1.2
- Add Makefile
- Update documentation
- Minor fix
- Update documentation
- Fix issue in __init__.py
- Fix issue in __init__.py
- Remove valuesof function and include it in use
- Fix compose
- Fix indexof
- Rename function: select -> pluck_each
- Fix mapcat
- Fix find_each
- Update __init__
- Ignore .projections.json for Vim
- Add new functions for inspecting data
- Remove useless whitespaces
- Add new functions
v1.1.1 / 2016-03-01¶
- Fix rmap
v1.1.0 / 2016-02-23¶
- Bump to version 1.1.0
- Update import in __init__.py
- Minor fix in setup.py and README.rst
- Add 2 new functions
- Implement a new function
- Implement 2 new functions
- Minor fixes.
- Minor fix
- Enhance occurrences
- Minor fix
- Add some informations into __init__.py
- Add more metadata for the package on PyPi
- Add MIT license and MANIFEST
- Add examples to the README.
- Ignore some files related to the packaging
v1.0.4 / 2014-12-06¶
- Bump version to 1.0.4
- Add functions into __init__.py
v1.0.3 / 2014-12-06¶
- Bump version to 1.0.3
- Fix setup.py
v1.0.2 / 2014-12-06¶
- Bump version to 1.0.2
- Minor fix
- Minor fix in README.md
- Change README.md to README.rst
- Add __init__.py
v1.0.1 / 2014-12-06¶
- Bump version to 1.0.1
- Add history
- Minor fixes
- Simplify setup.py
- Ignore experimental.py
- Minor changes
- Change path to module
- fntools is on pypi
- Improve setup.py
- Introduce new pipeline functions
- Rename issorted -> monotony
- [minor changes] Minor changes in README
- [minor change] Add readthedocs badge in README.md
- [minor change] Respect PEP8
- [fix doc] Remove files in _build
- [doc] Remove quickstart from the index
- [doc] Use default theme for doc
- [new] Add documentation
- [new file] Add .gitignore
- [minor change] Update README.md
- [new] Implement new functions
- [new functions] Implement new predicate functions
- [enhancement] pluck accepts keyword arguments for default value
- [new functions] Implement functions for inspecting data
- [new function] Implement dispatch
- Initial commit