!! EXCELerator is Deprecated !!¶
Superseded by fuzzytable, which includes an improved data model and csv compatibility.
- Docs: fuzzytable.readthedocs.io
- PyPI: pypi.org/project/fuzzytable
- github: github.com/jonathanchukinas/fuzzytable
pip install fuzzytable
EXCELerator documentation:¶
Readme¶
!! EXCELerator is Deprecated !!¶
Superseded by fuzzytable, which includes an improved data model and csv compatibility.
- Docs: fuzzytable.readthedocs.io
- PyPI: pypi.org/project/fuzzytable
- github: github.com/jonathanchukinas/fuzzytable
pip install fuzzytable
Anatomy of a Project¶
Excel Worksheet¶
- HEADER
- HEADER ROW
- VALUES
Tabular Data¶
(As interpreted by excelerator.TableReader
)
- FIELD
- FIELD NAME
- HEADER NAME
- COLUMN NUMBER
- DATA
- RECORD
API Reference¶
TableParser class¶
WorksheetParser class¶
Todo
This class has yet to be implemented.
FieldParser class¶
Normalizing Data¶
Basic Use¶
Let’s say you have this very simple table:
Name | Birthday |
---|---|
John Doe | 1995 |
123 | 2003OCt |
Jane Doe | 1964 |
River Song | 2019 |
You want to ensure the first field (Name) returns only strings and the Birthday column returns integers.
from excelerator import TableReader
from excelerator import normalize as n
tr = TableReader(
path='path/to/excel.xlsx',
sheetname='names and birthdays',
fields='Name Birthday'.split(),
normalize=[n.STRING(), n.INTEGER()],
)
fields = tr.get_fields()
fields['Name']
returns ['John Doe', '123', 'Jane Doe', 'River Song']
fields['Birthday']
returns [1995, 2003, 1964, 2019]
Note
Here’s a common “gotcha”: Make sure to instantiate the normalization classes.
That is, normalize=[n.STRING(), n.INTEGER()]
instead of normalize=[n.STRING, n.INTEGER]
Create Custom Normalizing Classes¶
But let’s say we don’t want the full string from Names
, but just the first name.
We could subclass either NormalizeBase
or one of its subclasses. Let’s subclass STRING
.
# Continuing our code from above...
class FirstString(n.STRING)
norm_func = n.STRING().normalize
# Note the lack of parentheses after normalize
# We do this here instead of in the normalize method
# so that n.STRING gets instantiated only once.
def normalize(self, value):
# This is the function that gets called to norm your data.
strings = norm_func(value).split()
return strings[0]
tr.normalize = [n.FirstString(), n.INTEGER()]
fields = tr.get_fields()
fields['Name']
returns ['John', '123', 'Jane', 'River']
Pretty easy, right?
Classes¶
Exceptions¶
Lists of these functions can be passed as the normalize
argument of TableReader.
Change Log¶
v0.8¶
- library deprecated; superseded by fuzzytable.
v0.7¶
- terminology change:
TableParser
,FieldParser
,WorksheetParser
v0.6¶
- TableReader:
- moved
path
andsheetname
parameters fromget_fields
andget_records
methods toTableReader()
- converted normalize functions to classes.
- updated documentation
- moved
v0.5¶
- TableReader
read_from()
method renamed toget_fields()
- new method:
get_records()
v0.4¶
- excelerator.normalize:
n.STRING()
n.INTEGER()
n.INTEGER_LIST()
- add
normalize
parameter to TableReader
v0.3¶
- add
header_row_num
param toTableReader