Welcome to weaver’s documentation!¶
Contents:
User Guide¶
The Data Weaver Project¶
The Data weaver is a Python tool that offers a simple to use, clean and a robust data integration platform.
The Data Weaver supports data integration of spatial datasets (Raster and Vector data), as well as tabular datasets.
Problem solving in science involves and requires studying entities using a broad range of associations among the entities under study. These associations are obtained through collecting and integrating various sources and forms of data.
Since these heterogenous datasets are collected by various scientists, the datasets are domain based or centered around a unique subset of problems.
The data weaver bridges the gap scientist’s face of not having readily unified datasets that can be used for multi dimension feature analysis. The data weaver handles the finding and integration of heterogeneous datasets forming a new dataset.
Dependencies¶
This package requires Python 3.3+, recommends Python 3.6+ and depends on the following packages:
retriever
PyMySQL>=0.4
psycopg2>=2.0
gdal
future
numpydoc
pandas
They can be installed using pip
.
sudo pip install -r requirements.txt
The package supports the following database management systems (DBMS):
DBMS | Spatial Datasets | Tabular Datasets |
---|---|---|
PostgreSQL | Yes | Yes |
SQLite | No | Yes |
Installing From Source¶
Either use pip to install directly from GitHub:
pip install git+https://git@github.com/weecology/weaver.git
or:
- Clone the repository
- From the directory containing setup.py, run the following command:
pip install .
. You may need to includesudo
at the beginning of the command depending on your system (i.e.,sudo pip install .
).
Using the Command Line¶
After installing the package, run weaver update to download the latest available dataset scripts. To see the full list of command line options and datasets run weaver –help.
$ weaver --help
usage: weaver [-h] [-v] [-q] {help,ls,citation,license,join,update} ...
positional arguments:
{help,ls,citation,license,join,update}
sub-command help
help
ls display a list all available datasets
citation view citation
license view dataset licenses
join integrate data using a data package script
update download updated versions of data package scripts
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-q, --quiet suppress command-line output
To get a list of available dataset use weaver ls
$ weaver ls
Available datasets : 11
breed-bird-routes-bioclim
mammal-community-bioclim
mammal-community-masses
mammal-community-sites-all-bioclim
mammal-community-sites-bioclim
mammal-community-sites-harvard-linear-features
mammal-community-sites-harvard-linear-features-soils
mammal-community-sites-harvard-soil
mammal-diet-mammal-life-history
mammal-sites-bioclim-1-2
portal-plot-species
To view the citaion of the datasets use weaver citation [dataset-name] Running weaver with no citation will provide the citation for the tool.
$ weaver citation mammal-diet-mammal-life-history
Dataset: mammal-diet-mammal-life-history
Description: Integrated data set of mammal-life-hist and mammal-diet
Citations:
mammal-life-hist: S. K. Morgan Ernest. 2003. ....
mammal-diet: Kissling WD, Dalby L, Flojgaard C, Lenoir J, ...
Integrating Data¶
Examples Integrating Data with the join command To integrate data, run weaver join [data package name] and provide the connection configurations.
weaver join postgres -h
usage: weaver join postgres [-h] [--user [USER]] [--password [PASSWORD]]
[--host [HOST]] [--port [PORT]]
[--database [DATABASE]]
[--database_name [DATABASE_NAME]]
[--table_name [TABLE_NAME]]
dataset
positional arguments:
dataset file name
optional arguments:
-h, --help show this help message and exit
--user [USER], -u [USER]
Enter your PostgreSQL username
--password [PASSWORD], -p [PASSWORD]
Enter your password
--host [HOST], -o [HOST]
Enter your PostgreSQL host
--port [PORT], -r [PORT]
Enter your PostgreSQL port
--database [DATABASE], -d [DATABASE]
Enter your PostgreSQL database name
--database_name [DATABASE_NAME], -a [DATABASE_NAME]
Format of schema name
--table_name [TABLE_NAME], -t [TABLE_NAME]
Format of table name
To use the weaver with postges .pgpass file set
$ weaver join postgres
or with command line configurations supplied
$ weaver join postgres -u name-of-user -h host-name -d database-to-use
Contribution¶
If you find any operation that is not supported by this package, feel free to create a Github issue. Additionally, you are more than welcome to submit a pull request for a bug fix or additional feature.
If you find any operation that is not supported by this package, feel free to create a Github issue. Additionaly you are more than welcome to submit a pull request for a bug fix or additional feature.
Please take a look at the Code of Conduct governing contributions to this project.
Acknowledgments¶
Development of this software was funded by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative to Ethan White.
Developer’s guide¶
Required Modules¶
Python 3.3+ installation and the following modules:
setuptools
xlrd
retriever
future
xlrd>=0.7
argcomplete
PyMySQL>=0.4
psycopg2>=2.0
numpydoc
pandas
sphinx_py3doc_enhanced_theme
sphinxcontrib-napoleon
Datasets Available¶
1. mammal-community-masses¶
name: | mammal-community-masses |
---|---|
citation: | [{‘mammal-masses’: ‘Felisa A. Smith, S. Kathleen Lyons, S. K. Morgan Ernest, Kate E. Jones, Dawn M. Kaufman, Tamar Dayan, Pablo A. Marquet, James H. Brown, and John P. Haskell. 2003. Body mass of late Quaternary mammals. Ecology 84:3403.’, ‘mammal-community-db’: ‘Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.’}] |
description: | Integrated dataset of mammal body mass and mammal communities |
2. mammal-community-bioclim¶
name: | mammal-community-bioclim |
---|---|
citation: | [{‘bioclim’: ‘Hijmans, R.J., S.E. Cameron, J.L. Parra, P.G. Jones and A. Jarvis, 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978.’, ‘mammal-community-db’: ‘Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.’}] |
description: | Integrated dataset of Bioclim bio1 and mammal communities datasets |
3. mammal-diet-mammal-life-history¶
name: | mammal-diet-mammal-life-history |
---|---|
citation: | [{‘mammal-life-hist’: ‘S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402.’, ‘mammal-diet’: ‘Kissling WD, Dalby L, Flojgaard C, Lenoir J, Sandel B, Sandom C, Trojelsgaard K, Svenning J-C (2014) Establishing macroecological trait datasets:digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide. Ecology and Evolution, online in advance of print. doi:10.1002/ece3.1136’}] |
description: | Integrated data set of mammal-life-hist and mammal-diet |
4. mammal-community-sites-harvard-soil¶
name: | mammal-community-sites-harvard-soil |
---|---|
citation: | [{‘harvard-forest’: ‘Hall B. 2017. Historical GIS Data for Harvard Forest Properties from 1908 to Present. Harvard Forest Data Archive: HF110.’, ‘mammal-community-db’: ‘Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.’}] |
description: | Integrated dataset of mammal communities and harvard-soil |
5. mammal-community-sites-harvard-linear-features¶
name: | mammal-community-sites-harvard-linear-features |
---|---|
citation: | [{‘harvard-forest’: ‘Hall B. 2017. Historical GIS Data for Harvard Forest Properties from 1908 to Present. Harvard Forest Data Archive: HF110.’, ‘mammal-community-db’: ‘Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.’}] |
description: | Integrated dataset of mammal communities and harvard-soil |
6. breed-bird-routes-bioclim¶
name: | breed-bird-routes-bioclim |
---|---|
citation: | [{‘bioclim’: ‘Hijmans, R.J., S.E. Cameron, J.L. Parra, P.G. Jones and A. Jarvis, 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978.’, ‘breed-bird-survey’: ‘Pardieck, K.L., D.J. Ziolkowski Jr., M.-A.R. Hudson. 2015. North American Breeding Bird Survey Dataset 1966 - 2014, version 2014.0. U.S. Geological Survey, Patuxent Wildlife Research Center’}] |
description: | Integrated dataset of mammal communities and harvard-soil |
7. portal-plot-species¶
name: | portal-plot-species |
---|---|
citation: | [{‘portal’: ‘S. K. Morgan Ernest, Thomas J. Valone, and James H. Brown. 2009. Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal, Arizona, USA. Ecology 90:1708.’}] |
description: | Integrated portal data with species and plot information |
8. mammal-community-sites-harvard-linear-features-soils¶
name: | mammal-community-sites-harvard-linear-features-soils |
---|---|
citation: | [{‘harvard-forest’: ‘Hall B. 2017. Historical GIS Data for Harvard Forest Properties from 1908 to Present. Harvard Forest Data Archive: HF110.’, ‘mammal-community-db’: ‘Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.’}] |
description: | Integrated dataset of mammal communities and harvard-soil |
9. mammal-sites-bioclim-1-2¶
name: | mammal-sites-bioclim-1-2 |
---|---|
citation: | [{‘bioclim’: ‘Hijmans, R.J., S.E. Cameron, J.L. Parra, P.G. Jones and A. Jarvis, 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978.’, ‘mammal-community-db’: ‘Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.’}] |
description: | Integrated dataset of Bioclim bio1, bio2 and mammal communities datasets |
10. mammal-community-sites-all-bioclim¶
name: | mammal-community-sites-all-bioclim |
---|---|
citation: | [{‘bioclim’: ‘Hijmans, R.J., S.E. Cameron, J.L. Parra, P.G. Jones and A. Jarvis, 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978.’, ‘mammal-community-db’: ‘Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.’}] |
description: | Integrated dataset of Bioclim bio 1 and mammal communities mammal communities |
11. mammal-community-sites-bioclim¶
name: | mammal-community-sites-bioclim |
---|---|
citation: | [{‘bioclim’: ‘Hijmans, R.J., S.E. Cameron, J.L. Parra, P.G. Jones and A. Jarvis, 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978.’, ‘mammal-community-db’: ‘Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.’}] |
description: | Integrated dataset of Bioclim bio 1 and mammal communities mammal communities |
Contributor Code of Conduct¶
Our Pledge¶
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
Our Standards¶
Examples of behavior that contributes to creating a positive environment include:
Using welcoming and inclusive language Being respectful of differing viewpoints and experiences Gracefully accepting constructive criticism Focusing on what is best for the community Showing empathy towards other community members Examples of unacceptable behavior by participants include:
The use of sexualized language or imagery and unwelcome sexual attention or advances Trolling, insulting/derogatory comments, and personal or political attacks Public or private harassment Publishing others’ private information, such as a physical or electronic address, without explicit permission Other conduct which could reasonably be considered inappropriate in a professional setting
Our Responsibilities¶
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
Scope¶
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
Enforcement¶
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at ethan@weecology.org. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project’s leadership.
Attribution¶
This Code of Conduct is adapted from the Contributor Covenant, version 1.4.
Weaver API¶
weaver package¶
Subpackages¶
weaver.engines package¶
Submodules¶
weaver.engines.postgres module¶
-
class
weaver.engines.postgres.
engine
¶ Bases:
weaver.lib.engine.Engine
Engine instance for PostgreSQL.
-
abbreviation
= 'postgres'¶
-
create_db
()¶ Create Engine database.
-
create_db_statement
()¶ In PostgreSQL, the equivalent of a SQL database is a schema.
CREATE SCHEMA table_name;
-
drop_statement
(objecttype, objectname)¶ In PostgreSQL, the equivalent of a SQL database is a schema.
-
get_connection
()¶ Get db connection. Please update the encoding lookup table if the required encoding is not present.
-
max_int
= 2147483647¶
-
name
= 'PostgreSQL'¶
-
placeholder
= '%s'¶
-
required_opts
= [('user', 'Enter your PostgreSQL username', 'postgres'), ('password', 'Enter your password', ''), ('host', 'Enter your PostgreSQL host', 'localhost'), ('port', 'Enter your PostgreSQL port', 5432), ('database', 'Enter your PostgreSQL database name', 'postgres'), ('database_name', 'Format of schema name', '{db}'), ('table_name', 'Format of table name', '{db}.{table}')]¶
-
weaver.engines.sqlite module¶
-
class
weaver.engines.sqlite.
engine
¶ Bases:
weaver.lib.engine.Engine
Engine instance for SQLite.
-
abbreviation
= 'sqlite'¶
-
create_db
()¶ Don’t create database for SQLite
SQLite doesn’t create databases. Each database is a file and needs a separate connection. This overloads`create_db` to do nothing in this case.
-
datatypes
= {'auto': ('INTEGER', 'AUTOINCREMENT'), 'bigint': 'INTEGER', 'bool': 'INTEGER', 'char': 'TEXT', 'decimal': 'REAL', 'double': 'REAL', 'int': 'INTEGER'}¶
-
get_connection
()¶ Get db connection.
-
name
= 'SQLite'¶
-
placeholder
= '?'¶
-
required_opts
= [('file', 'Enter the filename of your SQLite database', './sqlite.db', ''), ('table_name', 'Format of table name', '{db}_{table}')]¶
-
to_csv
()¶
-
weaver.lib package¶
Submodules¶
weaver.lib.engine module¶
-
class
weaver.lib.engine.
Engine
¶ Bases:
object
A generic database system. Specific database platforms will inherit from this class.
-
connect
(force_reconnect=False)¶
-
connection
¶
-
create_db
()¶ Create a new database based on settings supplied in Database object engine.db.
-
create_db_statement
()¶ Return SQL statement to create a database.
-
cursor
¶ Get db cursor.
-
database_name
(name=None)¶ Return name of the database.
-
datatypes
= []¶
-
db
= None¶
-
debug
= False¶
-
disconnect
()¶
-
drop_statement
(objecttype, objectname)¶ Return drop table or database SQL statement.
-
execute
(statement, commit=True)¶ Execute given statement.
-
executemany
(statement, values, commit=True)¶ Execute given statement with multiple values.
-
exists
(database, table_name)¶ Check to see if the given table exists.
-
final_cleanup
()¶ Close the database connection.
-
get_connection
()¶ This method should be overloaded by specific implementations of Engine.
-
get_cursor
()¶ Get db cursor.
-
get_input
()¶ Manually get user input for connection information when script is run from terminal.
-
gis_import
(table)¶
-
instructions
= 'Enter your database connection information:'¶
-
name
= ''¶
-
pkformat
= '%s PRIMARY KEY %s '¶
-
required_opts
= []¶
-
script
= None¶
-
set_engine_encoding
()¶
-
set_table_delimiter
(file_path)¶
-
table
= None¶
-
table_exists
(dbname, tablename)¶ This can be overridden to return True if a table exists. It returns False by default.
-
table_name
(name=None, dbname=None)¶ Return full tablename.
-
to_csv
(sort=True, path='/home/docs/checkouts/readthedocs.org/user_builds/weaver/checkouts/latest/docs', table_name=None)¶
-
use_cache
= True¶
-
warning
(warning)¶
-
warnings
= []¶
-
-
weaver.lib.engine.
file_exists
(path)¶ Return true if a file exists and its size is greater than 0.
-
weaver.lib.engine.
filename_from_url
(url)¶ Extract and returns the filename from the url.
-
weaver.lib.engine.
gen_from_source
(source)¶ Return generator from a source tuple.
Source tuples are of the form (callable, args) where callable(*args) returns either a generator or another source tuple. This allows indefinite regeneration of data sources.
-
weaver.lib.engine.
reporthook
(count, block_size, total_size)¶ Generate the progress bar.
Uses file size to calculate the percentage of file size downloaded. If the total_size of the file being downloaded is not in the header, provide progress as size of bytes downloaded in either KB, MB and GB.
-
weaver.lib.engine.
skip_rows
(rows, source)¶ Skip over the header lines by reading them before processing.
weaver.lib.get_opts module¶
weaver.lib.models module¶
Data Retriever Data Model
This module contains basic class definitions for the Retriever platform.
weaver.lib.tools module¶
-
weaver.lib.tools.
open_csvw
(csv_file, encode=True)¶ Open a csv writer forcing the use of Linux line endings on Windows.
Also sets dialect to ‘excel’ and escape characters to ‘’
-
weaver.lib.tools.
open_fr
(file_name, encoding='ISO-8859-1', encode=True)¶ Open file for reading respecting Python version and OS differences.
Sets newline to Linux line endings on Windows and Python 3 When encode=False does not set encoding on nix and Python 3 to keep as bytes
-
weaver.lib.tools.
open_fw
(file_name, encoding='ISO-8859-1', encode=True)¶ Open file for writing respecting Python version and OS differences.
Sets newline to Linux line endings on Python 3 When encode=False does not set encoding on nix and Python 3 to keep as bytes
-
weaver.lib.tools.
to_str
(object, object_encoding=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)¶
Module contents¶
-
weaver.lib.
check_for_updates
()¶ Check for updates to datasets.
This updates the HOME_DIR scripts directory with the latest script versions
-
weaver.lib.
join_postgres
(dataset, user='postgres', password='', host='localhost', port=5432, database='postgres', database_name=None, table_name=None, compile=False, debug=False, quiet=False, use_cache=True)¶ Install scripts in postgres.
-
weaver.lib.
join_sqlite
(dataset, file=None, table_name=None, compile=False, debug=False, quiet=False, use_cache=True)¶ Install scripts in sqlite.
-
weaver.lib.
datasets
(keywords=None, licenses=None)¶ Return list of all available datasets.
-
weaver.lib.
dataset_names
()¶ Return list of all available dataset names.
-
weaver.lib.
reload_scripts
()¶ Load scripts from scripts directory and return list of modules.
-
weaver.lib.
reset_weaver
(scope='scripts', ask_permission=True)¶ Remove stored information on scripts, data, and connections.