kaggledatasets Documentation¶
Installation¶
kaggledatasets requires Python 3.5+
kaggledatasets is available in the Python Package Index via
$ pip install kaggledatasets
The easiest way to get started on most systems is to create a virtualenv
$ python3 -m venv kd_venv
$ source kd_venv/bin/activate
(kd_venv) $ pip install kaggledatasets
This will install a version of all TF and PyTorch dependencies depending on your system. See kaggledatasets for more information.
If you need a different version of kaggledatasets, follow the instructions on the kaggledatasets website to install the appropriate version of kaggledatasets.github.io
Install From Source¶
$ git clone git@github.com:kaggledatasets/kaggledatasets.git
$ cd kaggledatasets
$ source kd_venv
(kd_venv) $ python3 setup.py install
Once that is installed, you can run the unit tests. We recommend using X as a runner.
To resume development in an already checked-out repo:
$ cd kaggledatasets
$ source kd_venv
To exit the virtual environment:
(kd_venv) $ deactivate
Coming Soon
Cloud VM Setup¶
This guide will cover all the setup work you have to do in order to be able to easily install kaggledatasets on a cloud VM . Note that while these instructions worked when they were written, they may become incorrect or out of date. If they do, please send us a Pull Request!
After following these instructions, you should be good to either follow the Installation instructions or the Install From Source instructions
Amazon Web Services¶
Coming Soon
Google Cloud Engine¶
Coming Soon
Microsoft Azure¶
Coming Soon
Configuration¶
Local System (Windows/Linux/macOS)¶
This guide will cover all the setup work you have to do in order to be able to easily install and configure kaggledatasets on your local machine.
Google Colab Setup¶
This guide will cover all the setup work you have to do in order to be able to easily install and configure kaggledatasets on Google Colab
Frequently Asked Questions¶
Contributing Guide¶
👍🎉 First off, thanks for taking the time to contribute! 🎉👍
Before you start¶
- Comment on the issue that you plan to work on so we can assign it to you and there isn’t unnecessary duplication of work.
- When you plan to work on something larger (for example, adding new features to Dataset Class), please respond on the issue (or create one if there isn’t one) to explain your plan and give others a chance to discuss.
- If you’re fixing some smaller issue - please check the list of pending Pull Requests to avoid unnecessary duplication.
How can you help¶
You can help in multiple ways: * Reproducing bugs, finding its root cause and providing fixes to that, this will be appreciated a lot (see the issues with label: bug) * Sending Pull Requests for new kaggle datasets and/or requested features (see the issues with label: dataset request or enhancement) * Doing Code Reviews on the Pull Requests from the developers of this community and verifying if PRs are working correctly or not
Datasets¶
Adding a Kaggle Dataset is a great way of making it more accessible to the various communities. `Add a new Kaggle Dataset <>`_ guide will be available soon.
Tests¶
We use pylint for ensuring kaggledatasets is nice and easy to use and work on long-term, all modules should have clear tests for public members.
Pull Requests¶
All contributions are done through Pull Requests here on GitHub.
Code Reviews¶
All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.
Persons of Interest¶
Looking for contributors who can be our Persons of Interest¶
General Maintainers¶
- Omkar Prabhu (prabhuomkar)
Module-level maintainers¶
Core¶
- Omkar Prabhu (prabhuomkar)
Structured¶
- Omkar Prabhu (prabhuomkar)
Tutorials¶
kaggledatasets¶
kaggledatasets
kaggledatasets.core¶
kaggledatasets.core.dataset¶
kaggledatasets.core.config¶
kaggledatasets.core.downloader¶
kaggledatasets.core.fileops¶
kaggledatasets.structured¶
The kaggledatasets.structured
subpackage consists of popular datasets and common functions for structured datasets like CSV, JSON, SQLITE, etc.
kaggledatasets.image¶
The kaggledatasets.image
subpackage consists of popular datasets and common image transformations for image datasets.
kaggledatasets.audio¶
The kaggledatasets.audio
subpackage consists of popular datasets and common audio transformations.
kaggledatasets.text¶
The kaggledatasets.text
subpackage consists of data processing utilities and popular datasets for natural language.