Welcome to Get Weather Data’s documentation!¶
Contents:
Get Data from Weather Station Nearest to a Zip Code using the NOAA Web Service¶
Get data from nearest weather station given a list of zip codes and date (see sample input file for the format in which data are expected) using the NOAA webservice. The script appends all the weather data from NOAA along with the GHCND id, name, lat. and longitude of the weather station from which the data are being gotten to the input file (see sample output file).
Using NOAA-Web¶
To get started, clone this subfolder from the repository:
git clone https://github.com/mfbx9da4/git-sub-dir.git cd git-sub-dir python get_git_sub_dir.py soodoku/get-weather-data/noaaweb cd noaaweb
The script needs an API token from NOAA. You can get a token from the NCDC site.
Before running the file, open
noaaweb.py
in a text editor and replaceNCDC_TOKEN
with your NCDC token.The default output file name is
output.csv
. To specify a custom output file name, pass-o outfilename_of_choice
.The script keeps track of the rows that have been processed. (It does so by taking row number from the output file as the start.) Thus, if halted in between, it will start with the last processed row.
python noaaweb.py samplein.csv -o sampleout.csv
Note¶
Requests to NOAA API “often return nothing. It isn’t clear why. The documentation doesn’t say whether the search for the closest weather station is limited to X kilometers because without that, one should have data for all zip codes and all dates. Nor does the API bother to return how far the weather station is from which it got the data.” (From Bad Weather: Getting weather data by zip and date). See for instance sample output file produced using the NOAA API.
License¶
The script is under the MIT License.
zip2ws: Find weather stations ‘nearest’ to zip codes¶
What it does:¶
- Finds (certain kinds of) weather stations “nearest” (within a certain distance, or X number of closest) to each zip code centroid
- Finds centroids of zip codes using Google API
Additional Details¶
Weather stations come in lots of varieties. We limit ourselves to weather stations of the following four kinds:
- GHCND stations list. For current list, see: NCDC GHCND Stations List
- ASOS stations list. For current list, see: NCDC ASOS Stations List
- COOP stations list (Active only). For current list, see: NCDC COOP Stations List
- USAF-WBAN stations list. For current list, see: NCDC ISH Stations List
Fields¶
zip, lat, long, city, state, zipcodetype, locationtype, location, decommisioned, taxreturns, estimatedpopulation, totalwages
from federalgovernmentzipcodes.us:- gm_lat/gm_long: lat./long. of centroids of zip codes via Google API.
- diff: distance in meters between Google API estimated centroid of zip code and lat/long that comes with the database.
- list of stations: ordered from closest to furthest
- stX_id: station id
- stX_name: name of station
- stX_distance: distance to zip centroid
Running the script¶
To run the script, you will need to install two Python libraries:
- pygeocoder To
install, you can simply use:
$ pip install pygeocoder
- requests To install,
you can simply user:
$ pip install requests
Don’t forget the inventories directory that contains the station files and zip csv that is imported. The inventories folder should be in the same folder as the script.
Usage: zip2ws.py [options]
Options:
-h, --help show this help message and exit
-D DATABASE, --database=DATABASE
Database name (default: zip2ws.sqlite)
-i, --import Create and import database
-g, --geocode Query and update Lat/Lon by Google Maps Geocoding API
-c, --closest Calculate and update closest table
--ghcn=GHCN Number of closest stations for GHCN (default: 3)
--coop=COOP Number of closest stations for COOP (default: 1)
--usaf=USAF Number of closest stations for USAF (default: 1)
-d DISTANCE, --distance=DISTANCE
Maximum distance of stations from Zip location
(Default: 0)
-e, --export Export closest stations for each Zip to CSV file
-o OUTFILE, --outfile=OUTFILE
CSV Output file name (default: zip-stations.csv)
--drop-closest Drop closet table
--clear-glatlon Clear Google Maps Geocoding API Lat/Lon
--use-zlatlon Use Zip Lat/Lon instead of Google Geocoding Lat/Lon
Start using the script by creating and importing the database. Do so by running:
python zip2ws.py -i
Next task is to update the closest weather stations table. This you can do by executing...
python zip2ws.py -c
This task uses the Google lat/long. If you want them to use other lat/long,
python zip2ws.py -c --use-zlatlon
NOTE: If you interrupt the script inbetween and restart it again, the script will start processing from where it left off.
If you want to find a set number of closest stations, specify the type and number of weather stations. For instance, to find 5 GHCND statons, 3 COOP stations, and 2 USAF stations, run:
python zip2ws.py -c --ghcn=5 --coop=3 --usaf=2
To find all weather stations within 30KM and organized by closest to furthest,
python zip2ws.py -c -d 30
To export results to a CSV file, “closest.csv”, run..
python zip2ws.py -e -o closest.csv
To find out centroids of zip codes using Google Maps Geocoding API, use
python zip2ws.py -g
Keep in mind that Google Maps Geocoding API usage limit is 2,500 Query/Day/IP Address. So you can quickly run into the limit. The script will raise the exception “OVER_QUERY_LIMIT” if the limit is breached. But do not fear. You can run the script multiple times to code a greater number of zip codes. If you are unahppy with the results, use the option: –clear-glatlon to clear exist data.
zip2wd_mp: Get Weather Data For a List of Zip Codes For a Range of Dates (Multi-processing version)¶
Given a zip code and a date or a range of dates, it gets weather data (you get to specify which data) from the closest weather station from which the data are available. If given a range of dates, it fetches all the specified columns for each of the days in the intervening p period.
How it does it:
This script is based of the script that calculates nearest weather station based on variety of metrics.
You can use a variety of options to choose the kinds of weather stations from which you want data. For instance, you can get data only from USAF stations.
The script features on demand data downloads. So it pings the local directory and sees if weather data for a particular day and time are present and if they are not, then it tries to download it from the NOAA website. On occassion the script may run into bandwidth bottlenecks and you may want to run the script again to download all the data that is needed.
Prerequisites:¶
zip2ws.sqlite is based off finding the nearest weather station project.
Input File Types:
- Basic: The input file format should be CSV and should contain 6
columns with following columns names: |
uniqid, zip, year, month, day
| See sample-input-basic.csv for sample input file. - Extended: The input file format contain 9 columns with the
following columns names: | `uniqid, zip, from.year, from.month, from.day, to.year, to.month, to.day | See sample-input-extend.csv for sample input file.
- Column Name File: This file contain list of weather data columns
chosen for output file. | The column names begining with character ‘#’ will not be appear in the output file. | (see column-names.txt for sample file)
For what these column names stand for, see column-names-info.txt
GHCN Weather Data in SQLite3 database: These files create by a script import-db.sh for each year.
e.g. for year 2000
cd data ./import-db.sh 2000
The script will download daily weather data (GHCN-Daily) from NOAA server for year 2000 and import to SQLite3 database file (e.g.
ghcn_2000.sqlite3
)
Configuration file¶
There are script settings in the configuration. zip2wd.cfg
[manager]
ip = 127.0.0.1
port = 9999
authkey = 1234
batch_size = 10
[worker]
uses_sqlite = yes
processes = 4
nth = 0
distance = 30
[output]
columns = column-names.txt
[db]
zip2ws = zip2ws.sqlite
path = ./data/
ip
andport
- IP address and port of manager process that the worker will be connect to.authkey
- A shared password which is used to authenticate between manager and worker processes.batch_size
- A number of zipcodes that manager process dispatch to worker process each time.uses_sqlite
- Uses weather data from imported SQLite3 database ifyes
(recommend for speed) or download weather data for individual weather station on demand ifno
processes
- A number of process will be forked on the worker process.nth
- Search within n-th closest station [set to0
for unlimited]distance
- Search within distance (KM) [set to0
for unlimited]column
- A column file that contains list of weather data column to be outputzip2ws
- SQLite3 database of zip codes and weather stationspath
- Path relative to database files
Usage¶
Manager process¶
usage: manager.py [-h] [--config CONFIG] [-o OUTFILE] [-v] inputs [inputs ...]
Weather search by ZIP (Manager)
positional arguments:
inputs CSV input file(s) name
optional arguments:
-h, --help show this help message and exit
--config CONFIG Default configuration file (default: zip2wd.cfg)
-o OUTFILE, --out OUTFILE
Search results in CSV (default: output.csv)
-v, --verbose Verbose message
Worker process¶
usage: worker.py [-h] [--config CONFIG] [-v]
Weather search by ZIP (Worker)
optional arguments:
-h, --help show this help message and exit
--config CONFIG Default configuration file (default: zip2wd.cfg)
-v, --verbose Verbose message
Example:¶
Run manager process search weather data for the input file sample-input-extend.csv
python manager.py sample-input-extend.csv
The default output file is
output.csv
Run worker process
python worker.py
The manager will dispatch job (list of zip codes and date range) to the connected workers. The worker process also forks a number of process (specify by
processes
in the configuration file) to search the weather data for each zip code and put back the results to the manager process.We can have multiple workers run on same or difference machine.
Output¶
in meters