gamma-cat

gamma-cat is an open data collection and source catalog for very-high-energy gamma-ray astronomy.

Overview

Gamma-Cat provides the gamma-ray data as

  1. a full data collection (often multiple measurements, e.g. spectra, for a given source)
  2. a source catalog (a simple table with one source per row, and a small subset of available data)

The latest version of the files on github can be found here: https://github.com/gammapy/gamma-cat/tree/master/output/data

Source catalog

The source catalog is available as a single table in a single file. It contains only part of the data available in gamma-cat.

Why multiple formats?

  • FITS is the main format for the catalog we release. It supports vector columns, which we use for spectral points.
  • The ECSV and YAML variant are more for us working on gamma-cat, to have a text-based, version control friendly format where it’s easy to see which changes occurred from one version to the next.

Data Collection

Here you can access the whole data collection.

At the moment we provide two “views”, either by source or by reference.

Data by source

List of sources in gamma-cat:

Data by reference

List of references in gamma-cat:

Stats

Basics

  • Version: e4d09c0

Source locations

TODO:

  • AIT plot of sky locations
  • Add Bokeh or Aladin Lite interactive Figure?

Kifune plot

The so-called Kifune plot shows the cumulative number of detected TeV sources versus time.

(Source code, png, hires.png, pdf)

_images/plot_kifune.png

Source classes

Plot source class pie chart

Changes

This is the changelog for gamma-cat.

It lists releases and major changes, not every small data update.

See also: Contributors and Acknowledgements.

Version 0.1 (unreleased)

  • tbd

Overview

Gamma-Cat is organised as a Github respository which mainly contains of the catalog and collection, the raw input data, the sphinx code of this webpage and a Python package to work with the data. The project is BSD licensed (like Astropy, Gammapy). This license only applies to the code, it has nothing to do with the data.

Gamma-Cat is under development and aims to provide data taken by HESS, MAGIC, VERITAS, WHIPPLE, HEGRA, CANGAROO, HAWC, MILAGRO, CRIMEA, ARGO, CAT and DURHAM. In the future we may try to ingest and interconnect other catalogs like e.g. the Fermi-LAT GeV catalogs.

Unfortunately, the catalog is still incomplete. If you want to help us, you are very welcome to look at the Github issue tracker (requires a Github account) or feel free to contact one of the Contributors.

For further information you can take a look on either the Gamma-Cat poster at ADASS 2016 at the ADASS 2016 conference or on the follow up Gamma-Cat poster from 2017 with more recent information on Gamma-Cat (and gamma-sky.net).

Terms of use

All data collected here was originally generated and published by others.

First and foremost, when using data, you have to cite the original publication!

In addition, if you used gamma-cat, we ask that you acknowledge this via

This research made use of gamma-cat (https://github.com/gammapy/gamma-cat),
an open data collection and source catalog for gamma-ray astronomy.

(we plan to write a paper on gamma-cat; once that is available, we’ll mention the reference here)

Otherwise, you are free to use this data as you like.

Contributors

The following people have directly contributed to gamma-cat (alphabetical order by first name)

Many others have contributed indirectly, e.g. given data or feedback via private communication.

Thank you!

Acknowledgements

The following tools and services were used to produce this catalog:

Example

TODO: Needs to be done!

Introduction

So you want to contribute to gamma-cat?

Really? Are you sure?

Great! Read on …

There are many ways to contribute to gamma-cat. We need more people that help with data entry, review data in gamma-cat for completeness and accuracy, improve the documentation, the data format schemas and the Python scripts.

We will try to describe and explain how everything works here in the contributor documentation. But we realise that in the end we will probably fall short, and if you try to contribute you will have questions about YAML or ECSV or schemas or get stuck with a git on Python question. If that happens, please don’t give up, but contact us and we will help!

Everything happens on Github here: https://www.github.com/gammapy/gamma-cat

This is a git repository that contains everything related to gamma-cat: the data entry in the input folder, the script make.py and gammacat folder with the Python scripts to generate the output files, which are in the output folder. The documentation at https://gamma-cat.readthedocs.io is generated from the RST files in the webpage folder in that repo. But not just all the files and version control is there, Github is also the place to file “issues” for question, discussion, feature or data entry requests, bug reports (see https://github.com/gammapy/gamma-cat/issues). And it is also the place where all changes and additions happen, via “pull requests” (see https://github.com/gammapy/gamma-cat/pulls ).

So if you want to contribute to gamma-cat, you have to make an account on Github. It’s free and should just take a minute. You can then find a lot of information about git and Github here: https://help.github.com/

There you will find documentation how to open “issues” and “pull requests” and resources how to learn git, as well as how to do basic things like edit or add a file directly via the Github web interface. This means that you can do some data entry or documentation improvements for gamma-cat in a simple way. If you’re new to Github and git, and the explanations below aren’t clear to you, then what you can do is to open a new issue in the gamma-cat issue tracker where you describe what you want to do (i.e. add or change something), and then we’ll try to help you do it, i.e. make your first pull request.

The following pages give you more information, focusing mostly on how to do data entry for gamma-cat, since this will be the most common way for people to contribute.

Input

We’ve already mentioned it in the introduction: all data entry for gamma-cat happens by editing or adding text files in the input folder. We use file formats that are both human- and machine-readable:

  • YAML files for hierarchical data
  • ECSV files for tabular data

This section describes the format and content of the data entry files for gamma-cat.

All data entry is done in the folder named input. It contains three sub-folders of interest:

  • sources contains yaml-files with basic information about the gamma-ray sources.
  • data contains the data from publications stored in YAML and ECSV files. The folder contains subfolders named by years and there subsubfolders named by reference_ids. E.g. the data from the publication with reference 2015ApJ...802...65A is stored in the folder input/data/2015/2015ApJ...802...65A. All these files are named corresponding to the source_id of the gamma-ray source defined in its definition file.
  • schemas contains files which define the structure of the data entry files and descriptions of the properties in the data files.

Now, these input files will be discussed in more detail, firstly the source definition files in sources:

The information (and a short description) which can be stored in such a file are defined by some keywords in basic_source_info.schema.yaml. It starts with properties, like the common_name, the source_id used in gamma-cat or the tevcat_name and goes on with information about experiments which investigated this source. Two important information are the reference_ids, which are all ADS reference to publication which deal with this source, and the source_id from which the names in the data folder are built. At the end of basic_source_info.schema.yaml after the keyword required, there are all of the upper information written down which have to be defined in a source definition file.

A good example to get familiar with this is e.g. tev_000049.yaml and compare it with basic_source_info.schema.yaml

The folders in /input/data/<year>/<reference> contain ecsv files with measured data in it, e.g. tev-000034-sed.ecsv, yaml files with model parameters, e.g. tev-000034.yaml, and finally a info yaml file in which all data corresponding to the publication are summarised, e.g. info.yaml.

The escv files can be either the measurement of spectral fluxes or of lightcurves. Information about the units of the data and additional information like source_id or telescope are stored as meta data in the header of the file. The naming convention is tev-<source_id>-sed.ecsv and tev-<source_id>-lc.ecsv, respectively.

The YAML files contain the model parameters given in the publication and are named within gamma-cat as dataset-files. The information which can/ has to be stored in a yaml file are defined in dataset_source_info.schemas.yaml.

The info.yaml files give an overview about all stored data which is related to the publication and its layout is defined in dataset_info.schema.yaml. One important property in a info file is data-entry with its subinformation status, reviewed and notes.

Add/ Change data:

When you add or change input data you have to do three things. Firstly, you must add or change the data, secondly, you must update the data status in the corresponding info.yaml file and finally, you have to tell gamma-cat that there is new data. This is done in gamma_cat_dataset.yaml where you must add the reference_id of the publication of the added data. Gamma-cat will only contain data whose reference_id is listed in gamma_cat_dataset.yaml.

Workflow

This page gives an overview how to make changes or additions to Gamma-Cat like e.g. add data, fix bugs or whatever. It is not a tutorial which explains the tools (git, Github, Sphinx etc.) or the code (Python etc.)

If you want to run the gamma-cat Python scripts locally, go through the Installation chapter. But contribution can be done without the installation because we have continous integration tests set up on travis-ci that check that everything is working OK. If that is what you want to do, jump over the Installation chapter. For information about the gammacat Python packages and the structure of the input files, please go to Code

Pull requests

Contribution can be done via pull requests on Github (hence you need an Github account). We like them small and easy to review. To get familiar with git and github you can look at https://help.github.com or simply use google for more information.

The general contribution cycle is roughly as follows: 1. Get the latest version of the master branch 2. Checkout a new feature branch for you changes/ additions 3. Make fixes, changes and additions locally 4. Make a pull request 5. Someone of us reviews the pull request, gives feedback and finally merges it 6. Update to the new latest verion of the master branch

Then you are done and you can start using the new Gamma-Cat version or do further improvements in a new pull request. It is possible and normal do work on different tasks in parallel using git branches.

So how large should one pull request be?

Our experience is that the smaller the better and each pull request should only handle one task, e.g. for every data entry or every bug fix make a single pull request. Working on a pull request for an hour or maximum a day and having a diff of around 100 lines to review is pleasant.

Pull requests that drag on for a few days or having a diff of 1000 lines of code are almost always painful and inefficient for both, the person who makes it and the reviewer.

If your pull request is related to an issue, it is recommended to name it analogeously, e.g. Fix bug in issue 45. This will make things easier for us.

Installation (optional)

If you want to run the gamma-cat Python scripts locally, you need to install Python 3.6 and some Python packages. We recommend you to download Anaconda and then run the following commands in the gamma-cat folder:

conda config --set always_yes yes --set changeps1 no
conda update -q conda
conda info -a
# Now install our dependencies
conda env create -f environment.yml
# Activation of the installed environment
source activate gamma-cat

Code

This page contains information about the whole code stored in the gamma-cat repository.

make.py

There is a command line interface to run the gamma-cat scripts, the make.py file in the top-level folder.

To see the available sub-commands and options:

$ ./make.py --help

To run the full pipeline, i.e. generate all output files and run all checks:

$ ./make.py all

After adding/ changing data in the input folder, one should always execute:

$ ./make.py checks

which checks the format/ structure of the input files.

gammacat package

The make.py command line interface just imports and executes functions and classes from the gammacat Python package (i.e. the .py files in the folder with name gammacat).

We list the modules in gammacat and comment on the code organisation.

The following are more basic modules:

  • utils.py has some helper utility functions (e.g. for JSON / YAML / ECSV I/O)
  • modeling.py has a Parameter and ParameterList class to help process input spatial and spectral source models from the YAML files.
  • info.py has some helpers for versions, filenames, …
  • sed.py has a class to process and validate the spectral energy distributions (SEDs) in the input folder. The SEDs in the output folder can be read directly with gammapy.spectrum.FluxPoints.
  • lightcurve.py has a class to process and validate the lightcurves in the input folder. The lightcurves in the output folder can be read directly with gammapy.time.LightCurve.

In additions there are classes in gammapy.catalog.gammacat that are used in the gammacat scripts to process the data: GammaCatResource, GammaCatResourceIndex, GammaCatDatasetCollection.

Then there is a hierarchy of higher-level modules (that import from the basic modules and modules representing lower-level steps in the processing pipeline):

  • input.py has classes to read / clean up / process the data in the input folder.
  • collection.py has classes to create the files in the output folder (only the dataset files and index files, not the catalog files).
  • cat.py is the code to create the catalog files
  • checks.py is the code to run checks. At the moment the methods there just dispatch to methods called validate or check in lower-level modules (such as gammacat.input), and the actual checks are thus scattered throughout the gammacat modules. There’s also checks on data content in gammacat/tests (which is probably a bad idea, but pytest is convenient to have asserts)

Tests

There is a folder gammacat/tests with some unit tests for the code in the gammacat package, that can be executed via python -m pytest gammacat/tests

We don’t have the relation quite figure out, what goes where:

  • gammacat/tests
  • gammapy/catalog/tests/test_gammacat.py
  • The various check / validate methods throughout gammacat and executed via ./make.py check.

Tools

The following tool is helpful to lint YAML files:

TODO: it’s too picky, showing errors for things that are OK. Figure out how to make it less picky and document that here.

Website build

The gamma-cat website is a static website generated by Python and Sphinx.

We have a Sphinx test page where we can try out things locally and check if they also work on ReadTheDocs. It’s an orphan page, i.e. doesn’t show up for normal users.

We use several Sphinx extensions, and also have our own in gammacat/sphinx/exts.

More info soon … for now this is just a link collection:

Details

This page contains some notes with details about gamma-cat.

Data

Reference identifiers

For paper identifiers, we use the ADS identifiers.

These are unique, well-known and stable. This corresponds to the bibcode in https://github.com/andycasey/ads/, i.e. can be used to obtain further info from ADS

Note that some bibcodes contain characters that don’t work well (or at all) as directory or filenames, e.g. 2011A&A…531L..18H with the & character. What ADS does in that case for URLs is to quote them, i.e. the URL is http://adsabs.harvard.edu/abs/2011A%26A…531L..18H . In gamma-cat we do the same, we URL quote the bibcode for filenames, folder names and URLs, and otherwise use the normal bibcode.

>>> papers = list(ads.SearchQuery(bibcode='2011A&A...531L..18H'))
>>> print(papers[0].bibcode)
'2011A&A...531L..18H'
>>> import urllib.parse
>>> urllib.parse.quote('2011A&A...531L..18H')
'2011A%26A...531L..18H'
>>> urllib.parse.unquote('2011A%26A...531L..18H')
'2011A&A...531L..18H'

Source identifiers

  • We use integer source identifiers. In many cases they are the same as for the TeGeV catalog, but generally that is not the case.
  • We do not introduce new sources “names” (for now). TBD: how should people reference sources from our catalog? Maybe we should do a position-based identifier like TeVCat or TeGeV cat?
  • TBD: How do we handle sources that split out into multiple sources with deeper observations?

Source classes

TODO: document properly.

For now, see the list of source classes we’re using at the end of this schema file:

https://github.com/gammapy/gamma-cat/blob/master/input/schemas/basic_source_info.schema.yaml

Positions

Sometimes source positions aren’t measured or given in the paper. This is commonly the case for AGN. In those cases, we use the position from SIMBAD and look it up like this:

>>> from astropy.coordinates import SkyCoord
>>> SkyCoord.from_name('Crab nebula').to_string(precision=7)
'83.6330830 22.0145000'

and store it like this:

position:
  simbad_id: Crab nebula
  ra: 83.6330830
  dec: 22.0145000

The presence of the simbad_id key means that it’s a position from SIMBAD.