Welcome to pyHML’s documentation!¶
Copyright (c) 2017 Be The Match operated by National Marrow Donor Program. All Rights Reserved.
pyHML¶
Python HML parser
- Free software: LGPL 3.0
- Documentation: https://pyhml.readthedocs.io.
- Jupyter Notebook
Features¶
import pyhml
hml_file = "hml_example.xml"
hmlparser = pyhml.HmlParser()
hml = hmlparser.parse(hml_file)
outdir = 'output/directory'
# Print out each subject in fasta format
hml.tobiotype(outdir, dtype='fasta', by='subject')
# Print out the full HML file in IMGT dat file format
hml.tobiotype(outdir, dtype='imgt', by='file')
# Get pandas DF from HML object
pandasdf = hml.toPandas()
print(pandasdf)
ID Locus glstring dbversion \
0 1367-7150-8 HLA-A HLA-A*01:01:01+HLA-A*24:02:01 3.14.0
1 1367-7150-8 HLA-A HLA-A*01:01:01+HLA-A*24:02:01 3.14.0
2 1367-7150-8 HLA-A HLA-A*01:01:01+HLA-A*24:02:01 3.14.0
3 1367-7150-8 HLA-A HLA-A*01:01:01+HLA-A*24:02:01 3.14.0
4 1367-7150-8 HLA-B HLA-B*08:01:01+HLA-B*57:01:01 3.14.0
5 1367-7150-8 HLA-B HLA-B*08:01:01+HLA-B*57:01:01 3.14.0
6 1367-7150-8 HLA-B HLA-B*08:01:01+HLA-B*57:01:01 3.14.0
7 1367-7150-8 HLA-B HLA-B*08:01:01+HLA-B*57:01:01 3.14.0
8 1367-7150-8 HLA-C HLA-C*06:02:01+HLA-C*07:01:01 3.14.0
9 1367-7150-8 HLA-C HLA-C*06:02:01+HLA-C*07:01:01 3.14.0
10 1367-7150-8 HLA-C HLA-C*06:02:01+HLA-C*07:01:01 3.14.0
11 1367-7150-8 HLA-C HLA-C*06:02:01+HLA-C*07:01:01 3.14.0
12 1367-7150-8 HLA-DPB1 HLA-DPB1*02:01:02+HLA-DPB1*04:01:01 3.14.0
13 1367-7150-8 HLA-DPB1 HLA-DPB1*02:01:02+HLA-DPB1*04:01:01 3.14.0
14 1367-7150-8 HLA-DRB1 HLA-DRB1*03:01:01+HLA-DRB1*07:01:01 3.15.0
15 1367-7150-8 HLA-DRB1 HLA-DRB1*03:01:01+HLA-DRB1*07:01:01 3.15.0
sequence
0 TTCCTGGATACTCACGACGCGGACCCAGTTCTCACTCCCATTGGGT...
1 TTCCCGTCAGACCCCCCCAAGACACATATGACCCACCACCCCATCT...
2 TTCCTGGATACTCACGACGCGGACCCAGTTCTCACTCCCATTGGGT...
3 GTGCCTGTGTCCAGGCTGGTGTCTGGGTTCTGTGCTCTCTTCCCCA...
4 CCATGGTGAGTTTCCCTGTACAAGAGTCCAAGGGGAGAGGTAAGTG...
5 GGCCTCTGCGGAGAGGAGCGAGGGGCCCGCCCGGCGAGGGCGCAGG...
6 CCATGGTGAGTTTCCCTGTACAAGAGTCCAAGGGGAGAGGTAAGTG...
7 GGCCTCTGCGGAGAGGAGCGAGGGGCCCGCCCGGCGAGGGCGCAGG...
8 AGGGATCAGGACGAAGTCCCAGGTCCCGGACGGGGCTCTCAGGGTC...
9 CGCATCCCCACTTCCCACTCCCATTGGGTGTCGGATATCTAGAGAA...
10 AGGGATCAGGACGAAGTCCCAGGTCCCGGACGGGGCTCTCAGGGTC...
11 CGCATCCCCACTTCCCACTCCCATTGGGTGTCGGATATCTAGAGAA...
12 CCAATTGGCCAATTGGCCAATTGGCCAATTGGCCAATTGGCCAATT...
13 CCAATTGGCCAATTGGCCAATTGGCCAATTGGCCAATTGGCCAATT...
14 CATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCA...
15 CATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCA...
Install¶
pip install pyhml
Credits¶
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Installation¶
Stable release¶
To install pyHML, run this command in your terminal:
$ pip install pyhml
This is the preferred method to install pyHML, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources¶
The sources for pyHML can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/mhalagan-nmdp/pyhml
Or download the tarball:
$ curl -OL https://github.com/mhalagan-nmdp/pyhml/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
History¶
0.0.5 (2017-04-16)¶
- Improved documentation
- Fixed issues with parsing HML files with NMDP-CORRECTION
0.0.4 (2017-04-15)¶
- Fixed dependency issues.
- Moved tobiotype to HML object.
- Moved toDF to HML object and renamed toPandas()
- Added tests and linked to travis.ci
0.0.3 (2017-04-14)¶
- Added the ability to parse .gz files
- Added the ability to parse HML files with bad tags.
0.0.2 (2017-11-14)¶
- Fixed issues with parsing HML files with missing data
0.0.1 (2017-10-19)¶
- First release on PyPI.
pyhml package¶
pyhml¶
-
class
pyhml.pyhml.
HmlParser
(hmlversion: str = None, verbose: bool = False)[source]¶ Bases:
object
A python HML parser that converts any valid HML file into an python
object
. Allows users to easily interact with HML data as python objects. Users can also easily convert the HML data to a pandas DataFrame. If nohmlversion
is provided, then the schemas for all HML versions are loaded.Examples:
>>> import pyhml >>> hmlparser = pyhml.HmlParser(verbose=True) >>> hml = hmlparser.parse(hml_file) >>> hml_df = hml.toPandas()
Parameters: - hmlversion (str) – A specific HML version to load.
- verbose (bool) – Flag for running in verbose.
Data Objects¶
HML¶
-
class
pyhml.models.hml.
HML
(project_name: str = None, version: str = None, schema_location: str = None, reporting_center: pyhml.models.reporting_center.ReportingCenter = None, sample: List[pyhml.models.sample.Sample] = None)[source]¶ Bases:
pyhml.models.base_model_.Model
NOTE: This class is auto generated by the swagger code generator program. Do not edit the class manually.
-
classmethod
from_dict
(dikt) → pyhml.models.hml.HML[source]¶ Returns the dict as a model
Parameters: dikt – A dict. Type: dict Returns: The HML of this HML. Return type: HML
-
project_name
¶ Gets the project_name of this HML.
Returns: The project_name of this HML. Return type: str
-
reporting_center
¶ Gets the reporting_center of this HML.
Returns: The reporting_center of this HML. Return type: ReportingCenter
-
schema_location
¶ Gets the schema_location of this HML.
Returns: The schema_location of this HML. Return type: str
-
toPandas
() → pandas.core.frame.DataFrame[source]¶ Returns all the HML data as a pandas DataFrame.
Examples:
>>> import pyhml >>> hmlparser = pyhml.HmlParser(verbose=True) >>> hml = hmlparser.parse(hml_file) >>> hml_df = hml.toPandas()
Returns: Pandas dataframe Return type: DataFrame
-
tobiotype
(outdir, dtype='fasta', by='file')[source]¶ Converts an HML object to a BioPython data fromat
Examples:
>>> import pyhml >>> hmlparser = pyhml.HmlParser(verbose=True) >>> hml = hmlparser.parse(hml_file) >>> hml.tobiotype("output/directory",dtype='imgt', by='subject')
Parameters: - outdir (str) – The output directory
- dtype – The BioPython output type
- by (str) – What to print out the HML file by
-
version
¶ Gets the version of this HML.
Returns: The version of this HML. Return type: str
-
classmethod
Sample¶
-
class
pyhml.models.sample.
Sample
(center_code: int = None, id: str = None, collection_method: str = None, typing: List[pyhml.models.typing.Typing] = None)[source]¶ Bases:
pyhml.models.base_model_.Model
Examples:
>>> from pyhml.models.typing import Typing >>> from pyhml.models.sample import Sample
-
center_code
¶ Gets the center_code of this Sample.
Returns: The center_code of this Sample. Return type: int
-
collection_method
¶ Gets the collection_method of this Sample.
Returns: The collection_method of this Sample. Return type: str
-
classmethod
from_dict
(dikt) → pyhml.models.sample.Sample[source]¶ Returns the dict as a model
Parameters: dikt – A dict. Type: dict Returns: The Sample of this Sample. Return type: Sample
-
id
¶ Gets the id of this Sample.
Returns: The id of this Sample. Return type: str
-
seq_records
¶ Gets the seq_records of this Sample.
Returns: The seq_records of this Sample. Return type: Dict
-
Typing¶
-
class
pyhml.models.typing.
Typing
(date: str = None, gene_family: str = None, allele_assignment: List[pyhml.models.allele_assignment.AlleleAssignment] = None, consensus_sequence: List[pyhml.models.consensus.Consensus] = None, typing_method: pyhml.models.typing_method.TypingMethod = None)[source]¶ Bases:
pyhml.models.base_model_.Model
NOTE: This class is auto generated by the swagger code generator program. Do not edit the class manually.
-
allele_assignment
¶ Gets the allele_assignment of this Typing.
Returns: The allele_assignment of this Typing. Return type: List[AlleleAssignment]
-
consensus_sequence
¶ Gets the consensus_sequence of this Typing.
Returns: The consensus_sequence of this Typing. Return type: List[Consensus]
-
date
¶ Gets the date of this Typing.
Returns: The date of this Typing. Return type: str
-
classmethod
from_dict
(dikt) → pyhml.models.typing.Typing[source]¶ Returns the dict as a model
Parameters: dikt – A dict. Type: dict Returns: The Typing of this Typing. Return type: Typing
-
gene_family
¶ Gets the gene_family of this Typing.
Returns: The gene_family of this Typing. Return type: str
-
seq_records
¶ Gets the seq_records of this ReferenceData.
Returns: The seq_records of this ReferenceData. Return type: List[SeqRecord]
-
typing_method
¶ Gets the typing_method of this Typing.
Returns: The typing_method of this Typing. Return type: TypingMethod
-
Consensus¶
-
class
pyhml.models.consensus.
Consensus
(date: str = None, consensus_sequence_block: List[pyhml.models.consensus_seq_block.ConsensusSeqBlock] = None, reference_database: List[pyhml.models.ref_database.RefDatabase] = None)[source]¶ Bases:
pyhml.models.base_model_.Model
NOTE: This class is auto generated by the swagger code generator program. Do not edit the class manually.
-
consensus_sequence_block
¶ Gets the consensus_sequence_block of this Consensus.
Returns: The consensus_sequence_block of this Consensus. Return type: List[ConsensusSeqBlock]
-
date
¶ Gets the date of this Consensus.
Returns: The date of this Consensus. Return type: str
-
classmethod
from_dict
(dikt) → pyhml.models.consensus.Consensus[source]¶ Returns the dict as a model
Parameters: dikt – A dict. Type: dict Returns: The Consensus of this Consensus. Return type: Consensus
-
reference_database
¶ Gets the reference_database of this Consensus.
Returns: The reference_database of this Consensus. Return type: List[RefDatabase]
-
Consensus Block¶
-
class
pyhml.models.consensus_seq_block.
ConsensusSeqBlock
(continuity: bool = None, description: str = None, end: int = None, expected_copy_number: int = None, phase_set: str = None, reference_sequence_id: str = None, start: int = None, strand: str = None, sequence: Bio.Seq.Seq = None, sequence_quality: List[pyhml.models.seq_quality.SeqQuality] = None, variant: List[pyhml.models.variant.Variant] = None)[source]¶ Bases:
pyhml.models.base_model_.Model
NOTE: This class is auto generated by the swagger code generator program. Do not edit the class manually.
-
continuity
¶ Gets the continuity of this ConsensusSeqBlock.
Returns: The continuity of this ConsensusSeqBlock. Return type: bool
-
description
¶ Gets the description of this ConsensusSeqBlock.
Returns: The description of this ConsensusSeqBlock. Return type: str
-
end
¶ Gets the end of this ConsensusSeqBlock.
Returns: The end of this ConsensusSeqBlock. Return type: int
-
expected_copy_number
¶ Gets the expected_copy_number of this ConsensusSeqBlock.
Returns: The expected_copy_number of this ConsensusSeqBlock. Return type: int
-
classmethod
from_dict
(dikt) → pyhml.models.consensus_seq_block.ConsensusSeqBlock[source]¶ Returns the dict as a model
Parameters: dikt – A dict. Type: dict Returns: The ConsensusSeqBlock of this ConsensusSeqBlock. Return type: ConsensusSeqBlock
-
phase_set
¶ Gets the phase_set of this ConsensusSeqBlock.
Returns: The phase_set of this ConsensusSeqBlock. Return type: str
-
reference_sequence_id
¶ Gets the reference_sequence_id of this ConsensusSeqBlock.
Returns: The reference_sequence_id of this ConsensusSeqBlock. Return type: str
-
sequence
¶ Gets the sequence of this ConsensusSeqBlock.
Returns: The sequence of this ConsensusSeqBlock. Return type: Seq
-
sequence_quality
¶ Gets the sequence_quality of this ConsensusSeqBlock.
Returns: The sequence_quality of this ConsensusSeqBlock. Return type: List[SeqQuality]
-
start
¶ Gets the start of this ConsensusSeqBlock.
Returns: The start of this ConsensusSeqBlock. Return type: int
-
strand
¶ Gets the strand of this ConsensusSeqBlock.
Returns: The strand of this ConsensusSeqBlock. Return type: str
-
variant
¶ Gets the variant of this ConsensusSeqBlock.
Returns: The variant of this ConsensusSeqBlock. Return type: List[Variant]
-
Allele Assignment¶
-
class
pyhml.models.allele_assignment.
AlleleAssignment
(allele_db: str = None, allele_version: str = None, date: str = None, glstring: List[str] = None, haploid: List[pyhml.models.haploid.Haploid] = None)[source]¶ Bases:
pyhml.models.base_model_.Model
NOTE: This class is auto generated by the swagger code generator program. Do not edit the class manually.
-
allele_db
¶ Gets the allele_db of this AlleleAssignment.
Returns: The allele_db of this AlleleAssignment. Return type: str
-
allele_version
¶ Gets the allele_version of this AlleleAssignment.
Returns: The allele_version of this AlleleAssignment. Return type: str
-
date
¶ Gets the date of this AlleleAssignment.
Returns: The date of this AlleleAssignment. Return type: str
-
classmethod
from_dict
(dikt) → pyhml.models.allele_assignment.AlleleAssignment[source]¶ Returns the dict as a model
Parameters: dikt – A dict. Type: dict Returns: The AlleleAssignment of this AlleleAssignment. Return type: AlleleAssignment
-
glstring
¶ Gets the glstring of this AlleleAssignment.
Returns: The glstring of this AlleleAssignment. Return type: List[str]
-
haploid
¶ Gets the haploid of this AlleleAssignment.
Returns: The haploid of this AlleleAssignment. Return type: List[Haploid]
-
Allele Assignment¶
-
class
pyhml.models.allele_assignment.
AlleleAssignment
(allele_db: str = None, allele_version: str = None, date: str = None, glstring: List[str] = None, haploid: List[pyhml.models.haploid.Haploid] = None)[source] Bases:
pyhml.models.base_model_.Model
NOTE: This class is auto generated by the swagger code generator program. Do not edit the class manually.
-
allele_db
Gets the allele_db of this AlleleAssignment.
Returns: The allele_db of this AlleleAssignment. Return type: str
-
allele_version
Gets the allele_version of this AlleleAssignment.
Returns: The allele_version of this AlleleAssignment. Return type: str
-
date
Gets the date of this AlleleAssignment.
Returns: The date of this AlleleAssignment. Return type: str
-
classmethod
from_dict
(dikt) → pyhml.models.allele_assignment.AlleleAssignment[source] Returns the dict as a model
Parameters: dikt – A dict. Type: dict Returns: The AlleleAssignment of this AlleleAssignment. Return type: AlleleAssignment
-
glstring
Gets the glstring of this AlleleAssignment.
Returns: The glstring of this AlleleAssignment. Return type: List[str]
-
haploid
Gets the haploid of this AlleleAssignment.
Returns: The haploid of this AlleleAssignment. Return type: List[Haploid]
-
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/mhalagan-nmdp/pyhml/issues.
If you are reporting a bug, please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
Write Documentation¶
pyHML could always use more documentation, whether as part of the official pyHML docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/mhalagan-nmdp/pyhml/issues.
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!¶
Ready to contribute? Here’s how to set up pyhml for local development.
Fork the pyhml repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/pyhml.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv pyhml $ cd pyhml/ $ python setup.py develop
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:
$ flake8 pyhml tests $ python setup.py test or py.test $ tox
To get flake8 and tox, just pip install them into your virtualenv.
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
- The pull request should include tests.
- If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
- The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check https://travis-ci.org/mhalagan-nmdp/pyhml/pull_requests and make sure that the tests pass for all supported Python versions.
Usage¶
To use pyHML in a project:
import pyhml
hmlparser = pyhml.HmlParser()
hml = hmlparser.parse("hml_example.xml")
pandasdf = hml.toPandas()
# Ouput the HML data as a IPD-IMGT/HLA .dat file for each subject
hml.tobiotype("output/directory", dtype='imgt', by='subject')
# Output the whole HML file as one fasta file
hml.tobiotype("output/directory", dtype='fasta', by='file')
# Defaults to dtype='fasta' and by='subject'
hml.tobiotype("output/directory")