Welcome to Fred2’s documentation!

Welcome to the class and function documentation of FRED2.

Tutorials on how to use FRED2 can be found at:

https://github.com/FRED-2/Fred2/tree/master/Fred2/tutorials.

Basic

Fred2.Core Module

Core.Allele

class Fred2.Core.Allele.Allele(name, prob=None)

Bases: Fred2.Core.Base.MetadataLogger

This class represents an HLA Allele and stores additional information

get_metadata(label, only_first=False)

Getter for the saved metadata with the key label

Parameters:
  • label (str) – key for the metadata that is inferred
  • only_first (bool) – true if only the the first element of the matadata list is to be returned
log_metadata(label, value)

Inserts a new metadata

Parameters:
  • label (str) – key for the metadata that will be added
  • value (list(object)) – any kindy of additional value that should be kept

Core.Base

https://docs.python.org/3/library/abc.html

class Fred2.Core.Base.ACleavageFragmentPrediction

Bases: object

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved

name

The name of the predictor

predict(aa_seq, **kwargs)

Predicts the probability that the fragment can be produced by the proteasom

Parameters:aa_seq (Peptide) – The sequence to be cleaved
Returns:Returns a AResult object for the specified Bio.Seq
Return type:AResult
supportedLength

The supported lengths of the predictor

version

Parameter specifying the version of the prediction method

class Fred2.Core.Base.ACleavageSitePrediction

Bases: object

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved (starting from 1)

name

The name of the predictor

predict(aa_seq, **kwargs)

Predicts the proteasomal cleavage site of the given sequences

Parameters:aa_seq (Peptide or Protein) – The sequence to be cleaved
Returns:Returns a AResult object for the specified Bio.Seq
Return type:AResult
supportedLength

The supported lengths of the predictor

version

Parameter specifying the version of the prediction method

class Fred2.Core.Base.AEpitopePrediction

Bases: object

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The alleles for which the internal predictor representation is needed
Returns:Returns a string representation of the input alleles
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Predicts the binding affinity for a given peptide or peptide lists for a given list of alleles. If alleles is not given, predictions for all valid alleles of the predictor is performed. If, however, a list of alleles is given, predictions for the valid allele subset is performed.

Parameters:
  • peptides (Peptide or list(Peptide)) – The peptide objects for which predictions should be performed
  • alleles (Allele/list(Allele)) – An Allele or list of Allele for which prediction models should be used
Returns:

Returns a AResult object for the specified Peptide and Allele

Return type:

AResult

supportedAlleles

A list of valid allele models

supportedLength

A list of supported peptide lengths

version

The version of the predictor

class Fred2.Core.Base.AExternal

Bases: object

Base class for external tools

command

Defines the commandline call for external tool

get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) –
  • Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
class Fred2.Core.Base.AHLATyping

Bases: object

name

The name of the predictor

predict(ngsFile, output, **kwargs)

Prediction method for inferring the HLA typing

Parameters:
  • ngsFile (str) – The path to the input file containing the NGS reads
  • output (str) – The path to the output file or directory
Returns:

A list of HLA alleles representing the genotype predicted by the algorithm

Return type:

list(Allele)

version

Parameter specifying the version of the prediction method

class Fred2.Core.Base.APluginRegister(name, bases, nmspc)

Bases: abc.ABCMeta

This class allows automatic registration of new plugins.

mro() → list

return a type’s method resolution order

register(subclass)

Register a virtual subclass of an ABC.

class Fred2.Core.Base.ASVM

Bases: object

Base class for SVM prediction tools

encode(peptides)

Returns the feature encoding for peptides

Parameters:peptides (list(Peptide)/Peptide) – List of or a single Peptide object
Returns:Feature encoding of the Peptide objects
Return type:list(Object)
class Fred2.Core.Base.ATAPPrediction

Bases: object

name

The name of the predictor

predict(peptides, **kwargs)

Predicts the TAP affinity for the given sequences

Parameters:peptides (list(Peptide)/Peptide) – Peptide for which TAP affinity should be predicted
Returns:Returns a TAPResult object
Return type:TAPResult
supportedLength

The supported lengths of the predictor

version

Parameter specifying the version of the prediction method

class Fred2.Core.Base.MetadataLogger

Bases: object

This class provides a simple interface for assigning additional metadata to any object in our data model. Examples: storing ANNOVAR columns like depth, base count, dbSNP id, quality information for variants, additional prediction information for peptides etc. This functionality is not used from core methods of FRED2.

The saved values are accessed via log_metadata() and get_metadata()

get_metadata(label, only_first=False)

Getter for the saved metadata with the key label

Parameters:
  • label (str) – key for the metadata that is inferred
  • only_first (bool) – true if only the the first element of the matadata list is to be returned
log_metadata(label, value)

Inserts a new metadata

Parameters:
  • label (str) – key for the metadata that will be added
  • value (list(object)) – any kindy of additional value that should be kept

Core.Generator

Fred2.Core.Generator.generate_peptides_from_proteins(proteins, window_size, peptides=None)

Creates all Peptide for a given window size, from a given Protein.

The result is a generator.

Parameters:
  • proteins (list(Protein) or Protein) – (Iterable of) protein(s) from which a list of unique peptides should be generated
  • window_size (int) – Size of peptide fragments
  • peptides (list(Peptide)) – A list of peptides to update during peptide generation (usa case: Adding and updating Peptides of newly generated Proteins)
Returns:

A unique generator of peptides

Return type:

Generator(Peptide)

Fred2.Core.Generator.generate_peptides_from_variants(vars, length, dbadapter, peptides=None, table='Standard', stop_symbol='*', to_stop=True, cds=False)

Generates Peptide from Variant and avoids the construction of all possible combinations of heterozygous variants by considering only those within the peptide sequence window. This reduces the number of combinations from 2^m with m = #Heterozygous Variants to 2^k with k<<m and k = #Heterozygous Variants within peptide window (and all frame-shift mutations that occurred prior to the current peptide window).

The result is a generator.

Parameters:
  • vars (list(Variant)) – A list of variant objects to construct peptides from
  • length (int) – The length of the peptides to construct
  • dbadapter (ADBAdapter) – A ADBAdapter to extract relevant transcript information
  • peptides (list(Peptide)) – A list of pre existing peptides that should be updated
  • table (str) – Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). Defaults to the ‘Standard’ table
  • stop_symbol (str) – Single character string, what to use for any terminators, defaults to the asterisk, ‘*’
  • to_stop (bool) – Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence)
  • cds (bool) – cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised
Returns:

A list of unique (polymorphic) peptides

Return type:

Generator(Peptide)

Raises:
  • ValueError – If incorrect table argument is pasted
  • TranslationError – If sequence is not multiple of three, or first codon is not a start codon, or last codon is not a stop codon, or an extra stop codon was found in frame, or codon is non-valid
Fred2.Core.Generator.generate_proteins_from_transcripts(transcripts, table='Standard', stop_symbol='*', to_stop=True, cds=False)

Enables the translation from a Transcript to a Protein instance. The result is a generator.

The result is a generator.

Parameters:
  • transcripts (list(Transcript) or Transcript) – A list of or a single transcripts to translate
  • table (str) – Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). Defaults to the ‘Standard’ table
  • stop_symbol (str) – Single character string, what to use for any terminators, defaults to the asterisk, ‘*’
  • to_stop (bool) – Translates sequence and passes any stop codons if False (default True)(translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence)
  • cds (bool) – Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised
Returns:

The protein that corresponds to the transcript

Return type:

Generator(Protein)

Raises:
  • ValueError – If incorrect table argument is pasted
  • TranslationError – If sequence is not multiple of three, or first codon is not a start codon, or last codon ist not a stop codon, or an extra stop codon was found in frame, or codon is non-valid
Fred2.Core.Generator.generate_transcripts_from_tumor_variants(normal, tumor, dbadapter)

Generates all possible Transcript variations of the given Variant.

The result is a generator.

Parameters:
  • normal (list(Variant)) – A list of variants of the normal tissue
  • tumor (list(Variant)) – A list of variant of the cancer tissue for which transcript should be generated
  • dbadapter (ADBAdapter) – a DBAdapter to fetch the transcript sequences
Returns:

A generator of transcripts with all possible variations determined by the given variant list

Return type:

Generator(Transcript)

Fred2.Core.Generator.generate_transcripts_from_variants(vars, dbadapter)

Generates all possible transcript Transcript based on the given Variant.

The result is a generator.

Parameters:vars (list(Variant)) – A list of variants for which transcripts should be build
Param:dbadapter: a DBAdapter to fetch the transcript sequences
Returns:A generator of transcripts with all possible variations determined by the given variant list
Return type:Generator(:class:`~Fred2.Core.Transcript.Transcript)
Invariant:Variants are considered to be annotated from forward strand, regardless of the transcripts real orientation

Core.Peptide

class Fred2.Core.Peptide.Peptide(seq, protein_pos=None)

Bases: Fred2.Core.Base.MetadataLogger, Bio.Seq.Seq

This class encapsulates a Peptide, belonging to one or several Protein.

Note

For accessing and manipulating the sequence see also Bio.Seq.Seq (from Biopython)

back_transcribe()

Returns the DNA sequence from an RNA sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG",
...                     IUPAC.unambiguous_rna)
>>> messenger_rna
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())
>>> messenger_rna.back_transcribe()
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())

Trying to back-transcribe a protein or DNA sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.back_transcribe()
Traceback (most recent call last):
   ...
ValueError: Proteins cannot be back transcribed!
complement()

Returns the complement sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_dna = Seq("CCCCCGATAG", IUPAC.unambiguous_dna)
>>> my_dna
Seq('CCCCCGATAG', IUPACUnambiguousDNA())
>>> my_dna.complement()
Seq('GGGGGCTATC', IUPACUnambiguousDNA())

You can of course used mixed case sequences,

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("CCCCCgatA-GD", generic_dna)
>>> my_dna
Seq('CCCCCgatA-GD', DNAAlphabet())
>>> my_dna.complement()
Seq('GGGGGctaT-CH', DNAAlphabet())

Note in the above example, ambiguous character D denotes G, A or T so its complement is H (for C, T or A).

Trying to complement a protein sequence raises an exception.

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.complement()
Traceback (most recent call last):
   ...
ValueError: Proteins do not have complements!
count(sub, start=0, end=9223372036854775807)

Non-overlapping count method, like that of a python string.

This behaves like the python string method of the same name, which does a non-overlapping count!

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

e.g.

>>> from Bio.Seq import Seq
>>> my_seq = Seq("AAAATGA")
>>> print(my_seq.count("A"))
5
>>> print(my_seq.count("ATG"))
1
>>> print(my_seq.count(Seq("AT")))
1
>>> print(my_seq.count("AT", 2, -1))
1

HOWEVER, please note because python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:

>>> "AAAA".count("AA")
2
>>> print(Seq("AAAA").count("AA"))
2

An overlapping search would give the answer as three!

endswith(suffix, start=0, end=9223372036854775807)

Does the Seq end with the given suffix? Returns True/False.

This behaves like the python string method of the same name.

Return True if the sequence ends with the specified suffix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. suffix can also be a tuple of strings to try. e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.endswith("UUG")
True
>>> my_rna.endswith("AUG")
False
>>> my_rna.endswith("AUG", 0, 18)
True
>>> my_rna.endswith(("UCC", "UCA", "UUG"))
True
find(sub, start=0, end=9223372036854775807)

Find method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

Returns -1 if the subsequence is NOT found.

e.g. Locating the first typical start codon, AUG, in an RNA sequence:

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.find("AUG")
3
get_all_proteins()

Returns all Protein objects associated with the Peptide

Returns:A list of Protein
Return type:list(Protein)
get_all_transcripts()

Returns a list of Transcript objects that are associated with the Peptide

Returns:A list of Transcript
Return type:list(Transcript)
get_metadata(label, only_first=False)

Getter for the saved metadata with the key label

Parameters:
  • label (str) – key for the metadata that is inferred
  • only_first (bool) – true if only the the first element of the matadata list is to be returned
get_protein(transcript_id)

Returns a specific protein object identified by a unique transcript-ID

Parameters:transcript_id (str) – A Transcript ID
Returns:A Protein
Return type:Protein
get_protein_positions(transcript_id)

Returns all positions of origin for a given Protein identified by its transcript-ID

Parameters:transcript_id (str) – The unique transcript ID of the Protein in question
Returns:A list of positions within the protein from which the Peptide originated (starts at 0)
Return type:list(int)
get_transcript(transcript_id)

Returns a specific Transcript object identified by a unique transcript-ID

Parameters:transcript_id (str) – A Transcript ID
Returns:A Transcript
Return type:Transcript
get_variants_by_protein(transcript_id)

Returns all Variant of a Protein that have influenced the Peptide sequence

Parameters:transcript_id (str) – Transcript ID of the specific protein in question
Returns:A list variants that influenced the peptide sequence
Return type:list(Variant)
Raises KeyError:
 If peptide does not originate from specified Protein
get_variants_by_protein_position(transcript_id, protein_pos)

Returns all Variant and their relative position to the peptide sequence of a given Protein and protein position

Parameters:
  • transcript_id (str) – A Transcript ID of the specific protein in question
  • protein_pos (int) – The Protein position at which the peptides sequence starts in the protein
Returns:

Dictionary of relative position of variants in peptide (starts at 0) and associated variants that influenced the peptide sequence

Return type:

dict(int,list(Variant))

Raises:
ValueError:If Peptide does not start at specified position
KeyError:If Peptide does not originate from specified Protein
log_metadata(label, value)

Inserts a new metadata

Parameters:
  • label (str) – key for the metadata that will be added
  • value (list(object)) – any kindy of additional value that should be kept
lower()

Returns a lower case copy of the sequence.

This will adjust the alphabet if required. Note that the IUPAC alphabets are upper case only, and thus a generic alphabet must be substituted.

>>> from Bio.Alphabet import Gapped, generic_dna
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> my_seq = Seq("CGGTACGCTTATGTCACGTAG*AAAAAA", Gapped(IUPAC.unambiguous_dna, "*"))
>>> my_seq
Seq('CGGTACGCTTATGTCACGTAG*AAAAAA', Gapped(IUPACUnambiguousDNA(), '*'))
>>> my_seq.lower()
Seq('cggtacgcttatgtcacgtag*aaaaaa', Gapped(DNAAlphabet(), '*'))

See also the upper method.

lstrip(chars=None)

Returns a new Seq object with leading (left) end stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. print(my_seq.lstrip(“-”))

See also the strip and rstrip methods.

reverse_complement()

Returns the reverse complement sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_dna = Seq("CCCCCGATAGNR", IUPAC.ambiguous_dna)
>>> my_dna
Seq('CCCCCGATAGNR', IUPACAmbiguousDNA())
>>> my_dna.reverse_complement()
Seq('YNCTATCGGGGG', IUPACAmbiguousDNA())

Note in the above example, since R = G or A, its complement is Y (which denotes C or T).

You can of course used mixed case sequences,

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("CCCCCgatA-G", generic_dna)
>>> my_dna
Seq('CCCCCgatA-G', DNAAlphabet())
>>> my_dna.reverse_complement()
Seq('C-TatcGGGGG', DNAAlphabet())

Trying to complement a protein sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.reverse_complement()
Traceback (most recent call last):
   ...
ValueError: Proteins do not have complements!
rfind(sub, start=0, end=9223372036854775807)

Find from right method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the last (right most) occurrence of substring argument sub in the (sub)sequence given by [start:end].

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

Returns -1 if the subsequence is NOT found.

e.g. Locating the last typical start codon, AUG, in an RNA sequence:

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.rfind("AUG")
15
rsplit(sep=None, maxsplit=-1)

Right split method, like that of a python string.

This behaves like the python string method of the same name.

Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done COUNTING FROM THE RIGHT. If maxsplit is omitted, all splits are made.

Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.

e.g. print(my_seq.rsplit(“*”,1))

See also the split method.

rstrip(chars=None)

Returns a new Seq object with trailing (right) end stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. Removing a nucleotide sequence’s polyadenylation (poly-A tail):

>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> my_seq = Seq("CGGTACGCTTATGTCACGTAGAAAAAA", IUPAC.unambiguous_dna)
>>> my_seq
Seq('CGGTACGCTTATGTCACGTAGAAAAAA', IUPACUnambiguousDNA())
>>> my_seq.rstrip("A")
Seq('CGGTACGCTTATGTCACGTAG', IUPACUnambiguousDNA())

See also the strip and lstrip methods.

split(sep=None, maxsplit=-1)

Split method, like that of a python string.

This behaves like the python string method of the same name.

Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If maxsplit is omitted, all splits are made.

Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.

e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_aa = my_rna.translate()
>>> my_aa
Seq('VMAIVMGR*KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_aa.split("*")
[Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
>>> my_aa.split("*", 1)
[Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))]

See also the rsplit method:

>>> my_aa.rsplit("*", 1)
[Seq('VMAIVMGR*KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
startswith(prefix, start=0, end=9223372036854775807)

Does the Seq start with the given prefix? Returns True/False.

This behaves like the python string method of the same name.

Return True if the sequence starts with the specified prefix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. prefix can also be a tuple of strings to try. e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.startswith("GUC")
True
>>> my_rna.startswith("AUG")
False
>>> my_rna.startswith("AUG", 3)
True
>>> my_rna.startswith(("UCC", "UCA", "UCG"), 1)
True
strip(chars=None)

Returns a new Seq object with leading and trailing ends stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. print(my_seq.strip(“-”))

See also the lstrip and rstrip methods.

tomutable()

Returns the full sequence as a MutableSeq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq("MKQHKAMIVALIVICITAVVAAL",
...              IUPAC.protein)
>>> my_seq
Seq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
>>> my_seq.tomutable()
MutableSeq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())

Note that the alphabet is preserved.

tostring()

Returns the full sequence as a python string (DEPRECATED).

You are now encouraged to use str(my_seq) instead of my_seq.tostring().

transcribe()

Returns the RNA sequence from a DNA sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG",
...                  IUPAC.unambiguous_dna)
>>> coding_dna
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
>>> coding_dna.transcribe()
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())

Trying to transcribe a protein or RNA sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.transcribe()
Traceback (most recent call last):
   ...
ValueError: Proteins cannot be transcribed!
translate(table='Standard', stop_symbol='*', to_stop=False, cds=False)

Turns a nucleotide sequence into a protein sequence. New Seq object.

This method will translate DNA or RNA sequences, and those with a nucleotide or generic alphabet. Trying to translate a protein sequence raises an exception.

Arguments:
  • table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). This defaults to the “Standard” table.
  • stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, “*”.
  • to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
  • cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.

e.g. Using the standard table:

>>> coding_dna = Seq("GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna.translate()
Seq('VAIVMGR*KGAR*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> coding_dna.translate(stop_symbol="@")
Seq('VAIVMGR@KGAR@', HasStopCodon(ExtendedIUPACProtein(), '@'))
>>> coding_dna.translate(to_stop=True)
Seq('VAIVMGR', ExtendedIUPACProtein())

Now using NCBI table 2, where TGA is not a stop codon:

>>> coding_dna.translate(table=2)
Seq('VAIVMGRWKGAR*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> coding_dna.translate(table=2, to_stop=True)
Seq('VAIVMGRWKGAR', ExtendedIUPACProtein())

In fact, GTG is an alternative start codon under NCBI table 2, meaning this sequence could be a complete CDS:

>>> coding_dna.translate(table=2, cds=True)
Seq('MAIVMGRWKGAR', ExtendedIUPACProtein())

It isn’t a valid CDS under NCBI table 1, due to both the start codon and also the in frame stop codons:

>>> coding_dna.translate(table=1, cds=True)
Traceback (most recent call last):
    ...
TranslationError: First codon 'GTG' is not a start codon

If the sequence has no in-frame stop codon, then the to_stop argument has no effect:

>>> coding_dna2 = Seq("TTGGCCATTGTAATGGGCCGC")
>>> coding_dna2.translate()
Seq('LAIVMGR', ExtendedIUPACProtein())
>>> coding_dna2.translate(to_stop=True)
Seq('LAIVMGR', ExtendedIUPACProtein())

NOTE - Ambiguous codons like “TAN” or “NNN” could be an amino acid or a stop codon. These are translated as “X”. Any invalid codon (e.g. “TA?” or “T-A”) will throw a TranslationError.

NOTE - Does NOT support gapped sequences.

NOTE - This does NOT behave like the python string’s translate method. For that use str(my_seq).translate(...) instead.

ungap(gap=None)

Return a copy of the sequence without the gap character(s).

The gap character can be specified in two ways - either as an explicit argument, or via the sequence’s alphabet. For example:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("-ATA--TGAAAT-TTGAAAA", generic_dna)
>>> my_dna
Seq('-ATA--TGAAAT-TTGAAAA', DNAAlphabet())
>>> my_dna.ungap("-")
Seq('ATATGAAATTTGAAAA', DNAAlphabet())

If the gap character is not given as an argument, it will be taken from the sequence’s alphabet (if defined). Notice that the returned sequence’s alphabet is adjusted since it no longer requires a gapped alphabet:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped, HasStopCodon
>>> my_pro = Seq("MVVLE=AD*", HasStopCodon(Gapped(IUPAC.protein, "=")))
>>> my_pro
Seq('MVVLE=AD*', HasStopCodon(Gapped(IUPACProtein(), '='), '*'))
>>> my_pro.ungap()
Seq('MVVLEAD*', HasStopCodon(IUPACProtein(), '*'))

Or, with a simpler gapped DNA example:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped
>>> my_seq = Seq("CGGGTAG=AAAAAA", Gapped(IUPAC.unambiguous_dna, "="))
>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap()
Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())

As long as it is consistent with the alphabet, although it is redundant, you can still supply the gap character as an argument to this method:

>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap("=")
Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())

However, if the gap character given as the argument disagrees with that declared in the alphabet, an exception is raised:

>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap("-")
Traceback (most recent call last):
   ...
ValueError: Gap '-' does not match '=' from alphabet

Finally, if a gap character is not supplied, and the alphabet does not define one, an exception is raised:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("ATA--TGAAAT-TTGAAAA", generic_dna)
>>> my_dna
Seq('ATA--TGAAAT-TTGAAAA', DNAAlphabet())
>>> my_dna.ungap()
Traceback (most recent call last):
   ...
ValueError: Gap character not given and not defined in alphabet
upper()

Returns an upper case copy of the sequence.

>>> from Bio.Alphabet import HasStopCodon, generic_protein
>>> from Bio.Seq import Seq
>>> my_seq = Seq("VHLTPeeK*", HasStopCodon(generic_protein))
>>> my_seq
Seq('VHLTPeeK*', HasStopCodon(ProteinAlphabet(), '*'))
>>> my_seq.lower()
Seq('vhltpeek*', HasStopCodon(ProteinAlphabet(), '*'))
>>> my_seq.upper()
Seq('VHLTPEEK*', HasStopCodon(ProteinAlphabet(), '*'))

This will adjust the alphabet if required. See also the lower method.

Core.Protein

class Fred2.Core.Protein.Protein(_seq, gene_id='unknown', transcript_id=None, orig_transcript=None, vars=None)

Bases: Fred2.Core.Base.MetadataLogger, Bio.Seq.Seq

Protein corresponding to exactly one transcript.

Note

For accessing and manipulating the sequence see also Bio.Seq.Seq (from Biopython)

back_transcribe()

Returns the DNA sequence from an RNA sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG",
...                     IUPAC.unambiguous_rna)
>>> messenger_rna
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())
>>> messenger_rna.back_transcribe()
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())

Trying to back-transcribe a protein or DNA sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.back_transcribe()
Traceback (most recent call last):
   ...
ValueError: Proteins cannot be back transcribed!
complement()

Returns the complement sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_dna = Seq("CCCCCGATAG", IUPAC.unambiguous_dna)
>>> my_dna
Seq('CCCCCGATAG', IUPACUnambiguousDNA())
>>> my_dna.complement()
Seq('GGGGGCTATC', IUPACUnambiguousDNA())

You can of course used mixed case sequences,

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("CCCCCgatA-GD", generic_dna)
>>> my_dna
Seq('CCCCCgatA-GD', DNAAlphabet())
>>> my_dna.complement()
Seq('GGGGGctaT-CH', DNAAlphabet())

Note in the above example, ambiguous character D denotes G, A or T so its complement is H (for C, T or A).

Trying to complement a protein sequence raises an exception.

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.complement()
Traceback (most recent call last):
   ...
ValueError: Proteins do not have complements!
count(sub, start=0, end=9223372036854775807)

Non-overlapping count method, like that of a python string.

This behaves like the python string method of the same name, which does a non-overlapping count!

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

e.g.

>>> from Bio.Seq import Seq
>>> my_seq = Seq("AAAATGA")
>>> print(my_seq.count("A"))
5
>>> print(my_seq.count("ATG"))
1
>>> print(my_seq.count(Seq("AT")))
1
>>> print(my_seq.count("AT", 2, -1))
1

HOWEVER, please note because python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:

>>> "AAAA".count("AA")
2
>>> print(Seq("AAAA").count("AA"))
2

An overlapping search would give the answer as three!

endswith(suffix, start=0, end=9223372036854775807)

Does the Seq end with the given suffix? Returns True/False.

This behaves like the python string method of the same name.

Return True if the sequence ends with the specified suffix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. suffix can also be a tuple of strings to try. e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.endswith("UUG")
True
>>> my_rna.endswith("AUG")
False
>>> my_rna.endswith("AUG", 0, 18)
True
>>> my_rna.endswith(("UCC", "UCA", "UUG"))
True
find(sub, start=0, end=9223372036854775807)

Find method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

Returns -1 if the subsequence is NOT found.

e.g. Locating the first typical start codon, AUG, in an RNA sequence:

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.find("AUG")
3
get_metadata(label, only_first=False)

Getter for the saved metadata with the key label

Parameters:
  • label (str) – key for the metadata that is inferred
  • only_first (bool) – true if only the the first element of the matadata list is to be returned
log_metadata(label, value)

Inserts a new metadata

Parameters:
  • label (str) – key for the metadata that will be added
  • value (list(object)) – any kindy of additional value that should be kept
lower()

Returns a lower case copy of the sequence.

This will adjust the alphabet if required. Note that the IUPAC alphabets are upper case only, and thus a generic alphabet must be substituted.

>>> from Bio.Alphabet import Gapped, generic_dna
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> my_seq = Seq("CGGTACGCTTATGTCACGTAG*AAAAAA", Gapped(IUPAC.unambiguous_dna, "*"))
>>> my_seq
Seq('CGGTACGCTTATGTCACGTAG*AAAAAA', Gapped(IUPACUnambiguousDNA(), '*'))
>>> my_seq.lower()
Seq('cggtacgcttatgtcacgtag*aaaaaa', Gapped(DNAAlphabet(), '*'))

See also the upper method.

lstrip(chars=None)

Returns a new Seq object with leading (left) end stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. print(my_seq.lstrip(“-”))

See also the strip and rstrip methods.

newid = <method-wrapper 'next' of itertools.count object>
reverse_complement()

Returns the reverse complement sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_dna = Seq("CCCCCGATAGNR", IUPAC.ambiguous_dna)
>>> my_dna
Seq('CCCCCGATAGNR', IUPACAmbiguousDNA())
>>> my_dna.reverse_complement()
Seq('YNCTATCGGGGG', IUPACAmbiguousDNA())

Note in the above example, since R = G or A, its complement is Y (which denotes C or T).

You can of course used mixed case sequences,

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("CCCCCgatA-G", generic_dna)
>>> my_dna
Seq('CCCCCgatA-G', DNAAlphabet())
>>> my_dna.reverse_complement()
Seq('C-TatcGGGGG', DNAAlphabet())

Trying to complement a protein sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.reverse_complement()
Traceback (most recent call last):
   ...
ValueError: Proteins do not have complements!
rfind(sub, start=0, end=9223372036854775807)

Find from right method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the last (right most) occurrence of substring argument sub in the (sub)sequence given by [start:end].

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

Returns -1 if the subsequence is NOT found.

e.g. Locating the last typical start codon, AUG, in an RNA sequence:

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.rfind("AUG")
15
rsplit(sep=None, maxsplit=-1)

Right split method, like that of a python string.

This behaves like the python string method of the same name.

Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done COUNTING FROM THE RIGHT. If maxsplit is omitted, all splits are made.

Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.

e.g. print(my_seq.rsplit(“*”,1))

See also the split method.

rstrip(chars=None)

Returns a new Seq object with trailing (right) end stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. Removing a nucleotide sequence’s polyadenylation (poly-A tail):

>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> my_seq = Seq("CGGTACGCTTATGTCACGTAGAAAAAA", IUPAC.unambiguous_dna)
>>> my_seq
Seq('CGGTACGCTTATGTCACGTAGAAAAAA', IUPACUnambiguousDNA())
>>> my_seq.rstrip("A")
Seq('CGGTACGCTTATGTCACGTAG', IUPACUnambiguousDNA())

See also the strip and lstrip methods.

split(sep=None, maxsplit=-1)

Split method, like that of a python string.

This behaves like the python string method of the same name.

Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If maxsplit is omitted, all splits are made.

Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.

e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_aa = my_rna.translate()
>>> my_aa
Seq('VMAIVMGR*KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_aa.split("*")
[Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
>>> my_aa.split("*", 1)
[Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))]

See also the rsplit method:

>>> my_aa.rsplit("*", 1)
[Seq('VMAIVMGR*KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
startswith(prefix, start=0, end=9223372036854775807)

Does the Seq start with the given prefix? Returns True/False.

This behaves like the python string method of the same name.

Return True if the sequence starts with the specified prefix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. prefix can also be a tuple of strings to try. e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.startswith("GUC")
True
>>> my_rna.startswith("AUG")
False
>>> my_rna.startswith("AUG", 3)
True
>>> my_rna.startswith(("UCC", "UCA", "UCG"), 1)
True
strip(chars=None)

Returns a new Seq object with leading and trailing ends stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. print(my_seq.strip(“-”))

See also the lstrip and rstrip methods.

tomutable()

Returns the full sequence as a MutableSeq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq("MKQHKAMIVALIVICITAVVAAL",
...              IUPAC.protein)
>>> my_seq
Seq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
>>> my_seq.tomutable()
MutableSeq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())

Note that the alphabet is preserved.

tostring()

Returns the full sequence as a python string (DEPRECATED).

You are now encouraged to use str(my_seq) instead of my_seq.tostring().

transcribe()

Returns the RNA sequence from a DNA sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG",
...                  IUPAC.unambiguous_dna)
>>> coding_dna
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
>>> coding_dna.transcribe()
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())

Trying to transcribe a protein or RNA sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.transcribe()
Traceback (most recent call last):
   ...
ValueError: Proteins cannot be transcribed!
translate(table='Standard', stop_symbol='*', to_stop=False, cds=False)

Turns a nucleotide sequence into a protein sequence. New Seq object.

This method will translate DNA or RNA sequences, and those with a nucleotide or generic alphabet. Trying to translate a protein sequence raises an exception.

Arguments:
  • table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). This defaults to the “Standard” table.
  • stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, “*”.
  • to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
  • cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.

e.g. Using the standard table:

>>> coding_dna = Seq("GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna.translate()
Seq('VAIVMGR*KGAR*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> coding_dna.translate(stop_symbol="@")
Seq('VAIVMGR@KGAR@', HasStopCodon(ExtendedIUPACProtein(), '@'))
>>> coding_dna.translate(to_stop=True)
Seq('VAIVMGR', ExtendedIUPACProtein())

Now using NCBI table 2, where TGA is not a stop codon:

>>> coding_dna.translate(table=2)
Seq('VAIVMGRWKGAR*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> coding_dna.translate(table=2, to_stop=True)
Seq('VAIVMGRWKGAR', ExtendedIUPACProtein())

In fact, GTG is an alternative start codon under NCBI table 2, meaning this sequence could be a complete CDS:

>>> coding_dna.translate(table=2, cds=True)
Seq('MAIVMGRWKGAR', ExtendedIUPACProtein())

It isn’t a valid CDS under NCBI table 1, due to both the start codon and also the in frame stop codons:

>>> coding_dna.translate(table=1, cds=True)
Traceback (most recent call last):
    ...
TranslationError: First codon 'GTG' is not a start codon

If the sequence has no in-frame stop codon, then the to_stop argument has no effect:

>>> coding_dna2 = Seq("TTGGCCATTGTAATGGGCCGC")
>>> coding_dna2.translate()
Seq('LAIVMGR', ExtendedIUPACProtein())
>>> coding_dna2.translate(to_stop=True)
Seq('LAIVMGR', ExtendedIUPACProtein())

NOTE - Ambiguous codons like “TAN” or “NNN” could be an amino acid or a stop codon. These are translated as “X”. Any invalid codon (e.g. “TA?” or “T-A”) will throw a TranslationError.

NOTE - Does NOT support gapped sequences.

NOTE - This does NOT behave like the python string’s translate method. For that use str(my_seq).translate(...) instead.

ungap(gap=None)

Return a copy of the sequence without the gap character(s).

The gap character can be specified in two ways - either as an explicit argument, or via the sequence’s alphabet. For example:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("-ATA--TGAAAT-TTGAAAA", generic_dna)
>>> my_dna
Seq('-ATA--TGAAAT-TTGAAAA', DNAAlphabet())
>>> my_dna.ungap("-")
Seq('ATATGAAATTTGAAAA', DNAAlphabet())

If the gap character is not given as an argument, it will be taken from the sequence’s alphabet (if defined). Notice that the returned sequence’s alphabet is adjusted since it no longer requires a gapped alphabet:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped, HasStopCodon
>>> my_pro = Seq("MVVLE=AD*", HasStopCodon(Gapped(IUPAC.protein, "=")))
>>> my_pro
Seq('MVVLE=AD*', HasStopCodon(Gapped(IUPACProtein(), '='), '*'))
>>> my_pro.ungap()
Seq('MVVLEAD*', HasStopCodon(IUPACProtein(), '*'))

Or, with a simpler gapped DNA example:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped
>>> my_seq = Seq("CGGGTAG=AAAAAA", Gapped(IUPAC.unambiguous_dna, "="))
>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap()
Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())

As long as it is consistent with the alphabet, although it is redundant, you can still supply the gap character as an argument to this method:

>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap("=")
Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())

However, if the gap character given as the argument disagrees with that declared in the alphabet, an exception is raised:

>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap("-")
Traceback (most recent call last):
   ...
ValueError: Gap '-' does not match '=' from alphabet

Finally, if a gap character is not supplied, and the alphabet does not define one, an exception is raised:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("ATA--TGAAAT-TTGAAAA", generic_dna)
>>> my_dna
Seq('ATA--TGAAAT-TTGAAAA', DNAAlphabet())
>>> my_dna.ungap()
Traceback (most recent call last):
   ...
ValueError: Gap character not given and not defined in alphabet
upper()

Returns an upper case copy of the sequence.

>>> from Bio.Alphabet import HasStopCodon, generic_protein
>>> from Bio.Seq import Seq
>>> my_seq = Seq("VHLTPeeK*", HasStopCodon(generic_protein))
>>> my_seq
Seq('VHLTPeeK*', HasStopCodon(ProteinAlphabet(), '*'))
>>> my_seq.lower()
Seq('vhltpeek*', HasStopCodon(ProteinAlphabet(), '*'))
>>> my_seq.upper()
Seq('VHLTPEEK*', HasStopCodon(ProteinAlphabet(), '*'))

This will adjust the alphabet if required. See also the lower method.

Core.Result

class Fred2.Core.Result.AResult(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: pandas.core.frame.DataFrame

A AResult object is a pandas.DataFrame with with multi-indexing.

This class is used as interface and can be extended with custom short-cuts for the sometimes often tedious calls in pandas

T

Transpose index and columns

abs()

Return an object with absolute value taken. Only applicable to objects that are all numeric

abs: type of caller

add(other, axis='columns', level=None, fill_value=None)

Binary operator add with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

add_prefix(prefix)

Concatenate prefix string with panel items names.

prefix : string

with_prefix : type of caller

add_suffix(suffix)

Concatenate suffix string with panel items names

suffix : string

with_suffix : type of caller

align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)

Align two object on their axes with the specified join method for each axis Index

other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None

Align on index (0), columns (1), or both (None)
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
copy : boolean, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value

method : str, default None limit : int, default None fill_axis : {0, 1}, default 0

Filling axis, method and limit
(left, right) : (type of input, type of other)
Aligned objects
all(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

any(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

append(other, ignore_index=False, verify_integrity=False)

Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.

other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False

If True do not use the index labels. Useful for gluing together record arrays
verify_integrity : boolean, default False
If True, raise ValueError on creating index with duplicates

If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged

appended : DataFrame

apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

func : function
Function to apply to each column/row
axis : {0, 1}
  • 0 : apply function to each column
  • 1 : apply function to each row
broadcast : boolean, default False
For aggregation functions, return object of same size with values propagated
reduce : boolean or None, default None
Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
raw : boolean, default False
If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
args : tuple
Positional arguments to pass to function in addition to the array/series

Additional keyword arguments will be passed as keywords to the function

>>> df.apply(numpy.sqrt) # returns DataFrame
>>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
>>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)

DataFrame.applymap: For elementwise operations

applied : Series or DataFrame

applymap(func)

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

func : function
Python function, returns a single value from a single value

applied : DataFrame

DataFrame.apply : For operations on rows/columns

as_blocks(columns=None)

Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.

are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
as_matrix)
columns : array-like
Specific column order

values : a list of Object

as_matrix(columns=None)

Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtype will be a lower-common-denominator dtype (implicit

upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks

e.g. if the dtypes are float16,float32 -> float32
float16,float32,float64 -> float64 int32,uint8 -> int32
values : ndarray
If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
asfreq(freq, method=None, how=None, normalize=False)

Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.

freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method
how : {‘start’, ‘end’}, default end
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight

converted : type of caller

astype(dtype, copy=True, raise_on_error=True)

Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)

dtype : numpy.dtype or Python type raise_on_error : raise on invalid input

casted : type of caller

at
at_time(time, asof=False)

Select values at particular time of day (e.g. 9:30AM)

time : datetime.time or string

values_at_time : type of caller

axes
between_time(start_time, end_time, include_start=True, include_end=True)

Select values between particular times of the day (e.g., 9:00-9:30 AM)

start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True

values_between_time : type of caller

bfill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’bfill’)

blocks

Internal property, property synonym for as_blocks()

bool()

Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False

Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)

Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns

data : DataFrame column : column names or list of names, or vector

Can be any valid input to groupby
by : string or sequence
Column in the DataFrame to group by

ax : matplotlib axis object, default None fontsize : int or string rot : int, default None

Rotation for ticks
grid : boolean, default None (matlab style default)
Axis grid lines

ax : matplotlib.axes.AxesSubplot

clip(lower=None, upper=None, out=None)

Trim values at input threshold(s)

lower : float, default None upper : float, default None

clipped : Series

clip_lower(threshold)

Return copy of the input with values below given value truncated

clip

clipped : same type as input

clip_upper(threshold)

Return copy of input with values above given value truncated

clip

clipped : same type as input

combine(other, func, fill_value=None, overwrite=True)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame

result : DataFrame

combineAdd(other)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combineMult(other)

Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combine_first(other)

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

other : DataFrame

a’s values prioritized, use values from b to fill holes:

>>> a.combine_first(b)

combined : DataFrame

compound(axis=None, skipna=None, level=None, **kwargs)

Return the compound percentage of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

compounded : Series or DataFrame (if level specified)

consolidate(inplace=False)

Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user

inplace : boolean, default False
If False return new object, otherwise modify existing object

consolidated : type of caller

convert_objects(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)

Attempt to infer better dtype for object columns

convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
force conversion (and non-convertibles get NaT)
convert_numeric : if True attempt to coerce to numbers (including
strings), non-convertibles get NaN
convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
force conversion (and non-convertibles get NaT)

copy : Boolean, if True, return copy, default is True

converted : asm as input object

copy(deep=True)

Make a copy of this object

deep : boolean, default True
Make a deep copy, i.e. also copy data

copy : type of caller

corr(method='pearson', min_periods=1)

Compute pairwise correlation of columns, excluding NA/null values

method : {‘pearson’, ‘kendall’, ‘spearman’}
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation

y : DataFrame

corrwith(other, axis=0, drop=False)

Compute pairwise correlation between rows or columns of two DataFrame objects.

other : DataFrame axis : {0, 1}

0 to compute column-wise, 1 for row-wise
drop : boolean, default False
Drop missing indices from result, default returns union of all

correls : Series

count(axis=0, level=None, numeric_only=False)

Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)

axis : {0, 1}
0 for row-wise, 1 for column-wise
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
numeric_only : boolean, default False
Include only float, int, boolean data

count : Series (or DataFrame if level specified)

cov(min_periods=None)

Compute pairwise covariance of columns, excluding NA/null values

min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result.

y : DataFrame

y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).

cummax(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative max over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

max : Series

cummin(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative min over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

min : Series

cumprod(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative prod over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

prod : Series

cumsum(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative sum over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

sum : Series

delevel(*args, **kwargs)
describe(percentile_width=50)

Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles

percentile_width : float, optional
width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75

DataFrame of summary statistics

diff(periods=1)

1st discrete difference of object

periods : int, default 1
Periods to shift for forming difference

diffed : DataFrame

div(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

divide(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

dot(other)

Matrix multiplication with DataFrame or Series objects

other : DataFrame or Series

dot_product : DataFrame or Series

drop(labels, axis=0, level=None, inplace=False, **kwargs)

Return new object with labels in requested axis removed

labels : single label or list-like axis : int or axis name level : int or name, default None

For MultiIndex
inplace : bool, default False
If True, do operation inplace and return None.

dropped : type of caller

drop_duplicates(cols=None, take_last=False, inplace=False)

Return DataFrame with duplicate rows removed, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

deduplicated : DataFrame

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Return object with labels on given axis omitted where alternately any or all of the data are missing

axis : {0, 1}, or tuple/list thereof
Pass tuple or list to drop on multiple axes
how : {‘any’, ‘all’}
  • any : if any NA values are present, drop that label
  • all : if all values are NA, drop that label
thresh : int, default None
int value : require that many non-NA values
subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
inplace : boolean, defalt False
If True, do operation inplace and return None.

dropped : DataFrame

dtypes

Return the dtypes in this object

duplicated(cols=None, take_last=False)

Return boolean Series denoting duplicate rows, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row

duplicated : Series

empty

True if NDFrame is entirely empty [no items]

eq(other, axis='columns', level=None)

Wrapper for flexible comparison methods eq

equals(other)

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

eval(expr, **kwargs)

Evaluate an expression in the context of the calling DataFrame instance.

expr : string
The expression string to evaluate.
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

ret : ndarray, scalar, or pandas object

pandas.DataFrame.query pandas.eval

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.eval('a + b')
>>> df.eval('c=a + b')
ffill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’ffill’)

fillna(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)

Fill NA/NaN values using the specified method

method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
value : scalar, dict, or Series
Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
inplace : boolean, default False
If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
limit : int, default None
Maximum size gap to forward or backward fill
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)

reindex, asfreq

filled : same type as caller

filter(items=None, like=None, regex=None, axis=None)

Restrict the info axis to set of items or wildcard

items : list-like
List of info axis to restrict to (must not all be present)
like : string
Keep info axis where “arg in col == True”
regex : string (regular expression)
Keep info axis with re.search(regex, col) == True

Arguments are mutually exclusive, but this is not checked for

filter_result(expressions)

Filter result based on a list of expressions

Parameters:expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold)
Returns:A new filtered AResult object
Return type:AResult
first(offset)

Convenience method for subsetting initial periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘10D’) -> First 10 days

subset : type of caller

first_valid_index()

Return label for first non-NA/null value

floordiv(other, axis='columns', level=None, fill_value=None)

Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

from_csv(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)

Read delimited file into DataFrame

path : string file path or file handle / StringIO header : int, default 0

Row to use at header (skip prior rows)
sep : string, default ‘,’
Field delimiter
index_col : int or sequence, default 0
Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
parse_dates : boolean, default True
Parse dates. Different default from read_table
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
infer_datetime_format: boolean, default False
If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.

Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data

y : DataFrame

from_dict(data, orient='columns', dtype=None)

Construct DataFrame from dict of array-like or dicts

data : dict
{field : array-like} or {field : dict}
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

DataFrame

from_items(items, columns=None, orient='columns')

Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.

items : sequence of (key, value) pairs
Values should be arrays or Series.
columns : sequence of column labels, optional
Must be passed if orient=’index’.
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.

frame : DataFrame

from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

Convert structured or record ndarray to DataFrame

data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like

Field of array to use as the index, alternately a specific set of input labels to use
exclude : sequence, default None
Columns or fields to exclude
columns : sequence, default None
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
coerce_float : boolean, default False
Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

df : DataFrame

ftypes

Return the ftypes (indication of sparse/dense and dtype) in this object.

ge(other, axis='columns', level=None)

Wrapper for flexible comparison methods ge

get(key, default=None)

Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found

key : object

value : type of items contained in object

get_dtype_counts()

Return the counts of dtypes in this object

get_ftype_counts()

Return the counts of ftypes in this object

get_value(index, col)

Quickly retrieve single value at passed column and index

index : row label col : column label

value : scalar value

get_values()

same as values (but handles sparseness conversions)

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)

Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns

by : mapping function / list of functions, dict, Series, or tuple /
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

axis : int, default 0 level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels
as_index : boolean, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
sort : boolean, default True
Sort group keys. Get better performance by turning this off
group_keys : boolean, default True
When calling apply, add group keys to index to identify pieces
squeeze : boolean, default False
reduce the dimensionaility of the return type if possible, otherwise return a consistent type

# DataFrame result >>> data.groupby(func, axis=0).mean()

# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()

# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()

GroupBy object

gt(other, axis='columns', level=None)

Wrapper for flexible comparison methods gt

head(n=5)

Returns first n rows

hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)

Draw histogram of the DataFrame’s series using matplotlib / pylab.

data : DataFrame column : string or sequence

If passed, will be used to limit data to a subset of columns
by : object, optional
If passed, then used to form histograms for separate groups
grid : boolean, default True
Whether to show axis grid lines
xlabelsize : int, default None
If specified changes the x-axis label size
xrot : float, default None
rotation of x axis labels
ylabelsize : int, default None
If specified changes the y-axis label size
yrot : float, default None
rotation of y axis labels

ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple

The size of the figure to create in inches by default

layout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments

To be passed to hist function
iat
icol(i)
idxmax(axis=0, skipna=True)

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be first index.

idxmax : Series

This method is the DataFrame version of ndarray.argmax.

Series.idxmax

idxmin(axis=0, skipna=True)

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

idxmin : Series

This method is the DataFrame version of ndarray.argmin.

Series.idxmin

iget_value(i, j)
iloc
info(verbose=True, buf=None, max_cols=None)

Concise summary of a DataFrame.

verbose : boolean, default True
If False, don’t print column count summary

buf : writable buffer, defaults to sys.stdout max_cols : int, default None

Determines whether full summary or short summary is printed
insert(loc, column, value, allow_duplicates=False)

Insert column into DataFrame at specified location.

If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.

loc : int
Must have 0 <= loc <= len(columns)

column : object value : int, Series, or array-like

interpolate(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)

Interpolate values according to different methods.

method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
  • ‘linear’: ignore the index and treat the values as equally spaced. default
  • ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
  • ‘index’: use the actual numerical values of the index
  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill.
inplace : bool, default False
Update the NDFrame in place if possible.
downcast : optional, ‘infer’ or None, defaults to ‘infer’
Downcast dtypes if possible.

Series or DataFrame of same shape interpolated at the NaNs

reindex, replace, fillna

# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64

irow(i, copy=False)
is_copy = None
isin(values)

Return boolean DataFrame showing whether each element in the DataFrame is contained in values.

values : iterable, Series, DataFrame or dictionary
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

DataFrame of booleans

When values is a list:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> df.isin([1, 3, 12, 'a'])
       A      B
0   True   True
1  False  False
2   True  False

When values is a dict:

>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
>>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
       A      B
0   True  False  # Note that B didn't match the 1 here.
1  False   True
2   True   True

When values is a Series or DataFrame:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']})
>>> df.isin(other)
       A      B
0   True  False
1  False  False  # Column A in `other` has a 3, but not at index 1.
2   True   True
isnull()

Return a boolean same-sized object indicating if the values are null

iteritems()

Iterator over (column, series) pairs

iterkv(*args, **kwargs)

iteritems alias used to get around 2to3. Deprecated

iterrows()

Iterate over rows of DataFrame as (index, Series) pairs.

  • iterrows does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

    >>> df = DataFrame([[1, 1.0]], columns=['x', 'y'])
    >>> row = next(df.iterrows())[1]
    >>> print(row['x'].dtype)
    float64
    >>> print(df['x'].dtype)
    int64
    
it : generator
A generator that iterates over the rows of the frame.
itertuples(index=True)

Iterate over rows of DataFrame as tuples, with index value as first element of the tuple

ix
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

other : DataFrame, Series with name field set, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
on : column name, tuple/list of column names, or array-like
Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
how : {‘left’, ‘right’, ‘outer’, ‘inner’}

How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise

  • left: use calling frame’s index
  • right: use input frame’s index
  • outer: form union of indexes
  • inner: use intersection of indexes
lsuffix : string
Suffix to use from left frame’s overlapping columns
rsuffix : string
Suffix to use from right frame’s overlapping columns
sort : boolean, default False
Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame

on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects

joined : DataFrame

keys()

Get the ‘info axis’ (see Indexing for more)

This is index for Series, columns for DataFrame and major_axis for Panel.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

last(offset)

Convenience method for subsetting final periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘5M’) -> Last 5 months

subset : type of caller

last_valid_index()

Return label for last non-NA/null value

le(other, axis='columns', level=None)

Wrapper for flexible comparison methods le

load(path)

Deprecated. Use read_pickle instead.

loc
lookup(row_labels, col_labels)

Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

row_labels : sequence
The row labels to use for lookup
col_labels : sequence
The column labels to use for lookup

Akin to:

result = []
for row, col in zip(row_labels, col_labels):
    result.append(df.get_value(row, col))
values : ndarray
The found values
lt(other, axis='columns', level=None)

Wrapper for flexible comparison methods lt

mad(axis=None, skipna=None, level=None, **kwargs)

Return the mean absolute deviation of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mad : Series or DataFrame (if level specified)

mask(cond)

Returns copy whose values are replaced with nan if the inverted condition is True

cond : boolean NDFrame or array

wh: same as input

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the maximum of the values in the object. If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

max : Series or DataFrame (if level specified)

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mean : Series or DataFrame (if level specified)

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

median : Series or DataFrame (if level specified)

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)

Merge DataFrame objects by performing a database-style join operation by columns or indexes.

If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’

  • left: use only keys from left frame (SQL: left outer join)
  • right: use only keys from right frame (SQL: right outer join)
  • outer: use union of keys from both frames (SQL: full outer join)
  • inner: use intersection of keys from both frames (SQL: inner join)
on : label or list
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per left_on docs
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
right_index : boolean, default False
Use the index from the right DataFrame as the join key. Same caveats as left_index
sort : boolean, default False
Sort the join keys lexicographically in the result DataFrame
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right side, respectively
copy : boolean, default True
If False, do not copy data unnecessarily
>>> A              >>> B
    lkey value         rkey value
0   foo  1         0   foo  5
1   bar  2         1   bar  6
2   baz  3         2   qux  7
3   foo  4         3   bar  8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer')
   lkey  value_x  rkey  value_y
0  bar   2        bar   6
1  bar   2        bar   8
2  baz   3        NaN   NaN
3  foo   1        foo   5
4  foo   4        foo   5
5  NaN   NaN      qux   7

merged : DataFrame

merge_results(others)

Merges results of the same type and returns a merged result

Parameters:others (list(AResult)/AResult) – A (list of) AResult object(s) of the same class
Returns:A new merged AResult object
Return type:AResult
min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the minimum of the values in the object. If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

min : Series or DataFrame (if level specified)

mod(other, axis='columns', level=None, fill_value=None)

Binary operator mod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

mode(axis=0, numeric_only=False)

Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.

axis : {0, 1, ‘index’, ‘columns’} (default 0)
  • 0/’index’ : get mode of each column
  • 1/’columns’ : get mode of each row
numeric_only : boolean, default False
if True, only apply to numeric columns

modes : DataFrame (sorted)

mul(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

multiply(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

ndim

Number of axes / array dimensions

ne(other, axis='columns', level=None)

Wrapper for flexible comparison methods ne

notnull()

Return a boolean same-sized object indicating if the values are not null

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwds)

Percent change over given number of periods

periods : int, default 1
Periods to shift for forming percent change
fill_method : str, default ‘pad’
How to handle NAs before computing percent changes
limit : int, default None
The number of consecutive NAs to fill before stopping
freq : DateOffset, timedelta, or offset alias string, optional
Increment to use from time series API (e.g. ‘M’ or BDay())

chg : same type as caller

pivot(index=None, columns=None, values=None)

Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)

index : string or object
Column name to use to make new frame’s index
columns : string or object
Column name to use to make new frame’s columns
values : string or object, optional
Column name to use for populating new frame’s values

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods

>>> df
    foo   bar  baz
0   one   A    1.
1   one   B    2.
2   one   C    3.
3   two   A    4.
4   two   B    5.
5   two   C    6.
>>> df.pivot('foo', 'bar', 'baz')
     A   B   C
one  1   2   3
two  4   5   6
>>> df.pivot('foo', 'bar')['baz']
     A   B   C
one  1   2   3
two  4   5   6
pivoted : DataFrame
If no values column specified, will have hierarchically indexed columns
pivot_table(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame

data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on

Keys to group on the x-axis of the pivot table
cols : list of column names or arrays to group on
Keys to group on the y-axis of the pivot table
aggfunc : function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
fill_value : scalar, default None
Value to replace missing values with
margins : boolean, default False
Add all row / columns (e.g. for subtotal / grand totals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', rows=['A', 'B'],
...                     cols=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

table : DataFrame

plot(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)

Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.

frame : DataFrame x : label or position, default None y : label or position, default None

Allows plotting of one column versus another
subplots : boolean, default False
Make separate subplots for each time series
sharex : boolean, default True
In case subplots=True, share x axis
sharey : boolean, default False
In case subplots=True, share y axis
use_index : boolean, default True
Use index as ticks for x axis
stacked : boolean, default False
If True, create stacked bar plot. Only valid for DataFrame input
sort_columns: boolean, default False
Sort column names to determine plot ordering
title : string
Title to use for the plot
grid : boolean, default None (matlab style default)
Axis grid lines
legend : boolean, default True
Place legend on axis subplots

ax : matplotlib axis object, default None style : list or dict

matplotlib line style per column
kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
logx : boolean, default False
For line plots, use log scaling on x axis
logy : boolean, default False
For line plots, use log scaling on y axis
xticks : sequence
Values to use for the xticks
yticks : sequence
Values to use for the yticks

xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None

Rotation for ticks
secondary_y : boolean or sequence, default False
Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
mark_right: boolean, default True
When using a secondary_y axis, should the legend label the axis of the various columns automatically
colormap : str or matplotlib colormap object, default None
Colormap to select colors from. If string, load colormap with that name from matplotlib.
kwds : keywords
Options to pass to matplotlib plotting method

ax_or_axes : matplotlib.AxesSubplot or list of them

pop(item)

Return item and drop from frame. Raise KeyError if not found.

pow(other, axis='columns', level=None, fill_value=None)

Binary operator pow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

prod(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

quantile(q=0.5, axis=0, numeric_only=True)

Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats

q : quantile, default 0.5 (50% quantile)
0 <= q <= 1
axis : {0, 1}
0 for row-wise, 1 for column-wise

quantiles : Series

query(expr, **kwargs)

Query the columns of a frame with a boolean expression.

expr : string
The query string to evaluate. The result of the evaluation of this expression is first passed to loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to __getitem__().
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

q : DataFrame or Series

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The index and columns attributes of the DataFrame instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for this variable, and you can also use the name of the index to identify it in a query.

For further details and examples see the query documentation in indexing.

pandas.eval DataFrame.eval

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.query('a > b')
>>> df[df.a > df.b]  # same result as the previous expression
radd(other, axis='columns', level=None, fill_value=None)

Binary operator radd with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)

Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values

axis : {0, 1}, default 0
Ranks over columns (0) or rows (1)
numeric_only : boolean, default None
Include only float, int, boolean data
method : {‘average’, ‘min’, ‘max’, ‘first’}
  • average: average rank of group
  • min: lowest rank in group
  • max: highest rank in group
  • first: ranks assigned in order they appear in the array
na_option : {‘keep’, ‘top’, ‘bottom’}
  • keep: leave NA values where they are
  • top: smallest rank if ascending
  • bottom: smallest rank if descending
ascending : boolean, default True
False for ranks by high (1) to low (N)

ranks : DataFrame

rdiv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

reindex(index=None, columns=None, **kwargs)

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index, columns : array-like, optional (can be specified in order, or as
keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value
limit : int, default None
Maximum size gap to forward or backward fill
takeable : boolean, default False
treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])

reindexed : DataFrame

reindex_axis(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)

Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index : array-like, optional
New labels / index to conform to. Preferably an Index object to avoid duplicating data

axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
limit : int, default None
Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)

reindex, reindex_like

reindexed : DataFrame

reindex_like(other, method=None, copy=True, limit=None)

return an object with matching indicies to myself

other : Object method : string or None copy : boolean, default True limit : int, default None

Maximum size gap to forward or backward fill
Like calling s.reindex(index=other.index, columns=other.columns,
method=...)

reindexed : same as input

rename(index=None, columns=None, **kwargs)

Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

index, columns : dict-like or function, optional
Transformation to apply to that axis values
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is ignored.

renamed : DataFrame (new object)

rename_axis(mapper, axis=0, copy=True, inplace=False)

Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True

Also copy underlying data

inplace : boolean, default False

renamed : type of caller

reorder_levels(order, axis=0)

Rearrange index levels using input order. May not drop or duplicate levels

order : list of int or list of str
List representing new level order. Reference level by number (position) or by key (label).
axis : int
Where to reorder levels.

type of caller (new object)

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)

Replace values given in ‘to_replace’ with ‘value’.

to_replace : str, regex, list, dict, Series, numeric, or None

  • str or regex:

    • str: string exactly matching to_replace will be replaced with value
    • regex: regexs matching to_replace will be replaced with value
  • list of str, regex, or numeric:

    • First, if to_replace and value are both lists, they must be the same length.
    • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
    • str and regex rules apply as above.
  • dict:

    • Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
    • Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
  • None:

    • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

value : scalar, dict, list, str, regex, default None
Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
inplace : boolean, default False
If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill
regex : bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Otherwise, to_replace must be None because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions.
method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
The method to use when for replacement, when to_replace is a list.

NDFrame.reindex NDFrame.asfreq NDFrame.fillna

filled : NDFrame

AssertionError
  • If regex is not a bool and to_replace is not None.
TypeError
  • If to_replace is a dict and value is not a list, dict, ndarray, or Series
  • If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
ValueError
  • If to_replace and value are list s or ndarray s, but they are not the same length.
  • Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.
  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)

Convenience method for frequency conversion and resampling of regular time-series data.

rule : string
the offset string or object representing target conversion
how : string
method for down- or re-sampling, default to ‘mean’ for downsampling

axis : int, optional, default 0 fill_method : string, default None

fill_method for upsampling
closed : {‘right’, ‘left’}
Which side of bin interval is closed
label : {‘right’, ‘left’}
Which bin edge label to label bucket with

convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta

Adjust the resampled time labels
limit : int, default None
Maximum size gap to when reindexing with fill_method
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.

level : int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default
drop : boolean, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
col_level : int or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fill : object, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

resetted : DataFrame

rfloordiv(other, axis='columns', level=None, fill_value=None)

Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmod(other, axis='columns', level=None, fill_value=None)

Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmul(other, axis='columns', level=None, fill_value=None)

Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rpow(other, axis='columns', level=None, fill_value=None)

Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rsub(other, axis='columns', level=None, fill_value=None)

Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rtruediv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

save(path)

Deprecated. Use to_pickle instead

select(crit, axis=0)

Return data corresponding to axis labels matching criteria

crit : function
To be called on each index (label). Should return True or False

axis : int

selection : type of caller

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.

keys : column label or list of column labels / arrays drop : boolean, default True

Delete columns to be used as the new index
append : boolean, default False
Whether to append columns to existing index
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

dataframe : DataFrame

set_value(index, col, value)

Put single value at passed column and index

index : row label col : column label value : scalar value

frame : DataFrame
If label pair is contained, will be reference to calling DataFrame, otherwise a new object
shape
shift(periods=1, freq=None, axis=0, **kwds)

Shift index by desired number of periods with an optional time freq

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from datetools module or time rule (e.g. ‘EOM’)

If freq is specified then the index values are shifted but the data if not realigned

shifted : same type as caller

skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased skew over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

skew : Series or DataFrame (if level specified)

sort(columns=None, column=None, axis=0, ascending=True, inplace=False)

Sort DataFrame either by labels (along either axis) or by the values in column(s)

columns : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
axis : {0, 1}
Sort index/rows versus columns
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])

sorted : DataFrame

sort_index(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')

Sort DataFrame either by labels (along either axis) or by the values in a column

axis : {0, 1}
Sort index/rows versus columns
by : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])

sorted : DataFrame

sortlevel(level=0, axis=0, ascending=True, inplace=False)

Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)

level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False

Sort the DataFrame without creating a new instance

sorted : DataFrame

squeeze()

squeeze length 1 dimensions

stack(level=-1, dropna=True)

Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.

level : int, string, or list of these, default last level
Level(s) to stack, can pass level name
dropna : boolean, default True
Whether to drop rows in the resulting Frame/Series with no valid values
>>> s
     a   b
one  1.  2.
two  3.  4.
>>> s.stack()
one a    1
    b    2
two a    3
    b    4

stacked : DataFrame or Series

std(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased standard deviation over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

stdev : Series or DataFrame (if level specified)

sub(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

subtract(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the sum of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

sum : Series or DataFrame (if level specified)

swapaxes(axis1, axis2, copy=True)

Interchange axes and swap values axes appropriately

y : same as input

swaplevel(i, j, axis=0)

Swap levels i and j in a MultiIndex on a particular axis

i, j : int, string (can be mixed)
Level of index to be swapped. Can pass level name as string.

swapped : type of caller (new object)

tail(n=5)

Returns last n rows

take(indices, axis=0, convert=True, is_copy=True)

Analogous to ndarray.take

indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy

taken : type of caller

to_clipboard(excel=None, sep=None, **kwargs)

Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.

excel : boolean, defaults to True
if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard

sep : optional, defaults to tab other keywords are passed to to_csv

Requirements for your platform
  • Linux: xclip, or xsel (with gtk or PyQt4 modules)
  • Windows: none
  • OS X: none
to_csv(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)

Write DataFrame to a comma-separated values (csv) file

path_or_buf : string or file handle / StringIO
File path
sep : character, default ”,”
Field delimiter for the output file.
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
nanRep : None
deprecated, use na_rep
mode : str
Python write mode, default ‘w’
encoding : string, optional
a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
line_terminator : string, default ‘\\n’
The newline character or character sequence to use in the output file
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL
chunksize : int or None
rows to write at a time
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
date_format : string, default None
Format string for datetime objects.
to_dense()

Return dense representation of NDFrame (as opposed to sparse)

to_dict(outtype='dict')

Convert DataFrame to dictionary.

outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.

result : dict like {column -> {index -> value}}

to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)

Write DataFrame to a excel sheet

excel_writer : string or ExcelWriter object
File path or existing ExcelWriter
sheet_name : string, default ‘Sheet1’
Name of sheet which will contain DataFrame
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
startow :
upper left cell row to dump data frame
startcol :
upper left cell column to dump data frame
engine : string, default None
write engine to use - you can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
merge_cells : boolean, default True
Write MultiIndex and Hierarchical Rows as merged cells.

If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:

>>> writer = ExcelWriter('output.xlsx')
>>> df1.to_excel(writer,'Sheet1')
>>> df2.to_excel(writer,'Sheet2')
>>> writer.save()
to_gbq(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)

Write a DataFrame to a Google BigQuery table.

If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.

destination_table : string
name of table to be written, in the form ‘dataset.tablename’
schema : sequence (optional)
list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
col_order : sequence (optional)
order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.

kwargs are passed to the Client constructor

SchemaMissing :
Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
TableExists :
Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
InvalidSchema :
Raised if the ‘schema’ parameter does not match the provided DataFrame
to_hdf(path_or_buf, key, **kwargs)

activate the HDFStore

path_or_buf : the path (string) or buffer to put the store key : string

indentifier for the group in the store

mode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’

'r'
Read-only; no data can be modified.
'w'
Write; a new file is created (an existing file with the same name would be deleted).
'a'
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
It is similar to 'a', but the file must already exist.
format : ‘fixed(f)|table(t)’, default is ‘fixed’
fixed(f) : Fixed format
Fast writing/reading. Not-appendable, nor searchable
table(t) : Table format
Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
append : boolean, default False
For Table formats, append the input data to the existing
complevel : int, 1-9, default 0
If a complib is specified compression will be applied where possible
complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
If complevel is > 0 apply compression to objects written in the store wherever possible
fletcher32 : bool, default False
If applying compression use the fletcher32 checksum
to_html(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame as an HTML table.

to_html-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table
escape : boolean, default True
Convert the characters <, >, and & to HTML-safe sequences.=
max_rows : int, optional
Maximum number of rows to show before truncating. If None, show all.
max_cols : int, optional
Maximum number of columns to show before truncating. If None, show all.
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_json(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

path_or_buf : the path or buffer to write the result string
if this is None, return a StringIO of the converted string

orient : string

  • Series
    • default is ‘index’
    • allowed values are: {‘split’,’records’,’index’}
  • DataFrame
    • default is ‘columns’
    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
  • The format of the JSON string
    • split : dict like {index -> [index], columns -> [columns], data -> [values]}
    • records : list like [{column -> value}, ... , {column -> value}]
    • index : dict like {index -> {column -> value}}
    • columns : dict like {column -> {index -> value}}
    • values : just the values array
date_format : {‘epoch’, ‘iso’}
Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
double_precision : The number of decimal places to use when encoding
floating point values, default 10.

force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.

same type as input object with filtered info axis

to_latex(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)

Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.

to_latex-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_msgpack(path_or_buf=None, **kwargs)

msgpack (serialize) object to input file path

THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.

path : string File path, buffer-like, or None
if None, return generated string
append : boolean whether to append to an existing msgpack
(default is False)
compress : type of compressor (zlib or blosc), default to None (no
compression)
to_panel()

Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.

Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later

panel : Panel

to_period(freq=None, axis=0, copy=True)

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)

freq : string, default axis : {0, 1}, default 0

The axis to convert (the index by default)
copy : boolean, default True
If False then underlying input data is not copied

ts : TimeSeries with PeriodIndex

to_pickle(path)

Pickle (serialize) object to input file path

path : string
File path
to_records(index=True, convert_datetime64=True)

Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested

index : boolean, default True
Include index in resulting record array, stored in ‘index’ field
convert_datetime64 : boolean, default True
Whether to convert the index to datetime.datetime if it is a DatetimeIndex

y : recarray

to_sparse(fill_value=None, kind='block')

Convert to SparseDataFrame

fill_value : float, default NaN kind : {‘block’, ‘integer’}

y : SparseDataFrame

to_sql(name, con, flavor='sqlite', if_exists='fail', **kwargs)

Write records stored in a DataFrame to a SQL database.

name : str
Name of SQL table

conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’

  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.
to_stata(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)

A class for writing Stata binary dta files from array-like objects

fname : file path or buffer
Where to save the dta file.
convert_dates : dict
Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
encoding : str
Default is latin-1. Note that Stata does not support unicode.
byteorder : str
Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data)
>>> writer.write_file()

Or with dates

>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'})
>>> writer.write_file()
to_string(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame to a console-friendly tabular output.

frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_timestamp(freq=None, how='start', axis=0, copy=True)

Cast to DatetimeIndex of timestamps, at beginning of period

freq : string, default frequency of PeriodIndex
Desired frequency
how : {‘s’, ‘e’, ‘start’, ‘end’}
Convention for converting period to timestamp; start of period vs. end
axis : {0, 1} default 0
The axis to convert (the index by default)
copy : boolean, default True
If false then underlying input data is not copied

df : DataFrame with DatetimeIndex

to_wide(*args, **kwargs)
transpose()

Transpose index and columns

truediv(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

truncate(before=None, after=None, axis=None, copy=True)

Truncates a sorted NDFrame before and/or after some particular dates.

before : date
Truncate before date
after : date
Truncate after date

axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,

return a copy of the truncated section

truncated : type of caller

tshift(periods=1, freq=None, axis=0, **kwds)

Shift the time index, using the index’s frequency if available

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, default None
Increment to use from datetools module or time rule (e.g. ‘EOM’)
axis : int or basestring
Corresponds to the axis that contains the Index

If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown

shifted : NDFrame

tz_convert(tz, axis=0, copy=True)

Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
tz_localize(tz, axis=0, copy=True, infer_dst=False)

Localize tz-naive TimeSeries to target time zone

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition times based on order
unstack(level=-1)

Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)

level : int, string, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name

DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation

from unstack).
>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1
     b   2
two  a   3
     b   4
dtype: float64
>>> s.unstack(level=-1)
     a   b
one  1  2
two  3  4
>>> s.unstack(level=0)
   one  two
a  1   3
b  2   4
>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.
     b  3.
two  a  2.
     b  4.

unstacked : DataFrame or Series

update(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)

Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices

other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
Can choose to replace values other than NA. Return True for values that should be updated
raise_conflict : boolean
If True, will raise an error if the DataFrame and other both contain data in the same place.
values

Numpy representation of NDFrame

var(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased variance over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

variance : Series or DataFrame (if level specified)

where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False

Whether to perform the operation in place on the data

axis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False

try to cast the result back to the input type (if possible),
raise_on_error : boolean, default True
Whether to raise on invalid data types (e.g. trying to where on strings)

wh : same type as caller

xs(key, axis=0, level=None, copy=True, drop_level=True)

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).

key : object
Some label contained in the index, or partially in a MultiIndex
axis : int, default 0
Axis to retrieve cross-section on
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
copy : boolean, default True
Whether to make a copy of the data
drop_level : boolean, default True
If False, returns object with same levels as self.
>>> df
   A  B  C
a  4  5  2
b  4  0  9
c  9  7  3
>>> df.xs('a')
A    4
B    5
C    2
Name: a
>>> df.xs('C', axis=1)
a    2
b    9
c    3
Name: C
>>> s = df.xs('a', copy=False)
>>> s['A'] = 100
>>> df
     A  B  C
a  100  5  2
b    4  0  9
c    9  7  3
>>> df
                    A  B  C  D
first second third
bar   one    1      4  1  8  9
      two    1      7  5  5  0
baz   one    1      6  6  8  0
      three  2      5  3  5  3
>>> df.xs(('baz', 'three'))
       A  B  C  D
third
2      5  3  5  3
>>> df.xs('one', level=1)
             A  B  C  D
first third
bar   1      4  1  8  9
baz   1      6  6  8  0
>>> df.xs(('baz', 2), level=[0, 'third'])
        A  B  C  D
second
three   5  3  5  3

xs : Series or DataFrame

class Fred2.Core.Result.CleavageFragmentPredictionResult(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: Fred2.Core.Result.AResult

A CleavageFragmentPredictionResult object is a pandas.DataFrame with single-indexing, where column Ids are the prediction scores fo the different prediction methods, and row ID the Peptide object.

CleavageFragmentPredictionResult:

Peptide Obj Method Name
Peptide1 -15.34
Peptide2 23.34
T

Transpose index and columns

abs()

Return an object with absolute value taken. Only applicable to objects that are all numeric

abs: type of caller

add(other, axis='columns', level=None, fill_value=None)

Binary operator add with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

add_prefix(prefix)

Concatenate prefix string with panel items names.

prefix : string

with_prefix : type of caller

add_suffix(suffix)

Concatenate suffix string with panel items names

suffix : string

with_suffix : type of caller

align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)

Align two object on their axes with the specified join method for each axis Index

other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None

Align on index (0), columns (1), or both (None)
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
copy : boolean, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value

method : str, default None limit : int, default None fill_axis : {0, 1}, default 0

Filling axis, method and limit
(left, right) : (type of input, type of other)
Aligned objects
all(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

any(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

append(other, ignore_index=False, verify_integrity=False)

Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.

other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False

If True do not use the index labels. Useful for gluing together record arrays
verify_integrity : boolean, default False
If True, raise ValueError on creating index with duplicates

If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged

appended : DataFrame

apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

func : function
Function to apply to each column/row
axis : {0, 1}
  • 0 : apply function to each column
  • 1 : apply function to each row
broadcast : boolean, default False
For aggregation functions, return object of same size with values propagated
reduce : boolean or None, default None
Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
raw : boolean, default False
If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
args : tuple
Positional arguments to pass to function in addition to the array/series

Additional keyword arguments will be passed as keywords to the function

>>> df.apply(numpy.sqrt) # returns DataFrame
>>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
>>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)

DataFrame.applymap: For elementwise operations

applied : Series or DataFrame

applymap(func)

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

func : function
Python function, returns a single value from a single value

applied : DataFrame

DataFrame.apply : For operations on rows/columns

as_blocks(columns=None)

Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.

are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
as_matrix)
columns : array-like
Specific column order

values : a list of Object

as_matrix(columns=None)

Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtype will be a lower-common-denominator dtype (implicit

upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks

e.g. if the dtypes are float16,float32 -> float32
float16,float32,float64 -> float64 int32,uint8 -> int32
values : ndarray
If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
asfreq(freq, method=None, how=None, normalize=False)

Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.

freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method
how : {‘start’, ‘end’}, default end
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight

converted : type of caller

astype(dtype, copy=True, raise_on_error=True)

Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)

dtype : numpy.dtype or Python type raise_on_error : raise on invalid input

casted : type of caller

at
at_time(time, asof=False)

Select values at particular time of day (e.g. 9:30AM)

time : datetime.time or string

values_at_time : type of caller

axes
between_time(start_time, end_time, include_start=True, include_end=True)

Select values between particular times of the day (e.g., 9:00-9:30 AM)

start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True

values_between_time : type of caller

bfill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’bfill’)

blocks

Internal property, property synonym for as_blocks()

bool()

Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False

Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)

Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns

data : DataFrame column : column names or list of names, or vector

Can be any valid input to groupby
by : string or sequence
Column in the DataFrame to group by

ax : matplotlib axis object, default None fontsize : int or string rot : int, default None

Rotation for ticks
grid : boolean, default None (matlab style default)
Axis grid lines

ax : matplotlib.axes.AxesSubplot

clip(lower=None, upper=None, out=None)

Trim values at input threshold(s)

lower : float, default None upper : float, default None

clipped : Series

clip_lower(threshold)

Return copy of the input with values below given value truncated

clip

clipped : same type as input

clip_upper(threshold)

Return copy of input with values above given value truncated

clip

clipped : same type as input

combine(other, func, fill_value=None, overwrite=True)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame

result : DataFrame

combineAdd(other)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combineMult(other)

Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combine_first(other)

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

other : DataFrame

a’s values prioritized, use values from b to fill holes:

>>> a.combine_first(b)

combined : DataFrame

compound(axis=None, skipna=None, level=None, **kwargs)

Return the compound percentage of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

compounded : Series or DataFrame (if level specified)

consolidate(inplace=False)

Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user

inplace : boolean, default False
If False return new object, otherwise modify existing object

consolidated : type of caller

convert_objects(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)

Attempt to infer better dtype for object columns

convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
force conversion (and non-convertibles get NaT)
convert_numeric : if True attempt to coerce to numbers (including
strings), non-convertibles get NaN
convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
force conversion (and non-convertibles get NaT)

copy : Boolean, if True, return copy, default is True

converted : asm as input object

copy(deep=True)

Make a copy of this object

deep : boolean, default True
Make a deep copy, i.e. also copy data

copy : type of caller

corr(method='pearson', min_periods=1)

Compute pairwise correlation of columns, excluding NA/null values

method : {‘pearson’, ‘kendall’, ‘spearman’}
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation

y : DataFrame

corrwith(other, axis=0, drop=False)

Compute pairwise correlation between rows or columns of two DataFrame objects.

other : DataFrame axis : {0, 1}

0 to compute column-wise, 1 for row-wise
drop : boolean, default False
Drop missing indices from result, default returns union of all

correls : Series

count(axis=0, level=None, numeric_only=False)

Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)

axis : {0, 1}
0 for row-wise, 1 for column-wise
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
numeric_only : boolean, default False
Include only float, int, boolean data

count : Series (or DataFrame if level specified)

cov(min_periods=None)

Compute pairwise covariance of columns, excluding NA/null values

min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result.

y : DataFrame

y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).

cummax(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative max over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

max : Series

cummin(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative min over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

min : Series

cumprod(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative prod over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

prod : Series

cumsum(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative sum over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

sum : Series

delevel(*args, **kwargs)
describe(percentile_width=50)

Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles

percentile_width : float, optional
width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75

DataFrame of summary statistics

diff(periods=1)

1st discrete difference of object

periods : int, default 1
Periods to shift for forming difference

diffed : DataFrame

div(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

divide(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

dot(other)

Matrix multiplication with DataFrame or Series objects

other : DataFrame or Series

dot_product : DataFrame or Series

drop(labels, axis=0, level=None, inplace=False, **kwargs)

Return new object with labels in requested axis removed

labels : single label or list-like axis : int or axis name level : int or name, default None

For MultiIndex
inplace : bool, default False
If True, do operation inplace and return None.

dropped : type of caller

drop_duplicates(cols=None, take_last=False, inplace=False)

Return DataFrame with duplicate rows removed, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

deduplicated : DataFrame

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Return object with labels on given axis omitted where alternately any or all of the data are missing

axis : {0, 1}, or tuple/list thereof
Pass tuple or list to drop on multiple axes
how : {‘any’, ‘all’}
  • any : if any NA values are present, drop that label
  • all : if all values are NA, drop that label
thresh : int, default None
int value : require that many non-NA values
subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
inplace : boolean, defalt False
If True, do operation inplace and return None.

dropped : DataFrame

dtypes

Return the dtypes in this object

duplicated(cols=None, take_last=False)

Return boolean Series denoting duplicate rows, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row

duplicated : Series

empty

True if NDFrame is entirely empty [no items]

eq(other, axis='columns', level=None)

Wrapper for flexible comparison methods eq

equals(other)

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

eval(expr, **kwargs)

Evaluate an expression in the context of the calling DataFrame instance.

expr : string
The expression string to evaluate.
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

ret : ndarray, scalar, or pandas object

pandas.DataFrame.query pandas.eval

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.eval('a + b')
>>> df.eval('c=a + b')
ffill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’ffill’)

fillna(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)

Fill NA/NaN values using the specified method

method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
value : scalar, dict, or Series
Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
inplace : boolean, default False
If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
limit : int, default None
Maximum size gap to forward or backward fill
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)

reindex, asfreq

filled : same type as caller

filter(items=None, like=None, regex=None, axis=None)

Restrict the info axis to set of items or wildcard

items : list-like
List of info axis to restrict to (must not all be present)
like : string
Keep info axis where “arg in col == True”
regex : string (regular expression)
Keep info axis with re.search(regex, col) == True

Arguments are mutually exclusive, but this is not checked for

filter_result(expressions)

Filters a result data frame based on a specified expression consisting of a list of triple with (method_name, comparator, threshold). The expression is applied to each row. If any of the columns fulfill the criteria the row remains.

Parameters:expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold)
Returns:A new filtered result object
Return type:CleavageFragmentPredictionResult
first(offset)

Convenience method for subsetting initial periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘10D’) -> First 10 days

subset : type of caller

first_valid_index()

Return label for first non-NA/null value

floordiv(other, axis='columns', level=None, fill_value=None)

Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

from_csv(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)

Read delimited file into DataFrame

path : string file path or file handle / StringIO header : int, default 0

Row to use at header (skip prior rows)
sep : string, default ‘,’
Field delimiter
index_col : int or sequence, default 0
Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
parse_dates : boolean, default True
Parse dates. Different default from read_table
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
infer_datetime_format: boolean, default False
If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.

Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data

y : DataFrame

from_dict(data, orient='columns', dtype=None)

Construct DataFrame from dict of array-like or dicts

data : dict
{field : array-like} or {field : dict}
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

DataFrame

from_items(items, columns=None, orient='columns')

Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.

items : sequence of (key, value) pairs
Values should be arrays or Series.
columns : sequence of column labels, optional
Must be passed if orient=’index’.
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.

frame : DataFrame

from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

Convert structured or record ndarray to DataFrame

data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like

Field of array to use as the index, alternately a specific set of input labels to use
exclude : sequence, default None
Columns or fields to exclude
columns : sequence, default None
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
coerce_float : boolean, default False
Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

df : DataFrame

ftypes

Return the ftypes (indication of sparse/dense and dtype) in this object.

ge(other, axis='columns', level=None)

Wrapper for flexible comparison methods ge

get(key, default=None)

Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found

key : object

value : type of items contained in object

get_dtype_counts()

Return the counts of dtypes in this object

get_ftype_counts()

Return the counts of ftypes in this object

get_value(index, col)

Quickly retrieve single value at passed column and index

index : row label col : column label

value : scalar value

get_values()

same as values (but handles sparseness conversions)

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)

Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns

by : mapping function / list of functions, dict, Series, or tuple /
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

axis : int, default 0 level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels
as_index : boolean, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
sort : boolean, default True
Sort group keys. Get better performance by turning this off
group_keys : boolean, default True
When calling apply, add group keys to index to identify pieces
squeeze : boolean, default False
reduce the dimensionaility of the return type if possible, otherwise return a consistent type

# DataFrame result >>> data.groupby(func, axis=0).mean()

# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()

# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()

GroupBy object

gt(other, axis='columns', level=None)

Wrapper for flexible comparison methods gt

head(n=5)

Returns first n rows

hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)

Draw histogram of the DataFrame’s series using matplotlib / pylab.

data : DataFrame column : string or sequence

If passed, will be used to limit data to a subset of columns
by : object, optional
If passed, then used to form histograms for separate groups
grid : boolean, default True
Whether to show axis grid lines
xlabelsize : int, default None
If specified changes the x-axis label size
xrot : float, default None
rotation of x axis labels
ylabelsize : int, default None
If specified changes the y-axis label size
yrot : float, default None
rotation of y axis labels

ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple

The size of the figure to create in inches by default

layout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments

To be passed to hist function
iat
icol(i)
idxmax(axis=0, skipna=True)

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be first index.

idxmax : Series

This method is the DataFrame version of ndarray.argmax.

Series.idxmax

idxmin(axis=0, skipna=True)

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

idxmin : Series

This method is the DataFrame version of ndarray.argmin.

Series.idxmin

iget_value(i, j)
iloc
info(verbose=True, buf=None, max_cols=None)

Concise summary of a DataFrame.

verbose : boolean, default True
If False, don’t print column count summary

buf : writable buffer, defaults to sys.stdout max_cols : int, default None

Determines whether full summary or short summary is printed
insert(loc, column, value, allow_duplicates=False)

Insert column into DataFrame at specified location.

If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.

loc : int
Must have 0 <= loc <= len(columns)

column : object value : int, Series, or array-like

interpolate(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)

Interpolate values according to different methods.

method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
  • ‘linear’: ignore the index and treat the values as equally spaced. default
  • ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
  • ‘index’: use the actual numerical values of the index
  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill.
inplace : bool, default False
Update the NDFrame in place if possible.
downcast : optional, ‘infer’ or None, defaults to ‘infer’
Downcast dtypes if possible.

Series or DataFrame of same shape interpolated at the NaNs

reindex, replace, fillna

# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64

irow(i, copy=False)
is_copy = None
isin(values)

Return boolean DataFrame showing whether each element in the DataFrame is contained in values.

values : iterable, Series, DataFrame or dictionary
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

DataFrame of booleans

When values is a list:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> df.isin([1, 3, 12, 'a'])
       A      B
0   True   True
1  False  False
2   True  False

When values is a dict:

>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
>>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
       A      B
0   True  False  # Note that B didn't match the 1 here.
1  False   True
2   True   True

When values is a Series or DataFrame:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']})
>>> df.isin(other)
       A      B
0   True  False
1  False  False  # Column A in `other` has a 3, but not at index 1.
2   True   True
isnull()

Return a boolean same-sized object indicating if the values are null

iteritems()

Iterator over (column, series) pairs

iterkv(*args, **kwargs)

iteritems alias used to get around 2to3. Deprecated

iterrows()

Iterate over rows of DataFrame as (index, Series) pairs.

  • iterrows does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

    >>> df = DataFrame([[1, 1.0]], columns=['x', 'y'])
    >>> row = next(df.iterrows())[1]
    >>> print(row['x'].dtype)
    float64
    >>> print(df['x'].dtype)
    int64
    
it : generator
A generator that iterates over the rows of the frame.
itertuples(index=True)

Iterate over rows of DataFrame as tuples, with index value as first element of the tuple

ix
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

other : DataFrame, Series with name field set, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
on : column name, tuple/list of column names, or array-like
Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
how : {‘left’, ‘right’, ‘outer’, ‘inner’}

How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise

  • left: use calling frame’s index
  • right: use input frame’s index
  • outer: form union of indexes
  • inner: use intersection of indexes
lsuffix : string
Suffix to use from left frame’s overlapping columns
rsuffix : string
Suffix to use from right frame’s overlapping columns
sort : boolean, default False
Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame

on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects

joined : DataFrame

keys()

Get the ‘info axis’ (see Indexing for more)

This is index for Series, columns for DataFrame and major_axis for Panel.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

last(offset)

Convenience method for subsetting final periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘5M’) -> Last 5 months

subset : type of caller

last_valid_index()

Return label for last non-NA/null value

le(other, axis='columns', level=None)

Wrapper for flexible comparison methods le

load(path)

Deprecated. Use read_pickle instead.

loc
lookup(row_labels, col_labels)

Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

row_labels : sequence
The row labels to use for lookup
col_labels : sequence
The column labels to use for lookup

Akin to:

result = []
for row, col in zip(row_labels, col_labels):
    result.append(df.get_value(row, col))
values : ndarray
The found values
lt(other, axis='columns', level=None)

Wrapper for flexible comparison methods lt

mad(axis=None, skipna=None, level=None, **kwargs)

Return the mean absolute deviation of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mad : Series or DataFrame (if level specified)

mask(cond)

Returns copy whose values are replaced with nan if the inverted condition is True

cond : boolean NDFrame or array

wh: same as input

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the maximum of the values in the object. If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

max : Series or DataFrame (if level specified)

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mean : Series or DataFrame (if level specified)

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

median : Series or DataFrame (if level specified)

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)

Merge DataFrame objects by performing a database-style join operation by columns or indexes.

If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’

  • left: use only keys from left frame (SQL: left outer join)
  • right: use only keys from right frame (SQL: right outer join)
  • outer: use union of keys from both frames (SQL: full outer join)
  • inner: use intersection of keys from both frames (SQL: inner join)
on : label or list
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per left_on docs
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
right_index : boolean, default False
Use the index from the right DataFrame as the join key. Same caveats as left_index
sort : boolean, default False
Sort the join keys lexicographically in the result DataFrame
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right side, respectively
copy : boolean, default True
If False, do not copy data unnecessarily
>>> A              >>> B
    lkey value         rkey value
0   foo  1         0   foo  5
1   bar  2         1   bar  6
2   baz  3         2   qux  7
3   foo  4         3   bar  8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer')
   lkey  value_x  rkey  value_y
0  bar   2        bar   6
1  bar   2        bar   8
2  baz   3        NaN   NaN
3  foo   1        foo   5
4  foo   4        foo   5
5  NaN   NaN      qux   7

merged : DataFrame

merge_results(others)

Merges results of type CleavageFragmentPredictionResult and returns the merged result

Parameters:others (list(CleavageFragmentPredictionResult) or CleavageFragmentPredictionResult) – A (list of) CleavageFragmentPredictionResult object(s)
Returns:new merged CleavageFragmentPredictionResult object
Return type:CleavageFragmentPredictionResult
min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the minimum of the values in the object. If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

min : Series or DataFrame (if level specified)

mod(other, axis='columns', level=None, fill_value=None)

Binary operator mod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

mode(axis=0, numeric_only=False)

Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.

axis : {0, 1, ‘index’, ‘columns’} (default 0)
  • 0/’index’ : get mode of each column
  • 1/’columns’ : get mode of each row
numeric_only : boolean, default False
if True, only apply to numeric columns

modes : DataFrame (sorted)

mul(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

multiply(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

ndim

Number of axes / array dimensions

ne(other, axis='columns', level=None)

Wrapper for flexible comparison methods ne

notnull()

Return a boolean same-sized object indicating if the values are not null

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwds)

Percent change over given number of periods

periods : int, default 1
Periods to shift for forming percent change
fill_method : str, default ‘pad’
How to handle NAs before computing percent changes
limit : int, default None
The number of consecutive NAs to fill before stopping
freq : DateOffset, timedelta, or offset alias string, optional
Increment to use from time series API (e.g. ‘M’ or BDay())

chg : same type as caller

pivot(index=None, columns=None, values=None)

Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)

index : string or object
Column name to use to make new frame’s index
columns : string or object
Column name to use to make new frame’s columns
values : string or object, optional
Column name to use for populating new frame’s values

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods

>>> df
    foo   bar  baz
0   one   A    1.
1   one   B    2.
2   one   C    3.
3   two   A    4.
4   two   B    5.
5   two   C    6.
>>> df.pivot('foo', 'bar', 'baz')
     A   B   C
one  1   2   3
two  4   5   6
>>> df.pivot('foo', 'bar')['baz']
     A   B   C
one  1   2   3
two  4   5   6
pivoted : DataFrame
If no values column specified, will have hierarchically indexed columns
pivot_table(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame

data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on

Keys to group on the x-axis of the pivot table
cols : list of column names or arrays to group on
Keys to group on the y-axis of the pivot table
aggfunc : function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
fill_value : scalar, default None
Value to replace missing values with
margins : boolean, default False
Add all row / columns (e.g. for subtotal / grand totals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', rows=['A', 'B'],
...                     cols=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

table : DataFrame

plot(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)

Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.

frame : DataFrame x : label or position, default None y : label or position, default None

Allows plotting of one column versus another
subplots : boolean, default False
Make separate subplots for each time series
sharex : boolean, default True
In case subplots=True, share x axis
sharey : boolean, default False
In case subplots=True, share y axis
use_index : boolean, default True
Use index as ticks for x axis
stacked : boolean, default False
If True, create stacked bar plot. Only valid for DataFrame input
sort_columns: boolean, default False
Sort column names to determine plot ordering
title : string
Title to use for the plot
grid : boolean, default None (matlab style default)
Axis grid lines
legend : boolean, default True
Place legend on axis subplots

ax : matplotlib axis object, default None style : list or dict

matplotlib line style per column
kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
logx : boolean, default False
For line plots, use log scaling on x axis
logy : boolean, default False
For line plots, use log scaling on y axis
xticks : sequence
Values to use for the xticks
yticks : sequence
Values to use for the yticks

xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None

Rotation for ticks
secondary_y : boolean or sequence, default False
Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
mark_right: boolean, default True
When using a secondary_y axis, should the legend label the axis of the various columns automatically
colormap : str or matplotlib colormap object, default None
Colormap to select colors from. If string, load colormap with that name from matplotlib.
kwds : keywords
Options to pass to matplotlib plotting method

ax_or_axes : matplotlib.AxesSubplot or list of them

pop(item)

Return item and drop from frame. Raise KeyError if not found.

pow(other, axis='columns', level=None, fill_value=None)

Binary operator pow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

prod(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

quantile(q=0.5, axis=0, numeric_only=True)

Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats

q : quantile, default 0.5 (50% quantile)
0 <= q <= 1
axis : {0, 1}
0 for row-wise, 1 for column-wise

quantiles : Series

query(expr, **kwargs)

Query the columns of a frame with a boolean expression.

expr : string
The query string to evaluate. The result of the evaluation of this expression is first passed to loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to __getitem__().
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

q : DataFrame or Series

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The index and columns attributes of the DataFrame instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for this variable, and you can also use the name of the index to identify it in a query.

For further details and examples see the query documentation in indexing.

pandas.eval DataFrame.eval

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.query('a > b')
>>> df[df.a > df.b]  # same result as the previous expression
radd(other, axis='columns', level=None, fill_value=None)

Binary operator radd with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)

Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values

axis : {0, 1}, default 0
Ranks over columns (0) or rows (1)
numeric_only : boolean, default None
Include only float, int, boolean data
method : {‘average’, ‘min’, ‘max’, ‘first’}
  • average: average rank of group
  • min: lowest rank in group
  • max: highest rank in group
  • first: ranks assigned in order they appear in the array
na_option : {‘keep’, ‘top’, ‘bottom’}
  • keep: leave NA values where they are
  • top: smallest rank if ascending
  • bottom: smallest rank if descending
ascending : boolean, default True
False for ranks by high (1) to low (N)

ranks : DataFrame

rdiv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

reindex(index=None, columns=None, **kwargs)

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index, columns : array-like, optional (can be specified in order, or as
keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value
limit : int, default None
Maximum size gap to forward or backward fill
takeable : boolean, default False
treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])

reindexed : DataFrame

reindex_axis(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)

Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index : array-like, optional
New labels / index to conform to. Preferably an Index object to avoid duplicating data

axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
limit : int, default None
Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)

reindex, reindex_like

reindexed : DataFrame

reindex_like(other, method=None, copy=True, limit=None)

return an object with matching indicies to myself

other : Object method : string or None copy : boolean, default True limit : int, default None

Maximum size gap to forward or backward fill
Like calling s.reindex(index=other.index, columns=other.columns,
method=...)

reindexed : same as input

rename(index=None, columns=None, **kwargs)

Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

index, columns : dict-like or function, optional
Transformation to apply to that axis values
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is ignored.

renamed : DataFrame (new object)

rename_axis(mapper, axis=0, copy=True, inplace=False)

Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True

Also copy underlying data

inplace : boolean, default False

renamed : type of caller

reorder_levels(order, axis=0)

Rearrange index levels using input order. May not drop or duplicate levels

order : list of int or list of str
List representing new level order. Reference level by number (position) or by key (label).
axis : int
Where to reorder levels.

type of caller (new object)

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)

Replace values given in ‘to_replace’ with ‘value’.

to_replace : str, regex, list, dict, Series, numeric, or None

  • str or regex:

    • str: string exactly matching to_replace will be replaced with value
    • regex: regexs matching to_replace will be replaced with value
  • list of str, regex, or numeric:

    • First, if to_replace and value are both lists, they must be the same length.
    • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
    • str and regex rules apply as above.
  • dict:

    • Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
    • Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
  • None:

    • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

value : scalar, dict, list, str, regex, default None
Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
inplace : boolean, default False
If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill
regex : bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Otherwise, to_replace must be None because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions.
method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
The method to use when for replacement, when to_replace is a list.

NDFrame.reindex NDFrame.asfreq NDFrame.fillna

filled : NDFrame

AssertionError
  • If regex is not a bool and to_replace is not None.
TypeError
  • If to_replace is a dict and value is not a list, dict, ndarray, or Series
  • If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
ValueError
  • If to_replace and value are list s or ndarray s, but they are not the same length.
  • Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.
  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)

Convenience method for frequency conversion and resampling of regular time-series data.

rule : string
the offset string or object representing target conversion
how : string
method for down- or re-sampling, default to ‘mean’ for downsampling

axis : int, optional, default 0 fill_method : string, default None

fill_method for upsampling
closed : {‘right’, ‘left’}
Which side of bin interval is closed
label : {‘right’, ‘left’}
Which bin edge label to label bucket with

convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta

Adjust the resampled time labels
limit : int, default None
Maximum size gap to when reindexing with fill_method
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.

level : int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default
drop : boolean, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
col_level : int or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fill : object, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

resetted : DataFrame

rfloordiv(other, axis='columns', level=None, fill_value=None)

Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmod(other, axis='columns', level=None, fill_value=None)

Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmul(other, axis='columns', level=None, fill_value=None)

Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rpow(other, axis='columns', level=None, fill_value=None)

Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rsub(other, axis='columns', level=None, fill_value=None)

Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rtruediv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

save(path)

Deprecated. Use to_pickle instead

select(crit, axis=0)

Return data corresponding to axis labels matching criteria

crit : function
To be called on each index (label). Should return True or False

axis : int

selection : type of caller

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.

keys : column label or list of column labels / arrays drop : boolean, default True

Delete columns to be used as the new index
append : boolean, default False
Whether to append columns to existing index
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

dataframe : DataFrame

set_value(index, col, value)

Put single value at passed column and index

index : row label col : column label value : scalar value

frame : DataFrame
If label pair is contained, will be reference to calling DataFrame, otherwise a new object
shape
shift(periods=1, freq=None, axis=0, **kwds)

Shift index by desired number of periods with an optional time freq

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from datetools module or time rule (e.g. ‘EOM’)

If freq is specified then the index values are shifted but the data if not realigned

shifted : same type as caller

skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased skew over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

skew : Series or DataFrame (if level specified)

sort(columns=None, column=None, axis=0, ascending=True, inplace=False)

Sort DataFrame either by labels (along either axis) or by the values in column(s)

columns : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
axis : {0, 1}
Sort index/rows versus columns
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])

sorted : DataFrame

sort_index(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')

Sort DataFrame either by labels (along either axis) or by the values in a column

axis : {0, 1}
Sort index/rows versus columns
by : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])

sorted : DataFrame

sortlevel(level=0, axis=0, ascending=True, inplace=False)

Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)

level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False

Sort the DataFrame without creating a new instance

sorted : DataFrame

squeeze()

squeeze length 1 dimensions

stack(level=-1, dropna=True)

Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.

level : int, string, or list of these, default last level
Level(s) to stack, can pass level name
dropna : boolean, default True
Whether to drop rows in the resulting Frame/Series with no valid values
>>> s
     a   b
one  1.  2.
two  3.  4.
>>> s.stack()
one a    1
    b    2
two a    3
    b    4

stacked : DataFrame or Series

std(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased standard deviation over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

stdev : Series or DataFrame (if level specified)

sub(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

subtract(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the sum of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

sum : Series or DataFrame (if level specified)

swapaxes(axis1, axis2, copy=True)

Interchange axes and swap values axes appropriately

y : same as input

swaplevel(i, j, axis=0)

Swap levels i and j in a MultiIndex on a particular axis

i, j : int, string (can be mixed)
Level of index to be swapped. Can pass level name as string.

swapped : type of caller (new object)

tail(n=5)

Returns last n rows

take(indices, axis=0, convert=True, is_copy=True)

Analogous to ndarray.take

indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy

taken : type of caller

to_clipboard(excel=None, sep=None, **kwargs)

Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.

excel : boolean, defaults to True
if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard

sep : optional, defaults to tab other keywords are passed to to_csv

Requirements for your platform
  • Linux: xclip, or xsel (with gtk or PyQt4 modules)
  • Windows: none
  • OS X: none
to_csv(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)

Write DataFrame to a comma-separated values (csv) file

path_or_buf : string or file handle / StringIO
File path
sep : character, default ”,”
Field delimiter for the output file.
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
nanRep : None
deprecated, use na_rep
mode : str
Python write mode, default ‘w’
encoding : string, optional
a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
line_terminator : string, default ‘\\n’
The newline character or character sequence to use in the output file
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL
chunksize : int or None
rows to write at a time
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
date_format : string, default None
Format string for datetime objects.
to_dense()

Return dense representation of NDFrame (as opposed to sparse)

to_dict(outtype='dict')

Convert DataFrame to dictionary.

outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.

result : dict like {column -> {index -> value}}

to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)

Write DataFrame to a excel sheet

excel_writer : string or ExcelWriter object
File path or existing ExcelWriter
sheet_name : string, default ‘Sheet1’
Name of sheet which will contain DataFrame
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
startow :
upper left cell row to dump data frame
startcol :
upper left cell column to dump data frame
engine : string, default None
write engine to use - you can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
merge_cells : boolean, default True
Write MultiIndex and Hierarchical Rows as merged cells.

If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:

>>> writer = ExcelWriter('output.xlsx')
>>> df1.to_excel(writer,'Sheet1')
>>> df2.to_excel(writer,'Sheet2')
>>> writer.save()
to_gbq(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)

Write a DataFrame to a Google BigQuery table.

If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.

destination_table : string
name of table to be written, in the form ‘dataset.tablename’
schema : sequence (optional)
list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
col_order : sequence (optional)
order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.

kwargs are passed to the Client constructor

SchemaMissing :
Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
TableExists :
Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
InvalidSchema :
Raised if the ‘schema’ parameter does not match the provided DataFrame
to_hdf(path_or_buf, key, **kwargs)

activate the HDFStore

path_or_buf : the path (string) or buffer to put the store key : string

indentifier for the group in the store

mode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’

'r'
Read-only; no data can be modified.
'w'
Write; a new file is created (an existing file with the same name would be deleted).
'a'
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
It is similar to 'a', but the file must already exist.
format : ‘fixed(f)|table(t)’, default is ‘fixed’
fixed(f) : Fixed format
Fast writing/reading. Not-appendable, nor searchable
table(t) : Table format
Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
append : boolean, default False
For Table formats, append the input data to the existing
complevel : int, 1-9, default 0
If a complib is specified compression will be applied where possible
complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
If complevel is > 0 apply compression to objects written in the store wherever possible
fletcher32 : bool, default False
If applying compression use the fletcher32 checksum
to_html(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame as an HTML table.

to_html-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table
escape : boolean, default True
Convert the characters <, >, and & to HTML-safe sequences.=
max_rows : int, optional
Maximum number of rows to show before truncating. If None, show all.
max_cols : int, optional
Maximum number of columns to show before truncating. If None, show all.
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_json(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

path_or_buf : the path or buffer to write the result string
if this is None, return a StringIO of the converted string

orient : string

  • Series
    • default is ‘index’
    • allowed values are: {‘split’,’records’,’index’}
  • DataFrame
    • default is ‘columns’
    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
  • The format of the JSON string
    • split : dict like {index -> [index], columns -> [columns], data -> [values]}
    • records : list like [{column -> value}, ... , {column -> value}]
    • index : dict like {index -> {column -> value}}
    • columns : dict like {column -> {index -> value}}
    • values : just the values array
date_format : {‘epoch’, ‘iso’}
Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
double_precision : The number of decimal places to use when encoding
floating point values, default 10.

force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.

same type as input object with filtered info axis

to_latex(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)

Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.

to_latex-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_msgpack(path_or_buf=None, **kwargs)

msgpack (serialize) object to input file path

THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.

path : string File path, buffer-like, or None
if None, return generated string
append : boolean whether to append to an existing msgpack
(default is False)
compress : type of compressor (zlib or blosc), default to None (no
compression)
to_panel()

Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.

Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later

panel : Panel

to_period(freq=None, axis=0, copy=True)

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)

freq : string, default axis : {0, 1}, default 0

The axis to convert (the index by default)
copy : boolean, default True
If False then underlying input data is not copied

ts : TimeSeries with PeriodIndex

to_pickle(path)

Pickle (serialize) object to input file path

path : string
File path
to_records(index=True, convert_datetime64=True)

Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested

index : boolean, default True
Include index in resulting record array, stored in ‘index’ field
convert_datetime64 : boolean, default True
Whether to convert the index to datetime.datetime if it is a DatetimeIndex

y : recarray

to_sparse(fill_value=None, kind='block')

Convert to SparseDataFrame

fill_value : float, default NaN kind : {‘block’, ‘integer’}

y : SparseDataFrame

to_sql(name, con, flavor='sqlite', if_exists='fail', **kwargs)

Write records stored in a DataFrame to a SQL database.

name : str
Name of SQL table

conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’

  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.
to_stata(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)

A class for writing Stata binary dta files from array-like objects

fname : file path or buffer
Where to save the dta file.
convert_dates : dict
Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
encoding : str
Default is latin-1. Note that Stata does not support unicode.
byteorder : str
Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data)
>>> writer.write_file()

Or with dates

>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'})
>>> writer.write_file()
to_string(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame to a console-friendly tabular output.

frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_timestamp(freq=None, how='start', axis=0, copy=True)

Cast to DatetimeIndex of timestamps, at beginning of period

freq : string, default frequency of PeriodIndex
Desired frequency
how : {‘s’, ‘e’, ‘start’, ‘end’}
Convention for converting period to timestamp; start of period vs. end
axis : {0, 1} default 0
The axis to convert (the index by default)
copy : boolean, default True
If false then underlying input data is not copied

df : DataFrame with DatetimeIndex

to_wide(*args, **kwargs)
transpose()

Transpose index and columns

truediv(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

truncate(before=None, after=None, axis=None, copy=True)

Truncates a sorted NDFrame before and/or after some particular dates.

before : date
Truncate before date
after : date
Truncate after date

axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,

return a copy of the truncated section

truncated : type of caller

tshift(periods=1, freq=None, axis=0, **kwds)

Shift the time index, using the index’s frequency if available

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, default None
Increment to use from datetools module or time rule (e.g. ‘EOM’)
axis : int or basestring
Corresponds to the axis that contains the Index

If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown

shifted : NDFrame

tz_convert(tz, axis=0, copy=True)

Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
tz_localize(tz, axis=0, copy=True, infer_dst=False)

Localize tz-naive TimeSeries to target time zone

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition times based on order
unstack(level=-1)

Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)

level : int, string, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name

DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation

from unstack).
>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1
     b   2
two  a   3
     b   4
dtype: float64
>>> s.unstack(level=-1)
     a   b
one  1  2
two  3  4
>>> s.unstack(level=0)
   one  two
a  1   3
b  2   4
>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.
     b  3.
two  a  2.
     b  4.

unstacked : DataFrame or Series

update(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)

Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices

other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
Can choose to replace values other than NA. Return True for values that should be updated
raise_conflict : boolean
If True, will raise an error if the DataFrame and other both contain data in the same place.
values

Numpy representation of NDFrame

var(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased variance over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

variance : Series or DataFrame (if level specified)

where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False

Whether to perform the operation in place on the data

axis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False

try to cast the result back to the input type (if possible),
raise_on_error : boolean, default True
Whether to raise on invalid data types (e.g. trying to where on strings)

wh : same type as caller

xs(key, axis=0, level=None, copy=True, drop_level=True)

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).

key : object
Some label contained in the index, or partially in a MultiIndex
axis : int, default 0
Axis to retrieve cross-section on
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
copy : boolean, default True
Whether to make a copy of the data
drop_level : boolean, default True
If False, returns object with same levels as self.
>>> df
   A  B  C
a  4  5  2
b  4  0  9
c  9  7  3
>>> df.xs('a')
A    4
B    5
C    2
Name: a
>>> df.xs('C', axis=1)
a    2
b    9
c    3
Name: C
>>> s = df.xs('a', copy=False)
>>> s['A'] = 100
>>> df
     A  B  C
a  100  5  2
b    4  0  9
c    9  7  3
>>> df
                    A  B  C  D
first second third
bar   one    1      4  1  8  9
      two    1      7  5  5  0
baz   one    1      6  6  8  0
      three  2      5  3  5  3
>>> df.xs(('baz', 'three'))
       A  B  C  D
third
2      5  3  5  3
>>> df.xs('one', level=1)
             A  B  C  D
first third
bar   1      4  1  8  9
baz   1      6  6  8  0
>>> df.xs(('baz', 2), level=[0, 'third'])
        A  B  C  D
second
three   5  3  5  3

xs : Series or DataFrame

class Fred2.Core.Result.CleavageSitePredictionResult(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: Fred2.Core.Result.AResult

A CleavageSitePredictionResult object is a pandas.DataFrame with multi-indexing, where column Ids are the prediction scores fo the different prediction methods, as well as the amino acid a a specific position, row ID the Protein ID and the position of the sequence (starting at 0).

CleavageSitePredictionResult:

ID Pos Seq Method_name
protein_ID 0 S 0.56
1 Y 15
2 F 0.36
3 P 10
T

Transpose index and columns

abs()

Return an object with absolute value taken. Only applicable to objects that are all numeric

abs: type of caller

add(other, axis='columns', level=None, fill_value=None)

Binary operator add with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

add_prefix(prefix)

Concatenate prefix string with panel items names.

prefix : string

with_prefix : type of caller

add_suffix(suffix)

Concatenate suffix string with panel items names

suffix : string

with_suffix : type of caller

align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)

Align two object on their axes with the specified join method for each axis Index

other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None

Align on index (0), columns (1), or both (None)
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
copy : boolean, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value

method : str, default None limit : int, default None fill_axis : {0, 1}, default 0

Filling axis, method and limit
(left, right) : (type of input, type of other)
Aligned objects
all(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

any(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

append(other, ignore_index=False, verify_integrity=False)

Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.

other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False

If True do not use the index labels. Useful for gluing together record arrays
verify_integrity : boolean, default False
If True, raise ValueError on creating index with duplicates

If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged

appended : DataFrame

apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

func : function
Function to apply to each column/row
axis : {0, 1}
  • 0 : apply function to each column
  • 1 : apply function to each row
broadcast : boolean, default False
For aggregation functions, return object of same size with values propagated
reduce : boolean or None, default None
Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
raw : boolean, default False
If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
args : tuple
Positional arguments to pass to function in addition to the array/series

Additional keyword arguments will be passed as keywords to the function

>>> df.apply(numpy.sqrt) # returns DataFrame
>>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
>>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)

DataFrame.applymap: For elementwise operations

applied : Series or DataFrame

applymap(func)

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

func : function
Python function, returns a single value from a single value

applied : DataFrame

DataFrame.apply : For operations on rows/columns

as_blocks(columns=None)

Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.

are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
as_matrix)
columns : array-like
Specific column order

values : a list of Object

as_matrix(columns=None)

Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtype will be a lower-common-denominator dtype (implicit

upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks

e.g. if the dtypes are float16,float32 -> float32
float16,float32,float64 -> float64 int32,uint8 -> int32
values : ndarray
If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
asfreq(freq, method=None, how=None, normalize=False)

Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.

freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method
how : {‘start’, ‘end’}, default end
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight

converted : type of caller

astype(dtype, copy=True, raise_on_error=True)

Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)

dtype : numpy.dtype or Python type raise_on_error : raise on invalid input

casted : type of caller

at
at_time(time, asof=False)

Select values at particular time of day (e.g. 9:30AM)

time : datetime.time or string

values_at_time : type of caller

axes
between_time(start_time, end_time, include_start=True, include_end=True)

Select values between particular times of the day (e.g., 9:00-9:30 AM)

start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True

values_between_time : type of caller

bfill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’bfill’)

blocks

Internal property, property synonym for as_blocks()

bool()

Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False

Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)

Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns

data : DataFrame column : column names or list of names, or vector

Can be any valid input to groupby
by : string or sequence
Column in the DataFrame to group by

ax : matplotlib axis object, default None fontsize : int or string rot : int, default None

Rotation for ticks
grid : boolean, default None (matlab style default)
Axis grid lines

ax : matplotlib.axes.AxesSubplot

clip(lower=None, upper=None, out=None)

Trim values at input threshold(s)

lower : float, default None upper : float, default None

clipped : Series

clip_lower(threshold)

Return copy of the input with values below given value truncated

clip

clipped : same type as input

clip_upper(threshold)

Return copy of input with values above given value truncated

clip

clipped : same type as input

combine(other, func, fill_value=None, overwrite=True)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame

result : DataFrame

combineAdd(other)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combineMult(other)

Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combine_first(other)

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

other : DataFrame

a’s values prioritized, use values from b to fill holes:

>>> a.combine_first(b)

combined : DataFrame

compound(axis=None, skipna=None, level=None, **kwargs)

Return the compound percentage of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

compounded : Series or DataFrame (if level specified)

consolidate(inplace=False)

Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user

inplace : boolean, default False
If False return new object, otherwise modify existing object

consolidated : type of caller

convert_objects(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)

Attempt to infer better dtype for object columns

convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
force conversion (and non-convertibles get NaT)
convert_numeric : if True attempt to coerce to numbers (including
strings), non-convertibles get NaN
convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
force conversion (and non-convertibles get NaT)

copy : Boolean, if True, return copy, default is True

converted : asm as input object

copy(deep=True)

Make a copy of this object

deep : boolean, default True
Make a deep copy, i.e. also copy data

copy : type of caller

corr(method='pearson', min_periods=1)

Compute pairwise correlation of columns, excluding NA/null values

method : {‘pearson’, ‘kendall’, ‘spearman’}
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation

y : DataFrame

corrwith(other, axis=0, drop=False)

Compute pairwise correlation between rows or columns of two DataFrame objects.

other : DataFrame axis : {0, 1}

0 to compute column-wise, 1 for row-wise
drop : boolean, default False
Drop missing indices from result, default returns union of all

correls : Series

count(axis=0, level=None, numeric_only=False)

Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)

axis : {0, 1}
0 for row-wise, 1 for column-wise
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
numeric_only : boolean, default False
Include only float, int, boolean data

count : Series (or DataFrame if level specified)

cov(min_periods=None)

Compute pairwise covariance of columns, excluding NA/null values

min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result.

y : DataFrame

y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).

cummax(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative max over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

max : Series

cummin(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative min over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

min : Series

cumprod(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative prod over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

prod : Series

cumsum(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative sum over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

sum : Series

delevel(*args, **kwargs)
describe(percentile_width=50)

Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles

percentile_width : float, optional
width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75

DataFrame of summary statistics

diff(periods=1)

1st discrete difference of object

periods : int, default 1
Periods to shift for forming difference

diffed : DataFrame

div(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

divide(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

dot(other)

Matrix multiplication with DataFrame or Series objects

other : DataFrame or Series

dot_product : DataFrame or Series

drop(labels, axis=0, level=None, inplace=False, **kwargs)

Return new object with labels in requested axis removed

labels : single label or list-like axis : int or axis name level : int or name, default None

For MultiIndex
inplace : bool, default False
If True, do operation inplace and return None.

dropped : type of caller

drop_duplicates(cols=None, take_last=False, inplace=False)

Return DataFrame with duplicate rows removed, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

deduplicated : DataFrame

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Return object with labels on given axis omitted where alternately any or all of the data are missing

axis : {0, 1}, or tuple/list thereof
Pass tuple or list to drop on multiple axes
how : {‘any’, ‘all’}
  • any : if any NA values are present, drop that label
  • all : if all values are NA, drop that label
thresh : int, default None
int value : require that many non-NA values
subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
inplace : boolean, defalt False
If True, do operation inplace and return None.

dropped : DataFrame

dtypes

Return the dtypes in this object

duplicated(cols=None, take_last=False)

Return boolean Series denoting duplicate rows, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row

duplicated : Series

empty

True if NDFrame is entirely empty [no items]

eq(other, axis='columns', level=None)

Wrapper for flexible comparison methods eq

equals(other)

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

eval(expr, **kwargs)

Evaluate an expression in the context of the calling DataFrame instance.

expr : string
The expression string to evaluate.
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

ret : ndarray, scalar, or pandas object

pandas.DataFrame.query pandas.eval

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.eval('a + b')
>>> df.eval('c=a + b')
ffill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’ffill’)

fillna(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)

Fill NA/NaN values using the specified method

method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
value : scalar, dict, or Series
Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
inplace : boolean, default False
If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
limit : int, default None
Maximum size gap to forward or backward fill
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)

reindex, asfreq

filled : same type as caller

filter(items=None, like=None, regex=None, axis=None)

Restrict the info axis to set of items or wildcard

items : list-like
List of info axis to restrict to (must not all be present)
like : string
Keep info axis where “arg in col == True”
regex : string (regular expression)
Keep info axis with re.search(regex, col) == True

Arguments are mutually exclusive, but this is not checked for

filter_result(expressions)

Filters a result data frame based on a specified expression consisting of a list of triple with (method_name, comparator, threshold). The expression is applied to each row. If any of the columns fulfill the criteria the row remains.

Parameters:expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold)
Returns:A new filtered result object
Return type:CleavageSitePredictionResult
first(offset)

Convenience method for subsetting initial periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘10D’) -> First 10 days

subset : type of caller

first_valid_index()

Return label for first non-NA/null value

floordiv(other, axis='columns', level=None, fill_value=None)

Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

from_csv(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)

Read delimited file into DataFrame

path : string file path or file handle / StringIO header : int, default 0

Row to use at header (skip prior rows)
sep : string, default ‘,’
Field delimiter
index_col : int or sequence, default 0
Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
parse_dates : boolean, default True
Parse dates. Different default from read_table
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
infer_datetime_format: boolean, default False
If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.

Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data

y : DataFrame

from_dict(data, orient='columns', dtype=None)

Construct DataFrame from dict of array-like or dicts

data : dict
{field : array-like} or {field : dict}
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

DataFrame

from_items(items, columns=None, orient='columns')

Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.

items : sequence of (key, value) pairs
Values should be arrays or Series.
columns : sequence of column labels, optional
Must be passed if orient=’index’.
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.

frame : DataFrame

from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

Convert structured or record ndarray to DataFrame

data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like

Field of array to use as the index, alternately a specific set of input labels to use
exclude : sequence, default None
Columns or fields to exclude
columns : sequence, default None
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
coerce_float : boolean, default False
Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

df : DataFrame

ftypes

Return the ftypes (indication of sparse/dense and dtype) in this object.

ge(other, axis='columns', level=None)

Wrapper for flexible comparison methods ge

get(key, default=None)

Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found

key : object

value : type of items contained in object

get_dtype_counts()

Return the counts of dtypes in this object

get_ftype_counts()

Return the counts of ftypes in this object

get_value(index, col)

Quickly retrieve single value at passed column and index

index : row label col : column label

value : scalar value

get_values()

same as values (but handles sparseness conversions)

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)

Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns

by : mapping function / list of functions, dict, Series, or tuple /
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

axis : int, default 0 level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels
as_index : boolean, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
sort : boolean, default True
Sort group keys. Get better performance by turning this off
group_keys : boolean, default True
When calling apply, add group keys to index to identify pieces
squeeze : boolean, default False
reduce the dimensionaility of the return type if possible, otherwise return a consistent type

# DataFrame result >>> data.groupby(func, axis=0).mean()

# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()

# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()

GroupBy object

gt(other, axis='columns', level=None)

Wrapper for flexible comparison methods gt

head(n=5)

Returns first n rows

hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)

Draw histogram of the DataFrame’s series using matplotlib / pylab.

data : DataFrame column : string or sequence

If passed, will be used to limit data to a subset of columns
by : object, optional
If passed, then used to form histograms for separate groups
grid : boolean, default True
Whether to show axis grid lines
xlabelsize : int, default None
If specified changes the x-axis label size
xrot : float, default None
rotation of x axis labels
ylabelsize : int, default None
If specified changes the y-axis label size
yrot : float, default None
rotation of y axis labels

ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple

The size of the figure to create in inches by default

layout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments

To be passed to hist function
iat
icol(i)
idxmax(axis=0, skipna=True)

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be first index.

idxmax : Series

This method is the DataFrame version of ndarray.argmax.

Series.idxmax

idxmin(axis=0, skipna=True)

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

idxmin : Series

This method is the DataFrame version of ndarray.argmin.

Series.idxmin

iget_value(i, j)
iloc
info(verbose=True, buf=None, max_cols=None)

Concise summary of a DataFrame.

verbose : boolean, default True
If False, don’t print column count summary

buf : writable buffer, defaults to sys.stdout max_cols : int, default None

Determines whether full summary or short summary is printed
insert(loc, column, value, allow_duplicates=False)

Insert column into DataFrame at specified location.

If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.

loc : int
Must have 0 <= loc <= len(columns)

column : object value : int, Series, or array-like

interpolate(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)

Interpolate values according to different methods.

method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
  • ‘linear’: ignore the index and treat the values as equally spaced. default
  • ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
  • ‘index’: use the actual numerical values of the index
  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill.
inplace : bool, default False
Update the NDFrame in place if possible.
downcast : optional, ‘infer’ or None, defaults to ‘infer’
Downcast dtypes if possible.

Series or DataFrame of same shape interpolated at the NaNs

reindex, replace, fillna

# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64

irow(i, copy=False)
is_copy = None
isin(values)

Return boolean DataFrame showing whether each element in the DataFrame is contained in values.

values : iterable, Series, DataFrame or dictionary
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

DataFrame of booleans

When values is a list:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> df.isin([1, 3, 12, 'a'])
       A      B
0   True   True
1  False  False
2   True  False

When values is a dict:

>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
>>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
       A      B
0   True  False  # Note that B didn't match the 1 here.
1  False   True
2   True   True

When values is a Series or DataFrame:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']})
>>> df.isin(other)
       A      B
0   True  False
1  False  False  # Column A in `other` has a 3, but not at index 1.
2   True   True
isnull()

Return a boolean same-sized object indicating if the values are null

iteritems()

Iterator over (column, series) pairs

iterkv(*args, **kwargs)

iteritems alias used to get around 2to3. Deprecated

iterrows()

Iterate over rows of DataFrame as (index, Series) pairs.

  • iterrows does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

    >>> df = DataFrame([[1, 1.0]], columns=['x', 'y'])
    >>> row = next(df.iterrows())[1]
    >>> print(row['x'].dtype)
    float64
    >>> print(df['x'].dtype)
    int64
    
it : generator
A generator that iterates over the rows of the frame.
itertuples(index=True)

Iterate over rows of DataFrame as tuples, with index value as first element of the tuple

ix
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

other : DataFrame, Series with name field set, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
on : column name, tuple/list of column names, or array-like
Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
how : {‘left’, ‘right’, ‘outer’, ‘inner’}

How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise

  • left: use calling frame’s index
  • right: use input frame’s index
  • outer: form union of indexes
  • inner: use intersection of indexes
lsuffix : string
Suffix to use from left frame’s overlapping columns
rsuffix : string
Suffix to use from right frame’s overlapping columns
sort : boolean, default False
Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame

on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects

joined : DataFrame

keys()

Get the ‘info axis’ (see Indexing for more)

This is index for Series, columns for DataFrame and major_axis for Panel.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

last(offset)

Convenience method for subsetting final periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘5M’) -> Last 5 months

subset : type of caller

last_valid_index()

Return label for last non-NA/null value

le(other, axis='columns', level=None)

Wrapper for flexible comparison methods le

load(path)

Deprecated. Use read_pickle instead.

loc
lookup(row_labels, col_labels)

Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

row_labels : sequence
The row labels to use for lookup
col_labels : sequence
The column labels to use for lookup

Akin to:

result = []
for row, col in zip(row_labels, col_labels):
    result.append(df.get_value(row, col))
values : ndarray
The found values
lt(other, axis='columns', level=None)

Wrapper for flexible comparison methods lt

mad(axis=None, skipna=None, level=None, **kwargs)

Return the mean absolute deviation of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mad : Series or DataFrame (if level specified)

mask(cond)

Returns copy whose values are replaced with nan if the inverted condition is True

cond : boolean NDFrame or array

wh: same as input

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the maximum of the values in the object. If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

max : Series or DataFrame (if level specified)

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mean : Series or DataFrame (if level specified)

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

median : Series or DataFrame (if level specified)

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)

Merge DataFrame objects by performing a database-style join operation by columns or indexes.

If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’

  • left: use only keys from left frame (SQL: left outer join)
  • right: use only keys from right frame (SQL: right outer join)
  • outer: use union of keys from both frames (SQL: full outer join)
  • inner: use intersection of keys from both frames (SQL: inner join)
on : label or list
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per left_on docs
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
right_index : boolean, default False
Use the index from the right DataFrame as the join key. Same caveats as left_index
sort : boolean, default False
Sort the join keys lexicographically in the result DataFrame
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right side, respectively
copy : boolean, default True
If False, do not copy data unnecessarily
>>> A              >>> B
    lkey value         rkey value
0   foo  1         0   foo  5
1   bar  2         1   bar  6
2   baz  3         2   qux  7
3   foo  4         3   bar  8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer')
   lkey  value_x  rkey  value_y
0  bar   2        bar   6
1  bar   2        bar   8
2  baz   3        NaN   NaN
3  foo   1        foo   5
4  foo   4        foo   5
5  NaN   NaN      qux   7

merged : DataFrame

merge_results(others)

Merges results of type CleavageSitePredictionResult and returns the merged result

Parameters:others (list(CleavageSitePredictionResult) or CleavageSitePredictionResult) – A (list of) CleavageSitePredictionResult object(s)
Returns:A new merged CleavageSitePredictionResult object
Return type:CleavageSitePredictionResult
min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the minimum of the values in the object. If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

min : Series or DataFrame (if level specified)

mod(other, axis='columns', level=None, fill_value=None)

Binary operator mod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

mode(axis=0, numeric_only=False)

Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.

axis : {0, 1, ‘index’, ‘columns’} (default 0)
  • 0/’index’ : get mode of each column
  • 1/’columns’ : get mode of each row
numeric_only : boolean, default False
if True, only apply to numeric columns

modes : DataFrame (sorted)

mul(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

multiply(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

ndim

Number of axes / array dimensions

ne(other, axis='columns', level=None)

Wrapper for flexible comparison methods ne

notnull()

Return a boolean same-sized object indicating if the values are not null

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwds)

Percent change over given number of periods

periods : int, default 1
Periods to shift for forming percent change
fill_method : str, default ‘pad’
How to handle NAs before computing percent changes
limit : int, default None
The number of consecutive NAs to fill before stopping
freq : DateOffset, timedelta, or offset alias string, optional
Increment to use from time series API (e.g. ‘M’ or BDay())

chg : same type as caller

pivot(index=None, columns=None, values=None)

Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)

index : string or object
Column name to use to make new frame’s index
columns : string or object
Column name to use to make new frame’s columns
values : string or object, optional
Column name to use for populating new frame’s values

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods

>>> df
    foo   bar  baz
0   one   A    1.
1   one   B    2.
2   one   C    3.
3   two   A    4.
4   two   B    5.
5   two   C    6.
>>> df.pivot('foo', 'bar', 'baz')
     A   B   C
one  1   2   3
two  4   5   6
>>> df.pivot('foo', 'bar')['baz']
     A   B   C
one  1   2   3
two  4   5   6
pivoted : DataFrame
If no values column specified, will have hierarchically indexed columns
pivot_table(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame

data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on

Keys to group on the x-axis of the pivot table
cols : list of column names or arrays to group on
Keys to group on the y-axis of the pivot table
aggfunc : function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
fill_value : scalar, default None
Value to replace missing values with
margins : boolean, default False
Add all row / columns (e.g. for subtotal / grand totals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', rows=['A', 'B'],
...                     cols=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

table : DataFrame

plot(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)

Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.

frame : DataFrame x : label or position, default None y : label or position, default None

Allows plotting of one column versus another
subplots : boolean, default False
Make separate subplots for each time series
sharex : boolean, default True
In case subplots=True, share x axis
sharey : boolean, default False
In case subplots=True, share y axis
use_index : boolean, default True
Use index as ticks for x axis
stacked : boolean, default False
If True, create stacked bar plot. Only valid for DataFrame input
sort_columns: boolean, default False
Sort column names to determine plot ordering
title : string
Title to use for the plot
grid : boolean, default None (matlab style default)
Axis grid lines
legend : boolean, default True
Place legend on axis subplots

ax : matplotlib axis object, default None style : list or dict

matplotlib line style per column
kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
logx : boolean, default False
For line plots, use log scaling on x axis
logy : boolean, default False
For line plots, use log scaling on y axis
xticks : sequence
Values to use for the xticks
yticks : sequence
Values to use for the yticks

xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None

Rotation for ticks
secondary_y : boolean or sequence, default False
Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
mark_right: boolean, default True
When using a secondary_y axis, should the legend label the axis of the various columns automatically
colormap : str or matplotlib colormap object, default None
Colormap to select colors from. If string, load colormap with that name from matplotlib.
kwds : keywords
Options to pass to matplotlib plotting method

ax_or_axes : matplotlib.AxesSubplot or list of them

pop(item)

Return item and drop from frame. Raise KeyError if not found.

pow(other, axis='columns', level=None, fill_value=None)

Binary operator pow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

prod(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

quantile(q=0.5, axis=0, numeric_only=True)

Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats

q : quantile, default 0.5 (50% quantile)
0 <= q <= 1
axis : {0, 1}
0 for row-wise, 1 for column-wise

quantiles : Series

query(expr, **kwargs)

Query the columns of a frame with a boolean expression.

expr : string
The query string to evaluate. The result of the evaluation of this expression is first passed to loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to __getitem__().
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

q : DataFrame or Series

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The index and columns attributes of the DataFrame instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for this variable, and you can also use the name of the index to identify it in a query.

For further details and examples see the query documentation in indexing.

pandas.eval DataFrame.eval

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.query('a > b')
>>> df[df.a > df.b]  # same result as the previous expression
radd(other, axis='columns', level=None, fill_value=None)

Binary operator radd with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)

Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values

axis : {0, 1}, default 0
Ranks over columns (0) or rows (1)
numeric_only : boolean, default None
Include only float, int, boolean data
method : {‘average’, ‘min’, ‘max’, ‘first’}
  • average: average rank of group
  • min: lowest rank in group
  • max: highest rank in group
  • first: ranks assigned in order they appear in the array
na_option : {‘keep’, ‘top’, ‘bottom’}
  • keep: leave NA values where they are
  • top: smallest rank if ascending
  • bottom: smallest rank if descending
ascending : boolean, default True
False for ranks by high (1) to low (N)

ranks : DataFrame

rdiv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

reindex(index=None, columns=None, **kwargs)

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index, columns : array-like, optional (can be specified in order, or as
keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value
limit : int, default None
Maximum size gap to forward or backward fill
takeable : boolean, default False
treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])

reindexed : DataFrame

reindex_axis(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)

Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index : array-like, optional
New labels / index to conform to. Preferably an Index object to avoid duplicating data

axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
limit : int, default None
Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)

reindex, reindex_like

reindexed : DataFrame

reindex_like(other, method=None, copy=True, limit=None)

return an object with matching indicies to myself

other : Object method : string or None copy : boolean, default True limit : int, default None

Maximum size gap to forward or backward fill
Like calling s.reindex(index=other.index, columns=other.columns,
method=...)

reindexed : same as input

rename(index=None, columns=None, **kwargs)

Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

index, columns : dict-like or function, optional
Transformation to apply to that axis values
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is ignored.

renamed : DataFrame (new object)

rename_axis(mapper, axis=0, copy=True, inplace=False)

Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True

Also copy underlying data

inplace : boolean, default False

renamed : type of caller

reorder_levels(order, axis=0)

Rearrange index levels using input order. May not drop or duplicate levels

order : list of int or list of str
List representing new level order. Reference level by number (position) or by key (label).
axis : int
Where to reorder levels.

type of caller (new object)

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)

Replace values given in ‘to_replace’ with ‘value’.

to_replace : str, regex, list, dict, Series, numeric, or None

  • str or regex:

    • str: string exactly matching to_replace will be replaced with value
    • regex: regexs matching to_replace will be replaced with value
  • list of str, regex, or numeric:

    • First, if to_replace and value are both lists, they must be the same length.
    • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
    • str and regex rules apply as above.
  • dict:

    • Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
    • Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
  • None:

    • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

value : scalar, dict, list, str, regex, default None
Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
inplace : boolean, default False
If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill
regex : bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Otherwise, to_replace must be None because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions.
method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
The method to use when for replacement, when to_replace is a list.

NDFrame.reindex NDFrame.asfreq NDFrame.fillna

filled : NDFrame

AssertionError
  • If regex is not a bool and to_replace is not None.
TypeError
  • If to_replace is a dict and value is not a list, dict, ndarray, or Series
  • If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
ValueError
  • If to_replace and value are list s or ndarray s, but they are not the same length.
  • Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.
  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)

Convenience method for frequency conversion and resampling of regular time-series data.

rule : string
the offset string or object representing target conversion
how : string
method for down- or re-sampling, default to ‘mean’ for downsampling

axis : int, optional, default 0 fill_method : string, default None

fill_method for upsampling
closed : {‘right’, ‘left’}
Which side of bin interval is closed
label : {‘right’, ‘left’}
Which bin edge label to label bucket with

convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta

Adjust the resampled time labels
limit : int, default None
Maximum size gap to when reindexing with fill_method
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.

level : int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default
drop : boolean, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
col_level : int or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fill : object, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

resetted : DataFrame

rfloordiv(other, axis='columns', level=None, fill_value=None)

Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmod(other, axis='columns', level=None, fill_value=None)

Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmul(other, axis='columns', level=None, fill_value=None)

Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rpow(other, axis='columns', level=None, fill_value=None)

Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rsub(other, axis='columns', level=None, fill_value=None)

Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rtruediv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

save(path)

Deprecated. Use to_pickle instead

select(crit, axis=0)

Return data corresponding to axis labels matching criteria

crit : function
To be called on each index (label). Should return True or False

axis : int

selection : type of caller

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.

keys : column label or list of column labels / arrays drop : boolean, default True

Delete columns to be used as the new index
append : boolean, default False
Whether to append columns to existing index
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

dataframe : DataFrame

set_value(index, col, value)

Put single value at passed column and index

index : row label col : column label value : scalar value

frame : DataFrame
If label pair is contained, will be reference to calling DataFrame, otherwise a new object
shape
shift(periods=1, freq=None, axis=0, **kwds)

Shift index by desired number of periods with an optional time freq

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from datetools module or time rule (e.g. ‘EOM’)

If freq is specified then the index values are shifted but the data if not realigned

shifted : same type as caller

skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased skew over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

skew : Series or DataFrame (if level specified)

sort(columns=None, column=None, axis=0, ascending=True, inplace=False)

Sort DataFrame either by labels (along either axis) or by the values in column(s)

columns : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
axis : {0, 1}
Sort index/rows versus columns
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])

sorted : DataFrame

sort_index(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')

Sort DataFrame either by labels (along either axis) or by the values in a column

axis : {0, 1}
Sort index/rows versus columns
by : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])

sorted : DataFrame

sortlevel(level=0, axis=0, ascending=True, inplace=False)

Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)

level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False

Sort the DataFrame without creating a new instance

sorted : DataFrame

squeeze()

squeeze length 1 dimensions

stack(level=-1, dropna=True)

Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.

level : int, string, or list of these, default last level
Level(s) to stack, can pass level name
dropna : boolean, default True
Whether to drop rows in the resulting Frame/Series with no valid values
>>> s
     a   b
one  1.  2.
two  3.  4.
>>> s.stack()
one a    1
    b    2
two a    3
    b    4

stacked : DataFrame or Series

std(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased standard deviation over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

stdev : Series or DataFrame (if level specified)

sub(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

subtract(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the sum of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

sum : Series or DataFrame (if level specified)

swapaxes(axis1, axis2, copy=True)

Interchange axes and swap values axes appropriately

y : same as input

swaplevel(i, j, axis=0)

Swap levels i and j in a MultiIndex on a particular axis

i, j : int, string (can be mixed)
Level of index to be swapped. Can pass level name as string.

swapped : type of caller (new object)

tail(n=5)

Returns last n rows

take(indices, axis=0, convert=True, is_copy=True)

Analogous to ndarray.take

indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy

taken : type of caller

to_clipboard(excel=None, sep=None, **kwargs)

Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.

excel : boolean, defaults to True
if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard

sep : optional, defaults to tab other keywords are passed to to_csv

Requirements for your platform
  • Linux: xclip, or xsel (with gtk or PyQt4 modules)
  • Windows: none
  • OS X: none
to_csv(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)

Write DataFrame to a comma-separated values (csv) file

path_or_buf : string or file handle / StringIO
File path
sep : character, default ”,”
Field delimiter for the output file.
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
nanRep : None
deprecated, use na_rep
mode : str
Python write mode, default ‘w’
encoding : string, optional
a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
line_terminator : string, default ‘\\n’
The newline character or character sequence to use in the output file
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL
chunksize : int or None
rows to write at a time
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
date_format : string, default None
Format string for datetime objects.
to_dense()

Return dense representation of NDFrame (as opposed to sparse)

to_dict(outtype='dict')

Convert DataFrame to dictionary.

outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.

result : dict like {column -> {index -> value}}

to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)

Write DataFrame to a excel sheet

excel_writer : string or ExcelWriter object
File path or existing ExcelWriter
sheet_name : string, default ‘Sheet1’
Name of sheet which will contain DataFrame
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
startow :
upper left cell row to dump data frame
startcol :
upper left cell column to dump data frame
engine : string, default None
write engine to use - you can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
merge_cells : boolean, default True
Write MultiIndex and Hierarchical Rows as merged cells.

If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:

>>> writer = ExcelWriter('output.xlsx')
>>> df1.to_excel(writer,'Sheet1')
>>> df2.to_excel(writer,'Sheet2')
>>> writer.save()
to_gbq(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)

Write a DataFrame to a Google BigQuery table.

If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.

destination_table : string
name of table to be written, in the form ‘dataset.tablename’
schema : sequence (optional)
list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
col_order : sequence (optional)
order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.

kwargs are passed to the Client constructor

SchemaMissing :
Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
TableExists :
Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
InvalidSchema :
Raised if the ‘schema’ parameter does not match the provided DataFrame
to_hdf(path_or_buf, key, **kwargs)

activate the HDFStore

path_or_buf : the path (string) or buffer to put the store key : string

indentifier for the group in the store

mode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’

'r'
Read-only; no data can be modified.
'w'
Write; a new file is created (an existing file with the same name would be deleted).
'a'
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
It is similar to 'a', but the file must already exist.
format : ‘fixed(f)|table(t)’, default is ‘fixed’
fixed(f) : Fixed format
Fast writing/reading. Not-appendable, nor searchable
table(t) : Table format
Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
append : boolean, default False
For Table formats, append the input data to the existing
complevel : int, 1-9, default 0
If a complib is specified compression will be applied where possible
complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
If complevel is > 0 apply compression to objects written in the store wherever possible
fletcher32 : bool, default False
If applying compression use the fletcher32 checksum
to_html(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame as an HTML table.

to_html-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table
escape : boolean, default True
Convert the characters <, >, and & to HTML-safe sequences.=
max_rows : int, optional
Maximum number of rows to show before truncating. If None, show all.
max_cols : int, optional
Maximum number of columns to show before truncating. If None, show all.
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_json(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

path_or_buf : the path or buffer to write the result string
if this is None, return a StringIO of the converted string

orient : string

  • Series
    • default is ‘index’
    • allowed values are: {‘split’,’records’,’index’}
  • DataFrame
    • default is ‘columns’
    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
  • The format of the JSON string
    • split : dict like {index -> [index], columns -> [columns], data -> [values]}
    • records : list like [{column -> value}, ... , {column -> value}]
    • index : dict like {index -> {column -> value}}
    • columns : dict like {column -> {index -> value}}
    • values : just the values array
date_format : {‘epoch’, ‘iso’}
Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
double_precision : The number of decimal places to use when encoding
floating point values, default 10.

force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.

same type as input object with filtered info axis

to_latex(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)

Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.

to_latex-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_msgpack(path_or_buf=None, **kwargs)

msgpack (serialize) object to input file path

THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.

path : string File path, buffer-like, or None
if None, return generated string
append : boolean whether to append to an existing msgpack
(default is False)
compress : type of compressor (zlib or blosc), default to None (no
compression)
to_panel()

Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.

Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later

panel : Panel

to_period(freq=None, axis=0, copy=True)

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)

freq : string, default axis : {0, 1}, default 0

The axis to convert (the index by default)
copy : boolean, default True
If False then underlying input data is not copied

ts : TimeSeries with PeriodIndex

to_pickle(path)

Pickle (serialize) object to input file path

path : string
File path
to_records(index=True, convert_datetime64=True)

Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested

index : boolean, default True
Include index in resulting record array, stored in ‘index’ field
convert_datetime64 : boolean, default True
Whether to convert the index to datetime.datetime if it is a DatetimeIndex

y : recarray

to_sparse(fill_value=None, kind='block')

Convert to SparseDataFrame

fill_value : float, default NaN kind : {‘block’, ‘integer’}

y : SparseDataFrame

to_sql(name, con, flavor='sqlite', if_exists='fail', **kwargs)

Write records stored in a DataFrame to a SQL database.

name : str
Name of SQL table

conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’

  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.
to_stata(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)

A class for writing Stata binary dta files from array-like objects

fname : file path or buffer
Where to save the dta file.
convert_dates : dict
Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
encoding : str
Default is latin-1. Note that Stata does not support unicode.
byteorder : str
Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data)
>>> writer.write_file()

Or with dates

>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'})
>>> writer.write_file()
to_string(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame to a console-friendly tabular output.

frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_timestamp(freq=None, how='start', axis=0, copy=True)

Cast to DatetimeIndex of timestamps, at beginning of period

freq : string, default frequency of PeriodIndex
Desired frequency
how : {‘s’, ‘e’, ‘start’, ‘end’}
Convention for converting period to timestamp; start of period vs. end
axis : {0, 1} default 0
The axis to convert (the index by default)
copy : boolean, default True
If false then underlying input data is not copied

df : DataFrame with DatetimeIndex

to_wide(*args, **kwargs)
transpose()

Transpose index and columns

truediv(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

truncate(before=None, after=None, axis=None, copy=True)

Truncates a sorted NDFrame before and/or after some particular dates.

before : date
Truncate before date
after : date
Truncate after date

axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,

return a copy of the truncated section

truncated : type of caller

tshift(periods=1, freq=None, axis=0, **kwds)

Shift the time index, using the index’s frequency if available

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, default None
Increment to use from datetools module or time rule (e.g. ‘EOM’)
axis : int or basestring
Corresponds to the axis that contains the Index

If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown

shifted : NDFrame

tz_convert(tz, axis=0, copy=True)

Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
tz_localize(tz, axis=0, copy=True, infer_dst=False)

Localize tz-naive TimeSeries to target time zone

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition times based on order
unstack(level=-1)

Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)

level : int, string, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name

DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation

from unstack).
>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1
     b   2
two  a   3
     b   4
dtype: float64
>>> s.unstack(level=-1)
     a   b
one  1  2
two  3  4
>>> s.unstack(level=0)
   one  two
a  1   3
b  2   4
>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.
     b  3.
two  a  2.
     b  4.

unstacked : DataFrame or Series

update(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)

Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices

other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
Can choose to replace values other than NA. Return True for values that should be updated
raise_conflict : boolean
If True, will raise an error if the DataFrame and other both contain data in the same place.
values

Numpy representation of NDFrame

var(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased variance over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

variance : Series or DataFrame (if level specified)

where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False

Whether to perform the operation in place on the data

axis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False

try to cast the result back to the input type (if possible),
raise_on_error : boolean, default True
Whether to raise on invalid data types (e.g. trying to where on strings)

wh : same type as caller

xs(key, axis=0, level=None, copy=True, drop_level=True)

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).

key : object
Some label contained in the index, or partially in a MultiIndex
axis : int, default 0
Axis to retrieve cross-section on
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
copy : boolean, default True
Whether to make a copy of the data
drop_level : boolean, default True
If False, returns object with same levels as self.
>>> df
   A  B  C
a  4  5  2
b  4  0  9
c  9  7  3
>>> df.xs('a')
A    4
B    5
C    2
Name: a
>>> df.xs('C', axis=1)
a    2
b    9
c    3
Name: C
>>> s = df.xs('a', copy=False)
>>> s['A'] = 100
>>> df
     A  B  C
a  100  5  2
b    4  0  9
c    9  7  3
>>> df
                    A  B  C  D
first second third
bar   one    1      4  1  8  9
      two    1      7  5  5  0
baz   one    1      6  6  8  0
      three  2      5  3  5  3
>>> df.xs(('baz', 'three'))
       A  B  C  D
third
2      5  3  5  3
>>> df.xs('one', level=1)
             A  B  C  D
first third
bar   1      4  1  8  9
baz   1      6  6  8  0
>>> df.xs(('baz', 2), level=[0, 'third'])
        A  B  C  D
second
three   5  3  5  3

xs : Series or DataFrame

class Fred2.Core.Result.Distance2SelfResult(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: Fred2.Core.Result.AResult

Distance2Self prediction result

T

Transpose index and columns

abs()

Return an object with absolute value taken. Only applicable to objects that are all numeric

abs: type of caller

add(other, axis='columns', level=None, fill_value=None)

Binary operator add with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

add_prefix(prefix)

Concatenate prefix string with panel items names.

prefix : string

with_prefix : type of caller

add_suffix(suffix)

Concatenate suffix string with panel items names

suffix : string

with_suffix : type of caller

align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)

Align two object on their axes with the specified join method for each axis Index

other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None

Align on index (0), columns (1), or both (None)
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
copy : boolean, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value

method : str, default None limit : int, default None fill_axis : {0, 1}, default 0

Filling axis, method and limit
(left, right) : (type of input, type of other)
Aligned objects
all(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

any(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

append(other, ignore_index=False, verify_integrity=False)

Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.

other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False

If True do not use the index labels. Useful for gluing together record arrays
verify_integrity : boolean, default False
If True, raise ValueError on creating index with duplicates

If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged

appended : DataFrame

apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

func : function
Function to apply to each column/row
axis : {0, 1}
  • 0 : apply function to each column
  • 1 : apply function to each row
broadcast : boolean, default False
For aggregation functions, return object of same size with values propagated
reduce : boolean or None, default None
Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
raw : boolean, default False
If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
args : tuple
Positional arguments to pass to function in addition to the array/series

Additional keyword arguments will be passed as keywords to the function

>>> df.apply(numpy.sqrt) # returns DataFrame
>>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
>>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)

DataFrame.applymap: For elementwise operations

applied : Series or DataFrame

applymap(func)

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

func : function
Python function, returns a single value from a single value

applied : DataFrame

DataFrame.apply : For operations on rows/columns

as_blocks(columns=None)

Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.

are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
as_matrix)
columns : array-like
Specific column order

values : a list of Object

as_matrix(columns=None)

Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtype will be a lower-common-denominator dtype (implicit

upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks

e.g. if the dtypes are float16,float32 -> float32
float16,float32,float64 -> float64 int32,uint8 -> int32
values : ndarray
If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
asfreq(freq, method=None, how=None, normalize=False)

Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.

freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method
how : {‘start’, ‘end’}, default end
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight

converted : type of caller

astype(dtype, copy=True, raise_on_error=True)

Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)

dtype : numpy.dtype or Python type raise_on_error : raise on invalid input

casted : type of caller

at
at_time(time, asof=False)

Select values at particular time of day (e.g. 9:30AM)

time : datetime.time or string

values_at_time : type of caller

axes
between_time(start_time, end_time, include_start=True, include_end=True)

Select values between particular times of the day (e.g., 9:00-9:30 AM)

start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True

values_between_time : type of caller

bfill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’bfill’)

blocks

Internal property, property synonym for as_blocks()

bool()

Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False

Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)

Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns

data : DataFrame column : column names or list of names, or vector

Can be any valid input to groupby
by : string or sequence
Column in the DataFrame to group by

ax : matplotlib axis object, default None fontsize : int or string rot : int, default None

Rotation for ticks
grid : boolean, default None (matlab style default)
Axis grid lines

ax : matplotlib.axes.AxesSubplot

clip(lower=None, upper=None, out=None)

Trim values at input threshold(s)

lower : float, default None upper : float, default None

clipped : Series

clip_lower(threshold)

Return copy of the input with values below given value truncated

clip

clipped : same type as input

clip_upper(threshold)

Return copy of input with values above given value truncated

clip

clipped : same type as input

combine(other, func, fill_value=None, overwrite=True)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame

result : DataFrame

combineAdd(other)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combineMult(other)

Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combine_first(other)

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

other : DataFrame

a’s values prioritized, use values from b to fill holes:

>>> a.combine_first(b)

combined : DataFrame

compound(axis=None, skipna=None, level=None, **kwargs)

Return the compound percentage of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

compounded : Series or DataFrame (if level specified)

consolidate(inplace=False)

Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user

inplace : boolean, default False
If False return new object, otherwise modify existing object

consolidated : type of caller

convert_objects(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)

Attempt to infer better dtype for object columns

convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
force conversion (and non-convertibles get NaT)
convert_numeric : if True attempt to coerce to numbers (including
strings), non-convertibles get NaN
convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
force conversion (and non-convertibles get NaT)

copy : Boolean, if True, return copy, default is True

converted : asm as input object

copy(deep=True)

Make a copy of this object

deep : boolean, default True
Make a deep copy, i.e. also copy data

copy : type of caller

corr(method='pearson', min_periods=1)

Compute pairwise correlation of columns, excluding NA/null values

method : {‘pearson’, ‘kendall’, ‘spearman’}
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation

y : DataFrame

corrwith(other, axis=0, drop=False)

Compute pairwise correlation between rows or columns of two DataFrame objects.

other : DataFrame axis : {0, 1}

0 to compute column-wise, 1 for row-wise
drop : boolean, default False
Drop missing indices from result, default returns union of all

correls : Series

count(axis=0, level=None, numeric_only=False)

Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)

axis : {0, 1}
0 for row-wise, 1 for column-wise
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
numeric_only : boolean, default False
Include only float, int, boolean data

count : Series (or DataFrame if level specified)

cov(min_periods=None)

Compute pairwise covariance of columns, excluding NA/null values

min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result.

y : DataFrame

y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).

cummax(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative max over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

max : Series

cummin(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative min over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

min : Series

cumprod(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative prod over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

prod : Series

cumsum(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative sum over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

sum : Series

delevel(*args, **kwargs)
describe(percentile_width=50)

Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles

percentile_width : float, optional
width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75

DataFrame of summary statistics

diff(periods=1)

1st discrete difference of object

periods : int, default 1
Periods to shift for forming difference

diffed : DataFrame

div(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

divide(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

dot(other)

Matrix multiplication with DataFrame or Series objects

other : DataFrame or Series

dot_product : DataFrame or Series

drop(labels, axis=0, level=None, inplace=False, **kwargs)

Return new object with labels in requested axis removed

labels : single label or list-like axis : int or axis name level : int or name, default None

For MultiIndex
inplace : bool, default False
If True, do operation inplace and return None.

dropped : type of caller

drop_duplicates(cols=None, take_last=False, inplace=False)

Return DataFrame with duplicate rows removed, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

deduplicated : DataFrame

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Return object with labels on given axis omitted where alternately any or all of the data are missing

axis : {0, 1}, or tuple/list thereof
Pass tuple or list to drop on multiple axes
how : {‘any’, ‘all’}
  • any : if any NA values are present, drop that label
  • all : if all values are NA, drop that label
thresh : int, default None
int value : require that many non-NA values
subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
inplace : boolean, defalt False
If True, do operation inplace and return None.

dropped : DataFrame

dtypes

Return the dtypes in this object

duplicated(cols=None, take_last=False)

Return boolean Series denoting duplicate rows, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row

duplicated : Series

empty

True if NDFrame is entirely empty [no items]

eq(other, axis='columns', level=None)

Wrapper for flexible comparison methods eq

equals(other)

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

eval(expr, **kwargs)

Evaluate an expression in the context of the calling DataFrame instance.

expr : string
The expression string to evaluate.
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

ret : ndarray, scalar, or pandas object

pandas.DataFrame.query pandas.eval

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.eval('a + b')
>>> df.eval('c=a + b')
ffill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’ffill’)

fillna(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)

Fill NA/NaN values using the specified method

method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
value : scalar, dict, or Series
Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
inplace : boolean, default False
If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
limit : int, default None
Maximum size gap to forward or backward fill
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)

reindex, asfreq

filled : same type as caller

filter(items=None, like=None, regex=None, axis=None)

Restrict the info axis to set of items or wildcard

items : list-like
List of info axis to restrict to (must not all be present)
like : string
Keep info axis where “arg in col == True”
regex : string (regular expression)
Keep info axis with re.search(regex, col) == True

Arguments are mutually exclusive, but this is not checked for

filter_result(expressions)
first(offset)

Convenience method for subsetting initial periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘10D’) -> First 10 days

subset : type of caller

first_valid_index()

Return label for first non-NA/null value

floordiv(other, axis='columns', level=None, fill_value=None)

Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

from_csv(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)

Read delimited file into DataFrame

path : string file path or file handle / StringIO header : int, default 0

Row to use at header (skip prior rows)
sep : string, default ‘,’
Field delimiter
index_col : int or sequence, default 0
Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
parse_dates : boolean, default True
Parse dates. Different default from read_table
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
infer_datetime_format: boolean, default False
If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.

Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data

y : DataFrame

from_dict(data, orient='columns', dtype=None)

Construct DataFrame from dict of array-like or dicts

data : dict
{field : array-like} or {field : dict}
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

DataFrame

from_items(items, columns=None, orient='columns')

Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.

items : sequence of (key, value) pairs
Values should be arrays or Series.
columns : sequence of column labels, optional
Must be passed if orient=’index’.
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.

frame : DataFrame

from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

Convert structured or record ndarray to DataFrame

data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like

Field of array to use as the index, alternately a specific set of input labels to use
exclude : sequence, default None
Columns or fields to exclude
columns : sequence, default None
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
coerce_float : boolean, default False
Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

df : DataFrame

ftypes

Return the ftypes (indication of sparse/dense and dtype) in this object.

ge(other, axis='columns', level=None)

Wrapper for flexible comparison methods ge

get(key, default=None)

Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found

key : object

value : type of items contained in object

get_dtype_counts()

Return the counts of dtypes in this object

get_ftype_counts()

Return the counts of ftypes in this object

get_value(index, col)

Quickly retrieve single value at passed column and index

index : row label col : column label

value : scalar value

get_values()

same as values (but handles sparseness conversions)

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)

Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns

by : mapping function / list of functions, dict, Series, or tuple /
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

axis : int, default 0 level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels
as_index : boolean, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
sort : boolean, default True
Sort group keys. Get better performance by turning this off
group_keys : boolean, default True
When calling apply, add group keys to index to identify pieces
squeeze : boolean, default False
reduce the dimensionaility of the return type if possible, otherwise return a consistent type

# DataFrame result >>> data.groupby(func, axis=0).mean()

# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()

# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()

GroupBy object

gt(other, axis='columns', level=None)

Wrapper for flexible comparison methods gt

head(n=5)

Returns first n rows

hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)

Draw histogram of the DataFrame’s series using matplotlib / pylab.

data : DataFrame column : string or sequence

If passed, will be used to limit data to a subset of columns
by : object, optional
If passed, then used to form histograms for separate groups
grid : boolean, default True
Whether to show axis grid lines
xlabelsize : int, default None
If specified changes the x-axis label size
xrot : float, default None
rotation of x axis labels
ylabelsize : int, default None
If specified changes the y-axis label size
yrot : float, default None
rotation of y axis labels

ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple

The size of the figure to create in inches by default

layout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments

To be passed to hist function
iat
icol(i)
idxmax(axis=0, skipna=True)

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be first index.

idxmax : Series

This method is the DataFrame version of ndarray.argmax.

Series.idxmax

idxmin(axis=0, skipna=True)

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

idxmin : Series

This method is the DataFrame version of ndarray.argmin.

Series.idxmin

iget_value(i, j)
iloc
info(verbose=True, buf=None, max_cols=None)

Concise summary of a DataFrame.

verbose : boolean, default True
If False, don’t print column count summary

buf : writable buffer, defaults to sys.stdout max_cols : int, default None

Determines whether full summary or short summary is printed
insert(loc, column, value, allow_duplicates=False)

Insert column into DataFrame at specified location.

If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.

loc : int
Must have 0 <= loc <= len(columns)

column : object value : int, Series, or array-like

interpolate(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)

Interpolate values according to different methods.

method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
  • ‘linear’: ignore the index and treat the values as equally spaced. default
  • ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
  • ‘index’: use the actual numerical values of the index
  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill.
inplace : bool, default False
Update the NDFrame in place if possible.
downcast : optional, ‘infer’ or None, defaults to ‘infer’
Downcast dtypes if possible.

Series or DataFrame of same shape interpolated at the NaNs

reindex, replace, fillna

# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64

irow(i, copy=False)
is_copy = None
isin(values)

Return boolean DataFrame showing whether each element in the DataFrame is contained in values.

values : iterable, Series, DataFrame or dictionary
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

DataFrame of booleans

When values is a list:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> df.isin([1, 3, 12, 'a'])
       A      B
0   True   True
1  False  False
2   True  False

When values is a dict:

>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
>>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
       A      B
0   True  False  # Note that B didn't match the 1 here.
1  False   True
2   True   True

When values is a Series or DataFrame:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']})
>>> df.isin(other)
       A      B
0   True  False
1  False  False  # Column A in `other` has a 3, but not at index 1.
2   True   True
isnull()

Return a boolean same-sized object indicating if the values are null

iteritems()

Iterator over (column, series) pairs

iterkv(*args, **kwargs)

iteritems alias used to get around 2to3. Deprecated

iterrows()

Iterate over rows of DataFrame as (index, Series) pairs.

  • iterrows does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

    >>> df = DataFrame([[1, 1.0]], columns=['x', 'y'])
    >>> row = next(df.iterrows())[1]
    >>> print(row['x'].dtype)
    float64
    >>> print(df['x'].dtype)
    int64
    
it : generator
A generator that iterates over the rows of the frame.
itertuples(index=True)

Iterate over rows of DataFrame as tuples, with index value as first element of the tuple

ix
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

other : DataFrame, Series with name field set, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
on : column name, tuple/list of column names, or array-like
Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
how : {‘left’, ‘right’, ‘outer’, ‘inner’}

How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise

  • left: use calling frame’s index
  • right: use input frame’s index
  • outer: form union of indexes
  • inner: use intersection of indexes
lsuffix : string
Suffix to use from left frame’s overlapping columns
rsuffix : string
Suffix to use from right frame’s overlapping columns
sort : boolean, default False
Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame

on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects

joined : DataFrame

keys()

Get the ‘info axis’ (see Indexing for more)

This is index for Series, columns for DataFrame and major_axis for Panel.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

last(offset)

Convenience method for subsetting final periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘5M’) -> Last 5 months

subset : type of caller

last_valid_index()

Return label for last non-NA/null value

le(other, axis='columns', level=None)

Wrapper for flexible comparison methods le

load(path)

Deprecated. Use read_pickle instead.

loc
lookup(row_labels, col_labels)

Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

row_labels : sequence
The row labels to use for lookup
col_labels : sequence
The column labels to use for lookup

Akin to:

result = []
for row, col in zip(row_labels, col_labels):
    result.append(df.get_value(row, col))
values : ndarray
The found values
lt(other, axis='columns', level=None)

Wrapper for flexible comparison methods lt

mad(axis=None, skipna=None, level=None, **kwargs)

Return the mean absolute deviation of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mad : Series or DataFrame (if level specified)

mask(cond)

Returns copy whose values are replaced with nan if the inverted condition is True

cond : boolean NDFrame or array

wh: same as input

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the maximum of the values in the object. If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

max : Series or DataFrame (if level specified)

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mean : Series or DataFrame (if level specified)

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

median : Series or DataFrame (if level specified)

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)

Merge DataFrame objects by performing a database-style join operation by columns or indexes.

If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’

  • left: use only keys from left frame (SQL: left outer join)
  • right: use only keys from right frame (SQL: right outer join)
  • outer: use union of keys from both frames (SQL: full outer join)
  • inner: use intersection of keys from both frames (SQL: inner join)
on : label or list
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per left_on docs
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
right_index : boolean, default False
Use the index from the right DataFrame as the join key. Same caveats as left_index
sort : boolean, default False
Sort the join keys lexicographically in the result DataFrame
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right side, respectively
copy : boolean, default True
If False, do not copy data unnecessarily
>>> A              >>> B
    lkey value         rkey value
0   foo  1         0   foo  5
1   bar  2         1   bar  6
2   baz  3         2   qux  7
3   foo  4         3   bar  8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer')
   lkey  value_x  rkey  value_y
0  bar   2        bar   6
1  bar   2        bar   8
2  baz   3        NaN   NaN
3  foo   1        foo   5
4  foo   4        foo   5
5  NaN   NaN      qux   7

merged : DataFrame

merge_results(others)
min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the minimum of the values in the object. If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

min : Series or DataFrame (if level specified)

mod(other, axis='columns', level=None, fill_value=None)

Binary operator mod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

mode(axis=0, numeric_only=False)

Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.

axis : {0, 1, ‘index’, ‘columns’} (default 0)
  • 0/’index’ : get mode of each column
  • 1/’columns’ : get mode of each row
numeric_only : boolean, default False
if True, only apply to numeric columns

modes : DataFrame (sorted)

mul(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

multiply(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

ndim

Number of axes / array dimensions

ne(other, axis='columns', level=None)

Wrapper for flexible comparison methods ne

notnull()

Return a boolean same-sized object indicating if the values are not null

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwds)

Percent change over given number of periods

periods : int, default 1
Periods to shift for forming percent change
fill_method : str, default ‘pad’
How to handle NAs before computing percent changes
limit : int, default None
The number of consecutive NAs to fill before stopping
freq : DateOffset, timedelta, or offset alias string, optional
Increment to use from time series API (e.g. ‘M’ or BDay())

chg : same type as caller

pivot(index=None, columns=None, values=None)

Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)

index : string or object
Column name to use to make new frame’s index
columns : string or object
Column name to use to make new frame’s columns
values : string or object, optional
Column name to use for populating new frame’s values

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods

>>> df
    foo   bar  baz
0   one   A    1.
1   one   B    2.
2   one   C    3.
3   two   A    4.
4   two   B    5.
5   two   C    6.
>>> df.pivot('foo', 'bar', 'baz')
     A   B   C
one  1   2   3
two  4   5   6
>>> df.pivot('foo', 'bar')['baz']
     A   B   C
one  1   2   3
two  4   5   6
pivoted : DataFrame
If no values column specified, will have hierarchically indexed columns
pivot_table(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame

data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on

Keys to group on the x-axis of the pivot table
cols : list of column names or arrays to group on
Keys to group on the y-axis of the pivot table
aggfunc : function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
fill_value : scalar, default None
Value to replace missing values with
margins : boolean, default False
Add all row / columns (e.g. for subtotal / grand totals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', rows=['A', 'B'],
...                     cols=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

table : DataFrame

plot(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)

Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.

frame : DataFrame x : label or position, default None y : label or position, default None

Allows plotting of one column versus another
subplots : boolean, default False
Make separate subplots for each time series
sharex : boolean, default True
In case subplots=True, share x axis
sharey : boolean, default False
In case subplots=True, share y axis
use_index : boolean, default True
Use index as ticks for x axis
stacked : boolean, default False
If True, create stacked bar plot. Only valid for DataFrame input
sort_columns: boolean, default False
Sort column names to determine plot ordering
title : string
Title to use for the plot
grid : boolean, default None (matlab style default)
Axis grid lines
legend : boolean, default True
Place legend on axis subplots

ax : matplotlib axis object, default None style : list or dict

matplotlib line style per column
kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
logx : boolean, default False
For line plots, use log scaling on x axis
logy : boolean, default False
For line plots, use log scaling on y axis
xticks : sequence
Values to use for the xticks
yticks : sequence
Values to use for the yticks

xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None

Rotation for ticks
secondary_y : boolean or sequence, default False
Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
mark_right: boolean, default True
When using a secondary_y axis, should the legend label the axis of the various columns automatically
colormap : str or matplotlib colormap object, default None
Colormap to select colors from. If string, load colormap with that name from matplotlib.
kwds : keywords
Options to pass to matplotlib plotting method

ax_or_axes : matplotlib.AxesSubplot or list of them

pop(item)

Return item and drop from frame. Raise KeyError if not found.

pow(other, axis='columns', level=None, fill_value=None)

Binary operator pow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

prod(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

quantile(q=0.5, axis=0, numeric_only=True)

Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats

q : quantile, default 0.5 (50% quantile)
0 <= q <= 1
axis : {0, 1}
0 for row-wise, 1 for column-wise

quantiles : Series

query(expr, **kwargs)

Query the columns of a frame with a boolean expression.

expr : string
The query string to evaluate. The result of the evaluation of this expression is first passed to loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to __getitem__().
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

q : DataFrame or Series

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The index and columns attributes of the DataFrame instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for this variable, and you can also use the name of the index to identify it in a query.

For further details and examples see the query documentation in indexing.

pandas.eval DataFrame.eval

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.query('a > b')
>>> df[df.a > df.b]  # same result as the previous expression
radd(other, axis='columns', level=None, fill_value=None)

Binary operator radd with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)

Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values

axis : {0, 1}, default 0
Ranks over columns (0) or rows (1)
numeric_only : boolean, default None
Include only float, int, boolean data
method : {‘average’, ‘min’, ‘max’, ‘first’}
  • average: average rank of group
  • min: lowest rank in group
  • max: highest rank in group
  • first: ranks assigned in order they appear in the array
na_option : {‘keep’, ‘top’, ‘bottom’}
  • keep: leave NA values where they are
  • top: smallest rank if ascending
  • bottom: smallest rank if descending
ascending : boolean, default True
False for ranks by high (1) to low (N)

ranks : DataFrame

rdiv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

reindex(index=None, columns=None, **kwargs)

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index, columns : array-like, optional (can be specified in order, or as
keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value
limit : int, default None
Maximum size gap to forward or backward fill
takeable : boolean, default False
treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])

reindexed : DataFrame

reindex_axis(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)

Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index : array-like, optional
New labels / index to conform to. Preferably an Index object to avoid duplicating data

axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
limit : int, default None
Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)

reindex, reindex_like

reindexed : DataFrame

reindex_like(other, method=None, copy=True, limit=None)

return an object with matching indicies to myself

other : Object method : string or None copy : boolean, default True limit : int, default None

Maximum size gap to forward or backward fill
Like calling s.reindex(index=other.index, columns=other.columns,
method=...)

reindexed : same as input

rename(index=None, columns=None, **kwargs)

Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

index, columns : dict-like or function, optional
Transformation to apply to that axis values
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is ignored.

renamed : DataFrame (new object)

rename_axis(mapper, axis=0, copy=True, inplace=False)

Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True

Also copy underlying data

inplace : boolean, default False

renamed : type of caller

reorder_levels(order, axis=0)

Rearrange index levels using input order. May not drop or duplicate levels

order : list of int or list of str
List representing new level order. Reference level by number (position) or by key (label).
axis : int
Where to reorder levels.

type of caller (new object)

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)

Replace values given in ‘to_replace’ with ‘value’.

to_replace : str, regex, list, dict, Series, numeric, or None

  • str or regex:

    • str: string exactly matching to_replace will be replaced with value
    • regex: regexs matching to_replace will be replaced with value
  • list of str, regex, or numeric:

    • First, if to_replace and value are both lists, they must be the same length.
    • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
    • str and regex rules apply as above.
  • dict:

    • Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
    • Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
  • None:

    • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

value : scalar, dict, list, str, regex, default None
Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
inplace : boolean, default False
If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill
regex : bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Otherwise, to_replace must be None because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions.
method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
The method to use when for replacement, when to_replace is a list.

NDFrame.reindex NDFrame.asfreq NDFrame.fillna

filled : NDFrame

AssertionError
  • If regex is not a bool and to_replace is not None.
TypeError
  • If to_replace is a dict and value is not a list, dict, ndarray, or Series
  • If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
ValueError
  • If to_replace and value are list s or ndarray s, but they are not the same length.
  • Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.
  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)

Convenience method for frequency conversion and resampling of regular time-series data.

rule : string
the offset string or object representing target conversion
how : string
method for down- or re-sampling, default to ‘mean’ for downsampling

axis : int, optional, default 0 fill_method : string, default None

fill_method for upsampling
closed : {‘right’, ‘left’}
Which side of bin interval is closed
label : {‘right’, ‘left’}
Which bin edge label to label bucket with

convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta

Adjust the resampled time labels
limit : int, default None
Maximum size gap to when reindexing with fill_method
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.

level : int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default
drop : boolean, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
col_level : int or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fill : object, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

resetted : DataFrame

rfloordiv(other, axis='columns', level=None, fill_value=None)

Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmod(other, axis='columns', level=None, fill_value=None)

Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmul(other, axis='columns', level=None, fill_value=None)

Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rpow(other, axis='columns', level=None, fill_value=None)

Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rsub(other, axis='columns', level=None, fill_value=None)

Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rtruediv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

save(path)

Deprecated. Use to_pickle instead

select(crit, axis=0)

Return data corresponding to axis labels matching criteria

crit : function
To be called on each index (label). Should return True or False

axis : int

selection : type of caller

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.

keys : column label or list of column labels / arrays drop : boolean, default True

Delete columns to be used as the new index
append : boolean, default False
Whether to append columns to existing index
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

dataframe : DataFrame

set_value(index, col, value)

Put single value at passed column and index

index : row label col : column label value : scalar value

frame : DataFrame
If label pair is contained, will be reference to calling DataFrame, otherwise a new object
shape
shift(periods=1, freq=None, axis=0, **kwds)

Shift index by desired number of periods with an optional time freq

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from datetools module or time rule (e.g. ‘EOM’)

If freq is specified then the index values are shifted but the data if not realigned

shifted : same type as caller

skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased skew over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

skew : Series or DataFrame (if level specified)

sort(columns=None, column=None, axis=0, ascending=True, inplace=False)

Sort DataFrame either by labels (along either axis) or by the values in column(s)

columns : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
axis : {0, 1}
Sort index/rows versus columns
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])

sorted : DataFrame

sort_index(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')

Sort DataFrame either by labels (along either axis) or by the values in a column

axis : {0, 1}
Sort index/rows versus columns
by : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])

sorted : DataFrame

sortlevel(level=0, axis=0, ascending=True, inplace=False)

Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)

level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False

Sort the DataFrame without creating a new instance

sorted : DataFrame

squeeze()

squeeze length 1 dimensions

stack(level=-1, dropna=True)

Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.

level : int, string, or list of these, default last level
Level(s) to stack, can pass level name
dropna : boolean, default True
Whether to drop rows in the resulting Frame/Series with no valid values
>>> s
     a   b
one  1.  2.
two  3.  4.
>>> s.stack()
one a    1
    b    2
two a    3
    b    4

stacked : DataFrame or Series

std(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased standard deviation over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

stdev : Series or DataFrame (if level specified)

sub(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

subtract(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the sum of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

sum : Series or DataFrame (if level specified)

swapaxes(axis1, axis2, copy=True)

Interchange axes and swap values axes appropriately

y : same as input

swaplevel(i, j, axis=0)

Swap levels i and j in a MultiIndex on a particular axis

i, j : int, string (can be mixed)
Level of index to be swapped. Can pass level name as string.

swapped : type of caller (new object)

tail(n=5)

Returns last n rows

take(indices, axis=0, convert=True, is_copy=True)

Analogous to ndarray.take

indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy

taken : type of caller

to_clipboard(excel=None, sep=None, **kwargs)

Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.

excel : boolean, defaults to True
if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard

sep : optional, defaults to tab other keywords are passed to to_csv

Requirements for your platform
  • Linux: xclip, or xsel (with gtk or PyQt4 modules)
  • Windows: none
  • OS X: none
to_csv(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)

Write DataFrame to a comma-separated values (csv) file

path_or_buf : string or file handle / StringIO
File path
sep : character, default ”,”
Field delimiter for the output file.
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
nanRep : None
deprecated, use na_rep
mode : str
Python write mode, default ‘w’
encoding : string, optional
a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
line_terminator : string, default ‘\\n’
The newline character or character sequence to use in the output file
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL
chunksize : int or None
rows to write at a time
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
date_format : string, default None
Format string for datetime objects.
to_dense()

Return dense representation of NDFrame (as opposed to sparse)

to_dict(outtype='dict')

Convert DataFrame to dictionary.

outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.

result : dict like {column -> {index -> value}}

to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)

Write DataFrame to a excel sheet

excel_writer : string or ExcelWriter object
File path or existing ExcelWriter
sheet_name : string, default ‘Sheet1’
Name of sheet which will contain DataFrame
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
startow :
upper left cell row to dump data frame
startcol :
upper left cell column to dump data frame
engine : string, default None
write engine to use - you can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
merge_cells : boolean, default True
Write MultiIndex and Hierarchical Rows as merged cells.

If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:

>>> writer = ExcelWriter('output.xlsx')
>>> df1.to_excel(writer,'Sheet1')
>>> df2.to_excel(writer,'Sheet2')
>>> writer.save()
to_gbq(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)

Write a DataFrame to a Google BigQuery table.

If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.

destination_table : string
name of table to be written, in the form ‘dataset.tablename’
schema : sequence (optional)
list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
col_order : sequence (optional)
order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.

kwargs are passed to the Client constructor

SchemaMissing :
Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
TableExists :
Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
InvalidSchema :
Raised if the ‘schema’ parameter does not match the provided DataFrame
to_hdf(path_or_buf, key, **kwargs)

activate the HDFStore

path_or_buf : the path (string) or buffer to put the store key : string

indentifier for the group in the store

mode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’

'r'
Read-only; no data can be modified.
'w'
Write; a new file is created (an existing file with the same name would be deleted).
'a'
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
It is similar to 'a', but the file must already exist.
format : ‘fixed(f)|table(t)’, default is ‘fixed’
fixed(f) : Fixed format
Fast writing/reading. Not-appendable, nor searchable
table(t) : Table format
Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
append : boolean, default False
For Table formats, append the input data to the existing
complevel : int, 1-9, default 0
If a complib is specified compression will be applied where possible
complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
If complevel is > 0 apply compression to objects written in the store wherever possible
fletcher32 : bool, default False
If applying compression use the fletcher32 checksum
to_html(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame as an HTML table.

to_html-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table
escape : boolean, default True
Convert the characters <, >, and & to HTML-safe sequences.=
max_rows : int, optional
Maximum number of rows to show before truncating. If None, show all.
max_cols : int, optional
Maximum number of columns to show before truncating. If None, show all.
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_json(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

path_or_buf : the path or buffer to write the result string
if this is None, return a StringIO of the converted string

orient : string

  • Series
    • default is ‘index’
    • allowed values are: {‘split’,’records’,’index’}
  • DataFrame
    • default is ‘columns’
    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
  • The format of the JSON string
    • split : dict like {index -> [index], columns -> [columns], data -> [values]}
    • records : list like [{column -> value}, ... , {column -> value}]
    • index : dict like {index -> {column -> value}}
    • columns : dict like {column -> {index -> value}}
    • values : just the values array
date_format : {‘epoch’, ‘iso’}
Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
double_precision : The number of decimal places to use when encoding
floating point values, default 10.

force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.

same type as input object with filtered info axis

to_latex(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)

Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.

to_latex-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_msgpack(path_or_buf=None, **kwargs)

msgpack (serialize) object to input file path

THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.

path : string File path, buffer-like, or None
if None, return generated string
append : boolean whether to append to an existing msgpack
(default is False)
compress : type of compressor (zlib or blosc), default to None (no
compression)
to_panel()

Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.

Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later

panel : Panel

to_period(freq=None, axis=0, copy=True)

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)

freq : string, default axis : {0, 1}, default 0

The axis to convert (the index by default)
copy : boolean, default True
If False then underlying input data is not copied

ts : TimeSeries with PeriodIndex

to_pickle(path)

Pickle (serialize) object to input file path

path : string
File path
to_records(index=True, convert_datetime64=True)

Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested

index : boolean, default True
Include index in resulting record array, stored in ‘index’ field
convert_datetime64 : boolean, default True
Whether to convert the index to datetime.datetime if it is a DatetimeIndex

y : recarray

to_sparse(fill_value=None, kind='block')

Convert to SparseDataFrame

fill_value : float, default NaN kind : {‘block’, ‘integer’}

y : SparseDataFrame

to_sql(name, con, flavor='sqlite', if_exists='fail', **kwargs)

Write records stored in a DataFrame to a SQL database.

name : str
Name of SQL table

conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’

  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.
to_stata(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)

A class for writing Stata binary dta files from array-like objects

fname : file path or buffer
Where to save the dta file.
convert_dates : dict
Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
encoding : str
Default is latin-1. Note that Stata does not support unicode.
byteorder : str
Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data)
>>> writer.write_file()

Or with dates

>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'})
>>> writer.write_file()
to_string(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame to a console-friendly tabular output.

frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_timestamp(freq=None, how='start', axis=0, copy=True)

Cast to DatetimeIndex of timestamps, at beginning of period

freq : string, default frequency of PeriodIndex
Desired frequency
how : {‘s’, ‘e’, ‘start’, ‘end’}
Convention for converting period to timestamp; start of period vs. end
axis : {0, 1} default 0
The axis to convert (the index by default)
copy : boolean, default True
If false then underlying input data is not copied

df : DataFrame with DatetimeIndex

to_wide(*args, **kwargs)
transpose()

Transpose index and columns

truediv(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

truncate(before=None, after=None, axis=None, copy=True)

Truncates a sorted NDFrame before and/or after some particular dates.

before : date
Truncate before date
after : date
Truncate after date

axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,

return a copy of the truncated section

truncated : type of caller

tshift(periods=1, freq=None, axis=0, **kwds)

Shift the time index, using the index’s frequency if available

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, default None
Increment to use from datetools module or time rule (e.g. ‘EOM’)
axis : int or basestring
Corresponds to the axis that contains the Index

If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown

shifted : NDFrame

tz_convert(tz, axis=0, copy=True)

Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
tz_localize(tz, axis=0, copy=True, infer_dst=False)

Localize tz-naive TimeSeries to target time zone

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition times based on order
unstack(level=-1)

Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)

level : int, string, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name

DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation

from unstack).
>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1
     b   2
two  a   3
     b   4
dtype: float64
>>> s.unstack(level=-1)
     a   b
one  1  2
two  3  4
>>> s.unstack(level=0)
   one  two
a  1   3
b  2   4
>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.
     b  3.
two  a  2.
     b  4.

unstacked : DataFrame or Series

update(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)

Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices

other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
Can choose to replace values other than NA. Return True for values that should be updated
raise_conflict : boolean
If True, will raise an error if the DataFrame and other both contain data in the same place.
values

Numpy representation of NDFrame

var(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased variance over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

variance : Series or DataFrame (if level specified)

where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False

Whether to perform the operation in place on the data

axis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False

try to cast the result back to the input type (if possible),
raise_on_error : boolean, default True
Whether to raise on invalid data types (e.g. trying to where on strings)

wh : same type as caller

xs(key, axis=0, level=None, copy=True, drop_level=True)

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).

key : object
Some label contained in the index, or partially in a MultiIndex
axis : int, default 0
Axis to retrieve cross-section on
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
copy : boolean, default True
Whether to make a copy of the data
drop_level : boolean, default True
If False, returns object with same levels as self.
>>> df
   A  B  C
a  4  5  2
b  4  0  9
c  9  7  3
>>> df.xs('a')
A    4
B    5
C    2
Name: a
>>> df.xs('C', axis=1)
a    2
b    9
c    3
Name: C
>>> s = df.xs('a', copy=False)
>>> s['A'] = 100
>>> df
     A  B  C
a  100  5  2
b    4  0  9
c    9  7  3
>>> df
                    A  B  C  D
first second third
bar   one    1      4  1  8  9
      two    1      7  5  5  0
baz   one    1      6  6  8  0
      three  2      5  3  5  3
>>> df.xs(('baz', 'three'))
       A  B  C  D
third
2      5  3  5  3
>>> df.xs('one', level=1)
             A  B  C  D
first third
bar   1      4  1  8  9
baz   1      6  6  8  0
>>> df.xs(('baz', 2), level=[0, 'third'])
        A  B  C  D
second
three   5  3  5  3

xs : Series or DataFrame

class Fred2.Core.Result.EpitopePredictionResult(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: Fred2.Core.Result.AResult

A EpitopePredictionResult object is a DataFrame with multi-indexing, where column Ids are the prediction model (i.e HLA Allele for epitope prediction), row ID the target of the prediction (i.e. Peptide) and the second row ID the predictor (e.g. BIMAS)

EpitopePredictionResult

Peptide Obj Method Name Allele1 Obj Allele2 Obj Allele3 Obj
Peptide1 Method 1 0.324 0.56 0.013
Method 2 20 15 23
Peptide2 Method 1 0.50 0.36 0.98
Method 2 26 10 50
T

Transpose index and columns

abs()

Return an object with absolute value taken. Only applicable to objects that are all numeric

abs: type of caller

add(other, axis='columns', level=None, fill_value=None)

Binary operator add with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

add_prefix(prefix)

Concatenate prefix string with panel items names.

prefix : string

with_prefix : type of caller

add_suffix(suffix)

Concatenate suffix string with panel items names

suffix : string

with_suffix : type of caller

align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)

Align two object on their axes with the specified join method for each axis Index

other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None

Align on index (0), columns (1), or both (None)
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
copy : boolean, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value

method : str, default None limit : int, default None fill_axis : {0, 1}, default 0

Filling axis, method and limit
(left, right) : (type of input, type of other)
Aligned objects
all(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

any(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

append(other, ignore_index=False, verify_integrity=False)

Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.

other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False

If True do not use the index labels. Useful for gluing together record arrays
verify_integrity : boolean, default False
If True, raise ValueError on creating index with duplicates

If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged

appended : DataFrame

apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

func : function
Function to apply to each column/row
axis : {0, 1}
  • 0 : apply function to each column
  • 1 : apply function to each row
broadcast : boolean, default False
For aggregation functions, return object of same size with values propagated
reduce : boolean or None, default None
Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
raw : boolean, default False
If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
args : tuple
Positional arguments to pass to function in addition to the array/series

Additional keyword arguments will be passed as keywords to the function

>>> df.apply(numpy.sqrt) # returns DataFrame
>>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
>>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)

DataFrame.applymap: For elementwise operations

applied : Series or DataFrame

applymap(func)

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

func : function
Python function, returns a single value from a single value

applied : DataFrame

DataFrame.apply : For operations on rows/columns

as_blocks(columns=None)

Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.

are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
as_matrix)
columns : array-like
Specific column order

values : a list of Object

as_matrix(columns=None)

Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtype will be a lower-common-denominator dtype (implicit

upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks

e.g. if the dtypes are float16,float32 -> float32
float16,float32,float64 -> float64 int32,uint8 -> int32
values : ndarray
If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
asfreq(freq, method=None, how=None, normalize=False)

Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.

freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method
how : {‘start’, ‘end’}, default end
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight

converted : type of caller

astype(dtype, copy=True, raise_on_error=True)

Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)

dtype : numpy.dtype or Python type raise_on_error : raise on invalid input

casted : type of caller

at
at_time(time, asof=False)

Select values at particular time of day (e.g. 9:30AM)

time : datetime.time or string

values_at_time : type of caller

axes
between_time(start_time, end_time, include_start=True, include_end=True)

Select values between particular times of the day (e.g., 9:00-9:30 AM)

start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True

values_between_time : type of caller

bfill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’bfill’)

blocks

Internal property, property synonym for as_blocks()

bool()

Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False

Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)

Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns

data : DataFrame column : column names or list of names, or vector

Can be any valid input to groupby
by : string or sequence
Column in the DataFrame to group by

ax : matplotlib axis object, default None fontsize : int or string rot : int, default None

Rotation for ticks
grid : boolean, default None (matlab style default)
Axis grid lines

ax : matplotlib.axes.AxesSubplot

clip(lower=None, upper=None, out=None)

Trim values at input threshold(s)

lower : float, default None upper : float, default None

clipped : Series

clip_lower(threshold)

Return copy of the input with values below given value truncated

clip

clipped : same type as input

clip_upper(threshold)

Return copy of input with values above given value truncated

clip

clipped : same type as input

combine(other, func, fill_value=None, overwrite=True)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame

result : DataFrame

combineAdd(other)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combineMult(other)

Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combine_first(other)

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

other : DataFrame

a’s values prioritized, use values from b to fill holes:

>>> a.combine_first(b)

combined : DataFrame

compound(axis=None, skipna=None, level=None, **kwargs)

Return the compound percentage of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

compounded : Series or DataFrame (if level specified)

consolidate(inplace=False)

Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user

inplace : boolean, default False
If False return new object, otherwise modify existing object

consolidated : type of caller

convert_objects(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)

Attempt to infer better dtype for object columns

convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
force conversion (and non-convertibles get NaT)
convert_numeric : if True attempt to coerce to numbers (including
strings), non-convertibles get NaN
convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
force conversion (and non-convertibles get NaT)

copy : Boolean, if True, return copy, default is True

converted : asm as input object

copy(deep=True)

Make a copy of this object

deep : boolean, default True
Make a deep copy, i.e. also copy data

copy : type of caller

corr(method='pearson', min_periods=1)

Compute pairwise correlation of columns, excluding NA/null values

method : {‘pearson’, ‘kendall’, ‘spearman’}
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation

y : DataFrame

corrwith(other, axis=0, drop=False)

Compute pairwise correlation between rows or columns of two DataFrame objects.

other : DataFrame axis : {0, 1}

0 to compute column-wise, 1 for row-wise
drop : boolean, default False
Drop missing indices from result, default returns union of all

correls : Series

count(axis=0, level=None, numeric_only=False)

Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)

axis : {0, 1}
0 for row-wise, 1 for column-wise
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
numeric_only : boolean, default False
Include only float, int, boolean data

count : Series (or DataFrame if level specified)

cov(min_periods=None)

Compute pairwise covariance of columns, excluding NA/null values

min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result.

y : DataFrame

y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).

cummax(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative max over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

max : Series

cummin(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative min over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

min : Series

cumprod(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative prod over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

prod : Series

cumsum(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative sum over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

sum : Series

delevel(*args, **kwargs)
describe(percentile_width=50)

Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles

percentile_width : float, optional
width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75

DataFrame of summary statistics

diff(periods=1)

1st discrete difference of object

periods : int, default 1
Periods to shift for forming difference

diffed : DataFrame

div(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

divide(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

dot(other)

Matrix multiplication with DataFrame or Series objects

other : DataFrame or Series

dot_product : DataFrame or Series

drop(labels, axis=0, level=None, inplace=False, **kwargs)

Return new object with labels in requested axis removed

labels : single label or list-like axis : int or axis name level : int or name, default None

For MultiIndex
inplace : bool, default False
If True, do operation inplace and return None.

dropped : type of caller

drop_duplicates(cols=None, take_last=False, inplace=False)

Return DataFrame with duplicate rows removed, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

deduplicated : DataFrame

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Return object with labels on given axis omitted where alternately any or all of the data are missing

axis : {0, 1}, or tuple/list thereof
Pass tuple or list to drop on multiple axes
how : {‘any’, ‘all’}
  • any : if any NA values are present, drop that label
  • all : if all values are NA, drop that label
thresh : int, default None
int value : require that many non-NA values
subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
inplace : boolean, defalt False
If True, do operation inplace and return None.

dropped : DataFrame

dtypes

Return the dtypes in this object

duplicated(cols=None, take_last=False)

Return boolean Series denoting duplicate rows, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row

duplicated : Series

empty

True if NDFrame is entirely empty [no items]

eq(other, axis='columns', level=None)

Wrapper for flexible comparison methods eq

equals(other)

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

eval(expr, **kwargs)

Evaluate an expression in the context of the calling DataFrame instance.

expr : string
The expression string to evaluate.
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

ret : ndarray, scalar, or pandas object

pandas.DataFrame.query pandas.eval

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.eval('a + b')
>>> df.eval('c=a + b')
ffill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’ffill’)

fillna(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)

Fill NA/NaN values using the specified method

method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
value : scalar, dict, or Series
Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
inplace : boolean, default False
If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
limit : int, default None
Maximum size gap to forward or backward fill
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)

reindex, asfreq

filled : same type as caller

filter(items=None, like=None, regex=None, axis=None)

Restrict the info axis to set of items or wildcard

items : list-like
List of info axis to restrict to (must not all be present)
like : string
Keep info axis where “arg in col == True”
regex : string (regular expression)
Keep info axis with re.search(regex, col) == True

Arguments are mutually exclusive, but this is not checked for

filter_result(expressions)

Filters a result data frame based on a specified expression consisting of a list of triple with (method_name, comparator, threshold). The expression is applied to each row. If any of the columns fulfill the criteria the row remains.

Parameters:expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold)
Returns:Filtered result object
Return type:EpitopePredictionResult
first(offset)

Convenience method for subsetting initial periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘10D’) -> First 10 days

subset : type of caller

first_valid_index()

Return label for first non-NA/null value

floordiv(other, axis='columns', level=None, fill_value=None)

Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

from_csv(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)

Read delimited file into DataFrame

path : string file path or file handle / StringIO header : int, default 0

Row to use at header (skip prior rows)
sep : string, default ‘,’
Field delimiter
index_col : int or sequence, default 0
Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
parse_dates : boolean, default True
Parse dates. Different default from read_table
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
infer_datetime_format: boolean, default False
If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.

Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data

y : DataFrame

from_dict(data, orient='columns', dtype=None)

Construct DataFrame from dict of array-like or dicts

data : dict
{field : array-like} or {field : dict}
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

DataFrame

from_items(items, columns=None, orient='columns')

Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.

items : sequence of (key, value) pairs
Values should be arrays or Series.
columns : sequence of column labels, optional
Must be passed if orient=’index’.
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.

frame : DataFrame

from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

Convert structured or record ndarray to DataFrame

data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like

Field of array to use as the index, alternately a specific set of input labels to use
exclude : sequence, default None
Columns or fields to exclude
columns : sequence, default None
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
coerce_float : boolean, default False
Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

df : DataFrame

ftypes

Return the ftypes (indication of sparse/dense and dtype) in this object.

ge(other, axis='columns', level=None)

Wrapper for flexible comparison methods ge

get(key, default=None)

Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found

key : object

value : type of items contained in object

get_dtype_counts()

Return the counts of dtypes in this object

get_ftype_counts()

Return the counts of ftypes in this object

get_value(index, col)

Quickly retrieve single value at passed column and index

index : row label col : column label

value : scalar value

get_values()

same as values (but handles sparseness conversions)

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)

Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns

by : mapping function / list of functions, dict, Series, or tuple /
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

axis : int, default 0 level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels
as_index : boolean, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
sort : boolean, default True
Sort group keys. Get better performance by turning this off
group_keys : boolean, default True
When calling apply, add group keys to index to identify pieces
squeeze : boolean, default False
reduce the dimensionaility of the return type if possible, otherwise return a consistent type

# DataFrame result >>> data.groupby(func, axis=0).mean()

# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()

# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()

GroupBy object

gt(other, axis='columns', level=None)

Wrapper for flexible comparison methods gt

head(n=5)

Returns first n rows

hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)

Draw histogram of the DataFrame’s series using matplotlib / pylab.

data : DataFrame column : string or sequence

If passed, will be used to limit data to a subset of columns
by : object, optional
If passed, then used to form histograms for separate groups
grid : boolean, default True
Whether to show axis grid lines
xlabelsize : int, default None
If specified changes the x-axis label size
xrot : float, default None
rotation of x axis labels
ylabelsize : int, default None
If specified changes the y-axis label size
yrot : float, default None
rotation of y axis labels

ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple

The size of the figure to create in inches by default

layout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments

To be passed to hist function
iat
icol(i)
idxmax(axis=0, skipna=True)

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be first index.

idxmax : Series

This method is the DataFrame version of ndarray.argmax.

Series.idxmax

idxmin(axis=0, skipna=True)

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

idxmin : Series

This method is the DataFrame version of ndarray.argmin.

Series.idxmin

iget_value(i, j)
iloc
info(verbose=True, buf=None, max_cols=None)

Concise summary of a DataFrame.

verbose : boolean, default True
If False, don’t print column count summary

buf : writable buffer, defaults to sys.stdout max_cols : int, default None

Determines whether full summary or short summary is printed
insert(loc, column, value, allow_duplicates=False)

Insert column into DataFrame at specified location.

If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.

loc : int
Must have 0 <= loc <= len(columns)

column : object value : int, Series, or array-like

interpolate(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)

Interpolate values according to different methods.

method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
  • ‘linear’: ignore the index and treat the values as equally spaced. default
  • ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
  • ‘index’: use the actual numerical values of the index
  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill.
inplace : bool, default False
Update the NDFrame in place if possible.
downcast : optional, ‘infer’ or None, defaults to ‘infer’
Downcast dtypes if possible.

Series or DataFrame of same shape interpolated at the NaNs

reindex, replace, fillna

# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64

irow(i, copy=False)
is_copy = None
isin(values)

Return boolean DataFrame showing whether each element in the DataFrame is contained in values.

values : iterable, Series, DataFrame or dictionary
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

DataFrame of booleans

When values is a list:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> df.isin([1, 3, 12, 'a'])
       A      B
0   True   True
1  False  False
2   True  False

When values is a dict:

>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
>>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
       A      B
0   True  False  # Note that B didn't match the 1 here.
1  False   True
2   True   True

When values is a Series or DataFrame:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']})
>>> df.isin(other)
       A      B
0   True  False
1  False  False  # Column A in `other` has a 3, but not at index 1.
2   True   True
isnull()

Return a boolean same-sized object indicating if the values are null

iteritems()

Iterator over (column, series) pairs

iterkv(*args, **kwargs)

iteritems alias used to get around 2to3. Deprecated

iterrows()

Iterate over rows of DataFrame as (index, Series) pairs.

  • iterrows does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

    >>> df = DataFrame([[1, 1.0]], columns=['x', 'y'])
    >>> row = next(df.iterrows())[1]
    >>> print(row['x'].dtype)
    float64
    >>> print(df['x'].dtype)
    int64
    
it : generator
A generator that iterates over the rows of the frame.
itertuples(index=True)

Iterate over rows of DataFrame as tuples, with index value as first element of the tuple

ix
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

other : DataFrame, Series with name field set, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
on : column name, tuple/list of column names, or array-like
Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
how : {‘left’, ‘right’, ‘outer’, ‘inner’}

How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise

  • left: use calling frame’s index
  • right: use input frame’s index
  • outer: form union of indexes
  • inner: use intersection of indexes
lsuffix : string
Suffix to use from left frame’s overlapping columns
rsuffix : string
Suffix to use from right frame’s overlapping columns
sort : boolean, default False
Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame

on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects

joined : DataFrame

keys()

Get the ‘info axis’ (see Indexing for more)

This is index for Series, columns for DataFrame and major_axis for Panel.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

last(offset)

Convenience method for subsetting final periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘5M’) -> Last 5 months

subset : type of caller

last_valid_index()

Return label for last non-NA/null value

le(other, axis='columns', level=None)

Wrapper for flexible comparison methods le

load(path)

Deprecated. Use read_pickle instead.

loc
lookup(row_labels, col_labels)

Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

row_labels : sequence
The row labels to use for lookup
col_labels : sequence
The column labels to use for lookup

Akin to:

result = []
for row, col in zip(row_labels, col_labels):
    result.append(df.get_value(row, col))
values : ndarray
The found values
lt(other, axis='columns', level=None)

Wrapper for flexible comparison methods lt

mad(axis=None, skipna=None, level=None, **kwargs)

Return the mean absolute deviation of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mad : Series or DataFrame (if level specified)

mask(cond)

Returns copy whose values are replaced with nan if the inverted condition is True

cond : boolean NDFrame or array

wh: same as input

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the maximum of the values in the object. If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

max : Series or DataFrame (if level specified)

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mean : Series or DataFrame (if level specified)

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

median : Series or DataFrame (if level specified)

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)

Merge DataFrame objects by performing a database-style join operation by columns or indexes.

If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’

  • left: use only keys from left frame (SQL: left outer join)
  • right: use only keys from right frame (SQL: right outer join)
  • outer: use union of keys from both frames (SQL: full outer join)
  • inner: use intersection of keys from both frames (SQL: inner join)
on : label or list
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per left_on docs
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
right_index : boolean, default False
Use the index from the right DataFrame as the join key. Same caveats as left_index
sort : boolean, default False
Sort the join keys lexicographically in the result DataFrame
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right side, respectively
copy : boolean, default True
If False, do not copy data unnecessarily
>>> A              >>> B
    lkey value         rkey value
0   foo  1         0   foo  5
1   bar  2         1   bar  6
2   baz  3         2   qux  7
3   foo  4         3   bar  8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer')
   lkey  value_x  rkey  value_y
0  bar   2        bar   6
1  bar   2        bar   8
2  baz   3        NaN   NaN
3  foo   1        foo   5
4  foo   4        foo   5
5  NaN   NaN      qux   7

merged : DataFrame

merge_results(others)

Merges results of type EpitopePredictionResult and returns the merged result

Parameters:others (list(EpitopePredictionResult)/EpitopePredictionResult) – Another (list of) :class:`~Fred2.Core.Result.EpitopePredictionResult`(s)
Returns:A new merged EpitopePredictionResult object
Return type:EpitopePredictionResult
min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the minimum of the values in the object. If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

min : Series or DataFrame (if level specified)

mod(other, axis='columns', level=None, fill_value=None)

Binary operator mod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

mode(axis=0, numeric_only=False)

Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.

axis : {0, 1, ‘index’, ‘columns’} (default 0)
  • 0/’index’ : get mode of each column
  • 1/’columns’ : get mode of each row
numeric_only : boolean, default False
if True, only apply to numeric columns

modes : DataFrame (sorted)

mul(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

multiply(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

ndim

Number of axes / array dimensions

ne(other, axis='columns', level=None)

Wrapper for flexible comparison methods ne

notnull()

Return a boolean same-sized object indicating if the values are not null

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwds)

Percent change over given number of periods

periods : int, default 1
Periods to shift for forming percent change
fill_method : str, default ‘pad’
How to handle NAs before computing percent changes
limit : int, default None
The number of consecutive NAs to fill before stopping
freq : DateOffset, timedelta, or offset alias string, optional
Increment to use from time series API (e.g. ‘M’ or BDay())

chg : same type as caller

pivot(index=None, columns=None, values=None)

Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)

index : string or object
Column name to use to make new frame’s index
columns : string or object
Column name to use to make new frame’s columns
values : string or object, optional
Column name to use for populating new frame’s values

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods

>>> df
    foo   bar  baz
0   one   A    1.
1   one   B    2.
2   one   C    3.
3   two   A    4.
4   two   B    5.
5   two   C    6.
>>> df.pivot('foo', 'bar', 'baz')
     A   B   C
one  1   2   3
two  4   5   6
>>> df.pivot('foo', 'bar')['baz']
     A   B   C
one  1   2   3
two  4   5   6
pivoted : DataFrame
If no values column specified, will have hierarchically indexed columns
pivot_table(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame

data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on

Keys to group on the x-axis of the pivot table
cols : list of column names or arrays to group on
Keys to group on the y-axis of the pivot table
aggfunc : function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
fill_value : scalar, default None
Value to replace missing values with
margins : boolean, default False
Add all row / columns (e.g. for subtotal / grand totals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', rows=['A', 'B'],
...                     cols=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

table : DataFrame

plot(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)

Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.

frame : DataFrame x : label or position, default None y : label or position, default None

Allows plotting of one column versus another
subplots : boolean, default False
Make separate subplots for each time series
sharex : boolean, default True
In case subplots=True, share x axis
sharey : boolean, default False
In case subplots=True, share y axis
use_index : boolean, default True
Use index as ticks for x axis
stacked : boolean, default False
If True, create stacked bar plot. Only valid for DataFrame input
sort_columns: boolean, default False
Sort column names to determine plot ordering
title : string
Title to use for the plot
grid : boolean, default None (matlab style default)
Axis grid lines
legend : boolean, default True
Place legend on axis subplots

ax : matplotlib axis object, default None style : list or dict

matplotlib line style per column
kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
logx : boolean, default False
For line plots, use log scaling on x axis
logy : boolean, default False
For line plots, use log scaling on y axis
xticks : sequence
Values to use for the xticks
yticks : sequence
Values to use for the yticks

xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None

Rotation for ticks
secondary_y : boolean or sequence, default False
Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
mark_right: boolean, default True
When using a secondary_y axis, should the legend label the axis of the various columns automatically
colormap : str or matplotlib colormap object, default None
Colormap to select colors from. If string, load colormap with that name from matplotlib.
kwds : keywords
Options to pass to matplotlib plotting method

ax_or_axes : matplotlib.AxesSubplot or list of them

pop(item)

Return item and drop from frame. Raise KeyError if not found.

pow(other, axis='columns', level=None, fill_value=None)

Binary operator pow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

prod(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

quantile(q=0.5, axis=0, numeric_only=True)

Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats

q : quantile, default 0.5 (50% quantile)
0 <= q <= 1
axis : {0, 1}
0 for row-wise, 1 for column-wise

quantiles : Series

query(expr, **kwargs)

Query the columns of a frame with a boolean expression.

expr : string
The query string to evaluate. The result of the evaluation of this expression is first passed to loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to __getitem__().
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

q : DataFrame or Series

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The index and columns attributes of the DataFrame instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for this variable, and you can also use the name of the index to identify it in a query.

For further details and examples see the query documentation in indexing.

pandas.eval DataFrame.eval

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.query('a > b')
>>> df[df.a > df.b]  # same result as the previous expression
radd(other, axis='columns', level=None, fill_value=None)

Binary operator radd with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)

Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values

axis : {0, 1}, default 0
Ranks over columns (0) or rows (1)
numeric_only : boolean, default None
Include only float, int, boolean data
method : {‘average’, ‘min’, ‘max’, ‘first’}
  • average: average rank of group
  • min: lowest rank in group
  • max: highest rank in group
  • first: ranks assigned in order they appear in the array
na_option : {‘keep’, ‘top’, ‘bottom’}
  • keep: leave NA values where they are
  • top: smallest rank if ascending
  • bottom: smallest rank if descending
ascending : boolean, default True
False for ranks by high (1) to low (N)

ranks : DataFrame

rdiv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

reindex(index=None, columns=None, **kwargs)

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index, columns : array-like, optional (can be specified in order, or as
keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value
limit : int, default None
Maximum size gap to forward or backward fill
takeable : boolean, default False
treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])

reindexed : DataFrame

reindex_axis(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)

Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index : array-like, optional
New labels / index to conform to. Preferably an Index object to avoid duplicating data

axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
limit : int, default None
Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)

reindex, reindex_like

reindexed : DataFrame

reindex_like(other, method=None, copy=True, limit=None)

return an object with matching indicies to myself

other : Object method : string or None copy : boolean, default True limit : int, default None

Maximum size gap to forward or backward fill
Like calling s.reindex(index=other.index, columns=other.columns,
method=...)

reindexed : same as input

rename(index=None, columns=None, **kwargs)

Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

index, columns : dict-like or function, optional
Transformation to apply to that axis values
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is ignored.

renamed : DataFrame (new object)

rename_axis(mapper, axis=0, copy=True, inplace=False)

Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True

Also copy underlying data

inplace : boolean, default False

renamed : type of caller

reorder_levels(order, axis=0)

Rearrange index levels using input order. May not drop or duplicate levels

order : list of int or list of str
List representing new level order. Reference level by number (position) or by key (label).
axis : int
Where to reorder levels.

type of caller (new object)

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)

Replace values given in ‘to_replace’ with ‘value’.

to_replace : str, regex, list, dict, Series, numeric, or None

  • str or regex:

    • str: string exactly matching to_replace will be replaced with value
    • regex: regexs matching to_replace will be replaced with value
  • list of str, regex, or numeric:

    • First, if to_replace and value are both lists, they must be the same length.
    • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
    • str and regex rules apply as above.
  • dict:

    • Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
    • Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
  • None:

    • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

value : scalar, dict, list, str, regex, default None
Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
inplace : boolean, default False
If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill
regex : bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Otherwise, to_replace must be None because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions.
method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
The method to use when for replacement, when to_replace is a list.

NDFrame.reindex NDFrame.asfreq NDFrame.fillna

filled : NDFrame

AssertionError
  • If regex is not a bool and to_replace is not None.
TypeError
  • If to_replace is a dict and value is not a list, dict, ndarray, or Series
  • If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
ValueError
  • If to_replace and value are list s or ndarray s, but they are not the same length.
  • Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.
  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)

Convenience method for frequency conversion and resampling of regular time-series data.

rule : string
the offset string or object representing target conversion
how : string
method for down- or re-sampling, default to ‘mean’ for downsampling

axis : int, optional, default 0 fill_method : string, default None

fill_method for upsampling
closed : {‘right’, ‘left’}
Which side of bin interval is closed
label : {‘right’, ‘left’}
Which bin edge label to label bucket with

convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta

Adjust the resampled time labels
limit : int, default None
Maximum size gap to when reindexing with fill_method
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.

level : int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default
drop : boolean, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
col_level : int or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fill : object, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

resetted : DataFrame

rfloordiv(other, axis='columns', level=None, fill_value=None)

Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmod(other, axis='columns', level=None, fill_value=None)

Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmul(other, axis='columns', level=None, fill_value=None)

Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rpow(other, axis='columns', level=None, fill_value=None)

Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rsub(other, axis='columns', level=None, fill_value=None)

Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rtruediv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

save(path)

Deprecated. Use to_pickle instead

select(crit, axis=0)

Return data corresponding to axis labels matching criteria

crit : function
To be called on each index (label). Should return True or False

axis : int

selection : type of caller

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.

keys : column label or list of column labels / arrays drop : boolean, default True

Delete columns to be used as the new index
append : boolean, default False
Whether to append columns to existing index
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

dataframe : DataFrame

set_value(index, col, value)

Put single value at passed column and index

index : row label col : column label value : scalar value

frame : DataFrame
If label pair is contained, will be reference to calling DataFrame, otherwise a new object
shape
shift(periods=1, freq=None, axis=0, **kwds)

Shift index by desired number of periods with an optional time freq

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from datetools module or time rule (e.g. ‘EOM’)

If freq is specified then the index values are shifted but the data if not realigned

shifted : same type as caller

skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased skew over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

skew : Series or DataFrame (if level specified)

sort(columns=None, column=None, axis=0, ascending=True, inplace=False)

Sort DataFrame either by labels (along either axis) or by the values in column(s)

columns : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
axis : {0, 1}
Sort index/rows versus columns
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])

sorted : DataFrame

sort_index(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')

Sort DataFrame either by labels (along either axis) or by the values in a column

axis : {0, 1}
Sort index/rows versus columns
by : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])

sorted : DataFrame

sortlevel(level=0, axis=0, ascending=True, inplace=False)

Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)

level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False

Sort the DataFrame without creating a new instance

sorted : DataFrame

squeeze()

squeeze length 1 dimensions

stack(level=-1, dropna=True)

Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.

level : int, string, or list of these, default last level
Level(s) to stack, can pass level name
dropna : boolean, default True
Whether to drop rows in the resulting Frame/Series with no valid values
>>> s
     a   b
one  1.  2.
two  3.  4.
>>> s.stack()
one a    1
    b    2
two a    3
    b    4

stacked : DataFrame or Series

std(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased standard deviation over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

stdev : Series or DataFrame (if level specified)

sub(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

subtract(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the sum of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

sum : Series or DataFrame (if level specified)

swapaxes(axis1, axis2, copy=True)

Interchange axes and swap values axes appropriately

y : same as input

swaplevel(i, j, axis=0)

Swap levels i and j in a MultiIndex on a particular axis

i, j : int, string (can be mixed)
Level of index to be swapped. Can pass level name as string.

swapped : type of caller (new object)

tail(n=5)

Returns last n rows

take(indices, axis=0, convert=True, is_copy=True)

Analogous to ndarray.take

indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy

taken : type of caller

to_clipboard(excel=None, sep=None, **kwargs)

Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.

excel : boolean, defaults to True
if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard

sep : optional, defaults to tab other keywords are passed to to_csv

Requirements for your platform
  • Linux: xclip, or xsel (with gtk or PyQt4 modules)
  • Windows: none
  • OS X: none
to_csv(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)

Write DataFrame to a comma-separated values (csv) file

path_or_buf : string or file handle / StringIO
File path
sep : character, default ”,”
Field delimiter for the output file.
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
nanRep : None
deprecated, use na_rep
mode : str
Python write mode, default ‘w’
encoding : string, optional
a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
line_terminator : string, default ‘\\n’
The newline character or character sequence to use in the output file
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL
chunksize : int or None
rows to write at a time
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
date_format : string, default None
Format string for datetime objects.
to_dense()

Return dense representation of NDFrame (as opposed to sparse)

to_dict(outtype='dict')

Convert DataFrame to dictionary.

outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.

result : dict like {column -> {index -> value}}

to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)

Write DataFrame to a excel sheet

excel_writer : string or ExcelWriter object
File path or existing ExcelWriter
sheet_name : string, default ‘Sheet1’
Name of sheet which will contain DataFrame
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
startow :
upper left cell row to dump data frame
startcol :
upper left cell column to dump data frame
engine : string, default None
write engine to use - you can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
merge_cells : boolean, default True
Write MultiIndex and Hierarchical Rows as merged cells.

If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:

>>> writer = ExcelWriter('output.xlsx')
>>> df1.to_excel(writer,'Sheet1')
>>> df2.to_excel(writer,'Sheet2')
>>> writer.save()
to_gbq(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)

Write a DataFrame to a Google BigQuery table.

If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.

destination_table : string
name of table to be written, in the form ‘dataset.tablename’
schema : sequence (optional)
list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
col_order : sequence (optional)
order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.

kwargs are passed to the Client constructor

SchemaMissing :
Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
TableExists :
Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
InvalidSchema :
Raised if the ‘schema’ parameter does not match the provided DataFrame
to_hdf(path_or_buf, key, **kwargs)

activate the HDFStore

path_or_buf : the path (string) or buffer to put the store key : string

indentifier for the group in the store

mode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’

'r'
Read-only; no data can be modified.
'w'
Write; a new file is created (an existing file with the same name would be deleted).
'a'
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
It is similar to 'a', but the file must already exist.
format : ‘fixed(f)|table(t)’, default is ‘fixed’
fixed(f) : Fixed format
Fast writing/reading. Not-appendable, nor searchable
table(t) : Table format
Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
append : boolean, default False
For Table formats, append the input data to the existing
complevel : int, 1-9, default 0
If a complib is specified compression will be applied where possible
complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
If complevel is > 0 apply compression to objects written in the store wherever possible
fletcher32 : bool, default False
If applying compression use the fletcher32 checksum
to_html(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame as an HTML table.

to_html-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table
escape : boolean, default True
Convert the characters <, >, and & to HTML-safe sequences.=
max_rows : int, optional
Maximum number of rows to show before truncating. If None, show all.
max_cols : int, optional
Maximum number of columns to show before truncating. If None, show all.
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_json(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

path_or_buf : the path or buffer to write the result string
if this is None, return a StringIO of the converted string

orient : string

  • Series
    • default is ‘index’
    • allowed values are: {‘split’,’records’,’index’}
  • DataFrame
    • default is ‘columns’
    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
  • The format of the JSON string
    • split : dict like {index -> [index], columns -> [columns], data -> [values]}
    • records : list like [{column -> value}, ... , {column -> value}]
    • index : dict like {index -> {column -> value}}
    • columns : dict like {column -> {index -> value}}
    • values : just the values array
date_format : {‘epoch’, ‘iso’}
Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
double_precision : The number of decimal places to use when encoding
floating point values, default 10.

force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.

same type as input object with filtered info axis

to_latex(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)

Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.

to_latex-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_msgpack(path_or_buf=None, **kwargs)

msgpack (serialize) object to input file path

THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.

path : string File path, buffer-like, or None
if None, return generated string
append : boolean whether to append to an existing msgpack
(default is False)
compress : type of compressor (zlib or blosc), default to None (no
compression)
to_panel()

Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.

Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later

panel : Panel

to_period(freq=None, axis=0, copy=True)

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)

freq : string, default axis : {0, 1}, default 0

The axis to convert (the index by default)
copy : boolean, default True
If False then underlying input data is not copied

ts : TimeSeries with PeriodIndex

to_pickle(path)

Pickle (serialize) object to input file path

path : string
File path
to_records(index=True, convert_datetime64=True)

Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested

index : boolean, default True
Include index in resulting record array, stored in ‘index’ field
convert_datetime64 : boolean, default True
Whether to convert the index to datetime.datetime if it is a DatetimeIndex

y : recarray

to_sparse(fill_value=None, kind='block')

Convert to SparseDataFrame

fill_value : float, default NaN kind : {‘block’, ‘integer’}

y : SparseDataFrame

to_sql(name, con, flavor='sqlite', if_exists='fail', **kwargs)

Write records stored in a DataFrame to a SQL database.

name : str
Name of SQL table

conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’

  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.
to_stata(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)

A class for writing Stata binary dta files from array-like objects

fname : file path or buffer
Where to save the dta file.
convert_dates : dict
Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
encoding : str
Default is latin-1. Note that Stata does not support unicode.
byteorder : str
Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data)
>>> writer.write_file()

Or with dates

>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'})
>>> writer.write_file()
to_string(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame to a console-friendly tabular output.

frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_timestamp(freq=None, how='start', axis=0, copy=True)

Cast to DatetimeIndex of timestamps, at beginning of period

freq : string, default frequency of PeriodIndex
Desired frequency
how : {‘s’, ‘e’, ‘start’, ‘end’}
Convention for converting period to timestamp; start of period vs. end
axis : {0, 1} default 0
The axis to convert (the index by default)
copy : boolean, default True
If false then underlying input data is not copied

df : DataFrame with DatetimeIndex

to_wide(*args, **kwargs)
transpose()

Transpose index and columns

truediv(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

truncate(before=None, after=None, axis=None, copy=True)

Truncates a sorted NDFrame before and/or after some particular dates.

before : date
Truncate before date
after : date
Truncate after date

axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,

return a copy of the truncated section

truncated : type of caller

tshift(periods=1, freq=None, axis=0, **kwds)

Shift the time index, using the index’s frequency if available

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, default None
Increment to use from datetools module or time rule (e.g. ‘EOM’)
axis : int or basestring
Corresponds to the axis that contains the Index

If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown

shifted : NDFrame

tz_convert(tz, axis=0, copy=True)

Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
tz_localize(tz, axis=0, copy=True, infer_dst=False)

Localize tz-naive TimeSeries to target time zone

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition times based on order
unstack(level=-1)

Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)

level : int, string, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name

DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation

from unstack).
>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1
     b   2
two  a   3
     b   4
dtype: float64
>>> s.unstack(level=-1)
     a   b
one  1  2
two  3  4
>>> s.unstack(level=0)
   one  two
a  1   3
b  2   4
>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.
     b  3.
two  a  2.
     b  4.

unstacked : DataFrame or Series

update(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)

Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices

other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
Can choose to replace values other than NA. Return True for values that should be updated
raise_conflict : boolean
If True, will raise an error if the DataFrame and other both contain data in the same place.
values

Numpy representation of NDFrame

var(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased variance over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

variance : Series or DataFrame (if level specified)

where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False

Whether to perform the operation in place on the data

axis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False

try to cast the result back to the input type (if possible),
raise_on_error : boolean, default True
Whether to raise on invalid data types (e.g. trying to where on strings)

wh : same type as caller

xs(key, axis=0, level=None, copy=True, drop_level=True)

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).

key : object
Some label contained in the index, or partially in a MultiIndex
axis : int, default 0
Axis to retrieve cross-section on
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
copy : boolean, default True
Whether to make a copy of the data
drop_level : boolean, default True
If False, returns object with same levels as self.
>>> df
   A  B  C
a  4  5  2
b  4  0  9
c  9  7  3
>>> df.xs('a')
A    4
B    5
C    2
Name: a
>>> df.xs('C', axis=1)
a    2
b    9
c    3
Name: C
>>> s = df.xs('a', copy=False)
>>> s['A'] = 100
>>> df
     A  B  C
a  100  5  2
b    4  0  9
c    9  7  3
>>> df
                    A  B  C  D
first second third
bar   one    1      4  1  8  9
      two    1      7  5  5  0
baz   one    1      6  6  8  0
      three  2      5  3  5  3
>>> df.xs(('baz', 'three'))
       A  B  C  D
third
2      5  3  5  3
>>> df.xs('one', level=1)
             A  B  C  D
first third
bar   1      4  1  8  9
baz   1      6  6  8  0
>>> df.xs(('baz', 2), level=[0, 'third'])
        A  B  C  D
second
three   5  3  5  3

xs : Series or DataFrame

class Fred2.Core.Result.TAPPredictionResult(data=None, index=None, columns=None, dtype=None, copy=False)

Bases: Fred2.Core.Result.AResult

A TAPPredictionResult object is a pandas.DataFrame with single-indexing, where column Ids are the ` prediction names of the different prediction methods, and row ID the Peptide object

TAPPredictionResult:

Peptide Obj Method Name
Peptide1 -15.34
Peptide2 23.34
T

Transpose index and columns

abs()

Return an object with absolute value taken. Only applicable to objects that are all numeric

abs: type of caller

add(other, axis='columns', level=None, fill_value=None)

Binary operator add with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

add_prefix(prefix)

Concatenate prefix string with panel items names.

prefix : string

with_prefix : type of caller

add_suffix(suffix)

Concatenate suffix string with panel items names

suffix : string

with_suffix : type of caller

align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)

Align two object on their axes with the specified join method for each axis Index

other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None

Align on index (0), columns (1), or both (None)
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
copy : boolean, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value

method : str, default None limit : int, default None fill_axis : {0, 1}, default 0

Filling axis, method and limit
(left, right) : (type of input, type of other)
Aligned objects
all(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

any(axis=None, bool_only=None, skipna=True, level=None, **kwargs)

Return whether any element is True over requested axis. %(na_action)s

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
bool_only : boolean, default None
Only include boolean data.

any : Series (or DataFrame if level specified)

append(other, ignore_index=False, verify_integrity=False)

Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.

other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False

If True do not use the index labels. Useful for gluing together record arrays
verify_integrity : boolean, default False
If True, raise ValueError on creating index with duplicates

If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged

appended : DataFrame

apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

func : function
Function to apply to each column/row
axis : {0, 1}
  • 0 : apply function to each column
  • 1 : apply function to each row
broadcast : boolean, default False
For aggregation functions, return object of same size with values propagated
reduce : boolean or None, default None
Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
raw : boolean, default False
If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
args : tuple
Positional arguments to pass to function in addition to the array/series

Additional keyword arguments will be passed as keywords to the function

>>> df.apply(numpy.sqrt) # returns DataFrame
>>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
>>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)

DataFrame.applymap: For elementwise operations

applied : Series or DataFrame

applymap(func)

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

func : function
Python function, returns a single value from a single value

applied : DataFrame

DataFrame.apply : For operations on rows/columns

as_blocks(columns=None)

Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.

are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
as_matrix)
columns : array-like
Specific column order

values : a list of Object

as_matrix(columns=None)

Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.

NOTE: the dtype will be a lower-common-denominator dtype (implicit

upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks

e.g. if the dtypes are float16,float32 -> float32
float16,float32,float64 -> float64 int32,uint8 -> int32
values : ndarray
If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
asfreq(freq, method=None, how=None, normalize=False)

Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.

freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method
how : {‘start’, ‘end’}, default end
For PeriodIndex only, see PeriodIndex.asfreq
normalize : bool, default False
Whether to reset output index to midnight

converted : type of caller

astype(dtype, copy=True, raise_on_error=True)

Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)

dtype : numpy.dtype or Python type raise_on_error : raise on invalid input

casted : type of caller

at
at_time(time, asof=False)

Select values at particular time of day (e.g. 9:30AM)

time : datetime.time or string

values_at_time : type of caller

axes
between_time(start_time, end_time, include_start=True, include_end=True)

Select values between particular times of the day (e.g., 9:00-9:30 AM)

start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True

values_between_time : type of caller

bfill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’bfill’)

blocks

Internal property, property synonym for as_blocks()

bool()

Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False

Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)

Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns

data : DataFrame column : column names or list of names, or vector

Can be any valid input to groupby
by : string or sequence
Column in the DataFrame to group by

ax : matplotlib axis object, default None fontsize : int or string rot : int, default None

Rotation for ticks
grid : boolean, default None (matlab style default)
Axis grid lines

ax : matplotlib.axes.AxesSubplot

clip(lower=None, upper=None, out=None)

Trim values at input threshold(s)

lower : float, default None upper : float, default None

clipped : Series

clip_lower(threshold)

Return copy of the input with values below given value truncated

clip

clipped : same type as input

clip_upper(threshold)

Return copy of input with values above given value truncated

clip

clipped : same type as input

combine(other, func, fill_value=None, overwrite=True)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame

result : DataFrame

combineAdd(other)

Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combineMult(other)

Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)

other : DataFrame

DataFrame

combine_first(other)

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

other : DataFrame

a’s values prioritized, use values from b to fill holes:

>>> a.combine_first(b)

combined : DataFrame

compound(axis=None, skipna=None, level=None, **kwargs)

Return the compound percentage of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

compounded : Series or DataFrame (if level specified)

consolidate(inplace=False)

Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user

inplace : boolean, default False
If False return new object, otherwise modify existing object

consolidated : type of caller

convert_objects(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)

Attempt to infer better dtype for object columns

convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
force conversion (and non-convertibles get NaT)
convert_numeric : if True attempt to coerce to numbers (including
strings), non-convertibles get NaN
convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
force conversion (and non-convertibles get NaT)

copy : Boolean, if True, return copy, default is True

converted : asm as input object

copy(deep=True)

Make a copy of this object

deep : boolean, default True
Make a deep copy, i.e. also copy data

copy : type of caller

corr(method='pearson', min_periods=1)

Compute pairwise correlation of columns, excluding NA/null values

method : {‘pearson’, ‘kendall’, ‘spearman’}
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation

y : DataFrame

corrwith(other, axis=0, drop=False)

Compute pairwise correlation between rows or columns of two DataFrame objects.

other : DataFrame axis : {0, 1}

0 to compute column-wise, 1 for row-wise
drop : boolean, default False
Drop missing indices from result, default returns union of all

correls : Series

count(axis=0, level=None, numeric_only=False)

Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)

axis : {0, 1}
0 for row-wise, 1 for column-wise
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
numeric_only : boolean, default False
Include only float, int, boolean data

count : Series (or DataFrame if level specified)

cov(min_periods=None)

Compute pairwise covariance of columns, excluding NA/null values

min_periods : int, optional
Minimum number of observations required per pair of columns to have a valid result.

y : DataFrame

y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).

cummax(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative max over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

max : Series

cummin(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative min over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

min : Series

cumprod(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative prod over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

prod : Series

cumsum(axis=None, dtype=None, out=None, skipna=True, **kwargs)

Return cumulative sum over requested axis.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

sum : Series

delevel(*args, **kwargs)
describe(percentile_width=50)

Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles

percentile_width : float, optional
width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75

DataFrame of summary statistics

diff(periods=1)

1st discrete difference of object

periods : int, default 1
Periods to shift for forming difference

diffed : DataFrame

div(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

divide(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

dot(other)

Matrix multiplication with DataFrame or Series objects

other : DataFrame or Series

dot_product : DataFrame or Series

drop(labels, axis=0, level=None, inplace=False, **kwargs)

Return new object with labels in requested axis removed

labels : single label or list-like axis : int or axis name level : int or name, default None

For MultiIndex
inplace : bool, default False
If True, do operation inplace and return None.

dropped : type of caller

drop_duplicates(cols=None, take_last=False, inplace=False)

Return DataFrame with duplicate rows removed, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

deduplicated : DataFrame

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Return object with labels on given axis omitted where alternately any or all of the data are missing

axis : {0, 1}, or tuple/list thereof
Pass tuple or list to drop on multiple axes
how : {‘any’, ‘all’}
  • any : if any NA values are present, drop that label
  • all : if all values are NA, drop that label
thresh : int, default None
int value : require that many non-NA values
subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
inplace : boolean, defalt False
If True, do operation inplace and return None.

dropped : DataFrame

dtypes

Return the dtypes in this object

duplicated(cols=None, take_last=False)

Return boolean Series denoting duplicate rows, optionally only considering certain columns

cols : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
take_last : boolean, default False
Take the last observed row in a row. Defaults to the first row

duplicated : Series

empty

True if NDFrame is entirely empty [no items]

eq(other, axis='columns', level=None)

Wrapper for flexible comparison methods eq

equals(other)

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

eval(expr, **kwargs)

Evaluate an expression in the context of the calling DataFrame instance.

expr : string
The expression string to evaluate.
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

ret : ndarray, scalar, or pandas object

pandas.DataFrame.query pandas.eval

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.eval('a + b')
>>> df.eval('c=a + b')
ffill(axis=0, inplace=False, limit=None, downcast=None)

Synonym for NDFrame.fillna(method=’ffill’)

fillna(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)

Fill NA/NaN values using the specified method

method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
value : scalar, dict, or Series
Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
inplace : boolean, default False
If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
limit : int, default None
Maximum size gap to forward or backward fill
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)

reindex, asfreq

filled : same type as caller

filter(items=None, like=None, regex=None, axis=None)

Restrict the info axis to set of items or wildcard

items : list-like
List of info axis to restrict to (must not all be present)
like : string
Keep info axis where “arg in col == True”
regex : string (regular expression)
Keep info axis with re.search(regex, col) == True

Arguments are mutually exclusive, but this is not checked for

filter_result(expressions)

Filters a result data frame based on a specified expression consisting of a list of triple with (method_name, comparator, threshold). The expression is applied to each row. If any of the columns fulfill the criteria the row remains.

Parameters:expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold)
Returns:A new filtered result object
Return type:TAPPredictionResult
first(offset)

Convenience method for subsetting initial periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘10D’) -> First 10 days

subset : type of caller

first_valid_index()

Return label for first non-NA/null value

floordiv(other, axis='columns', level=None, fill_value=None)

Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

from_csv(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)

Read delimited file into DataFrame

path : string file path or file handle / StringIO header : int, default 0

Row to use at header (skip prior rows)
sep : string, default ‘,’
Field delimiter
index_col : int or sequence, default 0
Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
parse_dates : boolean, default True
Parse dates. Different default from read_table
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
infer_datetime_format: boolean, default False
If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.

Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data

y : DataFrame

from_dict(data, orient='columns', dtype=None)

Construct DataFrame from dict of array-like or dicts

data : dict
{field : array-like} or {field : dict}
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

DataFrame

from_items(items, columns=None, orient='columns')

Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.

items : sequence of (key, value) pairs
Values should be arrays or Series.
columns : sequence of column labels, optional
Must be passed if orient=’index’.
orient : {‘columns’, ‘index’}, default ‘columns’
The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.

frame : DataFrame

from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

Convert structured or record ndarray to DataFrame

data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like

Field of array to use as the index, alternately a specific set of input labels to use
exclude : sequence, default None
Columns or fields to exclude
columns : sequence, default None
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
coerce_float : boolean, default False
Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

df : DataFrame

ftypes

Return the ftypes (indication of sparse/dense and dtype) in this object.

ge(other, axis='columns', level=None)

Wrapper for flexible comparison methods ge

get(key, default=None)

Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found

key : object

value : type of items contained in object

get_dtype_counts()

Return the counts of dtypes in this object

get_ftype_counts()

Return the counts of ftypes in this object

get_value(index, col)

Quickly retrieve single value at passed column and index

index : row label col : column label

value : scalar value

get_values()

same as values (but handles sparseness conversions)

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)

Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns

by : mapping function / list of functions, dict, Series, or tuple /
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

axis : int, default 0 level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels
as_index : boolean, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
sort : boolean, default True
Sort group keys. Get better performance by turning this off
group_keys : boolean, default True
When calling apply, add group keys to index to identify pieces
squeeze : boolean, default False
reduce the dimensionaility of the return type if possible, otherwise return a consistent type

# DataFrame result >>> data.groupby(func, axis=0).mean()

# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()

# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()

GroupBy object

gt(other, axis='columns', level=None)

Wrapper for flexible comparison methods gt

head(n=5)

Returns first n rows

hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)

Draw histogram of the DataFrame’s series using matplotlib / pylab.

data : DataFrame column : string or sequence

If passed, will be used to limit data to a subset of columns
by : object, optional
If passed, then used to form histograms for separate groups
grid : boolean, default True
Whether to show axis grid lines
xlabelsize : int, default None
If specified changes the x-axis label size
xrot : float, default None
rotation of x axis labels
ylabelsize : int, default None
If specified changes the y-axis label size
yrot : float, default None
rotation of y axis labels

ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple

The size of the figure to create in inches by default

layout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments

To be passed to hist function
iat
icol(i)
idxmax(axis=0, skipna=True)

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be first index.

idxmax : Series

This method is the DataFrame version of ndarray.argmax.

Series.idxmax

idxmin(axis=0, skipna=True)

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

idxmin : Series

This method is the DataFrame version of ndarray.argmin.

Series.idxmin

iget_value(i, j)
iloc
info(verbose=True, buf=None, max_cols=None)

Concise summary of a DataFrame.

verbose : boolean, default True
If False, don’t print column count summary

buf : writable buffer, defaults to sys.stdout max_cols : int, default None

Determines whether full summary or short summary is printed
insert(loc, column, value, allow_duplicates=False)

Insert column into DataFrame at specified location.

If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.

loc : int
Must have 0 <= loc <= len(columns)

column : object value : int, Series, or array-like

interpolate(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)

Interpolate values according to different methods.

method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
  • ‘linear’: ignore the index and treat the values as equally spaced. default
  • ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
  • ‘index’: use the actual numerical values of the index
  • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
  • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
axis : {0, 1}, default 0
  • 0: fill column-by-column
  • 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill.
inplace : bool, default False
Update the NDFrame in place if possible.
downcast : optional, ‘infer’ or None, defaults to ‘infer’
Downcast dtypes if possible.

Series or DataFrame of same shape interpolated at the NaNs

reindex, replace, fillna

# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64

irow(i, copy=False)
is_copy = None
isin(values)

Return boolean DataFrame showing whether each element in the DataFrame is contained in values.

values : iterable, Series, DataFrame or dictionary
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

DataFrame of booleans

When values is a list:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> df.isin([1, 3, 12, 'a'])
       A      B
0   True   True
1  False  False
2   True  False

When values is a dict:

>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
>>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
       A      B
0   True  False  # Note that B didn't match the 1 here.
1  False   True
2   True   True

When values is a Series or DataFrame:

>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']})
>>> df.isin(other)
       A      B
0   True  False
1  False  False  # Column A in `other` has a 3, but not at index 1.
2   True   True
isnull()

Return a boolean same-sized object indicating if the values are null

iteritems()

Iterator over (column, series) pairs

iterkv(*args, **kwargs)

iteritems alias used to get around 2to3. Deprecated

iterrows()

Iterate over rows of DataFrame as (index, Series) pairs.

  • iterrows does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

    >>> df = DataFrame([[1, 1.0]], columns=['x', 'y'])
    >>> row = next(df.iterrows())[1]
    >>> print(row['x'].dtype)
    float64
    >>> print(df['x'].dtype)
    int64
    
it : generator
A generator that iterates over the rows of the frame.
itertuples(index=True)

Iterate over rows of DataFrame as tuples, with index value as first element of the tuple

ix
join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

other : DataFrame, Series with name field set, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
on : column name, tuple/list of column names, or array-like
Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
how : {‘left’, ‘right’, ‘outer’, ‘inner’}

How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise

  • left: use calling frame’s index
  • right: use input frame’s index
  • outer: form union of indexes
  • inner: use intersection of indexes
lsuffix : string
Suffix to use from left frame’s overlapping columns
rsuffix : string
Suffix to use from right frame’s overlapping columns
sort : boolean, default False
Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame

on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects

joined : DataFrame

keys()

Get the ‘info axis’ (see Indexing for more)

This is index for Series, columns for DataFrame and major_axis for Panel.

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased kurtosis over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

kurt : Series or DataFrame (if level specified)

last(offset)

Convenience method for subsetting final periods of time series data based on a date offset

offset : string, DateOffset, dateutil.relativedelta

ts.last(‘5M’) -> Last 5 months

subset : type of caller

last_valid_index()

Return label for last non-NA/null value

le(other, axis='columns', level=None)

Wrapper for flexible comparison methods le

load(path)

Deprecated. Use read_pickle instead.

loc
lookup(row_labels, col_labels)

Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

row_labels : sequence
The row labels to use for lookup
col_labels : sequence
The column labels to use for lookup

Akin to:

result = []
for row, col in zip(row_labels, col_labels):
    result.append(df.get_value(row, col))
values : ndarray
The found values
lt(other, axis='columns', level=None)

Wrapper for flexible comparison methods lt

mad(axis=None, skipna=None, level=None, **kwargs)

Return the mean absolute deviation of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mad : Series or DataFrame (if level specified)

mask(cond)

Returns copy whose values are replaced with nan if the inverted condition is True

cond : boolean NDFrame or array

wh: same as input

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the maximum of the values in the object. If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

max : Series or DataFrame (if level specified)

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the mean of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

mean : Series or DataFrame (if level specified)

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the median of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

median : Series or DataFrame (if level specified)

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)

Merge DataFrame objects by performing a database-style join operation by columns or indexes.

If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’

  • left: use only keys from left frame (SQL: left outer join)
  • right: use only keys from right frame (SQL: right outer join)
  • outer: use union of keys from both frames (SQL: full outer join)
  • inner: use intersection of keys from both frames (SQL: inner join)
on : label or list
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per left_on docs
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
right_index : boolean, default False
Use the index from the right DataFrame as the join key. Same caveats as left_index
sort : boolean, default False
Sort the join keys lexicographically in the result DataFrame
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right side, respectively
copy : boolean, default True
If False, do not copy data unnecessarily
>>> A              >>> B
    lkey value         rkey value
0   foo  1         0   foo  5
1   bar  2         1   bar  6
2   baz  3         2   qux  7
3   foo  4         3   bar  8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer')
   lkey  value_x  rkey  value_y
0  bar   2        bar   6
1  bar   2        bar   8
2  baz   3        NaN   NaN
3  foo   1        foo   5
4  foo   4        foo   5
5  NaN   NaN      qux   7

merged : DataFrame

merge_results(others)

Merges results of type :class:`~Fred2.Core.Result.TAPPredictionResult and returns the merged result

Parameters:others (list(TAPPredictionResult) or TAPPredictionResult) – A (list of) TAPPredictionResult object(s)
Returns:A new merged TAPPredictionResult object
Return type:TAPPredictionResult`
min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

This method returns the minimum of the values in the object. If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

min : Series or DataFrame (if level specified)

mod(other, axis='columns', level=None, fill_value=None)

Binary operator mod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

mode(axis=0, numeric_only=False)

Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.

axis : {0, 1, ‘index’, ‘columns’} (default 0)
  • 0/’index’ : get mode of each column
  • 1/’columns’ : get mode of each row
numeric_only : boolean, default False
if True, only apply to numeric columns

modes : DataFrame (sorted)

mul(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

multiply(other, axis='columns', level=None, fill_value=None)

Binary operator mul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

ndim

Number of axes / array dimensions

ne(other, axis='columns', level=None)

Wrapper for flexible comparison methods ne

notnull()

Return a boolean same-sized object indicating if the values are not null

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwds)

Percent change over given number of periods

periods : int, default 1
Periods to shift for forming percent change
fill_method : str, default ‘pad’
How to handle NAs before computing percent changes
limit : int, default None
The number of consecutive NAs to fill before stopping
freq : DateOffset, timedelta, or offset alias string, optional
Increment to use from time series API (e.g. ‘M’ or BDay())

chg : same type as caller

pivot(index=None, columns=None, values=None)

Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)

index : string or object
Column name to use to make new frame’s index
columns : string or object
Column name to use to make new frame’s columns
values : string or object, optional
Column name to use for populating new frame’s values

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods

>>> df
    foo   bar  baz
0   one   A    1.
1   one   B    2.
2   one   C    3.
3   two   A    4.
4   two   B    5.
5   two   C    6.
>>> df.pivot('foo', 'bar', 'baz')
     A   B   C
one  1   2   3
two  4   5   6
>>> df.pivot('foo', 'bar')['baz']
     A   B   C
one  1   2   3
two  4   5   6
pivoted : DataFrame
If no values column specified, will have hierarchically indexed columns
pivot_table(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame

data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on

Keys to group on the x-axis of the pivot table
cols : list of column names or arrays to group on
Keys to group on the y-axis of the pivot table
aggfunc : function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
fill_value : scalar, default None
Value to replace missing values with
margins : boolean, default False
Add all row / columns (e.g. for subtotal / grand totals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', rows=['A', 'B'],
...                     cols=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

table : DataFrame

plot(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)

Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.

frame : DataFrame x : label or position, default None y : label or position, default None

Allows plotting of one column versus another
subplots : boolean, default False
Make separate subplots for each time series
sharex : boolean, default True
In case subplots=True, share x axis
sharey : boolean, default False
In case subplots=True, share y axis
use_index : boolean, default True
Use index as ticks for x axis
stacked : boolean, default False
If True, create stacked bar plot. Only valid for DataFrame input
sort_columns: boolean, default False
Sort column names to determine plot ordering
title : string
Title to use for the plot
grid : boolean, default None (matlab style default)
Axis grid lines
legend : boolean, default True
Place legend on axis subplots

ax : matplotlib axis object, default None style : list or dict

matplotlib line style per column
kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
logx : boolean, default False
For line plots, use log scaling on x axis
logy : boolean, default False
For line plots, use log scaling on y axis
xticks : sequence
Values to use for the xticks
yticks : sequence
Values to use for the yticks

xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None

Rotation for ticks
secondary_y : boolean or sequence, default False
Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
mark_right: boolean, default True
When using a secondary_y axis, should the legend label the axis of the various columns automatically
colormap : str or matplotlib colormap object, default None
Colormap to select colors from. If string, load colormap with that name from matplotlib.
kwds : keywords
Options to pass to matplotlib plotting method

ax_or_axes : matplotlib.AxesSubplot or list of them

pop(item)

Return item and drop from frame. Raise KeyError if not found.

pow(other, axis='columns', level=None, fill_value=None)

Binary operator pow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

prod(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the product of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

prod : Series or DataFrame (if level specified)

quantile(q=0.5, axis=0, numeric_only=True)

Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats

q : quantile, default 0.5 (50% quantile)
0 <= q <= 1
axis : {0, 1}
0 for row-wise, 1 for column-wise

quantiles : Series

query(expr, **kwargs)

Query the columns of a frame with a boolean expression.

expr : string
The query string to evaluate. The result of the evaluation of this expression is first passed to loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to __getitem__().
kwargs : dict
See the documentation for eval() for complete details on the keyword arguments accepted by query().

q : DataFrame or Series

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The index and columns attributes of the DataFrame instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for this variable, and you can also use the name of the index to identify it in a query.

For further details and examples see the query documentation in indexing.

pandas.eval DataFrame.eval

>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.query('a > b')
>>> df[df.a > df.b]  # same result as the previous expression
radd(other, axis='columns', level=None, fill_value=None)

Binary operator radd with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)

Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values

axis : {0, 1}, default 0
Ranks over columns (0) or rows (1)
numeric_only : boolean, default None
Include only float, int, boolean data
method : {‘average’, ‘min’, ‘max’, ‘first’}
  • average: average rank of group
  • min: lowest rank in group
  • max: highest rank in group
  • first: ranks assigned in order they appear in the array
na_option : {‘keep’, ‘top’, ‘bottom’}
  • keep: leave NA values where they are
  • top: smallest rank if ascending
  • bottom: smallest rank if descending
ascending : boolean, default True
False for ranks by high (1) to low (N)

ranks : DataFrame

rdiv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

reindex(index=None, columns=None, **kwargs)

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index, columns : array-like, optional (can be specified in order, or as
keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any “compatible” value
limit : int, default None
Maximum size gap to forward or backward fill
takeable : boolean, default False
treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])

reindexed : DataFrame

reindex_axis(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)

Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

index : array-like, optional
New labels / index to conform to. Preferably an Index object to avoid duplicating data

axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level
limit : int, default None
Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)

reindex, reindex_like

reindexed : DataFrame

reindex_like(other, method=None, copy=True, limit=None)

return an object with matching indicies to myself

other : Object method : string or None copy : boolean, default True limit : int, default None

Maximum size gap to forward or backward fill
Like calling s.reindex(index=other.index, columns=other.columns,
method=...)

reindexed : same as input

rename(index=None, columns=None, **kwargs)

Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

index, columns : dict-like or function, optional
Transformation to apply to that axis values
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is ignored.

renamed : DataFrame (new object)

rename_axis(mapper, axis=0, copy=True, inplace=False)

Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.

mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True

Also copy underlying data

inplace : boolean, default False

renamed : type of caller

reorder_levels(order, axis=0)

Rearrange index levels using input order. May not drop or duplicate levels

order : list of int or list of str
List representing new level order. Reference level by number (position) or by key (label).
axis : int
Where to reorder levels.

type of caller (new object)

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)

Replace values given in ‘to_replace’ with ‘value’.

to_replace : str, regex, list, dict, Series, numeric, or None

  • str or regex:

    • str: string exactly matching to_replace will be replaced with value
    • regex: regexs matching to_replace will be replaced with value
  • list of str, regex, or numeric:

    • First, if to_replace and value are both lists, they must be the same length.
    • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
    • str and regex rules apply as above.
  • dict:

    • Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
    • Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
  • None:

    • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

value : scalar, dict, list, str, regex, default None
Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
inplace : boolean, default False
If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill
regex : bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Otherwise, to_replace must be None because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions.
method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
The method to use when for replacement, when to_replace is a list.

NDFrame.reindex NDFrame.asfreq NDFrame.fillna

filled : NDFrame

AssertionError
  • If regex is not a bool and to_replace is not None.
TypeError
  • If to_replace is a dict and value is not a list, dict, ndarray, or Series
  • If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
ValueError
  • If to_replace and value are list s or ndarray s, but they are not the same length.
  • Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.
  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)

Convenience method for frequency conversion and resampling of regular time-series data.

rule : string
the offset string or object representing target conversion
how : string
method for down- or re-sampling, default to ‘mean’ for downsampling

axis : int, optional, default 0 fill_method : string, default None

fill_method for upsampling
closed : {‘right’, ‘left’}
Which side of bin interval is closed
label : {‘right’, ‘left’}
Which bin edge label to label bucket with

convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta

Adjust the resampled time labels
limit : int, default None
Maximum size gap to when reindexing with fill_method
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.

level : int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default
drop : boolean, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
col_level : int or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fill : object, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

resetted : DataFrame

rfloordiv(other, axis='columns', level=None, fill_value=None)

Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmod(other, axis='columns', level=None, fill_value=None)

Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rmul(other, axis='columns', level=None, fill_value=None)

Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rpow(other, axis='columns', level=None, fill_value=None)

Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rsub(other, axis='columns', level=None, fill_value=None)

Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

rtruediv(other, axis='columns', level=None, fill_value=None)

Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

save(path)

Deprecated. Use to_pickle instead

select(crit, axis=0)

Return data corresponding to axis labels matching criteria

crit : function
To be called on each index (label). Should return True or False

axis : int

selection : type of caller

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.

keys : column label or list of column labels / arrays drop : boolean, default True

Delete columns to be used as the new index
append : boolean, default False
Whether to append columns to existing index
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])

dataframe : DataFrame

set_value(index, col, value)

Put single value at passed column and index

index : row label col : column label value : scalar value

frame : DataFrame
If label pair is contained, will be reference to calling DataFrame, otherwise a new object
shape
shift(periods=1, freq=None, axis=0, **kwds)

Shift index by desired number of periods with an optional time freq

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, optional
Increment to use from datetools module or time rule (e.g. ‘EOM’)

If freq is specified then the index values are shifted but the data if not realigned

shifted : same type as caller

skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return unbiased skew over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

skew : Series or DataFrame (if level specified)

sort(columns=None, column=None, axis=0, ascending=True, inplace=False)

Sort DataFrame either by labels (along either axis) or by the values in column(s)

columns : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
axis : {0, 1}
Sort index/rows versus columns
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])

sorted : DataFrame

sort_index(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')

Sort DataFrame either by labels (along either axis) or by the values in a column

axis : {0, 1}
Sort index/rows versus columns
by : object
Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
ascending : boolean or list, default True
Sort ascending vs. descending. Specify list for multiple sort orders
inplace : boolean, default False
Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])

sorted : DataFrame

sortlevel(level=0, axis=0, ascending=True, inplace=False)

Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)

level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False

Sort the DataFrame without creating a new instance

sorted : DataFrame

squeeze()

squeeze length 1 dimensions

stack(level=-1, dropna=True)

Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.

level : int, string, or list of these, default last level
Level(s) to stack, can pass level name
dropna : boolean, default True
Whether to drop rows in the resulting Frame/Series with no valid values
>>> s
     a   b
one  1.  2.
two  3.  4.
>>> s.stack()
one a    1
    b    2
two a    3
    b    4

stacked : DataFrame or Series

std(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased standard deviation over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

stdev : Series or DataFrame (if level specified)

sub(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

subtract(other, axis='columns', level=None, fill_value=None)

Binary operator sub with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Return the sum of the values for the requested axis

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

sum : Series or DataFrame (if level specified)

swapaxes(axis1, axis2, copy=True)

Interchange axes and swap values axes appropriately

y : same as input

swaplevel(i, j, axis=0)

Swap levels i and j in a MultiIndex on a particular axis

i, j : int, string (can be mixed)
Level of index to be swapped. Can pass level name as string.

swapped : type of caller (new object)

tail(n=5)

Returns last n rows

take(indices, axis=0, convert=True, is_copy=True)

Analogous to ndarray.take

indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy

taken : type of caller

to_clipboard(excel=None, sep=None, **kwargs)

Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.

excel : boolean, defaults to True
if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard

sep : optional, defaults to tab other keywords are passed to to_csv

Requirements for your platform
  • Linux: xclip, or xsel (with gtk or PyQt4 modules)
  • Windows: none
  • OS X: none
to_csv(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)

Write DataFrame to a comma-separated values (csv) file

path_or_buf : string or file handle / StringIO
File path
sep : character, default ”,”
Field delimiter for the output file.
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
nanRep : None
deprecated, use na_rep
mode : str
Python write mode, default ‘w’
encoding : string, optional
a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
line_terminator : string, default ‘\\n’
The newline character or character sequence to use in the output file
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL
chunksize : int or None
rows to write at a time
tupleize_cols : boolean, default False
write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
date_format : string, default None
Format string for datetime objects.
to_dense()

Return dense representation of NDFrame (as opposed to sparse)

to_dict(outtype='dict')

Convert DataFrame to dictionary.

outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.

result : dict like {column -> {index -> value}}

to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)

Write DataFrame to a excel sheet

excel_writer : string or ExcelWriter object
File path or existing ExcelWriter
sheet_name : string, default ‘Sheet1’
Name of sheet which will contain DataFrame
na_rep : string, default ‘’
Missing data representation
float_format : string, default None
Format string for floating point numbers
cols : sequence, optional
Columns to write
header : boolean or list of string, default True
Write out column names. If a list of string is given it is assumed to be aliases for the column names
index : boolean, default True
Write row names (index)
index_label : string or sequence, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
startow :
upper left cell row to dump data frame
startcol :
upper left cell column to dump data frame
engine : string, default None
write engine to use - you can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
merge_cells : boolean, default True
Write MultiIndex and Hierarchical Rows as merged cells.

If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:

>>> writer = ExcelWriter('output.xlsx')
>>> df1.to_excel(writer,'Sheet1')
>>> df2.to_excel(writer,'Sheet2')
>>> writer.save()
to_gbq(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)

Write a DataFrame to a Google BigQuery table.

If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.

destination_table : string
name of table to be written, in the form ‘dataset.tablename’
schema : sequence (optional)
list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
col_order : sequence (optional)
order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.

kwargs are passed to the Client constructor

SchemaMissing :
Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
TableExists :
Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
InvalidSchema :
Raised if the ‘schema’ parameter does not match the provided DataFrame
to_hdf(path_or_buf, key, **kwargs)

activate the HDFStore

path_or_buf : the path (string) or buffer to put the store key : string

indentifier for the group in the store

mode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’

'r'
Read-only; no data can be modified.
'w'
Write; a new file is created (an existing file with the same name would be deleted).
'a'
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
It is similar to 'a', but the file must already exist.
format : ‘fixed(f)|table(t)’, default is ‘fixed’
fixed(f) : Fixed format
Fast writing/reading. Not-appendable, nor searchable
table(t) : Table format
Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
append : boolean, default False
For Table formats, append the input data to the existing
complevel : int, 1-9, default 0
If a complib is specified compression will be applied where possible
complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
If complevel is > 0 apply compression to objects written in the store wherever possible
fletcher32 : bool, default False
If applying compression use the fletcher32 checksum
to_html(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame as an HTML table.

to_html-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table
escape : boolean, default True
Convert the characters <, >, and & to HTML-safe sequences.=
max_rows : int, optional
Maximum number of rows to show before truncating. If None, show all.
max_cols : int, optional
Maximum number of columns to show before truncating. If None, show all.
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_json(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

path_or_buf : the path or buffer to write the result string
if this is None, return a StringIO of the converted string

orient : string

  • Series
    • default is ‘index’
    • allowed values are: {‘split’,’records’,’index’}
  • DataFrame
    • default is ‘columns’
    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
  • The format of the JSON string
    • split : dict like {index -> [index], columns -> [columns], data -> [values]}
    • records : list like [{column -> value}, ... , {column -> value}]
    • index : dict like {index -> {column -> value}}
    • columns : dict like {column -> {index -> value}}
    • values : just the values array
date_format : {‘epoch’, ‘iso’}
Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
double_precision : The number of decimal places to use when encoding
floating point values, default 10.

force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.

same type as input object with filtered info axis

to_latex(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)

Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.

to_latex-specific options:

bold_rows : boolean, default True
Make the row labels bold in the output
frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_msgpack(path_or_buf=None, **kwargs)

msgpack (serialize) object to input file path

THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.

path : string File path, buffer-like, or None
if None, return generated string
append : boolean whether to append to an existing msgpack
(default is False)
compress : type of compressor (zlib or blosc), default to None (no
compression)
to_panel()

Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.

Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later

panel : Panel

to_period(freq=None, axis=0, copy=True)

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)

freq : string, default axis : {0, 1}, default 0

The axis to convert (the index by default)
copy : boolean, default True
If False then underlying input data is not copied

ts : TimeSeries with PeriodIndex

to_pickle(path)

Pickle (serialize) object to input file path

path : string
File path
to_records(index=True, convert_datetime64=True)

Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested

index : boolean, default True
Include index in resulting record array, stored in ‘index’ field
convert_datetime64 : boolean, default True
Whether to convert the index to datetime.datetime if it is a DatetimeIndex

y : recarray

to_sparse(fill_value=None, kind='block')

Convert to SparseDataFrame

fill_value : float, default NaN kind : {‘block’, ‘integer’}

y : SparseDataFrame

to_sql(name, con, flavor='sqlite', if_exists='fail', **kwargs)

Write records stored in a DataFrame to a SQL database.

name : str
Name of SQL table

conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’

  • fail: If table exists, do nothing.
  • replace: If table exists, drop it, recreate it, and insert data.
  • append: If table exists, insert data. Create if does not exist.
to_stata(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)

A class for writing Stata binary dta files from array-like objects

fname : file path or buffer
Where to save the dta file.
convert_dates : dict
Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
encoding : str
Default is latin-1. Note that Stata does not support unicode.
byteorder : str
Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data)
>>> writer.write_file()

Or with dates

>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'})
>>> writer.write_file()
to_string(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)

Render a DataFrame to a console-friendly tabular output.

frame : DataFrame
object to render
buf : StringIO-like, optional
buffer to write to
columns : sequence, optional
the subset of columns to write; default None writes all columns
col_space : int, optional
the minimum width of each column
header : bool, optional
whether to print column labels, default True
index : bool, optional
whether to print index (row) labels, default True
na_rep : string, optional
string representation of NAN to use, default ‘NaN’
formatters : list or dict of one-parameter functions, optional
formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
float_format : one-parameter function, optional
formatter function to apply to columns’ elements if they are floats default None
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
justify : {‘left’, ‘right’}, default None
Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
index_names : bool, optional
Prints the names of the indexes, default True
force_unicode : bool, default False
Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.

formatted : string (or unicode, depending on data and options)

to_timestamp(freq=None, how='start', axis=0, copy=True)

Cast to DatetimeIndex of timestamps, at beginning of period

freq : string, default frequency of PeriodIndex
Desired frequency
how : {‘s’, ‘e’, ‘start’, ‘end’}
Convention for converting period to timestamp; start of period vs. end
axis : {0, 1} default 0
The axis to convert (the index by default)
copy : boolean, default True
If false then underlying input data is not copied

df : DataFrame with DatetimeIndex

to_wide(*args, **kwargs)
transpose()

Transpose index and columns

truediv(other, axis='columns', level=None, fill_value=None)

Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs

other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}

For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the passed MultiIndex level

Mismatched indices will be unioned together

result : DataFrame

truncate(before=None, after=None, axis=None, copy=True)

Truncates a sorted NDFrame before and/or after some particular dates.

before : date
Truncate before date
after : date
Truncate after date

axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,

return a copy of the truncated section

truncated : type of caller

tshift(periods=1, freq=None, axis=0, **kwds)

Shift the time index, using the index’s frequency if available

periods : int
Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, default None
Increment to use from datetools module or time rule (e.g. ‘EOM’)
axis : int or basestring
Corresponds to the axis that contains the Index

If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown

shifted : NDFrame

tz_convert(tz, axis=0, copy=True)

Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
tz_localize(tz, axis=0, copy=True, infer_dst=False)

Localize tz-naive TimeSeries to target time zone

tz : string or pytz.timezone object copy : boolean, default True

Also make a copy of the underlying data
infer_dst : boolean, default False
Attempt to infer fall dst-transition times based on order
unstack(level=-1)

Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)

level : int, string, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name

DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation

from unstack).
>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1
     b   2
two  a   3
     b   4
dtype: float64
>>> s.unstack(level=-1)
     a   b
one  1  2
two  3  4
>>> s.unstack(level=0)
   one  two
a  1   3
b  2   4
>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.
     b  3.
two  a  2.
     b  4.

unstacked : DataFrame or Series

update(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)

Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices

other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True

If True then overwrite values for common keys in the calling frame
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
Can choose to replace values other than NA. Return True for values that should be updated
raise_conflict : boolean
If True, will raise an error if the DataFrame and other both contain data in the same place.
values

Numpy representation of NDFrame

var(axis=None, skipna=None, level=None, ddof=1, **kwargs)

Return unbiased variance over requested axis Normalized by N-1

axis : {index (0), columns (1)} skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data

variance : Series or DataFrame (if level specified)

where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False

Whether to perform the operation in place on the data

axis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False

try to cast the result back to the input type (if possible),
raise_on_error : boolean, default True
Whether to raise on invalid data types (e.g. trying to where on strings)

wh : same type as caller

xs(key, axis=0, level=None, copy=True, drop_level=True)

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).

key : object
Some label contained in the index, or partially in a MultiIndex
axis : int, default 0
Axis to retrieve cross-section on
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
copy : boolean, default True
Whether to make a copy of the data
drop_level : boolean, default True
If False, returns object with same levels as self.
>>> df
   A  B  C
a  4  5  2
b  4  0  9
c  9  7  3
>>> df.xs('a')
A    4
B    5
C    2
Name: a
>>> df.xs('C', axis=1)
a    2
b    9
c    3
Name: C
>>> s = df.xs('a', copy=False)
>>> s['A'] = 100
>>> df
     A  B  C
a  100  5  2
b    4  0  9
c    9  7  3
>>> df
                    A  B  C  D
first second third
bar   one    1      4  1  8  9
      two    1      7  5  5  0
baz   one    1      6  6  8  0
      three  2      5  3  5  3
>>> df.xs(('baz', 'three'))
       A  B  C  D
third
2      5  3  5  3
>>> df.xs('one', level=1)
             A  B  C  D
first third
bar   1      4  1  8  9
baz   1      6  6  8  0
>>> df.xs(('baz', 2), level=[0, 'third'])
        A  B  C  D
second
three   5  3  5  3

xs : Series or DataFrame

Core.Transcript

class Fred2.Core.Transcript.Transcript(seq, gene_id='unknown', transcript_id=None, vars=None)

Bases: Fred2.Core.Base.MetadataLogger, Bio.Seq.Seq

A Transcript is the mRNA sequence containing no or several Fred2.Core.Variant.Variant.

Note

For accessing and manipulating the sequence see also Bio.Seq.Seq (from Biopython)

Parameters:
  • gene_id (str) – Genome ID
  • transcript_id (str) – Transcript RefSeqID
  • vars (dict(int,:class:Fred2.Core.Variant.Variant)) – Dict of Fred2.Core.Variant.Variant for specific positions in the Transcript. key=position, value=Variant
back_transcribe()

Returns the DNA sequence from an RNA sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG",
...                     IUPAC.unambiguous_rna)
>>> messenger_rna
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())
>>> messenger_rna.back_transcribe()
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())

Trying to back-transcribe a protein or DNA sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.back_transcribe()
Traceback (most recent call last):
   ...
ValueError: Proteins cannot be back transcribed!
complement()

Returns the complement sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_dna = Seq("CCCCCGATAG", IUPAC.unambiguous_dna)
>>> my_dna
Seq('CCCCCGATAG', IUPACUnambiguousDNA())
>>> my_dna.complement()
Seq('GGGGGCTATC', IUPACUnambiguousDNA())

You can of course used mixed case sequences,

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("CCCCCgatA-GD", generic_dna)
>>> my_dna
Seq('CCCCCgatA-GD', DNAAlphabet())
>>> my_dna.complement()
Seq('GGGGGctaT-CH', DNAAlphabet())

Note in the above example, ambiguous character D denotes G, A or T so its complement is H (for C, T or A).

Trying to complement a protein sequence raises an exception.

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.complement()
Traceback (most recent call last):
   ...
ValueError: Proteins do not have complements!
count(sub, start=0, end=9223372036854775807)

Non-overlapping count method, like that of a python string.

This behaves like the python string method of the same name, which does a non-overlapping count!

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

e.g.

>>> from Bio.Seq import Seq
>>> my_seq = Seq("AAAATGA")
>>> print(my_seq.count("A"))
5
>>> print(my_seq.count("ATG"))
1
>>> print(my_seq.count(Seq("AT")))
1
>>> print(my_seq.count("AT", 2, -1))
1

HOWEVER, please note because python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:

>>> "AAAA".count("AA")
2
>>> print(Seq("AAAA").count("AA"))
2

An overlapping search would give the answer as three!

endswith(suffix, start=0, end=9223372036854775807)

Does the Seq end with the given suffix? Returns True/False.

This behaves like the python string method of the same name.

Return True if the sequence ends with the specified suffix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. suffix can also be a tuple of strings to try. e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.endswith("UUG")
True
>>> my_rna.endswith("AUG")
False
>>> my_rna.endswith("AUG", 0, 18)
True
>>> my_rna.endswith(("UCC", "UCA", "UUG"))
True
find(sub, start=0, end=9223372036854775807)

Find method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

Returns -1 if the subsequence is NOT found.

e.g. Locating the first typical start codon, AUG, in an RNA sequence:

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.find("AUG")
3
get_metadata(label, only_first=False)

Getter for the saved metadata with the key label

Parameters:
  • label (str) – key for the metadata that is inferred
  • only_first (bool) – true if only the the first element of the matadata list is to be returned
log_metadata(label, value)

Inserts a new metadata

Parameters:
  • label (str) – key for the metadata that will be added
  • value (list(object)) – any kindy of additional value that should be kept
lower()

Returns a lower case copy of the sequence.

This will adjust the alphabet if required. Note that the IUPAC alphabets are upper case only, and thus a generic alphabet must be substituted.

>>> from Bio.Alphabet import Gapped, generic_dna
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> my_seq = Seq("CGGTACGCTTATGTCACGTAG*AAAAAA", Gapped(IUPAC.unambiguous_dna, "*"))
>>> my_seq
Seq('CGGTACGCTTATGTCACGTAG*AAAAAA', Gapped(IUPACUnambiguousDNA(), '*'))
>>> my_seq.lower()
Seq('cggtacgcttatgtcacgtag*aaaaaa', Gapped(DNAAlphabet(), '*'))

See also the upper method.

lstrip(chars=None)

Returns a new Seq object with leading (left) end stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. print(my_seq.lstrip(“-”))

See also the strip and rstrip methods.

newid = <method-wrapper 'next' of itertools.count object>
reverse_complement()

Returns the reverse complement sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_dna = Seq("CCCCCGATAGNR", IUPAC.ambiguous_dna)
>>> my_dna
Seq('CCCCCGATAGNR', IUPACAmbiguousDNA())
>>> my_dna.reverse_complement()
Seq('YNCTATCGGGGG', IUPACAmbiguousDNA())

Note in the above example, since R = G or A, its complement is Y (which denotes C or T).

You can of course used mixed case sequences,

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("CCCCCgatA-G", generic_dna)
>>> my_dna
Seq('CCCCCgatA-G', DNAAlphabet())
>>> my_dna.reverse_complement()
Seq('C-TatcGGGGG', DNAAlphabet())

Trying to complement a protein sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.reverse_complement()
Traceback (most recent call last):
   ...
ValueError: Proteins do not have complements!
rfind(sub, start=0, end=9223372036854775807)

Find from right method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the last (right most) occurrence of substring argument sub in the (sub)sequence given by [start:end].

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

Returns -1 if the subsequence is NOT found.

e.g. Locating the last typical start codon, AUG, in an RNA sequence:

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.rfind("AUG")
15
rsplit(sep=None, maxsplit=-1)

Right split method, like that of a python string.

This behaves like the python string method of the same name.

Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done COUNTING FROM THE RIGHT. If maxsplit is omitted, all splits are made.

Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.

e.g. print(my_seq.rsplit(“*”,1))

See also the split method.

rstrip(chars=None)

Returns a new Seq object with trailing (right) end stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. Removing a nucleotide sequence’s polyadenylation (poly-A tail):

>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> my_seq = Seq("CGGTACGCTTATGTCACGTAGAAAAAA", IUPAC.unambiguous_dna)
>>> my_seq
Seq('CGGTACGCTTATGTCACGTAGAAAAAA', IUPACUnambiguousDNA())
>>> my_seq.rstrip("A")
Seq('CGGTACGCTTATGTCACGTAG', IUPACUnambiguousDNA())

See also the strip and lstrip methods.

split(sep=None, maxsplit=-1)

Split method, like that of a python string.

This behaves like the python string method of the same name.

Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If maxsplit is omitted, all splits are made.

Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.

e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_aa = my_rna.translate()
>>> my_aa
Seq('VMAIVMGR*KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> my_aa.split("*")
[Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
>>> my_aa.split("*", 1)
[Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))]

See also the rsplit method:

>>> my_aa.rsplit("*", 1)
[Seq('VMAIVMGR*KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
startswith(prefix, start=0, end=9223372036854775807)

Does the Seq start with the given prefix? Returns True/False.

This behaves like the python string method of the same name.

Return True if the sequence starts with the specified prefix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. prefix can also be a tuple of strings to try. e.g.

>>> from Bio.Seq import Seq
>>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG")
>>> my_rna.startswith("GUC")
True
>>> my_rna.startswith("AUG")
False
>>> my_rna.startswith("AUG", 3)
True
>>> my_rna.startswith(("UCC", "UCA", "UCG"), 1)
True
strip(chars=None)

Returns a new Seq object with leading and trailing ends stripped.

This behaves like the python string method of the same name.

Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.

e.g. print(my_seq.strip(“-”))

See also the lstrip and rstrip methods.

tomutable()

Returns the full sequence as a MutableSeq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq("MKQHKAMIVALIVICITAVVAAL",
...              IUPAC.protein)
>>> my_seq
Seq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
>>> my_seq.tomutable()
MutableSeq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())

Note that the alphabet is preserved.

tostring()

Returns the full sequence as a python string (DEPRECATED).

You are now encouraged to use str(my_seq) instead of my_seq.tostring().

transcribe()

Returns the RNA sequence from a DNA sequence. New Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG",
...                  IUPAC.unambiguous_dna)
>>> coding_dna
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
>>> coding_dna.transcribe()
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())

Trying to transcribe a protein or RNA sequence raises an exception:

>>> my_protein = Seq("MAIVMGR", IUPAC.protein)
>>> my_protein.transcribe()
Traceback (most recent call last):
   ...
ValueError: Proteins cannot be transcribed!
translate(table='Standard', stop_symbol='*', to_stop=False, cds=False)

Turns a nucleotide sequence into a protein sequence. New Seq object.

This method will translate DNA or RNA sequences, and those with a nucleotide or generic alphabet. Trying to translate a protein sequence raises an exception.

Arguments:
  • table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). This defaults to the “Standard” table.
  • stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, “*”.
  • to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
  • cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.

e.g. Using the standard table:

>>> coding_dna = Seq("GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna.translate()
Seq('VAIVMGR*KGAR*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> coding_dna.translate(stop_symbol="@")
Seq('VAIVMGR@KGAR@', HasStopCodon(ExtendedIUPACProtein(), '@'))
>>> coding_dna.translate(to_stop=True)
Seq('VAIVMGR', ExtendedIUPACProtein())

Now using NCBI table 2, where TGA is not a stop codon:

>>> coding_dna.translate(table=2)
Seq('VAIVMGRWKGAR*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> coding_dna.translate(table=2, to_stop=True)
Seq('VAIVMGRWKGAR', ExtendedIUPACProtein())

In fact, GTG is an alternative start codon under NCBI table 2, meaning this sequence could be a complete CDS:

>>> coding_dna.translate(table=2, cds=True)
Seq('MAIVMGRWKGAR', ExtendedIUPACProtein())

It isn’t a valid CDS under NCBI table 1, due to both the start codon and also the in frame stop codons:

>>> coding_dna.translate(table=1, cds=True)
Traceback (most recent call last):
    ...
TranslationError: First codon 'GTG' is not a start codon

If the sequence has no in-frame stop codon, then the to_stop argument has no effect:

>>> coding_dna2 = Seq("TTGGCCATTGTAATGGGCCGC")
>>> coding_dna2.translate()
Seq('LAIVMGR', ExtendedIUPACProtein())
>>> coding_dna2.translate(to_stop=True)
Seq('LAIVMGR', ExtendedIUPACProtein())

NOTE - Ambiguous codons like “TAN” or “NNN” could be an amino acid or a stop codon. These are translated as “X”. Any invalid codon (e.g. “TA?” or “T-A”) will throw a TranslationError.

NOTE - Does NOT support gapped sequences.

NOTE - This does NOT behave like the python string’s translate method. For that use str(my_seq).translate(...) instead.

ungap(gap=None)

Return a copy of the sequence without the gap character(s).

The gap character can be specified in two ways - either as an explicit argument, or via the sequence’s alphabet. For example:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("-ATA--TGAAAT-TTGAAAA", generic_dna)
>>> my_dna
Seq('-ATA--TGAAAT-TTGAAAA', DNAAlphabet())
>>> my_dna.ungap("-")
Seq('ATATGAAATTTGAAAA', DNAAlphabet())

If the gap character is not given as an argument, it will be taken from the sequence’s alphabet (if defined). Notice that the returned sequence’s alphabet is adjusted since it no longer requires a gapped alphabet:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped, HasStopCodon
>>> my_pro = Seq("MVVLE=AD*", HasStopCodon(Gapped(IUPAC.protein, "=")))
>>> my_pro
Seq('MVVLE=AD*', HasStopCodon(Gapped(IUPACProtein(), '='), '*'))
>>> my_pro.ungap()
Seq('MVVLEAD*', HasStopCodon(IUPACProtein(), '*'))

Or, with a simpler gapped DNA example:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped
>>> my_seq = Seq("CGGGTAG=AAAAAA", Gapped(IUPAC.unambiguous_dna, "="))
>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap()
Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())

As long as it is consistent with the alphabet, although it is redundant, you can still supply the gap character as an argument to this method:

>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap("=")
Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())

However, if the gap character given as the argument disagrees with that declared in the alphabet, an exception is raised:

>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap("-")
Traceback (most recent call last):
   ...
ValueError: Gap '-' does not match '=' from alphabet

Finally, if a gap character is not supplied, and the alphabet does not define one, an exception is raised:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("ATA--TGAAAT-TTGAAAA", generic_dna)
>>> my_dna
Seq('ATA--TGAAAT-TTGAAAA', DNAAlphabet())
>>> my_dna.ungap()
Traceback (most recent call last):
   ...
ValueError: Gap character not given and not defined in alphabet
upper()

Returns an upper case copy of the sequence.

>>> from Bio.Alphabet import HasStopCodon, generic_protein
>>> from Bio.Seq import Seq
>>> my_seq = Seq("VHLTPeeK*", HasStopCodon(generic_protein))
>>> my_seq
Seq('VHLTPeeK*', HasStopCodon(ProteinAlphabet(), '*'))
>>> my_seq.lower()
Seq('vhltpeek*', HasStopCodon(ProteinAlphabet(), '*'))
>>> my_seq.upper()
Seq('VHLTPEEK*', HasStopCodon(ProteinAlphabet(), '*'))

This will adjust the alphabet if required. See also the lower method.

Core.Variant

class Fred2.Core.Variant.MutationSyntax(transID, transPos, protPos, cds, aas)

This class represents the mutation syntax of a variant and stores its transcript and protein position

Parameters:
  • transID (str) – The Transcript id
  • transPos (int) – The position of the Variant within the Transcript
  • protPos (int) – The Protein position of the Variant within the Transcript
  • cds (str) – The complete cds_mutation_syntax string
  • aas (str) – The complete protein_mutation_syntax string
class Fred2.Core.Variant.Variant(id, type, chrom, genomePos, ref, obs, coding, isHomozygous, isSynonymous, experimentalDesign=None, metadata=None)

Bases: Fred2.Core.Base.MetadataLogger

A Variant contains information about a single genetic modification of the reference genome.

get_annotated_protein_pos(transID)

Returns the annotated protein position

Parameters:transID (str) – The Transcript ID of interest
Returns:The annotated Protein position of the given Transcript ID
Return type:int
Raises KeyError:
 If Variant is not annotated to the given Transcript ID
get_annotated_transcript_pos(transID)

Returns the annotated Transcript position

Parameters:transID (str) – The Transcript ID of interest
Returns:The annotated Transcript position of the given Transcript ID
Return type:int
Raises KeyError:
 If variant is not annotated to the given Transcript ID
get_metadata(label, only_first=False)

Getter for the saved metadata with the key label

Parameters:
  • label (str) – key for the metadata that is inferred
  • only_first (bool) – true if only the the first element of the matadata list is to be returned
get_shift()

Returns the frameshift offset caused by the mutation in {0,1,2}

Returns:The frameshift caused by mutation
Return type:int
get_transcript_offset()

Returns the sequence offset caused by the mutation

Returns:The sequence offset
Return type:int
log_metadata(label, value)

Inserts a new metadata

Parameters:
  • label (str) – key for the metadata that will be added
  • value (list(object)) – any kindy of additional value that should be kept
Fred2.Core.Variant.VariationType

alias of Enum

Fred2.IO Module

IO.ADBAdapter

IO.EnsemblAdapter

IO.FileReader

Fred2.IO.FileReader.read_annovar_exonic(annovar_file, gene_filter=None, experimentalDesig=None)

Reads an gene-based ANNOVAR output file and generates Variant objects containing all annotated Transcript ids an outputs a list Variant.

Parameters:
  • annovar_file (str) – The path ot the ANNOVAR file
  • gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)
Returns:

List of :class:`~Fred2.Core.Variant.Variants fully annotated

Return type:

list(Variant)

Fred2.IO.FileReader.read_fasta(files, type=<class 'Fred2.Core.Peptide.Peptide'>, id_position=1)

Generator function:

Read a (couple of) peptide, protein or rna sequence from a FASTA file. User needs to specify the correct type of the underlying sequences. It can either be: Peptide, Protein or Transcript (for RNA).

Parameters:
  • files (list(str) or str) – A (list) of file names to read in
  • type (Peptide or Transcript or Protein) – The type to read in
  • id_position (int) – the position of the id specified counted by |
Returns:

a list of the specified sequence type derived from the FASTA file sequences.

Return type:

(list(type))

Raises ValueError:
 

if a file is not readable

Fred2.IO.FileReader.read_lines(files, type=<class 'Fred2.Core.Peptide.Peptide'>)

Generator function:

Read a sequence directly from a line. User needs to manually specify the correct type of the underlying data. It can either be: Peptide, Protein or Transcript, Allele.

Parameters:
Returns:

A list of the specified objects

Return type:

(list(type))

Raises IOError:

if a file is not readable

IO.MartsAdapter

class Fred2.IO.MartsAdapter.MartsAdapter(usr=None, host=None, pwd=None, db=None, biomart=None)

Bases: Fred2.IO.ADBAdapter.ADBAdapter

get_all_variant_gene(locations, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')

Fetches the important db ids and names for given chromosomal location :param chrom: integer value of the chromosome in question :param start: integer value of the variation start position on given chromosome :param stop: integer value of the variation stop position on given chromosome :return: The respective gene name, i.e. the first one reported

get_all_variant_ids(**kwargs)

Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:

  • ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
  • ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
  • ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
Parameters:
  • 'locations' – list of locations as triplets of integer values representing (chrom, start, stop)
  • 'genes' – list of genes as string value of the genes of variation
Returns:

The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)

get_product_sequence(product_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')

fetches product sequence for the given id :param product_refseq: given refseq id :return: list of dictionaries of the requested sequence, the respective strand and the associated gene name

get_protein_sequence_from_protein_id(**kwargs)

Returns the protein sequence for a given protein ID that can either be refeseq, uniprot or ensamble id

Parameters:kwargs
Returns:
get_transcript_information(transcript_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')

It also already uses the Field-Enum for DBAdapters

Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

get_transcript_information_from_protein_id(**kwargs)

It also already uses the Field-Enum for DBAdapters

Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

get_transcript_position(start, stop, gene_id, transcript_id, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')

If no transcript position is available for the variant :param start: :param stop: :param gene_id: :param transcript_id: :param _db: :param _dataset: :return:

get_transcript_sequence(transcript_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')

Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

get_variant_gene(chrom, start, stop, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')

Fetches the important db ids and names for given chromosomal location :param chrom: integer value of the chromosome in question :param start: integer value of the variation start position on given chromosome :param stop: integer value of the variation stop position on given chromosome :return: The respective gene name, i.e. the first one reported

get_variant_id_from_gene_id(**kwargs)

returns all information needed to instantiate a variation

Parameters:trans_id – A transcript ID (either ENSAMBLE (ENS) or RefSeq (NM, XN)
Returns:list of dicts – containing all information needed for a variant initialization
get_variant_id_from_protein_id(**kwargs)

returns all information needed to instantiate a variation

Parameters:trans_id – A transcript ID (either ENSAMBLE (ENS) or RefSeq (NM, XN)
Returns:list of dicts – containing all information needed for a variant initialization
get_variant_ids(**kwargs)

Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:

  • ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
  • ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
  • ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
Parameters:
  • 'chrom' – integer value of the chromosome in question
  • 'start' – integer value of the variation start position on given chromosome
  • 'stop' – integer value of the variation stop position on given chromosome
  • 'gene' – string value of the gene of variation
  • 'transcript_id' – string value of the gene of variation
Returns:

The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)

IO.RefSeqAdapter

class Fred2.IO.RefSeqAdapter.RefSeqAdapter(prot_file=None, prot_vers=None, mrna_file=None, mrna_vers=None)

Bases: Fred2.IO.ADBAdapter.ADBAdapter

get_product_sequence(product_refseq)

fetches product sequence for the given id :param product_refseq: given refseq id :return: list of dictionaries of the requested sequence, the respective strand and the associated gene name

get_transcript_information(transcript_refseq)
get_transcript_sequence(transcript_refseq)

Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

load(filename)

IO.UniProtAdapter

class Fred2.IO.UniProtAdapter.UniProtDB(name='fdb')
exists(seq)

fast check if given sequence exists (as subsequence) in one of the UniProtDB objects collection of sequences.

Parameters:seq – the subsequence to be searched for
Returns:True, if it is found somewhere, False otherwise
read_seqs(sequence_file)

read sequences from uniprot files (.dat or .fasta) or from lists or dicts of BioPython SeqRecords and make them available for fast search. Appending also with this function.

Parameters:sequence_file – uniprot files (.dat or .fasta)
Returns:
search(seq)

search for first occurrence of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of the first occurrence.

Parameters:seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences
Returns:a dictionary of sequences to lists (of ids, ‘null’ if n/a)
search_all(seq)

search for all occurrences of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of all occurrences.

Parameters:seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences
Returns:a dictionary of the given sequences to lists (of ids, ‘null’ if n/a)
write_seqs(name)

writes all fasta entries in the current object into one fasta file

Parameters:name – the complete path with file name where the fasta is going to be written

Prediction

Fred2.CleavagePrediction Module

CleavagePrediction.External

class Fred2.CleavagePrediction.External.AExternalCleavageSitePrediction

Bases: Fred2.Core.Base.ACleavageSitePrediction, Fred2.Core.Base.AExternal

Abstract base class for external cleavage site prediction methods. Implements predict functionality.

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved (starting from 1)

command

Defines the commandline call for external tool

get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) –
  • Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(aa_seq, command=None, options=None, **kwargs)

Overwrites ACleavageSitePrediction.predict

Parameters:
  • aa_seq (list(Peptide/Protein) or Peptide/Protein) – A list of or a single Peptide or Protein object
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool
Returns:

A CleavageSitePredictionResult object

Return type:

CleavageSitePredictionResult

prepare_input(input, file)

Prepares the data :attr:_input and writes them to :attr:_file in the special format used by the external tool

Parameters:
  • input (list(str)) – The input data (here peptide sequences)
  • file (File) – A file handler with which the data are written to file
supportedLength

The supported lengths of the predictor

version

Parameter specifying the version of the prediction method

class Fred2.CleavagePrediction.External.NetChop_3_1

Bases: Fred2.CleavagePrediction.External.AExternalCleavageSitePrediction, Fred2.Core.Base.AExternal

Implements NetChop Cleavage Site Prediction (v. 3.1).

Note

Nielsen, M., Lundegaard, C., Lund, O., & Kesmir, C. (2005). The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage. Immunogenetics, 57(1-2), 33-41.

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved

command

Defines the commandline call for external tool

get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) –
  • Optional specification of executable path if deviant from :attr:self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:Returns a dictionary with the prediction results
Return type:dict(str,dict((str,int),float))
predict(aa_seq, command=None, options=None, **kwargs)

Overwrites ACleavageSitePrediction.predict

Parameters:
  • aa_seq (list(Peptide/Protein) or Peptide/Protein) – A list of or a single Peptide or Protein object
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool
Returns:

A CleavageSitePredictionResult object

Return type:

CleavageSitePredictionResult

prepare_input(input, file)

Prepares the data and writes them to _file in the special format used by the external tool

Parameters:
  • input (list(str)) – The input data (here peptide sequences)
  • file (File) – A file handler with which the data are written to file
supportedLength

The supported lengths of the predictor

version

The version of the Method

CleavagePrediction.PSSM

class Fred2.CleavagePrediction.PSSM.APSSMCleavageFragmentPredictor

Bases: Fred2.Core.Base.ACleavageFragmentPrediction

Abstract base class for PSSM predictions.

This implementation only supports cleavage fragment prediction not site prediction

Implements predict functionality

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved

name

The name of the predictor

predict(peptides, **kwargs)

Takes peptides plus their trailing C and N-terminal residues to predict the probability that this n-mer was produced by proteasomal cleavage. It returns the score and the peptide sequence in a AResult object. Row-IDs are the peitopes column is the prediction score.

Parameters:peptides (list(Peptide) or Peptide) – A list of peptide objects or a single peptide object
Returns:Returns a Fred2.Core.Result.CleavageFragmentPredictionResult object
Return type:Fred2.Core.Result.CleavageFragmentPredictionResult
supportedLength

The supported lengths of the predictor

trailingN

The number of trailing residues at the N-terminal of the peptide used for prediction

tralingC

The number of trailing residues at the C-terminal of the peptide used for prediction

version

Parameter specifying the version of the prediction method

class Fred2.CleavagePrediction.PSSM.APSSMCleavageSitePredictor

Bases: Fred2.Core.Base.ACleavageSitePrediction

Abstract base class for PSSM predictions. This implementation only supports cleavage site prediction not fragment prediction. Implements predict functionality.

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved (starting from 1)

name

The name of the predictor

predict(aa_seq, length=None, **kwargs)

Returns predictions for given peptides.

Parameters:
Returns:

Returns a CleavageSitePredictionResult object

Return type:

CleavageSitePredictionResult

supportedLength

The supported lengths of the predictor

version

Parameter specifying the version of the prediction method

class Fred2.CleavagePrediction.PSSM.PCM

Bases: Fred2.CleavagePrediction.PSSM.APSSMCleavageSitePredictor

Implements the PCM cleavage prediction method.

Note

Doennes, P., and Kohlbacher, O. (2005). Integrated modeling of the major events in the MHC class I antigen processing pathway. Protein Science, 14(8), 2132-2140.

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved

name

The name of the predictor

predict(peptides, length=None, **kwargs)

Returns predictions for given peptides.

Parameters:
Returns:

Returns a CleavageSitePredictionResult object

Return type:

CleavageSitePredictionResult

supportedLength

A list of supported peptide lengths

version

The version of the predictor

class Fred2.CleavagePrediction.PSSM.PSSMGinodi

Bases: Fred2.CleavagePrediction.PSSM.APSSMCleavageFragmentPredictor

Implements the Cleavage Fragment prediction method of Ginodi et al.

Note

Ido Ginodi, Tal Vider-Shalit, Lea Tsaban, and Yoram Louzoun Precise score for the prediction of peptides cleaved by the proteasome Bioinformatics (2008) 24 (4): 477-483

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved

name

The name of the predictor

predict(peptides, **kwargs)

Takes peptides plus their trailing C and N-terminal residues to predict the probability that this n-mer was produced by proteasomal cleavage. It returns the score and the peptide sequence in a AResult object. Row-IDs are the peitopes column is the prediction score.

Parameters:peptides (list(Peptide) or Peptide) – A list of peptide objects or a single peptide object
Returns:Returns a Fred2.Core.Result.CleavageFragmentPredictionResult object
Return type:Fred2.Core.Result.CleavageFragmentPredictionResult
supportedLength

A list of supported peptide lengths

trailingN

The number of trailing residues at the N-terminal of the peptide used for prediction

tralingC

The number of trailing residues at the C-terminal of the peptide used for prediction

version

The version of the predictor

class Fred2.CleavagePrediction.PSSM.ProteaSMMConsecutive

Bases: Fred2.CleavagePrediction.PSSM.APSSMCleavageSitePredictor

Implements the ProteaSMM cleavage prediction method.

Note

Tenzer, S., et al. “Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding.” Cellular and Molecular Life Sciences CMLS 62.9 (2005): 1025-1037.

This model represents the consecutive proteasom

The matrices are generated not using the preon-dataset since a recent study has show that including those worsened the results.

Note

Calis, Jorg JA, et al. “Role of peptide processing predictions in T cell epitope identification: contribution of different prediction programs.” Immunogenetics (2014): 1-9.

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved

name

The name of the predictor

predict(peptides, length=None, **kwargs)

Returns predictions for given peptides.

Parameters:
Returns:

Returns a CleavageSitePredictionResult object

Return type:

CleavageSitePredictionResult

supportedLength

A list of supported peptide lengths

version

The version of the predictor

class Fred2.CleavagePrediction.PSSM.ProteaSMMImmuno

Bases: Fred2.CleavagePrediction.PSSM.APSSMCleavageSitePredictor

Implements the ProteaSMM cleavage prediction method.

Note

Tenzer, S., et al. “Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding.” Cellular and Molecular Life Sciences CMLS 62.9 (2005): 1025-1037.

This model represents the immuno proteasom

The matrices are generated not using the preon-dataset since a recent study has show that including those worsened the results.

Note

Calis, Jorg JA, et al. “Role of peptide processing predictions in T cell epitope identification: contribution of different prediction programs.” Immunogenetics (2014): 1-9.

cleavagePos

Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved

name

The name of the predictor

predict(peptides, length=None, **kwargs)

Returns predictions for given peptides.

Parameters:
Returns:

Returns a CleavageSitePredictionResult object

Return type:

CleavageSitePredictionResult

supportedLength

A list of supported peptide lengths

version

The version of the predictor

Module contents

class Fred2.CleavagePrediction.CleavageFragmentPredictorFactory

Bases: object

static available_methods()

Returns a list of available cleavage site predictors

Returns:dict(str,list(str)) - dict of cleavage site predictor represented as string and the supported versions
class Fred2.CleavagePrediction.CleavageSitePredictorFactory

Bases: object

static available_methods()

Returns a list of available cleavage site predictors

Returns:dict(str,list(int)) - dict of cleavage site predictor represented as string and the supported versions

Fred2.TAPPrediction Module

TAPPrediction.PSSM

class Fred2.TAPPrediction.PSSM.APSSMTAPPrediction

Bases: Fred2.Core.Base.ATAPPrediction

Abstract base class for PSSM predictions. Implements predict functionality

name

The name of the predictor

predict(peptides, **kwargs)

Returns TAP predictions for given Peptide.

Parameters:peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
Returns:Returns a TAPPredictionResult object with the prediction results
Return type:TAPPredictionResult
supportedLength

The supported lengths of the predictor

version

Parameter specifying the version of the prediction method

class Fred2.TAPPrediction.PSSM.SMMTAP

Bases: Fred2.TAPPrediction.PSSM.APSSMTAPPrediction

Implementation of SMMTAP.

Note

Peters, B., Bulik, S., Tampe, R., Van Endert, P. M., & Holzhuetter, H. G. (2003). Identifying MHC class I epitopes by predicting the TAP transport efficiency of epitope precursors. The Journal of Immunology, 171(4), 1741-1749.

name

The name of the predictor

predict(peptides, **kwargs)

Returns TAP predictions for given Peptide.

Parameters:peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
Returns:Returns a TAPPredictionResult object with the prediction results
Return type:TAPPredictionResult
supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.TAPPrediction.PSSM.TAPDoytchinova

Bases: Fred2.TAPPrediction.PSSM.APSSMTAPPrediction

Implements the TAP prediction model from Doytchinova.

Note

Doytchinova, I., Hemsley, S. and Flower, D. R. Transporter associated with antigen processing preselection of peptides binding to the MHC: a bioinformatic evaluation. J. Immunol, 2004, 173, 6813-6819

name

The name of the predictor

predict(peptides, **kwargs)

Returns TAP predictions for given Peptide.

Parameters:peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
Returns:Returns a TAPPredictionResult object with the prediction results
Return type:TAPPredictionResult
supportedLength

A list of supported Peptide lengths

version

The version of the predictor

TAPPrediction.SVM

class Fred2.TAPPrediction.SVM.ASVMTAPPrediction

Bases: Fred2.Core.Base.ATAPPrediction, Fred2.Core.Base.ASVM

encode(peptides)

Returns the feature encoding for peptides

Parameters:peptides (list(Peptide)/Peptide) – List of or a single Peptide object
Returns:Feature encoding of the Peptide objects
Return type:list(Object)
name

The name of the predictor

predict(peptides, **kwargs)

Returns TAP predictions for given Peptide.

Parameters:peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
Returns:Returns a TAPPredictionResult object with the prediction results
Return type:TAPPredictionResult
supportedLength

The supported lengths of the predictor

version

Parameter specifying the version of the prediction method

class Fred2.TAPPrediction.SVM.SVMTAP

Bases: Fred2.TAPPrediction.SVM.ASVMTAPPrediction

Implements SVMTAP prediction of Doeness et al.

Note

Doennes, P. and Kohlbacher, O. Integrated modeling of the major events in the MHC class I antigen processing pathway. Protein Sci, 2005

encode(peptides)

Encodes the Peptide with a binary sparse encoding

Parameters:peptides (list(str)) – A list of Peptide
Returns:Dictionary with Peptide as key and feature encoding as value (see svmlight encoding scheme http://svmlight.joachims.org/)
Return type:dict(Peptide, (tuple(int, list(tuple(int,float))))
name

The name of the predictor

predict(peptides, **kwargs)

Returns predictions for given Peptide.

Parameters:peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
Returns:Returns a TAPPredictionResult object with the prediction results
Return type:TAPPredictionResult
supportedLength

A list of supported peptide lengths

version

The version of the predictor

Module contents

class Fred2.TAPPrediction.TAPPredictorFactory

Bases: object

static available_methods()

Returns a dictionary of available TAP predictors and the supported versions

Returns:dict(str, list(str)) - A dictionary of TAP predictors represented as string and supported versions

Fred2.EpitopePrediction Module

EpitopePrediction.External

class Fred2.EpitopePrediction.External.AExternalEpitopePrediction

Bases: Fred2.Core.Base.AEpitopePrediction, Fred2.Core.Base.AExternal

Abstract class representing an external prediction function. Implementations shall wrap external binaries by following the given abstraction.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The alleles for which the internal predictor representation is needed
Returns:Returns a string representation of the input alleles
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) –
  • Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to _file in the specific format

NO return value!

Param:list(str) _input: The Peptide sequences to write into file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of valid allele models

supportedLength

A list of supported peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.NetCTLpan_1_1

Bases: Fred2.EpitopePrediction.External.AExternalEpitopePrediction

Interface for NetCTLpan 1.1.

Note

NetCTLpan - Pan-specific MHC class I epitope predictions Stranzl T., Larsen M. V., Lundegaard C., Nielsen M. Immunogenetics. 2010 Apr 9. [Epub ahead of print]

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input alleles
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to file in the specific format

No return value!

Param:list(str) input: The Peptide sequences to write into file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.NetMHCII_2_2

Bases: Fred2.EpitopePrediction.External.AExternalEpitopePrediction

Implements a wrapper for NetMHCII

Note

Nielsen, M., & Lund, O. (2009). NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinformatics, 10(1), 296.

Nielsen, M., Lundegaard, C., & Lund, O. (2007). Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics, 8(1), 238.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts Allele into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to _file in the specific format

No return value!

Param:list(str) input: The Peptide sequences to write into file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of valid Allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.NetMHCIIpan_3_0

Bases: Fred2.EpitopePrediction.External.AExternalEpitopePrediction

Implements a wrapper for NetMHCIIpan.

Note

Andreatta, M., Karosiene, E., Rasmussen, M., Stryhn, A., Buus, S., & Nielsen, M. (2015). Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification. Immunogenetics, 1-10.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input alleles
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to _file in the specific format

No return value!

Param:list(str) input: The Peptide sequences to write into file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of valid Allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.NetMHCIIpan_3_1

Bases: Fred2.EpitopePrediction.External.NetMHCIIpan_3_0

Implementation of NetMHCIIpan 3.1 adapter.

Note

Andreatta, M., Karosiene, E., Rasmussen, M., Stryhn, A., Buus, S., & Nielsen, M. (2015). Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification. Immunogenetics, 1-10.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input alleles
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to _file in the specific format

No return value!

Param:list(str) input: The Peptide sequences to write into file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of valid Allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.NetMHC_3_0

Bases: Fred2.EpitopePrediction.External.NetMHC_3_4

Implements the NetMHC binding (for netMHC3.0):

Note

NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. Nucleic Acids Res. 1;36(Web Server issue):W509-12. 2008

Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers. Lundegaard C, Lund O, Nielsen M. Bioinformatics, 24(11):1397-98, 2008.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:dict
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to file in the specific format

NO return value!

Param:list(str) input: The : sequences to write into _file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of valid Allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.NetMHC_3_4

Bases: Fred2.EpitopePrediction.External.AExternalEpitopePrediction

Implements the NetMHC binding (in current form for netMHC3.4).

Note

NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. Nucleic Acids Res. 1;36(Web Server issue):W509-12. 2008

Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers. Lundegaard C, Lund O, Nielsen M. Bioinformatics, 24(11):1397-98, 2008.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (Allele) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:dict
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to file in the specific format

NO return value!

Param:list(str) input: The : sequences to write into _file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of valid allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.NetMHCpan_2_4

Bases: Fred2.EpitopePrediction.External.AExternalEpitopePrediction

Implements the NetMHC binding (in current form for netMHCpan 2.4). Supported MHC alleles currently only restricted to HLA alleles.

Note

Nielsen, Morten, et al. “NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and-B locus protein of known sequence.” PloS one 2.8 (2007): e796.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts Allele into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to file in the specific format

NO return value!

Param:list(str) input: The Peptide sequences to write into file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of valid Allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.NetMHCpan_2_8

Bases: Fred2.EpitopePrediction.External.AExternalEpitopePrediction

Implements the NetMHC binding (in current form for netMHCpan 2.8). Supported MHC alleles currently only restricted to HLA alleles.

Note

Nielsen, Morten, et al. “NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and-B locus protein of known sequence.” PloS one 2.8 (2007): e796.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to file in the specific format

No return value!

Param:list(str) input: The Peptide sequences to write into file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of valid Allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.External.PickPocket_1_1

Bases: Fred2.EpitopePrediction.External.AExternalEpitopePrediction

Implementation of PickPocket adapter.

Note

Zhang, H., Lund, O., & Nielsen, M. (2009). The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding. Bioinformatics, 25(10), 1293-1299.

command

Defines the commandline call for external tool

convert_alleles(alleles)

Converts Allele into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The alleles for which the internal predictor representation is needed
Returns:Returns a string representation of the input alleles
Return type:list(str)
get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from elf.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(peptides, alleles=None, command=None, options=None, **kwargs)

Overwrites AEpitopePrediction.predict

Parameters:
  • peptides (list(Peptide) or Peptide) – A list of or a single Peptide object
  • alleles (list(Allele)/Allele) – A list of or a single Allele object. If no Allele are provided, predictions are made for all Allele supported by the prediction method
  • command (str) – The path to a alternative binary (can be used if binary is not globally executable)
  • options (str) – A string of additional options directly past to the external tool.
Returns:

A EpitopePredictionResult object

Return type:

EpitopePredictionResult

prepare_input(input, file)

Prepares input for external tools and writes them to file in the specific format

No return value!

Param:list(str) input: The Peptide sequences to write into _file
Parameters:file (File) – File-handler to input file for external tool
supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

EpitopePrediction.PSSM

class Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Bases: Fred2.Core.Base.AEpitopePrediction

Abstract base class for PSSM predictions. Implements predict functionality

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The alleles for which the internal predictor representation is needed
Returns:Returns a string representation of the input alleles
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of valid allele models

supportedLength

A list of supported peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.ARB

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Implements IEDBs ARB method.

Note

Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothe BR, Chisari FV, Watkins DI, Sette A. 2005. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics 57:304-314.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.BIMAS

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Represents the BIMAS PSSM predictor.

Note

Parker, K.C., Bednarek, M.A. and Coligan, J.E. Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. The Journal of Immunology 1994;152(1):163-175.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.ComblibSidney2008

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Implements IEDBs Comblib_Sidney2008 PSSM method.

Note

Sidney J, Assarsson E, Moore C, Ngo S, Pinilla C, Sette A, Peters B. 2008. Quantitative peptide binding motifs for 19 human and mouse MHC class I molecules derived using positional scanning combinatorial peptide libraries. Immunome Res 4:2.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.Epidemix

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Represents the Epidemix PSSM predictor.

Note

Feldhahn, M., et al. FRED-a framework for T-cell epitope detection. Bioinformatics 2009;25(20):2758-2759.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.Hammer

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Represents the virtual pockets approach by Sturniolo et al.

Note

Sturniolo, T., et al. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nature biotechnology 1999;17(6):555-561.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.SMM

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Implements IEDBs SMM PSSM method.

Note

Peters B, Sette A. 2005. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics 6:132.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.SMMPMBEC

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Implements IEDBs SMMPMBEC PSSM method.

Note

Kim, Y., Sidney, J., Pinilla, C., Sette, A., & Peters, B. (2009). Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior. BMC Bioinformatics, 10(1), 394.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.Syfpeithi

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Represents the Syfpeithi PSSM predictor.

Note

Rammensee, H. G., Bachmann, J., Emmerich, N. P. N., Bachor, O. A., & Stevanovic, S. (1999). SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics, 50(3-4), 213-219.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.PSSM.TEPITOPEpan

Bases: Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction

Implements TEPITOPEpan.

Note

TEPITOPEpan: Extending TEPITOPE for Peptide Binding Prediction Covering over 700 HLA-DR Molecules Zhang L, Chen Y, Wong H-S, Zhou S, Mamitsuka H, et al. (2012) TEPITOPEpan: Extending TEPITOPE for Peptide Binding Prediction Covering over 700 HLA-DR Molecules. PLoS ONE 7(2): e30483.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an Allele. If no Allele are given, predictions for all available models are made.

Parameters:
  • peptides (list(Peptide) or Peptide) – A single Peptide or a list of Peptide
  • alleles (list(Allele) or class:~Fred2.Core.Allele.Allele) – A list of Allele
  • kwargs – optional parameter (not used yet)
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

EpitopePrediction.SVM

class Fred2.EpitopePrediction.SVM.ASVMEpitopePrediction

Bases: Fred2.Core.Base.AEpitopePrediction, Fred2.Core.Base.ASVM

Implements default prediction routine for SVM based epitope prediction tools

convert_alleles(alleles)

Converts alleles into the internal allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The alleles for which the internal predictor representation is needed
Returns:Returns a string representation of the input alleles
Return type:list(str)
encode(peptides)

Returns the feature encoding for peptides

Parameters:peptides (list(Peptide)/Peptide) – List of or a single Peptide object
Returns:Feature encoding of the Peptide objects
Return type:list(Object)
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an alleles. If no alleles are given, predictions for all available models are made.

Parameters:
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of valid allele models

supportedLength

A list of supported peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.SVM.SVMHC

Bases: Fred2.EpitopePrediction.SVM.ASVMEpitopePrediction

Implements SVMHC epitope prediction for MHC-I alleles (SYFPEITHI models).

Note

Doennes, P. and Kohlbacher, O. SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res, 2006, 34, W194-W197

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
encode(peptides)

Encodes the input with binary sparse encoding of the Peptide

Parameters:peptides (str) – A list of Peptide sequences
Returns:Dictionary with Peptide as key and feature encoding as value (see svmlight encoding scheme http://svmlight.joachims.org/)
Return type:dict(Peptide, (tuple(int, list(tuple(int,float))))
name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an alleles. If no alleles are given, predictions for all available models are made.

Parameters:
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

class Fred2.EpitopePrediction.SVM.UniTope

Bases: Fred2.EpitopePrediction.SVM.ASVMEpitopePrediction

Implements UniTope prediction for MHC-I.

Note

Toussaint, N. C., Feldhahn, M., Ziehm, M., Stevanovic, S., & Kohlbacher, O. (2011, August). T-cell epitope prediction based on self-tolerance. In Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (pp. 584-588). ACM.

convert_alleles(alleles)

Converts Allele into the internal Allele representation of the predictor and returns a string representation

Parameters:alleles (list(Allele)) – The Allele for which the internal predictor representation is needed
Returns:Returns a string representation of the input Allele
Return type:list(str)
encode(peptides, allele)

Encodes the input with binary sparse encoding of the Peptide

Parameters:
  • peptides (str) – A list of Peptide sequences
  • allele (str) – The HLA Allele represented by a string
Returns:

Dictionary with Peptide as key and feature encoding as value (see svmlight encoding scheme http://svmlight.joachims.org/)

Return type:

dict(Peptide, (tuple(int, list(tuple(int,float))))`

name

The name of the predictor

predict(peptides, alleles=None, **kwargs)

Returns predictions for given peptides an alleles. If no alleles are given, predictions for all available models are made.

Parameters:
Returns:

Returns a EpitopePredictionResult object with the prediction results

Return type:

EpitopePredictionResult

supportedAlleles

A list of supported Allele models

supportedLength

A list of supported Peptide lengths

version

The version of the predictor

Module contents

Vaccine Design

Fred2.EpitopeSelection Module

EpitopeSelection.OptiTope

class Fred2.EpitopeSelection.OptiTope.OptiTope(results, threshold=None, k=10, solver='glpk', verbosity=0)

Bases: object

This class implements the epitope selection functionality of OptiTope published by Toussaint et al. [1].

This module builds upon Pyomo, an embedded algebraic modeling languages [2].

It allows to (de)select specific constraints of the underlying ILP and to solve the specific problem with a MIP solver of choice

Note

[1] N. C. Toussaint and O. Kohlbacher. OptiTope–a web server for the selection of an optimal set of peptides for epitope-based vaccines. Nucleic Acids Res, 2009, 37, W617-W622 [2] Pyomo - Optimization Modeling in Python. William E. Hart, Carl Laird, Jean-Paul Watson and David L. Woodruff. Springer, 2012.

activate_allele_coverage_const(minCoverage)

Enables the allele coverage constraint

Parameters:minCoverage (float) – Percentage of alleles which have to be covered [0,1]
Raises ValueError:
 If the input variable is not in the same domain as the parameter
activate_antigen_coverage_const(t_var)

Activates the variation coverage constraint

Parameters:t_var (int) – The number of epitopes which have to come from each variation
Raises ValueError:
 If the input variable is not in the same domain as the parameter
activate_epitope_conservation_const(t_c, conservation=None)

Activates the epitope conservation constraint

Parameters:t_c (float) – The percentage of conservation an epitope has to have [0.0,1.0].
Param:conservation: A dict with key=:class:~Fred2.Core.Peptide.Peptide specifying a different conservation score for each Peptide
Raises ValueError:
 If the input variable is not in the same domain as the parameter
deactivate_allele_coverage_const()

Deactivates the allele coverage constraint

deactivate_antigen_coverage_const()

Deactivates the variation coverage constraint

deactivate_epitope_conservation_const()

Deactivates epitope conservation constraint

set_k(k)

Sets the number of epitopes to select

Parameters:k (int) – The number of epitopes
Raises ValueError:
 If the input variable is not in the same domain as the parameter
solve(options=None)

Invokes the selected solver and solves the problem

Parameters:options (dict(str,str)) – A dictionary of solver specific options as keys and their parameters as values

:return Returns the optimal epitopes as list of Peptide objectives :rtype: list(Peptide) :raise RuntimeError: If the solver raised a problem or the solver is not accessible via the PATH

environmental variable.

Fred2.EpitopeAssembly Module

EpitopeAssembly.EpitopeAssembly

class Fred2.EpitopeAssembly.EpitopeAssembly.EpitopeAssembly(peptides, pred, solver='glpk', weight=0.0, matrix=None, verbosity=0)

Bases: object

Implements the epitope assembly approach proposed by Toussaint et al. using proteasomal cleavage site prediction and formulating the problem as TSP.

Note

Toussaint, N.C., et al. Universal peptide vaccines - Optimal peptide vaccine design based on viral sequence conservation. Vaccine 2011;29(47):8745-8753.

Parameters:
  • peptides (list(Peptide)) – A list of Peptide which shell be arranged
  • pred (ACleavageSitePredictor) – A ACleavageSitePrediction
  • solver (str) – Specifies the solver to use (mused by callable by pyomo)
  • weight (float) – Specifies how strong unwanted cleavage sites should be punished [0,1], where 0 means they will be ignored, and 1 the sum of all unwanted cleave sites is subtracted from the cleave site between two epitopes
  • verbosity (int) – Specifies how verbos the class will be, 0 means normal, >0 debug mode
approximate()

Approximates the eptiope assembly problem by applying Lin-Kernighan traveling salesman heuristic

Note

LKH implementation must be downloaded, compiled, and globally executable. Source code can be found here: http://www.akira.ruc.dk/~keld/research/LKH/

Returns:An order list of the Peptide (based on the sting-of-beads ordering)
Return type:list(Peptide)
solve(options=None)

Solves the Epitope Assembly problem and returns an ordered list of the peptides

Note

This can take quite long and should not be done for more and 30 epitopes max!

Parameters:options (str) – Solver specific options as string (will not be checked for correctness)
Returns:An order list of the Peptide (based on the string-of-beads ordering)
Return type:list(Peptide)
class Fred2.EpitopeAssembly.EpitopeAssembly.EpitopeAssemblyWithSpacer(peptides, cleav_pred, epi_pred, alleles, k=5, en=9, threshold=None, solver='glpk', alpha=0.99, beta=0, verbosity=0)

Bases: object

Implements the epitope assembly approach proposed by Toussaint et al. using proteasomal cleavage site prediction and formulating the problem as TSP.

It also extends it by optimal spacer design. (currently only allowed with PSSM cleavage site and epitope prediction)

The ILP model is implemented. So be reasonable with the size of epitope to be arranged.

approximate(start=0, threads=1, options=None)

Approximates the Eptiope Assembly problem by applying Lin-Kernighan traveling salesman heuristic

LKH implementation must be downloaded, compiled, and globally executable.

Source code can be found here: http://www.akira.ruc.dk/~keld/research/LKH/

Parameters:
  • start (int) – Start length for spacers (default 0).
  • threads (int) – Number of threads used for spacer design. Be careful, if options contain solver threads it will allocate threads*solver_threads cores!
  • options (dict(str,str)) – Solver specific options (threads for example)
Returns:

A list of ordered Peptide

Return type:

list(Peptide)

solve(start=0, threads=None, options=None)

Solve the epitope assembly problem with spacers optimally using integer linear programming.

Note

This can take quite long and should not be done for more and 30 epitopes max! Also, one has to disable pre-solving steps in order to use this model.

Parameters:
  • start (int) – Start length for spacers (default 0).
  • threads (int) – Number of threads used for spacer design. Be careful, if options contain solver threads it will allocate threads*solver_threads cores!
  • options (dict(str,str)) – Solver specific options as keys and parameters as values
Returns:

A list of ordered Peptide

Return type:

list(Peptide)

EpitopeAssembly.MosaicVaccine

The methods offers an exact solution for small till medium sized problems as well as heuristics based on a Matheuristic using Tabu Search and Branch-and-Bound for large problems.

The heuristic proceeds as follows:

I: initialize solution s_best via greedy construction

s_current = s_best WHILE convergence is not reached DO:

I: s<-Tabu Search(s_current)
II: s<-Intensification via local MIP(s) solution (allow only alpha arcs to change)
if s > s_best:
s_best = s

III: Diversification(s) to escape local maxima

END

class Fred2.EpitopeAssembly.MosaicVaccine.MosaicVaccineTS(_results, threshold=None, k=10, solver='glpk', verbosity=0)
approximate(phi=0.05, options=None, _greedyLP=True, _tabu=True, _intensify=True, _jump=True, max_iter=10000, delta_change=0.0001, max_delta=101, seed=23478234)

Matheueristic using Tabu Search

solve(options=None)

solves the model optimally

class Fred2.EpitopeAssembly.MosaicVaccine.TabuList(iterable=None, size=None)

Bases: _abcoll.MutableSet

add(key)
clear()

This is slow (creates N new iterators!) but effective.

discard(key)
isdisjoint(other)

Return True if two sets have a null intersection.

pop(last=True)
remove(value)

Remove an element. If not a member, raise a KeyError.

Fred2.EpitopeAssembly.MosaicVaccine.suffixPrefixMatch(m)

Return length of longest suffix of x of length at least k that matches a prefix of y. Return 0 if there no suffix/prefix match has length at least k.

HLA Typing

Fred2.HLAtyping Module

HLAtyping.External

class Fred2.HLAtyping.External.AExternalHLATyping

Bases: Fred2.Core.Base.AHLATyping, Fred2.Core.Base.AExternal

clean_up(_output)

Cleans the generated files after prediction

Parameters:output (str) – The path to the output file or directory
command

Defines the commandline call for external tool

get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) –
  • Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(file)

Parses external results and returns the result

Parameters:file (str) – The file path or the external prediction results
Returns:A dictionary containing the prediction results
Return type:dict
predict(ngsFile, output, command=None, options=None, delete=True, **kwargs)

Implementation of prediction

Parameters:
  • ngsFile (str) – The path to the NGS file of interest
  • output (str) – The path to the output file or directory
  • command (str) – The path to a alternative binary (if binary is not globally executable)
  • options (str) – A string with additional options that is directly past to the tool
  • delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns:

A list of Allele objects representing the most likely HLA genotype

Return type:

list(Allele)

version

Parameter specifying the version of the prediction method

class Fred2.HLAtyping.External.ATHLATES_1_0

Bases: Fred2.HLAtyping.External.AExternalHLATyping

Wrapper for ATHLATES.

Note

C. Liu, X. Yang, B. Duffy, T. Mohanakumar, R.D. Mitra, M.C. Zody, J.D. Pfeifer (2012) ATHLATES: accurate typing of human leukocyte antigen through exome sequencing, Nucl. Acids Res. (2013)

clean_up(output)

Deletes files created by ATHLATES within _output

Parameters:output (str) – The path to the output file or directory of the programme
command

Defines the commandline call for external tool

get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(output)

Searches within the defined dir _file for the newest dir and reads the prediction file from there

Parameters:output (str) – The path to the output dir
Returns:The predicted HLA genotype
Return type:list(Allele)
predict(ngsFile, output, command=None, options=None, delete=True, **kwargs)

Implementation of prediction

Parameters:
  • ngsFile (str) – The path to the NGS file of interest
  • output (str) – The path to the output file or directory
  • command (str) – The path to a alternative binary (if binary is not globally executable)
  • options (str) – A string with additional options that is directly past to the tool
  • delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns:

A list of Allele objects representing the most likely HLA genotype

Return type:

list(Allele)

version

The version of the predictor

class Fred2.HLAtyping.External.OptiType_1_0

Bases: Fred2.HLAtyping.External.AExternalHLATyping

Wrapper of OptiType v1.0.

Note

Szolek, A., Schubert, B., Mohr, C., Sturm, M., Feldhahn, M., & Kohlbacher, O. (2014). OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics, 30(23), 3310-3316.

clean_up(output)

Searches within the defined dir _file for the newest dir and deletes it. This should be the one OptiType had created

This could cause some terrible site effects if someone or something also writes in that directory!! OptiType should change the way it writes its output!

Parameters:output (str) – The path to the output file or directory of the programme
command

Defines the commandline call for external tool

get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(output)

Searches within the defined dir _file for the newest dir and reads the prediction file from there

Parameters:output (str) – The path to the output dir
Returns:The predicted HLA genotype
Return type:list(Allele)
predict(ngsFile, output, command=None, options=None, delete=True, **kwargs)

Implementation of prediction

Parameters:
  • ngsFile (str) – The path to the NGS file of interest
  • output (str) – The path to the output file or directory
  • command (str) – The path to a alternative binary (if binary is not globally executable)
  • options (str) – A string with additional options that is directly past to the tool
  • delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns:

A list of Allele objects representing the most likely HLA genotype

Return type:

list(Allele)

version

The version of the predictor

class Fred2.HLAtyping.External.Polysolver

Bases: Fred2.HLAtyping.External.AExternalHLATyping

Wrapper for Polysolver.

Note

Shukla, Sachet A., Rooney, Michael S., Rajasagi, Mohini, Tiao, Grace, et al. (2015). Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotech, advance online publication. doi: 10.1038/nbt.3344

clean_up(output)

Deletes files created by Polysolver within output

Parameters:output (str) – The path to the output file or directory of the programme
command

Defines the commandline call for external tool

get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(output)

Searches within the defined dir _file for the newest dir and reads the prediction file from there

Parameters:output (str) – The path to the output dir
Returns:The predicted HLA genotype
Return type:list(Allele)
predict(ngsFile, output, command=None, options=None, delete=True, **kwargs)

Implementation of prediction

Parameters:
  • ngsFile (str) – The path to the NGS file of interest
  • output (str) – The path to the output file or directory
  • command (str) – The path to a alternative binary (if binary is not globally executable)
  • options (str) – A string with additional options that is directly past to the tool
  • delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns:

A list of Allele objects representing the most likely HLA genotype

Return type:

list(Allele)

version

The version of the predictor

class Fred2.HLAtyping.External.Seq2HLA_2_2

Bases: Fred2.HLAtyping.External.AExternalHLATyping

Wrapper of seq2HLA v2.2.

Note

Boegel, S., Scholtalbers, J., Loewer, M., Sahin, U., & Castle, J. C. (2015). In Silico HLA Typing Using Standard RNA-Seq Sequence Reads. Molecular Typing of Blood Cell Antigens, 247.

clean_up(output)

Deletes all created files.

Parameters:output (str) – The path to the output file or directory of the programme
command

Defines the commandline call for external tool

get_external_version(path=None)

Returns the external version of the tool by executing >{command} –version

might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()

Parameters:path (str) – Optional specification of executable path if deviant from self.__command
Returns:The external version of the tool or None if tool does not support versioning
Return type:str
is_in_path()

Checks whether the specified execution command can be found in PATH

Returns:Whether or not command could be found in PATH
Return type:bool
name

The name of the predictor

parse_external_result(output)

Searches within the defined dir _file for the newest dir and reads the prediction file from there

Parameters:output (str) – The path to the output dir
Returns:The predicted HLA genotype
Return type:list(Allele)
predict(ngsFile, output, command=None, options=None, delete=True, **kwargs)

Implementation of prediction

Parameters:
  • ngsFile (str) – The path to the NGS file of interest
  • output (str) – The path to the output file or directory
  • command (str) – The path to a alternative binary (if binary is not globally executable)
  • options (str) – A string with additional options that is directly past to the tool
  • delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns:

A list of Allele objects representing the most likely HLA genotype

Return type:

list(Allele)

version

The version of the predictor

Indices and tables