Welcome to Fred2’s documentation!¶
Welcome to the class and function documentation of FRED2.
Tutorials on how to use FRED2 can be found at:
https://github.com/FRED-2/Fred2/tree/master/Fred2/tutorials.
Basic¶
Fred2.Core Module¶
Core.Allele¶
-
class
Fred2.Core.Allele.
Allele
(name, prob=None)¶ Bases:
Fred2.Core.Base.MetadataLogger
This class represents an HLA Allele and stores additional information
-
get_metadata
(label, only_first=False)¶ Getter for the saved metadata with the key
label
Parameters: - label (str) – key for the metadata that is inferred
- only_first (bool) – true if only the the first element of the matadata list is to be returned
-
log_metadata
(label, value)¶ Inserts a new metadata
Parameters: - label (str) – key for the metadata that will be added
- value (list(object)) – any kindy of additional value that should be kept
-
Core.Base¶
https://docs.python.org/3/library/abc.html
-
class
Fred2.Core.Base.
ACleavageFragmentPrediction
¶ Bases:
object
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved
-
name
¶ The name of the predictor
-
predict
(aa_seq, **kwargs)¶ Predicts the probability that the fragment can be produced by the proteasom
Parameters: aa_seq ( Peptide
) – The sequence to be cleavedReturns: Returns a AResult
object for the specified Bio.SeqReturn type: AResult
-
supportedLength
¶ The supported lengths of the predictor
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.Core.Base.
ACleavageSitePrediction
¶ Bases:
object
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved (starting from 1)
-
name
¶ The name of the predictor
-
predict
(aa_seq, **kwargs)¶ Predicts the proteasomal cleavage site of the given sequences
Parameters: aa_seq ( Peptide
orProtein
) – The sequence to be cleavedReturns: Returns a AResult
object for the specified Bio.SeqReturn type: AResult
-
supportedLength
¶ The supported lengths of the predictor
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.Core.Base.
AEpitopePrediction
¶ Bases:
object
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – The alleles for which the internal predictor representation is neededReturns: Returns a string representation of the input alleles Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Predicts the binding affinity for a given peptide or peptide lists for a given list of alleles. If alleles is not given, predictions for all valid alleles of the predictor is performed. If, however, a list of alleles is given, predictions for the valid allele subset is performed.
Parameters: Returns: Returns a
AResult
object for the specifiedPeptide
andAllele
Return type:
-
supportedAlleles
¶ A list of valid allele models
-
supportedLength
¶ A list of supported peptide lengths
-
version
¶ The version of the predictor
-
-
class
Fred2.Core.Base.
AExternal
¶ Bases:
object
Base class for external tools
-
command
¶ Defines the commandline call for external tool
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – - Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
-
class
Fred2.Core.Base.
AHLATyping
¶ Bases:
object
-
name
¶ The name of the predictor
-
predict
(ngsFile, output, **kwargs)¶ Prediction method for inferring the HLA typing
Parameters: - ngsFile (str) – The path to the input file containing the NGS reads
- output (str) – The path to the output file or directory
Returns: A list of HLA alleles representing the genotype predicted by the algorithm
Return type: list(
Allele
)
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.Core.Base.
APluginRegister
(name, bases, nmspc)¶ Bases:
abc.ABCMeta
This class allows automatic registration of new plugins.
-
mro
() → list¶ return a type’s method resolution order
-
register
(subclass)¶ Register a virtual subclass of an ABC.
-
-
class
Fred2.Core.Base.
ASVM
¶ Bases:
object
Base class for SVM prediction tools
-
class
Fred2.Core.Base.
ATAPPrediction
¶ Bases:
object
-
name
¶ The name of the predictor
-
predict
(peptides, **kwargs)¶ Predicts the TAP affinity for the given sequences
Parameters: peptides (list( Peptide
)/Peptide
) –Peptide
for which TAP affinity should be predictedReturns: Returns a TAPResult
objectReturn type: TAPResult
-
supportedLength
¶ The supported lengths of the predictor
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.Core.Base.
MetadataLogger
¶ Bases:
object
This class provides a simple interface for assigning additional metadata to any object in our data model. Examples: storing ANNOVAR columns like depth, base count, dbSNP id, quality information for variants, additional prediction information for peptides etc. This functionality is not used from core methods of FRED2.
The saved values are accessed via
log_metadata()
andget_metadata()
-
get_metadata
(label, only_first=False)¶ Getter for the saved metadata with the key
label
Parameters: - label (str) – key for the metadata that is inferred
- only_first (bool) – true if only the the first element of the matadata list is to be returned
-
log_metadata
(label, value)¶ Inserts a new metadata
Parameters: - label (str) – key for the metadata that will be added
- value (list(object)) – any kindy of additional value that should be kept
-
Core.Generator¶
-
Fred2.Core.Generator.
generate_peptides_from_proteins
(proteins, window_size, peptides=None)¶ Creates all
Peptide
for a given window size, from a givenProtein
.The result is a generator.
Parameters: - proteins (list(
Protein
) orProtein
) – (Iterable of) protein(s) from which a list of unique peptides should be generated - window_size (int) – Size of peptide fragments
- peptides (list(
Peptide
)) – A list of peptides to update during peptide generation (usa case: Adding and updating Peptides of newly generated Proteins)
Returns: A unique generator of peptides
Return type: Generator(
Peptide
)- proteins (list(
-
Fred2.Core.Generator.
generate_peptides_from_variants
(vars, length, dbadapter, peptides=None, table='Standard', stop_symbol='*', to_stop=True, cds=False)¶ Generates
Peptide
fromVariant
and avoids the construction of all possible combinations of heterozygous variants by considering only those within the peptide sequence window. This reduces the number of combinations from 2^m with m = #Heterozygous Variants to 2^k with k<<m and k = #Heterozygous Variants within peptide window (and all frame-shift mutations that occurred prior to the current peptide window).The result is a generator.
Parameters: - vars (list(
Variant
)) – A list of variant objects to construct peptides from - length (int) – The length of the peptides to construct
- dbadapter (
ADBAdapter
) – AADBAdapter
to extract relevant transcript information - peptides (list(
Peptide
)) – A list of pre existing peptides that should be updated - table (str) – Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). Defaults to the ‘Standard’ table
- stop_symbol (str) – Single character string, what to use for any terminators, defaults to the asterisk, ‘*’
- to_stop (bool) – Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence)
- cds (bool) – cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised
Returns: A list of unique (polymorphic) peptides
Return type: Generator(
Peptide
)Raises: - ValueError – If incorrect table argument is pasted
- TranslationError – If sequence is not multiple of three, or first codon is not a start codon, or last codon is not a stop codon, or an extra stop codon was found in frame, or codon is non-valid
- vars (list(
-
Fred2.Core.Generator.
generate_proteins_from_transcripts
(transcripts, table='Standard', stop_symbol='*', to_stop=True, cds=False)¶ Enables the translation from a
Transcript
to aProtein
instance. The result is a generator.The result is a generator.
Parameters: - transcripts (list(
Transcript
) orTranscript
) – A list of or a single transcripts to translate - table (str) – Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). Defaults to the ‘Standard’ table
- stop_symbol (str) – Single character string, what to use for any terminators, defaults to the asterisk, ‘*’
- to_stop (bool) – Translates sequence and passes any stop codons if False (default True)(translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence)
- cds (bool) – Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised
Returns: The protein that corresponds to the transcript
Return type: Generator(
Protein
)Raises: - ValueError – If incorrect table argument is pasted
- TranslationError – If sequence is not multiple of three, or first codon is not a start codon, or last codon ist not a stop codon, or an extra stop codon was found in frame, or codon is non-valid
- transcripts (list(
-
Fred2.Core.Generator.
generate_transcripts_from_tumor_variants
(normal, tumor, dbadapter)¶ Generates all possible
Transcript
variations of the givenVariant
.The result is a generator.
Parameters: Returns: A generator of transcripts with all possible variations determined by the given variant list
Return type: Generator(
Transcript
)
-
Fred2.Core.Generator.
generate_transcripts_from_variants
(vars, dbadapter)¶ Generates all possible transcript
Transcript
based on the givenVariant
.The result is a generator.
Parameters: vars (list( Variant
)) – A list of variants for which transcripts should be buildParam: dbadapter: a DBAdapter to fetch the transcript sequences Returns: A generator of transcripts with all possible variations determined by the given variant list Return type: Generator(:class:`~Fred2.Core.Transcript.Transcript) Invariant: Variants are considered to be annotated from forward strand, regardless of the transcripts real orientation
Core.Peptide¶
-
class
Fred2.Core.Peptide.
Peptide
(seq, protein_pos=None)¶ Bases:
Fred2.Core.Base.MetadataLogger
,Bio.Seq.Seq
This class encapsulates a
Peptide
, belonging to one or severalProtein
.Note
For accessing and manipulating the sequence see also
Bio.Seq.Seq
(from Biopython)-
back_transcribe
()¶ Returns the DNA sequence from an RNA sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", ... IUPAC.unambiguous_rna) >>> messenger_rna Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA()) >>> messenger_rna.back_transcribe() Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
Trying to back-transcribe a protein or DNA sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.back_transcribe() Traceback (most recent call last): ... ValueError: Proteins cannot be back transcribed!
-
complement
()¶ Returns the complement sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_dna = Seq("CCCCCGATAG", IUPAC.unambiguous_dna) >>> my_dna Seq('CCCCCGATAG', IUPACUnambiguousDNA()) >>> my_dna.complement() Seq('GGGGGCTATC', IUPACUnambiguousDNA())
You can of course used mixed case sequences,
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("CCCCCgatA-GD", generic_dna) >>> my_dna Seq('CCCCCgatA-GD', DNAAlphabet()) >>> my_dna.complement() Seq('GGGGGctaT-CH', DNAAlphabet())
Note in the above example, ambiguous character D denotes G, A or T so its complement is H (for C, T or A).
Trying to complement a protein sequence raises an exception.
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.complement() Traceback (most recent call last): ... ValueError: Proteins do not have complements!
-
count
(sub, start=0, end=9223372036854775807)¶ Non-overlapping count method, like that of a python string.
This behaves like the python string method of the same name, which does a non-overlapping count!
Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
e.g.
>>> from Bio.Seq import Seq >>> my_seq = Seq("AAAATGA") >>> print(my_seq.count("A")) 5 >>> print(my_seq.count("ATG")) 1 >>> print(my_seq.count(Seq("AT"))) 1 >>> print(my_seq.count("AT", 2, -1)) 1
HOWEVER, please note because python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:
>>> "AAAA".count("AA") 2 >>> print(Seq("AAAA").count("AA")) 2
An overlapping search would give the answer as three!
-
endswith
(suffix, start=0, end=9223372036854775807)¶ Does the Seq end with the given suffix? Returns True/False.
This behaves like the python string method of the same name.
Return True if the sequence ends with the specified suffix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. suffix can also be a tuple of strings to try. e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.endswith("UUG") True >>> my_rna.endswith("AUG") False >>> my_rna.endswith("AUG", 0, 18) True >>> my_rna.endswith(("UCC", "UCA", "UUG")) True
-
find
(sub, start=0, end=9223372036854775807)¶ Find method, like that of a python string.
This behaves like the python string method of the same name.
Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
Returns -1 if the subsequence is NOT found.
e.g. Locating the first typical start codon, AUG, in an RNA sequence:
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.find("AUG") 3
-
get_all_proteins
()¶ Returns all
Protein
objects associated with thePeptide
Returns: A list of Protein
Return type: list( Protein
)
-
get_all_transcripts
()¶ Returns a list of
Transcript
objects that are associated with thePeptide
Returns: A list of Transcript
Return type: list( Transcript
)
-
get_metadata
(label, only_first=False)¶ Getter for the saved metadata with the key
label
Parameters: - label (str) – key for the metadata that is inferred
- only_first (bool) – true if only the the first element of the matadata list is to be returned
-
get_protein
(transcript_id)¶ Returns a specific protein object identified by a unique transcript-ID
Parameters: transcript_id (str) – A Transcript
IDReturns: A Protein
Return type: Protein
-
get_protein_positions
(transcript_id)¶ Returns all positions of origin for a given
Protein
identified by its transcript-IDParameters: transcript_id (str) – The unique transcript ID of the Protein
in questionReturns: A list of positions within the protein from which the Peptide
originated (starts at 0)Return type: list(int)
-
get_transcript
(transcript_id)¶ Returns a specific
Transcript
object identified by a unique transcript-IDParameters: transcript_id (str) – A Transcript
IDReturns: A Transcript
Return type: Transcript
-
get_variants_by_protein
(transcript_id)¶ Returns all
Variant
of aProtein
that have influenced thePeptide
sequenceParameters: transcript_id (str) – Transcript
ID of the specific protein in questionReturns: A list variants that influenced the peptide sequence Return type: list( Variant
)Raises KeyError: If peptide does not originate from specified Protein
-
get_variants_by_protein_position
(transcript_id, protein_pos)¶ Returns all
Variant
and their relative position to the peptide sequence of a givenProtein
and protein positionParameters: - transcript_id (str) – A
Transcript
ID of the specific protein in question - protein_pos (int) – The
Protein
position at which the peptides sequence starts in the protein
Returns: Dictionary of relative position of variants in peptide (starts at 0) and associated variants that influenced the peptide sequence
Return type: dict(int,list(
Variant
))Raises: ValueError: If Peptide
does not start at specified positionKeyError: If Peptide
does not originate from specifiedProtein
- transcript_id (str) – A
-
log_metadata
(label, value)¶ Inserts a new metadata
Parameters: - label (str) – key for the metadata that will be added
- value (list(object)) – any kindy of additional value that should be kept
-
lower
()¶ Returns a lower case copy of the sequence.
This will adjust the alphabet if required. Note that the IUPAC alphabets are upper case only, and thus a generic alphabet must be substituted.
>>> from Bio.Alphabet import Gapped, generic_dna >>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> my_seq = Seq("CGGTACGCTTATGTCACGTAG*AAAAAA", Gapped(IUPAC.unambiguous_dna, "*")) >>> my_seq Seq('CGGTACGCTTATGTCACGTAG*AAAAAA', Gapped(IUPACUnambiguousDNA(), '*')) >>> my_seq.lower() Seq('cggtacgcttatgtcacgtag*aaaaaa', Gapped(DNAAlphabet(), '*'))
See also the upper method.
-
lstrip
(chars=None)¶ Returns a new Seq object with leading (left) end stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. print(my_seq.lstrip(“-”))
See also the strip and rstrip methods.
-
reverse_complement
()¶ Returns the reverse complement sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_dna = Seq("CCCCCGATAGNR", IUPAC.ambiguous_dna) >>> my_dna Seq('CCCCCGATAGNR', IUPACAmbiguousDNA()) >>> my_dna.reverse_complement() Seq('YNCTATCGGGGG', IUPACAmbiguousDNA())
Note in the above example, since R = G or A, its complement is Y (which denotes C or T).
You can of course used mixed case sequences,
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("CCCCCgatA-G", generic_dna) >>> my_dna Seq('CCCCCgatA-G', DNAAlphabet()) >>> my_dna.reverse_complement() Seq('C-TatcGGGGG', DNAAlphabet())
Trying to complement a protein sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.reverse_complement() Traceback (most recent call last): ... ValueError: Proteins do not have complements!
-
rfind
(sub, start=0, end=9223372036854775807)¶ Find from right method, like that of a python string.
This behaves like the python string method of the same name.
Returns an integer, the index of the last (right most) occurrence of substring argument sub in the (sub)sequence given by [start:end].
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
Returns -1 if the subsequence is NOT found.
e.g. Locating the last typical start codon, AUG, in an RNA sequence:
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.rfind("AUG") 15
-
rsplit
(sep=None, maxsplit=-1)¶ Right split method, like that of a python string.
This behaves like the python string method of the same name.
Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done COUNTING FROM THE RIGHT. If maxsplit is omitted, all splits are made.
Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.
e.g. print(my_seq.rsplit(“*”,1))
See also the split method.
-
rstrip
(chars=None)¶ Returns a new Seq object with trailing (right) end stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. Removing a nucleotide sequence’s polyadenylation (poly-A tail):
>>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> my_seq = Seq("CGGTACGCTTATGTCACGTAGAAAAAA", IUPAC.unambiguous_dna) >>> my_seq Seq('CGGTACGCTTATGTCACGTAGAAAAAA', IUPACUnambiguousDNA()) >>> my_seq.rstrip("A") Seq('CGGTACGCTTATGTCACGTAG', IUPACUnambiguousDNA())
See also the strip and lstrip methods.
-
split
(sep=None, maxsplit=-1)¶ Split method, like that of a python string.
This behaves like the python string method of the same name.
Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If maxsplit is omitted, all splits are made.
Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.
e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_aa = my_rna.translate() >>> my_aa Seq('VMAIVMGR*KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_aa.split("*") [Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))] >>> my_aa.split("*", 1) [Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
See also the rsplit method:
>>> my_aa.rsplit("*", 1) [Seq('VMAIVMGR*KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
-
startswith
(prefix, start=0, end=9223372036854775807)¶ Does the Seq start with the given prefix? Returns True/False.
This behaves like the python string method of the same name.
Return True if the sequence starts with the specified prefix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. prefix can also be a tuple of strings to try. e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.startswith("GUC") True >>> my_rna.startswith("AUG") False >>> my_rna.startswith("AUG", 3) True >>> my_rna.startswith(("UCC", "UCA", "UCG"), 1) True
-
strip
(chars=None)¶ Returns a new Seq object with leading and trailing ends stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. print(my_seq.strip(“-”))
See also the lstrip and rstrip methods.
-
tomutable
()¶ Returns the full sequence as a MutableSeq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_seq = Seq("MKQHKAMIVALIVICITAVVAAL", ... IUPAC.protein) >>> my_seq Seq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein()) >>> my_seq.tomutable() MutableSeq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
Note that the alphabet is preserved.
-
tostring
()¶ Returns the full sequence as a python string (DEPRECATED).
You are now encouraged to use str(my_seq) instead of my_seq.tostring().
-
transcribe
()¶ Returns the RNA sequence from a DNA sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", ... IUPAC.unambiguous_dna) >>> coding_dna Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA()) >>> coding_dna.transcribe() Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())
Trying to transcribe a protein or RNA sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.transcribe() Traceback (most recent call last): ... ValueError: Proteins cannot be transcribed!
-
translate
(table='Standard', stop_symbol='*', to_stop=False, cds=False)¶ Turns a nucleotide sequence into a protein sequence. New Seq object.
This method will translate DNA or RNA sequences, and those with a nucleotide or generic alphabet. Trying to translate a protein sequence raises an exception.
- Arguments:
- table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). This defaults to the “Standard” table.
- stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, “*”.
- to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
- cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.
e.g. Using the standard table:
>>> coding_dna = Seq("GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG") >>> coding_dna.translate() Seq('VAIVMGR*KGAR*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> coding_dna.translate(stop_symbol="@") Seq('VAIVMGR@KGAR@', HasStopCodon(ExtendedIUPACProtein(), '@')) >>> coding_dna.translate(to_stop=True) Seq('VAIVMGR', ExtendedIUPACProtein())
Now using NCBI table 2, where TGA is not a stop codon:
>>> coding_dna.translate(table=2) Seq('VAIVMGRWKGAR*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> coding_dna.translate(table=2, to_stop=True) Seq('VAIVMGRWKGAR', ExtendedIUPACProtein())
In fact, GTG is an alternative start codon under NCBI table 2, meaning this sequence could be a complete CDS:
>>> coding_dna.translate(table=2, cds=True) Seq('MAIVMGRWKGAR', ExtendedIUPACProtein())
It isn’t a valid CDS under NCBI table 1, due to both the start codon and also the in frame stop codons:
>>> coding_dna.translate(table=1, cds=True) Traceback (most recent call last): ... TranslationError: First codon 'GTG' is not a start codon
If the sequence has no in-frame stop codon, then the to_stop argument has no effect:
>>> coding_dna2 = Seq("TTGGCCATTGTAATGGGCCGC") >>> coding_dna2.translate() Seq('LAIVMGR', ExtendedIUPACProtein()) >>> coding_dna2.translate(to_stop=True) Seq('LAIVMGR', ExtendedIUPACProtein())
NOTE - Ambiguous codons like “TAN” or “NNN” could be an amino acid or a stop codon. These are translated as “X”. Any invalid codon (e.g. “TA?” or “T-A”) will throw a TranslationError.
NOTE - Does NOT support gapped sequences.
NOTE - This does NOT behave like the python string’s translate method. For that use str(my_seq).translate(...) instead.
-
ungap
(gap=None)¶ Return a copy of the sequence without the gap character(s).
The gap character can be specified in two ways - either as an explicit argument, or via the sequence’s alphabet. For example:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("-ATA--TGAAAT-TTGAAAA", generic_dna) >>> my_dna Seq('-ATA--TGAAAT-TTGAAAA', DNAAlphabet()) >>> my_dna.ungap("-") Seq('ATATGAAATTTGAAAA', DNAAlphabet())
If the gap character is not given as an argument, it will be taken from the sequence’s alphabet (if defined). Notice that the returned sequence’s alphabet is adjusted since it no longer requires a gapped alphabet:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC, Gapped, HasStopCodon >>> my_pro = Seq("MVVLE=AD*", HasStopCodon(Gapped(IUPAC.protein, "="))) >>> my_pro Seq('MVVLE=AD*', HasStopCodon(Gapped(IUPACProtein(), '='), '*')) >>> my_pro.ungap() Seq('MVVLEAD*', HasStopCodon(IUPACProtein(), '*'))
Or, with a simpler gapped DNA example:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC, Gapped >>> my_seq = Seq("CGGGTAG=AAAAAA", Gapped(IUPAC.unambiguous_dna, "=")) >>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap() Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())
As long as it is consistent with the alphabet, although it is redundant, you can still supply the gap character as an argument to this method:
>>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap("=") Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())
However, if the gap character given as the argument disagrees with that declared in the alphabet, an exception is raised:
>>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap("-") Traceback (most recent call last): ... ValueError: Gap '-' does not match '=' from alphabet
Finally, if a gap character is not supplied, and the alphabet does not define one, an exception is raised:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("ATA--TGAAAT-TTGAAAA", generic_dna) >>> my_dna Seq('ATA--TGAAAT-TTGAAAA', DNAAlphabet()) >>> my_dna.ungap() Traceback (most recent call last): ... ValueError: Gap character not given and not defined in alphabet
-
upper
()¶ Returns an upper case copy of the sequence.
>>> from Bio.Alphabet import HasStopCodon, generic_protein >>> from Bio.Seq import Seq >>> my_seq = Seq("VHLTPeeK*", HasStopCodon(generic_protein)) >>> my_seq Seq('VHLTPeeK*', HasStopCodon(ProteinAlphabet(), '*')) >>> my_seq.lower() Seq('vhltpeek*', HasStopCodon(ProteinAlphabet(), '*')) >>> my_seq.upper() Seq('VHLTPEEK*', HasStopCodon(ProteinAlphabet(), '*'))
This will adjust the alphabet if required. See also the lower method.
-
Core.Protein¶
-
class
Fred2.Core.Protein.
Protein
(_seq, gene_id='unknown', transcript_id=None, orig_transcript=None, vars=None)¶ Bases:
Fred2.Core.Base.MetadataLogger
,Bio.Seq.Seq
Protein
corresponding to exactly one transcript.Note
For accessing and manipulating the sequence see also
Bio.Seq.Seq
(from Biopython)-
back_transcribe
()¶ Returns the DNA sequence from an RNA sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", ... IUPAC.unambiguous_rna) >>> messenger_rna Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA()) >>> messenger_rna.back_transcribe() Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
Trying to back-transcribe a protein or DNA sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.back_transcribe() Traceback (most recent call last): ... ValueError: Proteins cannot be back transcribed!
-
complement
()¶ Returns the complement sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_dna = Seq("CCCCCGATAG", IUPAC.unambiguous_dna) >>> my_dna Seq('CCCCCGATAG', IUPACUnambiguousDNA()) >>> my_dna.complement() Seq('GGGGGCTATC', IUPACUnambiguousDNA())
You can of course used mixed case sequences,
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("CCCCCgatA-GD", generic_dna) >>> my_dna Seq('CCCCCgatA-GD', DNAAlphabet()) >>> my_dna.complement() Seq('GGGGGctaT-CH', DNAAlphabet())
Note in the above example, ambiguous character D denotes G, A or T so its complement is H (for C, T or A).
Trying to complement a protein sequence raises an exception.
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.complement() Traceback (most recent call last): ... ValueError: Proteins do not have complements!
-
count
(sub, start=0, end=9223372036854775807)¶ Non-overlapping count method, like that of a python string.
This behaves like the python string method of the same name, which does a non-overlapping count!
Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
e.g.
>>> from Bio.Seq import Seq >>> my_seq = Seq("AAAATGA") >>> print(my_seq.count("A")) 5 >>> print(my_seq.count("ATG")) 1 >>> print(my_seq.count(Seq("AT"))) 1 >>> print(my_seq.count("AT", 2, -1)) 1
HOWEVER, please note because python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:
>>> "AAAA".count("AA") 2 >>> print(Seq("AAAA").count("AA")) 2
An overlapping search would give the answer as three!
-
endswith
(suffix, start=0, end=9223372036854775807)¶ Does the Seq end with the given suffix? Returns True/False.
This behaves like the python string method of the same name.
Return True if the sequence ends with the specified suffix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. suffix can also be a tuple of strings to try. e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.endswith("UUG") True >>> my_rna.endswith("AUG") False >>> my_rna.endswith("AUG", 0, 18) True >>> my_rna.endswith(("UCC", "UCA", "UUG")) True
-
find
(sub, start=0, end=9223372036854775807)¶ Find method, like that of a python string.
This behaves like the python string method of the same name.
Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
Returns -1 if the subsequence is NOT found.
e.g. Locating the first typical start codon, AUG, in an RNA sequence:
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.find("AUG") 3
-
get_metadata
(label, only_first=False)¶ Getter for the saved metadata with the key
label
Parameters: - label (str) – key for the metadata that is inferred
- only_first (bool) – true if only the the first element of the matadata list is to be returned
-
log_metadata
(label, value)¶ Inserts a new metadata
Parameters: - label (str) – key for the metadata that will be added
- value (list(object)) – any kindy of additional value that should be kept
-
lower
()¶ Returns a lower case copy of the sequence.
This will adjust the alphabet if required. Note that the IUPAC alphabets are upper case only, and thus a generic alphabet must be substituted.
>>> from Bio.Alphabet import Gapped, generic_dna >>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> my_seq = Seq("CGGTACGCTTATGTCACGTAG*AAAAAA", Gapped(IUPAC.unambiguous_dna, "*")) >>> my_seq Seq('CGGTACGCTTATGTCACGTAG*AAAAAA', Gapped(IUPACUnambiguousDNA(), '*')) >>> my_seq.lower() Seq('cggtacgcttatgtcacgtag*aaaaaa', Gapped(DNAAlphabet(), '*'))
See also the upper method.
-
lstrip
(chars=None)¶ Returns a new Seq object with leading (left) end stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. print(my_seq.lstrip(“-”))
See also the strip and rstrip methods.
-
newid
= <method-wrapper 'next' of itertools.count object>¶
-
reverse_complement
()¶ Returns the reverse complement sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_dna = Seq("CCCCCGATAGNR", IUPAC.ambiguous_dna) >>> my_dna Seq('CCCCCGATAGNR', IUPACAmbiguousDNA()) >>> my_dna.reverse_complement() Seq('YNCTATCGGGGG', IUPACAmbiguousDNA())
Note in the above example, since R = G or A, its complement is Y (which denotes C or T).
You can of course used mixed case sequences,
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("CCCCCgatA-G", generic_dna) >>> my_dna Seq('CCCCCgatA-G', DNAAlphabet()) >>> my_dna.reverse_complement() Seq('C-TatcGGGGG', DNAAlphabet())
Trying to complement a protein sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.reverse_complement() Traceback (most recent call last): ... ValueError: Proteins do not have complements!
-
rfind
(sub, start=0, end=9223372036854775807)¶ Find from right method, like that of a python string.
This behaves like the python string method of the same name.
Returns an integer, the index of the last (right most) occurrence of substring argument sub in the (sub)sequence given by [start:end].
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
Returns -1 if the subsequence is NOT found.
e.g. Locating the last typical start codon, AUG, in an RNA sequence:
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.rfind("AUG") 15
-
rsplit
(sep=None, maxsplit=-1)¶ Right split method, like that of a python string.
This behaves like the python string method of the same name.
Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done COUNTING FROM THE RIGHT. If maxsplit is omitted, all splits are made.
Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.
e.g. print(my_seq.rsplit(“*”,1))
See also the split method.
-
rstrip
(chars=None)¶ Returns a new Seq object with trailing (right) end stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. Removing a nucleotide sequence’s polyadenylation (poly-A tail):
>>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> my_seq = Seq("CGGTACGCTTATGTCACGTAGAAAAAA", IUPAC.unambiguous_dna) >>> my_seq Seq('CGGTACGCTTATGTCACGTAGAAAAAA', IUPACUnambiguousDNA()) >>> my_seq.rstrip("A") Seq('CGGTACGCTTATGTCACGTAG', IUPACUnambiguousDNA())
See also the strip and lstrip methods.
-
split
(sep=None, maxsplit=-1)¶ Split method, like that of a python string.
This behaves like the python string method of the same name.
Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If maxsplit is omitted, all splits are made.
Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.
e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_aa = my_rna.translate() >>> my_aa Seq('VMAIVMGR*KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_aa.split("*") [Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))] >>> my_aa.split("*", 1) [Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
See also the rsplit method:
>>> my_aa.rsplit("*", 1) [Seq('VMAIVMGR*KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
-
startswith
(prefix, start=0, end=9223372036854775807)¶ Does the Seq start with the given prefix? Returns True/False.
This behaves like the python string method of the same name.
Return True if the sequence starts with the specified prefix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. prefix can also be a tuple of strings to try. e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.startswith("GUC") True >>> my_rna.startswith("AUG") False >>> my_rna.startswith("AUG", 3) True >>> my_rna.startswith(("UCC", "UCA", "UCG"), 1) True
-
strip
(chars=None)¶ Returns a new Seq object with leading and trailing ends stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. print(my_seq.strip(“-”))
See also the lstrip and rstrip methods.
-
tomutable
()¶ Returns the full sequence as a MutableSeq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_seq = Seq("MKQHKAMIVALIVICITAVVAAL", ... IUPAC.protein) >>> my_seq Seq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein()) >>> my_seq.tomutable() MutableSeq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
Note that the alphabet is preserved.
-
tostring
()¶ Returns the full sequence as a python string (DEPRECATED).
You are now encouraged to use str(my_seq) instead of my_seq.tostring().
-
transcribe
()¶ Returns the RNA sequence from a DNA sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", ... IUPAC.unambiguous_dna) >>> coding_dna Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA()) >>> coding_dna.transcribe() Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())
Trying to transcribe a protein or RNA sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.transcribe() Traceback (most recent call last): ... ValueError: Proteins cannot be transcribed!
-
translate
(table='Standard', stop_symbol='*', to_stop=False, cds=False)¶ Turns a nucleotide sequence into a protein sequence. New Seq object.
This method will translate DNA or RNA sequences, and those with a nucleotide or generic alphabet. Trying to translate a protein sequence raises an exception.
- Arguments:
- table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). This defaults to the “Standard” table.
- stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, “*”.
- to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
- cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.
e.g. Using the standard table:
>>> coding_dna = Seq("GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG") >>> coding_dna.translate() Seq('VAIVMGR*KGAR*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> coding_dna.translate(stop_symbol="@") Seq('VAIVMGR@KGAR@', HasStopCodon(ExtendedIUPACProtein(), '@')) >>> coding_dna.translate(to_stop=True) Seq('VAIVMGR', ExtendedIUPACProtein())
Now using NCBI table 2, where TGA is not a stop codon:
>>> coding_dna.translate(table=2) Seq('VAIVMGRWKGAR*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> coding_dna.translate(table=2, to_stop=True) Seq('VAIVMGRWKGAR', ExtendedIUPACProtein())
In fact, GTG is an alternative start codon under NCBI table 2, meaning this sequence could be a complete CDS:
>>> coding_dna.translate(table=2, cds=True) Seq('MAIVMGRWKGAR', ExtendedIUPACProtein())
It isn’t a valid CDS under NCBI table 1, due to both the start codon and also the in frame stop codons:
>>> coding_dna.translate(table=1, cds=True) Traceback (most recent call last): ... TranslationError: First codon 'GTG' is not a start codon
If the sequence has no in-frame stop codon, then the to_stop argument has no effect:
>>> coding_dna2 = Seq("TTGGCCATTGTAATGGGCCGC") >>> coding_dna2.translate() Seq('LAIVMGR', ExtendedIUPACProtein()) >>> coding_dna2.translate(to_stop=True) Seq('LAIVMGR', ExtendedIUPACProtein())
NOTE - Ambiguous codons like “TAN” or “NNN” could be an amino acid or a stop codon. These are translated as “X”. Any invalid codon (e.g. “TA?” or “T-A”) will throw a TranslationError.
NOTE - Does NOT support gapped sequences.
NOTE - This does NOT behave like the python string’s translate method. For that use str(my_seq).translate(...) instead.
-
ungap
(gap=None)¶ Return a copy of the sequence without the gap character(s).
The gap character can be specified in two ways - either as an explicit argument, or via the sequence’s alphabet. For example:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("-ATA--TGAAAT-TTGAAAA", generic_dna) >>> my_dna Seq('-ATA--TGAAAT-TTGAAAA', DNAAlphabet()) >>> my_dna.ungap("-") Seq('ATATGAAATTTGAAAA', DNAAlphabet())
If the gap character is not given as an argument, it will be taken from the sequence’s alphabet (if defined). Notice that the returned sequence’s alphabet is adjusted since it no longer requires a gapped alphabet:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC, Gapped, HasStopCodon >>> my_pro = Seq("MVVLE=AD*", HasStopCodon(Gapped(IUPAC.protein, "="))) >>> my_pro Seq('MVVLE=AD*', HasStopCodon(Gapped(IUPACProtein(), '='), '*')) >>> my_pro.ungap() Seq('MVVLEAD*', HasStopCodon(IUPACProtein(), '*'))
Or, with a simpler gapped DNA example:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC, Gapped >>> my_seq = Seq("CGGGTAG=AAAAAA", Gapped(IUPAC.unambiguous_dna, "=")) >>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap() Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())
As long as it is consistent with the alphabet, although it is redundant, you can still supply the gap character as an argument to this method:
>>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap("=") Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())
However, if the gap character given as the argument disagrees with that declared in the alphabet, an exception is raised:
>>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap("-") Traceback (most recent call last): ... ValueError: Gap '-' does not match '=' from alphabet
Finally, if a gap character is not supplied, and the alphabet does not define one, an exception is raised:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("ATA--TGAAAT-TTGAAAA", generic_dna) >>> my_dna Seq('ATA--TGAAAT-TTGAAAA', DNAAlphabet()) >>> my_dna.ungap() Traceback (most recent call last): ... ValueError: Gap character not given and not defined in alphabet
-
upper
()¶ Returns an upper case copy of the sequence.
>>> from Bio.Alphabet import HasStopCodon, generic_protein >>> from Bio.Seq import Seq >>> my_seq = Seq("VHLTPeeK*", HasStopCodon(generic_protein)) >>> my_seq Seq('VHLTPeeK*', HasStopCodon(ProteinAlphabet(), '*')) >>> my_seq.lower() Seq('vhltpeek*', HasStopCodon(ProteinAlphabet(), '*')) >>> my_seq.upper() Seq('VHLTPEEK*', HasStopCodon(ProteinAlphabet(), '*'))
This will adjust the alphabet if required. See also the lower method.
-
Core.Result¶
-
class
Fred2.Core.Result.
AResult
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
pandas.core.frame.DataFrame
A
AResult
object is apandas.DataFrame
with with multi-indexing.This class is used as interface and can be extended with custom short-cuts for the sometimes often tedious calls in pandas
-
T
¶ Transpose index and columns
-
abs
()¶ Return an object with absolute value taken. Only applicable to objects that are all numeric
abs: type of caller
-
add
(other, axis='columns', level=None, fill_value=None)¶ Binary operator add with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
add_prefix
(prefix)¶ Concatenate prefix string with panel items names.
prefix : string
with_prefix : type of caller
-
add_suffix
(suffix)¶ Concatenate suffix string with panel items names
suffix : string
with_suffix : type of caller
-
align
(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)¶ Align two object on their axes with the specified join method for each axis Index
other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None
Align on index (0), columns (1), or both (None)- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- copy : boolean, default True
- Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
method : str, default None limit : int, default None fill_axis : {0, 1}, default 0
Filling axis, method and limit- (left, right) : (type of input, type of other)
- Aligned objects
-
all
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
any
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any element is True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
append
(other, ignore_index=False, verify_integrity=False)¶ Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.
other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False
If True do not use the index labels. Useful for gluing together record arrays- verify_integrity : boolean, default False
- If True, raise ValueError on creating index with duplicates
If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged
appended : DataFrame
-
apply
(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)¶ Applies function along input axis of DataFrame.
Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.
- func : function
- Function to apply to each column/row
- axis : {0, 1}
- 0 : apply function to each column
- 1 : apply function to each row
- broadcast : boolean, default False
- For aggregation functions, return object of same size with values propagated
- reduce : boolean or None, default None
- Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
- raw : boolean, default False
- If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
- args : tuple
- Positional arguments to pass to function in addition to the array/series
Additional keyword arguments will be passed as keywords to the function
>>> df.apply(numpy.sqrt) # returns DataFrame >>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0) >>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)
DataFrame.applymap: For elementwise operations
applied : Series or DataFrame
-
applymap
(func)¶ Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
- func : function
- Python function, returns a single value from a single value
applied : DataFrame
DataFrame.apply : For operations on rows/columns
-
as_blocks
(columns=None)¶ Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.
are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
- as_matrix)
- columns : array-like
- Specific column order
values : a list of Object
-
as_matrix
(columns=None)¶ Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtype will be a lower-common-denominator dtype (implicit
upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks
- e.g. if the dtypes are float16,float32 -> float32
- float16,float32,float64 -> float64 int32,uint8 -> int32
- values : ndarray
- If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
-
asfreq
(freq, method=None, how=None, normalize=False)¶ Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.
freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method- how : {‘start’, ‘end’}, default end
- For PeriodIndex only, see PeriodIndex.asfreq
- normalize : bool, default False
- Whether to reset output index to midnight
converted : type of caller
-
astype
(dtype, copy=True, raise_on_error=True)¶ Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)
dtype : numpy.dtype or Python type raise_on_error : raise on invalid input
casted : type of caller
-
at
¶
-
at_time
(time, asof=False)¶ Select values at particular time of day (e.g. 9:30AM)
time : datetime.time or string
values_at_time : type of caller
-
axes
¶
-
between_time
(start_time, end_time, include_start=True, include_end=True)¶ Select values between particular times of the day (e.g., 9:00-9:30 AM)
start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True
values_between_time : type of caller
-
bfill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’bfill’)
-
blocks
¶ Internal property, property synonym for as_blocks()
-
bool
()¶ Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False
Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean
-
boxplot
(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)¶ Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns
data : DataFrame column : column names or list of names, or vector
Can be any valid input to groupby- by : string or sequence
- Column in the DataFrame to group by
ax : matplotlib axis object, default None fontsize : int or string rot : int, default None
Rotation for ticks- grid : boolean, default None (matlab style default)
- Axis grid lines
ax : matplotlib.axes.AxesSubplot
-
clip
(lower=None, upper=None, out=None)¶ Trim values at input threshold(s)
lower : float, default None upper : float, default None
clipped : Series
-
clip_lower
(threshold)¶ Return copy of the input with values below given value truncated
clip
clipped : same type as input
-
clip_upper
(threshold)¶ Return copy of input with values above given value truncated
clip
clipped : same type as input
-
combine
(other, func, fill_value=None, overwrite=True)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True
If True then overwrite values for common keys in the calling frameresult : DataFrame
-
combineAdd
(other)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combineMult
(other)¶ Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combine_first
(other)¶ Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns
other : DataFrame
a’s values prioritized, use values from b to fill holes:
>>> a.combine_first(b)
combined : DataFrame
-
compound
(axis=None, skipna=None, level=None, **kwargs)¶ Return the compound percentage of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
compounded : Series or DataFrame (if level specified)
-
consolidate
(inplace=False)¶ Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user
- inplace : boolean, default False
- If False return new object, otherwise modify existing object
consolidated : type of caller
-
convert_objects
(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)¶ Attempt to infer better dtype for object columns
- convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
- convert_numeric : if True attempt to coerce to numbers (including
- strings), non-convertibles get NaN
- convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
copy : Boolean, if True, return copy, default is True
converted : asm as input object
-
copy
(deep=True)¶ Make a copy of this object
- deep : boolean, default True
- Make a deep copy, i.e. also copy data
copy : type of caller
-
corr
(method='pearson', min_periods=1)¶ Compute pairwise correlation of columns, excluding NA/null values
- method : {‘pearson’, ‘kendall’, ‘spearman’}
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation
y : DataFrame
-
corrwith
(other, axis=0, drop=False)¶ Compute pairwise correlation between rows or columns of two DataFrame objects.
other : DataFrame axis : {0, 1}
0 to compute column-wise, 1 for row-wise- drop : boolean, default False
- Drop missing indices from result, default returns union of all
correls : Series
-
count
(axis=0, level=None, numeric_only=False)¶ Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- numeric_only : boolean, default False
- Include only float, int, boolean data
count : Series (or DataFrame if level specified)
-
cov
(min_periods=None)¶ Compute pairwise covariance of columns, excluding NA/null values
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result.
y : DataFrame
y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).
-
cummax
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative max over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmax : Series
-
cummin
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative min over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmin : Series
-
cumprod
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative prod over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAprod : Series
-
cumsum
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative sum over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAsum : Series
-
delevel
(*args, **kwargs)¶
-
describe
(percentile_width=50)¶ Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles
- percentile_width : float, optional
- width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75
DataFrame of summary statistics
-
diff
(periods=1)¶ 1st discrete difference of object
- periods : int, default 1
- Periods to shift for forming difference
diffed : DataFrame
-
div
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
divide
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
dot
(other)¶ Matrix multiplication with DataFrame or Series objects
other : DataFrame or Series
dot_product : DataFrame or Series
-
drop
(labels, axis=0, level=None, inplace=False, **kwargs)¶ Return new object with labels in requested axis removed
labels : single label or list-like axis : int or axis name level : int or name, default None
For MultiIndex- inplace : bool, default False
- If True, do operation inplace and return None.
dropped : type of caller
-
drop_duplicates
(cols=None, take_last=False, inplace=False)¶ Return DataFrame with duplicate rows removed, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
- inplace : boolean, default False
- Whether to drop duplicates in place or to return a copy
deduplicated : DataFrame
-
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Return object with labels on given axis omitted where alternately any or all of the data are missing
- axis : {0, 1}, or tuple/list thereof
- Pass tuple or list to drop on multiple axes
- how : {‘any’, ‘all’}
- any : if any NA values are present, drop that label
- all : if all values are NA, drop that label
- thresh : int, default None
- int value : require that many non-NA values
- subset : array-like
- Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
- inplace : boolean, defalt False
- If True, do operation inplace and return None.
dropped : DataFrame
-
dtypes
¶ Return the dtypes in this object
-
duplicated
(cols=None, take_last=False)¶ Return boolean Series denoting duplicate rows, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
duplicated : Series
-
empty
¶ True if NDFrame is entirely empty [no items]
-
eq
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods eq
-
equals
(other)¶ Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
-
eval
(expr, **kwargs)¶ Evaluate an expression in the context of the calling DataFrame instance.
- expr : string
- The expression string to evaluate.
- kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
ret : ndarray, scalar, or pandas object
pandas.DataFrame.query pandas.eval
For more details see the API documentation for
eval()
. For detailed examples see enhancing performance with eval.>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.eval('a + b') >>> df.eval('c=a + b')
-
ffill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’ffill’)
-
fillna
(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)¶ Fill NA/NaN values using the specified method
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- value : scalar, dict, or Series
- Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- inplace : boolean, default False
- If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
- limit : int, default None
- Maximum size gap to forward or backward fill
- downcast : dict, default is None
- a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)
reindex, asfreq
filled : same type as caller
-
filter
(items=None, like=None, regex=None, axis=None)¶ Restrict the info axis to set of items or wildcard
- items : list-like
- List of info axis to restrict to (must not all be present)
- like : string
- Keep info axis where “arg in col == True”
- regex : string (regular expression)
- Keep info axis with re.search(regex, col) == True
Arguments are mutually exclusive, but this is not checked for
-
filter_result
(expressions)¶ Filter result based on a list of expressions
Parameters: expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold) Returns: A new filtered AResult object Return type: AResult
-
first
(offset)¶ Convenience method for subsetting initial periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘10D’) -> First 10 days
subset : type of caller
-
first_valid_index
()¶ Return label for first non-NA/null value
-
floordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
from_csv
(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)¶ Read delimited file into DataFrame
path : string file path or file handle / StringIO header : int, default 0
Row to use at header (skip prior rows)- sep : string, default ‘,’
- Field delimiter
- index_col : int or sequence, default 0
- Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
- parse_dates : boolean, default True
- Parse dates. Different default from read_table
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- infer_datetime_format: boolean, default False
- If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.
Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data
y : DataFrame
-
from_dict
(data, orient='columns', dtype=None)¶ Construct DataFrame from dict of array-like or dicts
- data : dict
- {field : array-like} or {field : dict}
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.
DataFrame
-
from_items
(items, columns=None, orient='columns')¶ Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.
- items : sequence of (key, value) pairs
- Values should be arrays or Series.
- columns : sequence of column labels, optional
- Must be passed if orient=’index’.
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.
frame : DataFrame
-
from_records
(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)¶ Convert structured or record ndarray to DataFrame
data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like
Field of array to use as the index, alternately a specific set of input labels to use- exclude : sequence, default None
- Columns or fields to exclude
- columns : sequence, default None
- Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
- coerce_float : boolean, default False
- Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets
df : DataFrame
-
ftypes
¶ Return the ftypes (indication of sparse/dense and dtype) in this object.
-
ge
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ge
-
get
(key, default=None)¶ Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found
key : object
value : type of items contained in object
-
get_dtype_counts
()¶ Return the counts of dtypes in this object
-
get_ftype_counts
()¶ Return the counts of ftypes in this object
-
get_value
(index, col)¶ Quickly retrieve single value at passed column and index
index : row label col : column label
value : scalar value
-
get_values
()¶ same as values (but handles sparseness conversions)
-
groupby
(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)¶ Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
- by : mapping function / list of functions, dict, Series, or tuple /
- list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
axis : int, default 0 level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels- as_index : boolean, default True
- For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
- sort : boolean, default True
- Sort group keys. Get better performance by turning this off
- group_keys : boolean, default True
- When calling apply, add group keys to index to identify pieces
- squeeze : boolean, default False
- reduce the dimensionaility of the return type if possible, otherwise return a consistent type
# DataFrame result >>> data.groupby(func, axis=0).mean()
# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()
# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()
GroupBy object
-
gt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods gt
-
head
(n=5)¶ Returns first n rows
-
hist
(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)¶ Draw histogram of the DataFrame’s series using matplotlib / pylab.
data : DataFrame column : string or sequence
If passed, will be used to limit data to a subset of columns- by : object, optional
- If passed, then used to form histograms for separate groups
- grid : boolean, default True
- Whether to show axis grid lines
- xlabelsize : int, default None
- If specified changes the x-axis label size
- xrot : float, default None
- rotation of x axis labels
- ylabelsize : int, default None
- If specified changes the y-axis label size
- yrot : float, default None
- rotation of y axis labels
ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple
The size of the figure to create in inches by defaultlayout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments
To be passed to hist function
-
iat
¶
-
icol
(i)¶
-
idxmax
(axis=0, skipna=True)¶ Return index of first occurrence of maximum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be first index.
idxmax : Series
This method is the DataFrame version of
ndarray.argmax
.Series.idxmax
-
idxmin
(axis=0, skipna=True)¶ Return index of first occurrence of minimum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
idxmin : Series
This method is the DataFrame version of
ndarray.argmin
.Series.idxmin
-
iget_value
(i, j)¶
-
iloc
¶
-
info
(verbose=True, buf=None, max_cols=None)¶ Concise summary of a DataFrame.
- verbose : boolean, default True
- If False, don’t print column count summary
buf : writable buffer, defaults to sys.stdout max_cols : int, default None
Determines whether full summary or short summary is printed
-
insert
(loc, column, value, allow_duplicates=False)¶ Insert column into DataFrame at specified location.
If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.
- loc : int
- Must have 0 <= loc <= len(columns)
column : object value : int, Series, or array-like
-
interpolate
(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)¶ Interpolate values according to different methods.
- method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
- ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
- ‘linear’: ignore the index and treat the values as equally spaced. default
- ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
- ‘index’: use the actual numerical values of the index
- ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
- ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- limit : int, default None.
- Maximum number of consecutive NaNs to fill.
- inplace : bool, default False
- Update the NDFrame in place if possible.
- downcast : optional, ‘infer’ or None, defaults to ‘infer’
- Downcast dtypes if possible.
Series or DataFrame of same shape interpolated at the NaNs
reindex, replace, fillna
# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64
-
irow
(i, copy=False)¶
-
is_copy
= None¶
-
isin
(values)¶ Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
- values : iterable, Series, DataFrame or dictionary
- The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
DataFrame of booleans
When
values
is a list:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> df.isin([1, 3, 12, 'a']) A B 0 True True 1 False False 2 True False
When
values
is a dict:>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]}) >>> df.isin({'A': [1, 3], 'B': [4, 7, 12]}) A B 0 True False # Note that B didn't match the 1 here. 1 False True 2 True True
When
values
is a Series or DataFrame:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']}) >>> df.isin(other) A B 0 True False 1 False False # Column A in `other` has a 3, but not at index 1. 2 True True
-
isnull
()¶ Return a boolean same-sized object indicating if the values are null
-
iteritems
()¶ Iterator over (column, series) pairs
-
iterkv
(*args, **kwargs)¶ iteritems alias used to get around 2to3. Deprecated
-
iterrows
()¶ Iterate over rows of DataFrame as (index, Series) pairs.
iterrows
does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,>>> df = DataFrame([[1, 1.0]], columns=['x', 'y']) >>> row = next(df.iterrows())[1] >>> print(row['x'].dtype) float64 >>> print(df['x'].dtype) int64
- it : generator
- A generator that iterates over the rows of the frame.
-
itertuples
(index=True)¶ Iterate over rows of DataFrame as tuples, with index value as first element of the tuple
-
ix
¶
-
join
(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)¶ Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
- other : DataFrame, Series with name field set, or list of DataFrame
- Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
- on : column name, tuple/list of column names, or array-like
- Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
- how : {‘left’, ‘right’, ‘outer’, ‘inner’}
How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise
- left: use calling frame’s index
- right: use input frame’s index
- outer: form union of indexes
- inner: use intersection of indexes
- lsuffix : string
- Suffix to use from left frame’s overlapping columns
- rsuffix : string
- Suffix to use from right frame’s overlapping columns
- sort : boolean, default False
- Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame
on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects
joined : DataFrame
-
keys
()¶ Get the ‘info axis’ (see Indexing for more)
This is index for Series, columns for DataFrame and major_axis for Panel.
-
kurt
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
kurtosis
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
last
(offset)¶ Convenience method for subsetting final periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘5M’) -> Last 5 months
subset : type of caller
-
last_valid_index
()¶ Return label for last non-NA/null value
-
le
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods le
-
load
(path)¶ Deprecated. Use read_pickle instead.
-
loc
¶
-
lookup
(row_labels, col_labels)¶ Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.
- row_labels : sequence
- The row labels to use for lookup
- col_labels : sequence
- The column labels to use for lookup
Akin to:
result = [] for row, col in zip(row_labels, col_labels): result.append(df.get_value(row, col))
- values : ndarray
- The found values
-
lt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods lt
-
mad
(axis=None, skipna=None, level=None, **kwargs)¶ Return the mean absolute deviation of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mad : Series or DataFrame (if level specified)
-
mask
(cond)¶ Returns copy whose values are replaced with nan if the inverted condition is True
cond : boolean NDFrame or array
wh: same as input
-
max
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the maximum of the values in the object. If you want the index of the maximum, use
idxmax
. This is the equivalent of thenumpy.ndarray
methodargmax
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
max : Series or DataFrame (if level specified)
-
mean
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mean : Series or DataFrame (if level specified)
-
median
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the median of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
median : Series or DataFrame (if level specified)
-
merge
(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)¶ Merge DataFrame objects by performing a database-style join operation by columns or indexes.
If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
- left: use only keys from left frame (SQL: left outer join)
- right: use only keys from right frame (SQL: right outer join)
- outer: use union of keys from both frames (SQL: full outer join)
- inner: use intersection of keys from both frames (SQL: inner join)
- on : label or list
- Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
- left_on : label or list, or array-like
- Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
- right_on : label or list, or array-like
- Field names to join on in right DataFrame or vector/list of vectors per left_on docs
- left_index : boolean, default False
- Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
- right_index : boolean, default False
- Use the index from the right DataFrame as the join key. Same caveats as left_index
- sort : boolean, default False
- Sort the join keys lexicographically in the result DataFrame
- suffixes : 2-length sequence (tuple, list, ...)
- Suffix to apply to overlapping column names in the left and right side, respectively
- copy : boolean, default True
- If False, do not copy data unnecessarily
>>> A >>> B lkey value rkey value 0 foo 1 0 foo 5 1 bar 2 1 bar 6 2 baz 3 2 qux 7 3 foo 4 3 bar 8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer') lkey value_x rkey value_y 0 bar 2 bar 6 1 bar 2 bar 8 2 baz 3 NaN NaN 3 foo 1 foo 5 4 foo 4 foo 5 5 NaN NaN qux 7
merged : DataFrame
-
merge_results
(others)¶ Merges results of the same type and returns a merged result
Parameters: others (list( AResult
)/AResult
) – A (list of)AResult
object(s) of the same classReturns: A new merged AResult
objectReturn type: AResult
-
min
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the minimum of the values in the object. If you want the index of the minimum, use
idxmin
. This is the equivalent of thenumpy.ndarray
methodargmin
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
min : Series or DataFrame (if level specified)
-
mod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
mode
(axis=0, numeric_only=False)¶ Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.
- axis : {0, 1, ‘index’, ‘columns’} (default 0)
- 0/’index’ : get mode of each column
- 1/’columns’ : get mode of each row
- numeric_only : boolean, default False
- if True, only apply to numeric columns
modes : DataFrame (sorted)
-
mul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
multiply
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
ndim
¶ Number of axes / array dimensions
-
ne
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ne
-
notnull
()¶ Return a boolean same-sized object indicating if the values are not null
-
pct_change
(periods=1, fill_method='pad', limit=None, freq=None, **kwds)¶ Percent change over given number of periods
- periods : int, default 1
- Periods to shift for forming percent change
- fill_method : str, default ‘pad’
- How to handle NAs before computing percent changes
- limit : int, default None
- The number of consecutive NAs to fill before stopping
- freq : DateOffset, timedelta, or offset alias string, optional
- Increment to use from time series API (e.g. ‘M’ or BDay())
chg : same type as caller
-
pivot
(index=None, columns=None, values=None)¶ Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)
- index : string or object
- Column name to use to make new frame’s index
- columns : string or object
- Column name to use to make new frame’s columns
- values : string or object, optional
- Column name to use for populating new frame’s values
For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods
>>> df foo bar baz 0 one A 1. 1 one B 2. 2 one C 3. 3 two A 4. 4 two B 5. 5 two C 6.
>>> df.pivot('foo', 'bar', 'baz') A B C one 1 2 3 two 4 5 6
>>> df.pivot('foo', 'bar')['baz'] A B C one 1 2 3 two 4 5 6
- pivoted : DataFrame
- If no values column specified, will have hierarchically indexed columns
-
pivot_table
(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)¶ Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on
Keys to group on the x-axis of the pivot table- cols : list of column names or arrays to group on
- Keys to group on the y-axis of the pivot table
- aggfunc : function, default numpy.mean, or list of functions
- If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
- fill_value : scalar, default None
- Value to replace missing values with
- margins : boolean, default False
- Add all row / columns (e.g. for subtotal / grand totals)
- dropna : boolean, default True
- Do not include columns whose entries are all NaN
>>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
>>> table = pivot_table(df, values='D', rows=['A', 'B'], ... cols=['C'], aggfunc=np.sum) >>> table small large foo one 1 4 two 6 NaN bar one 5 4 two 6 7
table : DataFrame
-
plot
(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)¶ Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.
frame : DataFrame x : label or position, default None y : label or position, default None
Allows plotting of one column versus another- subplots : boolean, default False
- Make separate subplots for each time series
- sharex : boolean, default True
- In case subplots=True, share x axis
- sharey : boolean, default False
- In case subplots=True, share y axis
- use_index : boolean, default True
- Use index as ticks for x axis
- stacked : boolean, default False
- If True, create stacked bar plot. Only valid for DataFrame input
- sort_columns: boolean, default False
- Sort column names to determine plot ordering
- title : string
- Title to use for the plot
- grid : boolean, default None (matlab style default)
- Axis grid lines
- legend : boolean, default True
- Place legend on axis subplots
ax : matplotlib axis object, default None style : list or dict
matplotlib line style per column- kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
- bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
- logx : boolean, default False
- For line plots, use log scaling on x axis
- logy : boolean, default False
- For line plots, use log scaling on y axis
- xticks : sequence
- Values to use for the xticks
- yticks : sequence
- Values to use for the yticks
xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None
Rotation for ticks- secondary_y : boolean or sequence, default False
- Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
- mark_right: boolean, default True
- When using a secondary_y axis, should the legend label the axis of the various columns automatically
- colormap : str or matplotlib colormap object, default None
- Colormap to select colors from. If string, load colormap with that name from matplotlib.
- kwds : keywords
- Options to pass to matplotlib plotting method
ax_or_axes : matplotlib.AxesSubplot or list of them
-
pop
(item)¶ Return item and drop from frame. Raise KeyError if not found.
-
pow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator pow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
prod
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
product
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
quantile
(q=0.5, axis=0, numeric_only=True)¶ Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats
- q : quantile, default 0.5 (50% quantile)
- 0 <= q <= 1
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
quantiles : Series
-
query
(expr, **kwargs)¶ Query the columns of a frame with a boolean expression.
- expr : string
- The query string to evaluate. The result of the evaluation of this
expression is first passed to
loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to__getitem__()
. - kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
q : DataFrame or Series
This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
index
andcolumns
attributes of theDataFrame
instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for this variable, and you can also use the name of the index to identify it in a query.For further details and examples see the
query
documentation in indexing.pandas.eval DataFrame.eval
>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.query('a > b') >>> df[df.a > df.b] # same result as the previous expression
-
radd
(other, axis='columns', level=None, fill_value=None)¶ Binary operator radd with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rank
(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)¶ Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values
- axis : {0, 1}, default 0
- Ranks over columns (0) or rows (1)
- numeric_only : boolean, default None
- Include only float, int, boolean data
- method : {‘average’, ‘min’, ‘max’, ‘first’}
- average: average rank of group
- min: lowest rank in group
- max: highest rank in group
- first: ranks assigned in order they appear in the array
- na_option : {‘keep’, ‘top’, ‘bottom’}
- keep: leave NA values where they are
- top: smallest rank if ascending
- bottom: smallest rank if descending
- ascending : boolean, default True
- False for ranks by high (1) to low (N)
ranks : DataFrame
-
rdiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
reindex
(index=None, columns=None, **kwargs)¶ Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index, columns : array-like, optional (can be specified in order, or as
- keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
- limit : int, default None
- Maximum size gap to forward or backward fill
- takeable : boolean, default False
- treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])
reindexed : DataFrame
-
reindex_axis
(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)¶ Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index : array-like, optional
- New labels / index to conform to. Preferably an Index object to avoid duplicating data
axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- limit : int, default None
- Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)
reindex, reindex_like
reindexed : DataFrame
-
reindex_like
(other, method=None, copy=True, limit=None)¶ return an object with matching indicies to myself
other : Object method : string or None copy : boolean, default True limit : int, default None
Maximum size gap to forward or backward fill- Like calling s.reindex(index=other.index, columns=other.columns,
- method=...)
reindexed : same as input
-
rename
(index=None, columns=None, **kwargs)¶ Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
- index, columns : dict-like or function, optional
- Transformation to apply to that axis values
- copy : boolean, default True
- Also copy underlying data
- inplace : boolean, default False
- Whether to return a new DataFrame. If True then value of copy is ignored.
renamed : DataFrame (new object)
-
rename_axis
(mapper, axis=0, copy=True, inplace=False)¶ Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True
Also copy underlying datainplace : boolean, default False
renamed : type of caller
-
reorder_levels
(order, axis=0)¶ Rearrange index levels using input order. May not drop or duplicate levels
- order : list of int or list of str
- List representing new level order. Reference level by number (position) or by key (label).
- axis : int
- Where to reorder levels.
type of caller (new object)
-
replace
(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)¶ Replace values given in ‘to_replace’ with ‘value’.
to_replace : str, regex, list, dict, Series, numeric, or None
str or regex:
- str: string exactly matching to_replace will be replaced with value
- regex: regexs matching to_replace will be replaced with value
list of str, regex, or numeric:
- First, if to_replace and value are both lists, they must be the same length.
- Second, if
regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use. - str and regex rules apply as above.
dict:
- Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
- Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
None:
- This means that the
regex
argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is alsoNone
then this must be a nested dictionary orSeries
.
- This means that the
See the examples section for examples of each of these.
- value : scalar, dict, list, str, regex, default None
- Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
- inplace : boolean, default False
- If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
- limit : int, default None
- Maximum size gap to forward or backward fill
- regex : bool or same types as to_replace, default False
- Whether to interpret to_replace and/or value as regular
expressions. If this is
True
then to_replace must be a string. Otherwise, to_replace must beNone
because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions. - method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
- The method to use when for replacement, when
to_replace
is alist
.
NDFrame.reindex NDFrame.asfreq NDFrame.fillna
filled : NDFrame
- AssertionError
- If regex is not a
bool
and to_replace is notNone
.
- If regex is not a
- TypeError
- If to_replace is a
dict
and value is not alist
,dict
,ndarray
, orSeries
- If to_replace is
None
and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
- If to_replace is a
- ValueError
- If to_replace and value are
list
s orndarray
s, but they are not the same length.
- If to_replace and value are
- Regex substitution is performed under the hood with
re.sub
. The rules for substitution forre.sub
are the same. - Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
- This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
-
resample
(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)¶ Convenience method for frequency conversion and resampling of regular time-series data.
- rule : string
- the offset string or object representing target conversion
- how : string
- method for down- or re-sampling, default to ‘mean’ for downsampling
axis : int, optional, default 0 fill_method : string, default None
fill_method for upsampling- closed : {‘right’, ‘left’}
- Which side of bin interval is closed
- label : {‘right’, ‘left’}
- Which bin edge label to label bucket with
convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta
Adjust the resampled time labels- limit : int, default None
- Maximum size gap to when reindexing with fill_method
- base : int, default 0
- For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
-
reset_index
(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶ For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.
- level : int, str, tuple, or list, default None
- Only remove the given levels from the index. Removes all levels by default
- drop : boolean, default False
- Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- col_level : int or str, default 0
- If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
- col_fill : object, default ‘’
- If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
resetted : DataFrame
-
rfloordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rpow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rsub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rtruediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
save
(path)¶ Deprecated. Use to_pickle instead
-
select
(crit, axis=0)¶ Return data corresponding to axis labels matching criteria
- crit : function
- To be called on each index (label). Should return True or False
axis : int
selection : type of caller
-
set_index
(keys, drop=True, append=False, inplace=False, verify_integrity=False)¶ Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.
keys : column label or list of column labels / arrays drop : boolean, default True
Delete columns to be used as the new index- append : boolean, default False
- Whether to append columns to existing index
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- verify_integrity : boolean, default False
- Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B']) >>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]]) >>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])
dataframe : DataFrame
-
set_value
(index, col, value)¶ Put single value at passed column and index
index : row label col : column label value : scalar value
- frame : DataFrame
- If label pair is contained, will be reference to calling DataFrame, otherwise a new object
-
shape
¶
-
shift
(periods=1, freq=None, axis=0, **kwds)¶ Shift index by desired number of periods with an optional time freq
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, optional
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
If freq is specified then the index values are shifted but the data if not realigned
shifted : same type as caller
-
skew
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased skew over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
skew : Series or DataFrame (if level specified)
-
sort
(columns=None, column=None, axis=0, ascending=True, inplace=False)¶ Sort DataFrame either by labels (along either axis) or by the values in column(s)
- columns : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- axis : {0, 1}
- Sort index/rows versus columns
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])
sorted : DataFrame
-
sort_index
(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')¶ Sort DataFrame either by labels (along either axis) or by the values in a column
- axis : {0, 1}
- Sort index/rows versus columns
- by : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])
sorted : DataFrame
-
sortlevel
(level=0, axis=0, ascending=True, inplace=False)¶ Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)
level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False
Sort the DataFrame without creating a new instancesorted : DataFrame
-
squeeze
()¶ squeeze length 1 dimensions
-
stack
(level=-1, dropna=True)¶ Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.
- level : int, string, or list of these, default last level
- Level(s) to stack, can pass level name
- dropna : boolean, default True
- Whether to drop rows in the resulting Frame/Series with no valid values
>>> s a b one 1. 2. two 3. 4.
>>> s.stack() one a 1 b 2 two a 3 b 4
stacked : DataFrame or Series
-
std
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased standard deviation over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
stdev : Series or DataFrame (if level specified)
-
sub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
subtract
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
sum
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the sum of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
sum : Series or DataFrame (if level specified)
-
swapaxes
(axis1, axis2, copy=True)¶ Interchange axes and swap values axes appropriately
y : same as input
-
swaplevel
(i, j, axis=0)¶ Swap levels i and j in a MultiIndex on a particular axis
- i, j : int, string (can be mixed)
- Level of index to be swapped. Can pass level name as string.
swapped : type of caller (new object)
-
tail
(n=5)¶ Returns last n rows
-
take
(indices, axis=0, convert=True, is_copy=True)¶ Analogous to ndarray.take
indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy
taken : type of caller
-
to_clipboard
(excel=None, sep=None, **kwargs)¶ Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.
- excel : boolean, defaults to True
- if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard
sep : optional, defaults to tab other keywords are passed to to_csv
- Requirements for your platform
- Linux: xclip, or xsel (with gtk or PyQt4 modules)
- Windows: none
- OS X: none
-
to_csv
(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)¶ Write DataFrame to a comma-separated values (csv) file
- path_or_buf : string or file handle / StringIO
- File path
- sep : character, default ”,”
- Field delimiter for the output file.
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, or False, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
- nanRep : None
- deprecated, use na_rep
- mode : str
- Python write mode, default ‘w’
- encoding : string, optional
- a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
- line_terminator : string, default ‘\\n’
- The newline character or character sequence to use in the output file
- quoting : optional constant from csv module
- defaults to csv.QUOTE_MINIMAL
- chunksize : int or None
- rows to write at a time
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- date_format : string, default None
- Format string for datetime objects.
-
to_dense
()¶ Return dense representation of NDFrame (as opposed to sparse)
-
to_dict
(outtype='dict')¶ Convert DataFrame to dictionary.
- outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
- Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.
result : dict like {column -> {index -> value}}
-
to_excel
(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)¶ Write DataFrame to a excel sheet
- excel_writer : string or ExcelWriter object
- File path or existing ExcelWriter
- sheet_name : string, default ‘Sheet1’
- Name of sheet which will contain DataFrame
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startow :
- upper left cell row to dump data frame
- startcol :
- upper left cell column to dump data frame
- engine : string, default None
- write engine to use - you can also set this via the options
io.excel.xlsx.writer
,io.excel.xls.writer
, andio.excel.xlsm.writer
. - merge_cells : boolean, default True
- Write MultiIndex and Hierarchical Rows as merged cells.
If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:
>>> writer = ExcelWriter('output.xlsx') >>> df1.to_excel(writer,'Sheet1') >>> df2.to_excel(writer,'Sheet2') >>> writer.save()
-
to_gbq
(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)¶ Write a DataFrame to a Google BigQuery table.
If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.
- destination_table : string
- name of table to be written, in the form ‘dataset.tablename’
- schema : sequence (optional)
- list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
- col_order : sequence (optional)
- order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
- if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
kwargs are passed to the Client constructor
- SchemaMissing :
- Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
- TableExists :
- Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
- InvalidSchema :
- Raised if the ‘schema’ parameter does not match the provided DataFrame
-
to_hdf
(path_or_buf, key, **kwargs)¶ activate the HDFStore
path_or_buf : the path (string) or buffer to put the store key : string
indentifier for the group in the storemode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’
'r'
- Read-only; no data can be modified.
'w'
- Write; a new file is created (an existing file with the same name would be deleted).
'a'
- Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
- It is similar to
'a'
, but the file must already exist.
- format : ‘fixed(f)|table(t)’, default is ‘fixed’
- fixed(f) : Fixed format
- Fast writing/reading. Not-appendable, nor searchable
- table(t) : Table format
- Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
- append : boolean, default False
- For Table formats, append the input data to the existing
- complevel : int, 1-9, default 0
- If a complib is specified compression will be applied where possible
- complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
- If complevel is > 0 apply compression to objects written in the store wherever possible
- fletcher32 : bool, default False
- If applying compression use the fletcher32 checksum
-
to_html
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame as an HTML table.
to_html-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- classes : str or list or tuple, default None
- CSS class(es) to apply to the resulting html table
- escape : boolean, default True
- Convert the characters <, >, and & to HTML-safe sequences.=
- max_rows : int, optional
- Maximum number of rows to show before truncating. If None, show all.
- max_cols : int, optional
- Maximum number of columns to show before truncating. If None, show all.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_json
(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)¶ Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.
- path_or_buf : the path or buffer to write the result string
- if this is None, return a StringIO of the converted string
orient : string
- Series
- default is ‘index’
- allowed values are: {‘split’,’records’,’index’}
- DataFrame
- default is ‘columns’
- allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
- The format of the JSON string
- split : dict like {index -> [index], columns -> [columns], data -> [values]}
- records : list like [{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
- columns : dict like {column -> {index -> value}}
- values : just the values array
- date_format : {‘epoch’, ‘iso’}
- Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
- double_precision : The number of decimal places to use when encoding
- floating point values, default 10.
force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.- default_handler : callable, default None
- Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
same type as input object with filtered info axis
-
to_latex
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)¶ Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.
to_latex-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_msgpack
(path_or_buf=None, **kwargs)¶ msgpack (serialize) object to input file path
THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.
- path : string File path, buffer-like, or None
- if None, return generated string
- append : boolean whether to append to an existing msgpack
- (default is False)
- compress : type of compressor (zlib or blosc), default to None (no
- compression)
-
to_panel
()¶ Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.
Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later
panel : Panel
-
to_period
(freq=None, axis=0, copy=True)¶ Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)
freq : string, default axis : {0, 1}, default 0
The axis to convert (the index by default)- copy : boolean, default True
- If False then underlying input data is not copied
ts : TimeSeries with PeriodIndex
-
to_pickle
(path)¶ Pickle (serialize) object to input file path
- path : string
- File path
-
to_records
(index=True, convert_datetime64=True)¶ Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested
- index : boolean, default True
- Include index in resulting record array, stored in ‘index’ field
- convert_datetime64 : boolean, default True
- Whether to convert the index to datetime.datetime if it is a DatetimeIndex
y : recarray
-
to_sparse
(fill_value=None, kind='block')¶ Convert to SparseDataFrame
fill_value : float, default NaN kind : {‘block’, ‘integer’}
y : SparseDataFrame
-
to_sql
(name, con, flavor='sqlite', if_exists='fail', **kwargs)¶ Write records stored in a DataFrame to a SQL database.
- name : str
- Name of SQL table
conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
-
to_stata
(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)¶ A class for writing Stata binary dta files from array-like objects
- fname : file path or buffer
- Where to save the dta file.
- convert_dates : dict
- Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
- encoding : str
- Default is latin-1. Note that Stata does not support unicode.
- byteorder : str
- Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data) >>> writer.write_file()
Or with dates
>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'}) >>> writer.write_file()
-
to_string
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame to a console-friendly tabular output.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_timestamp
(freq=None, how='start', axis=0, copy=True)¶ Cast to DatetimeIndex of timestamps, at beginning of period
- freq : string, default frequency of PeriodIndex
- Desired frequency
- how : {‘s’, ‘e’, ‘start’, ‘end’}
- Convention for converting period to timestamp; start of period vs. end
- axis : {0, 1} default 0
- The axis to convert (the index by default)
- copy : boolean, default True
- If false then underlying input data is not copied
df : DataFrame with DatetimeIndex
-
to_wide
(*args, **kwargs)¶
-
transpose
()¶ Transpose index and columns
-
truediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
truncate
(before=None, after=None, axis=None, copy=True)¶ Truncates a sorted NDFrame before and/or after some particular dates.
- before : date
- Truncate before date
- after : date
- Truncate after date
axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,
return a copy of the truncated sectiontruncated : type of caller
-
tshift
(periods=1, freq=None, axis=0, **kwds)¶ Shift the time index, using the index’s frequency if available
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, default None
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
- axis : int or basestring
- Corresponds to the axis that contains the Index
If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown
shifted : NDFrame
-
tz_convert
(tz, axis=0, copy=True)¶ Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data
-
tz_localize
(tz, axis=0, copy=True, infer_dst=False)¶ Localize tz-naive TimeSeries to target time zone
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data- infer_dst : boolean, default False
- Attempt to infer fall dst-transition times based on order
-
unstack
(level=-1)¶ Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)
- level : int, string, or list of these, default -1 (last level)
- Level(s) of index to unstack, can pass level name
DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation
from unstack).>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ... ('two', 'a'), ('two', 'b')]) >>> s = pd.Series(np.arange(1.0, 5.0), index=index) >>> s one a 1 b 2 two a 3 b 4 dtype: float64
>>> s.unstack(level=-1) a b one 1 2 two 3 4
>>> s.unstack(level=0) one two a 1 3 b 2 4
>>> df = s.unstack(level=0) >>> df.unstack() one a 1. b 3. two a 2. b 4.
unstacked : DataFrame or Series
-
update
(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)¶ Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices
other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame- filter_func : callable(1d-array) -> 1d-array<boolean>, default None
- Can choose to replace values other than NA. Return True for values that should be updated
- raise_conflict : boolean
- If True, will raise an error if the DataFrame and other both contain data in the same place.
-
values
¶ Numpy representation of NDFrame
-
var
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased variance over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
variance : Series or DataFrame (if level specified)
-
where
(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)¶ Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False
Whether to perform the operation in place on the dataaxis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False
try to cast the result back to the input type (if possible),- raise_on_error : boolean, default True
- Whether to raise on invalid data types (e.g. trying to where on strings)
wh : same type as caller
-
xs
(key, axis=0, level=None, copy=True, drop_level=True)¶ Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).
- key : object
- Some label contained in the index, or partially in a MultiIndex
- axis : int, default 0
- Axis to retrieve cross-section on
- level : object, defaults to first n levels (n=1 or len(key))
- In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
- copy : boolean, default True
- Whether to make a copy of the data
- drop_level : boolean, default True
- If False, returns object with same levels as self.
>>> df A B C a 4 5 2 b 4 0 9 c 9 7 3 >>> df.xs('a') A 4 B 5 C 2 Name: a >>> df.xs('C', axis=1) a 2 b 9 c 3 Name: C >>> s = df.xs('a', copy=False) >>> s['A'] = 100 >>> df A B C a 100 5 2 b 4 0 9 c 9 7 3
>>> df A B C D first second third bar one 1 4 1 8 9 two 1 7 5 5 0 baz one 1 6 6 8 0 three 2 5 3 5 3 >>> df.xs(('baz', 'three')) A B C D third 2 5 3 5 3 >>> df.xs('one', level=1) A B C D first third bar 1 4 1 8 9 baz 1 6 6 8 0 >>> df.xs(('baz', 2), level=[0, 'third']) A B C D second three 5 3 5 3
xs : Series or DataFrame
-
-
class
Fred2.Core.Result.
CleavageFragmentPredictionResult
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
Fred2.Core.Result.AResult
A
CleavageFragmentPredictionResult
object is apandas.DataFrame
with single-indexing, where column Ids are the prediction scores fo the different prediction methods, and row ID thePeptide
object.CleavageFragmentPredictionResult:
Peptide Obj Method Name Peptide1 -15.34 Peptide2 23.34 -
T
¶ Transpose index and columns
-
abs
()¶ Return an object with absolute value taken. Only applicable to objects that are all numeric
abs: type of caller
-
add
(other, axis='columns', level=None, fill_value=None)¶ Binary operator add with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
add_prefix
(prefix)¶ Concatenate prefix string with panel items names.
prefix : string
with_prefix : type of caller
-
add_suffix
(suffix)¶ Concatenate suffix string with panel items names
suffix : string
with_suffix : type of caller
-
align
(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)¶ Align two object on their axes with the specified join method for each axis Index
other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None
Align on index (0), columns (1), or both (None)- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- copy : boolean, default True
- Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
method : str, default None limit : int, default None fill_axis : {0, 1}, default 0
Filling axis, method and limit- (left, right) : (type of input, type of other)
- Aligned objects
-
all
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
any
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any element is True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
append
(other, ignore_index=False, verify_integrity=False)¶ Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.
other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False
If True do not use the index labels. Useful for gluing together record arrays- verify_integrity : boolean, default False
- If True, raise ValueError on creating index with duplicates
If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged
appended : DataFrame
-
apply
(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)¶ Applies function along input axis of DataFrame.
Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.
- func : function
- Function to apply to each column/row
- axis : {0, 1}
- 0 : apply function to each column
- 1 : apply function to each row
- broadcast : boolean, default False
- For aggregation functions, return object of same size with values propagated
- reduce : boolean or None, default None
- Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
- raw : boolean, default False
- If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
- args : tuple
- Positional arguments to pass to function in addition to the array/series
Additional keyword arguments will be passed as keywords to the function
>>> df.apply(numpy.sqrt) # returns DataFrame >>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0) >>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)
DataFrame.applymap: For elementwise operations
applied : Series or DataFrame
-
applymap
(func)¶ Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
- func : function
- Python function, returns a single value from a single value
applied : DataFrame
DataFrame.apply : For operations on rows/columns
-
as_blocks
(columns=None)¶ Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.
are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
- as_matrix)
- columns : array-like
- Specific column order
values : a list of Object
-
as_matrix
(columns=None)¶ Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtype will be a lower-common-denominator dtype (implicit
upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks
- e.g. if the dtypes are float16,float32 -> float32
- float16,float32,float64 -> float64 int32,uint8 -> int32
- values : ndarray
- If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
-
asfreq
(freq, method=None, how=None, normalize=False)¶ Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.
freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method- how : {‘start’, ‘end’}, default end
- For PeriodIndex only, see PeriodIndex.asfreq
- normalize : bool, default False
- Whether to reset output index to midnight
converted : type of caller
-
astype
(dtype, copy=True, raise_on_error=True)¶ Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)
dtype : numpy.dtype or Python type raise_on_error : raise on invalid input
casted : type of caller
-
at
¶
-
at_time
(time, asof=False)¶ Select values at particular time of day (e.g. 9:30AM)
time : datetime.time or string
values_at_time : type of caller
-
axes
¶
-
between_time
(start_time, end_time, include_start=True, include_end=True)¶ Select values between particular times of the day (e.g., 9:00-9:30 AM)
start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True
values_between_time : type of caller
-
bfill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’bfill’)
-
blocks
¶ Internal property, property synonym for as_blocks()
-
bool
()¶ Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False
Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean
-
boxplot
(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)¶ Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns
data : DataFrame column : column names or list of names, or vector
Can be any valid input to groupby- by : string or sequence
- Column in the DataFrame to group by
ax : matplotlib axis object, default None fontsize : int or string rot : int, default None
Rotation for ticks- grid : boolean, default None (matlab style default)
- Axis grid lines
ax : matplotlib.axes.AxesSubplot
-
clip
(lower=None, upper=None, out=None)¶ Trim values at input threshold(s)
lower : float, default None upper : float, default None
clipped : Series
-
clip_lower
(threshold)¶ Return copy of the input with values below given value truncated
clip
clipped : same type as input
-
clip_upper
(threshold)¶ Return copy of input with values above given value truncated
clip
clipped : same type as input
-
combine
(other, func, fill_value=None, overwrite=True)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True
If True then overwrite values for common keys in the calling frameresult : DataFrame
-
combineAdd
(other)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combineMult
(other)¶ Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combine_first
(other)¶ Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns
other : DataFrame
a’s values prioritized, use values from b to fill holes:
>>> a.combine_first(b)
combined : DataFrame
-
compound
(axis=None, skipna=None, level=None, **kwargs)¶ Return the compound percentage of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
compounded : Series or DataFrame (if level specified)
-
consolidate
(inplace=False)¶ Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user
- inplace : boolean, default False
- If False return new object, otherwise modify existing object
consolidated : type of caller
-
convert_objects
(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)¶ Attempt to infer better dtype for object columns
- convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
- convert_numeric : if True attempt to coerce to numbers (including
- strings), non-convertibles get NaN
- convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
copy : Boolean, if True, return copy, default is True
converted : asm as input object
-
copy
(deep=True)¶ Make a copy of this object
- deep : boolean, default True
- Make a deep copy, i.e. also copy data
copy : type of caller
-
corr
(method='pearson', min_periods=1)¶ Compute pairwise correlation of columns, excluding NA/null values
- method : {‘pearson’, ‘kendall’, ‘spearman’}
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation
y : DataFrame
-
corrwith
(other, axis=0, drop=False)¶ Compute pairwise correlation between rows or columns of two DataFrame objects.
other : DataFrame axis : {0, 1}
0 to compute column-wise, 1 for row-wise- drop : boolean, default False
- Drop missing indices from result, default returns union of all
correls : Series
-
count
(axis=0, level=None, numeric_only=False)¶ Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- numeric_only : boolean, default False
- Include only float, int, boolean data
count : Series (or DataFrame if level specified)
-
cov
(min_periods=None)¶ Compute pairwise covariance of columns, excluding NA/null values
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result.
y : DataFrame
y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).
-
cummax
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative max over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmax : Series
-
cummin
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative min over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmin : Series
-
cumprod
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative prod over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAprod : Series
-
cumsum
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative sum over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAsum : Series
-
delevel
(*args, **kwargs)¶
-
describe
(percentile_width=50)¶ Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles
- percentile_width : float, optional
- width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75
DataFrame of summary statistics
-
diff
(periods=1)¶ 1st discrete difference of object
- periods : int, default 1
- Periods to shift for forming difference
diffed : DataFrame
-
div
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
divide
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
dot
(other)¶ Matrix multiplication with DataFrame or Series objects
other : DataFrame or Series
dot_product : DataFrame or Series
-
drop
(labels, axis=0, level=None, inplace=False, **kwargs)¶ Return new object with labels in requested axis removed
labels : single label or list-like axis : int or axis name level : int or name, default None
For MultiIndex- inplace : bool, default False
- If True, do operation inplace and return None.
dropped : type of caller
-
drop_duplicates
(cols=None, take_last=False, inplace=False)¶ Return DataFrame with duplicate rows removed, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
- inplace : boolean, default False
- Whether to drop duplicates in place or to return a copy
deduplicated : DataFrame
-
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Return object with labels on given axis omitted where alternately any or all of the data are missing
- axis : {0, 1}, or tuple/list thereof
- Pass tuple or list to drop on multiple axes
- how : {‘any’, ‘all’}
- any : if any NA values are present, drop that label
- all : if all values are NA, drop that label
- thresh : int, default None
- int value : require that many non-NA values
- subset : array-like
- Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
- inplace : boolean, defalt False
- If True, do operation inplace and return None.
dropped : DataFrame
-
dtypes
¶ Return the dtypes in this object
-
duplicated
(cols=None, take_last=False)¶ Return boolean Series denoting duplicate rows, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
duplicated : Series
-
empty
¶ True if NDFrame is entirely empty [no items]
-
eq
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods eq
-
equals
(other)¶ Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
-
eval
(expr, **kwargs)¶ Evaluate an expression in the context of the calling DataFrame instance.
- expr : string
- The expression string to evaluate.
- kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
ret : ndarray, scalar, or pandas object
pandas.DataFrame.query pandas.eval
For more details see the API documentation for
eval()
. For detailed examples see enhancing performance with eval.>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.eval('a + b') >>> df.eval('c=a + b')
-
ffill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’ffill’)
-
fillna
(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)¶ Fill NA/NaN values using the specified method
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- value : scalar, dict, or Series
- Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- inplace : boolean, default False
- If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
- limit : int, default None
- Maximum size gap to forward or backward fill
- downcast : dict, default is None
- a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)
reindex, asfreq
filled : same type as caller
-
filter
(items=None, like=None, regex=None, axis=None)¶ Restrict the info axis to set of items or wildcard
- items : list-like
- List of info axis to restrict to (must not all be present)
- like : string
- Keep info axis where “arg in col == True”
- regex : string (regular expression)
- Keep info axis with re.search(regex, col) == True
Arguments are mutually exclusive, but this is not checked for
-
filter_result
(expressions)¶ Filters a result data frame based on a specified expression consisting of a list of triple with (method_name, comparator, threshold). The expression is applied to each row. If any of the columns fulfill the criteria the row remains.
Parameters: expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold) Returns: A new filtered result object Return type: CleavageFragmentPredictionResult
-
first
(offset)¶ Convenience method for subsetting initial periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘10D’) -> First 10 days
subset : type of caller
-
first_valid_index
()¶ Return label for first non-NA/null value
-
floordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
from_csv
(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)¶ Read delimited file into DataFrame
path : string file path or file handle / StringIO header : int, default 0
Row to use at header (skip prior rows)- sep : string, default ‘,’
- Field delimiter
- index_col : int or sequence, default 0
- Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
- parse_dates : boolean, default True
- Parse dates. Different default from read_table
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- infer_datetime_format: boolean, default False
- If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.
Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data
y : DataFrame
-
from_dict
(data, orient='columns', dtype=None)¶ Construct DataFrame from dict of array-like or dicts
- data : dict
- {field : array-like} or {field : dict}
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.
DataFrame
-
from_items
(items, columns=None, orient='columns')¶ Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.
- items : sequence of (key, value) pairs
- Values should be arrays or Series.
- columns : sequence of column labels, optional
- Must be passed if orient=’index’.
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.
frame : DataFrame
-
from_records
(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)¶ Convert structured or record ndarray to DataFrame
data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like
Field of array to use as the index, alternately a specific set of input labels to use- exclude : sequence, default None
- Columns or fields to exclude
- columns : sequence, default None
- Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
- coerce_float : boolean, default False
- Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets
df : DataFrame
-
ftypes
¶ Return the ftypes (indication of sparse/dense and dtype) in this object.
-
ge
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ge
-
get
(key, default=None)¶ Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found
key : object
value : type of items contained in object
-
get_dtype_counts
()¶ Return the counts of dtypes in this object
-
get_ftype_counts
()¶ Return the counts of ftypes in this object
-
get_value
(index, col)¶ Quickly retrieve single value at passed column and index
index : row label col : column label
value : scalar value
-
get_values
()¶ same as values (but handles sparseness conversions)
-
groupby
(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)¶ Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
- by : mapping function / list of functions, dict, Series, or tuple /
- list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
axis : int, default 0 level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels- as_index : boolean, default True
- For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
- sort : boolean, default True
- Sort group keys. Get better performance by turning this off
- group_keys : boolean, default True
- When calling apply, add group keys to index to identify pieces
- squeeze : boolean, default False
- reduce the dimensionaility of the return type if possible, otherwise return a consistent type
# DataFrame result >>> data.groupby(func, axis=0).mean()
# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()
# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()
GroupBy object
-
gt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods gt
-
head
(n=5)¶ Returns first n rows
-
hist
(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)¶ Draw histogram of the DataFrame’s series using matplotlib / pylab.
data : DataFrame column : string or sequence
If passed, will be used to limit data to a subset of columns- by : object, optional
- If passed, then used to form histograms for separate groups
- grid : boolean, default True
- Whether to show axis grid lines
- xlabelsize : int, default None
- If specified changes the x-axis label size
- xrot : float, default None
- rotation of x axis labels
- ylabelsize : int, default None
- If specified changes the y-axis label size
- yrot : float, default None
- rotation of y axis labels
ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple
The size of the figure to create in inches by defaultlayout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments
To be passed to hist function
-
iat
¶
-
icol
(i)¶
-
idxmax
(axis=0, skipna=True)¶ Return index of first occurrence of maximum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be first index.
idxmax : Series
This method is the DataFrame version of
ndarray.argmax
.Series.idxmax
-
idxmin
(axis=0, skipna=True)¶ Return index of first occurrence of minimum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
idxmin : Series
This method is the DataFrame version of
ndarray.argmin
.Series.idxmin
-
iget_value
(i, j)¶
-
iloc
¶
-
info
(verbose=True, buf=None, max_cols=None)¶ Concise summary of a DataFrame.
- verbose : boolean, default True
- If False, don’t print column count summary
buf : writable buffer, defaults to sys.stdout max_cols : int, default None
Determines whether full summary or short summary is printed
-
insert
(loc, column, value, allow_duplicates=False)¶ Insert column into DataFrame at specified location.
If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.
- loc : int
- Must have 0 <= loc <= len(columns)
column : object value : int, Series, or array-like
-
interpolate
(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)¶ Interpolate values according to different methods.
- method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
- ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
- ‘linear’: ignore the index and treat the values as equally spaced. default
- ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
- ‘index’: use the actual numerical values of the index
- ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
- ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- limit : int, default None.
- Maximum number of consecutive NaNs to fill.
- inplace : bool, default False
- Update the NDFrame in place if possible.
- downcast : optional, ‘infer’ or None, defaults to ‘infer’
- Downcast dtypes if possible.
Series or DataFrame of same shape interpolated at the NaNs
reindex, replace, fillna
# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64
-
irow
(i, copy=False)¶
-
is_copy
= None¶
-
isin
(values)¶ Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
- values : iterable, Series, DataFrame or dictionary
- The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
DataFrame of booleans
When
values
is a list:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> df.isin([1, 3, 12, 'a']) A B 0 True True 1 False False 2 True False
When
values
is a dict:>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]}) >>> df.isin({'A': [1, 3], 'B': [4, 7, 12]}) A B 0 True False # Note that B didn't match the 1 here. 1 False True 2 True True
When
values
is a Series or DataFrame:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']}) >>> df.isin(other) A B 0 True False 1 False False # Column A in `other` has a 3, but not at index 1. 2 True True
-
isnull
()¶ Return a boolean same-sized object indicating if the values are null
-
iteritems
()¶ Iterator over (column, series) pairs
-
iterkv
(*args, **kwargs)¶ iteritems alias used to get around 2to3. Deprecated
-
iterrows
()¶ Iterate over rows of DataFrame as (index, Series) pairs.
iterrows
does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,>>> df = DataFrame([[1, 1.0]], columns=['x', 'y']) >>> row = next(df.iterrows())[1] >>> print(row['x'].dtype) float64 >>> print(df['x'].dtype) int64
- it : generator
- A generator that iterates over the rows of the frame.
-
itertuples
(index=True)¶ Iterate over rows of DataFrame as tuples, with index value as first element of the tuple
-
ix
¶
-
join
(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)¶ Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
- other : DataFrame, Series with name field set, or list of DataFrame
- Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
- on : column name, tuple/list of column names, or array-like
- Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
- how : {‘left’, ‘right’, ‘outer’, ‘inner’}
How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise
- left: use calling frame’s index
- right: use input frame’s index
- outer: form union of indexes
- inner: use intersection of indexes
- lsuffix : string
- Suffix to use from left frame’s overlapping columns
- rsuffix : string
- Suffix to use from right frame’s overlapping columns
- sort : boolean, default False
- Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame
on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects
joined : DataFrame
-
keys
()¶ Get the ‘info axis’ (see Indexing for more)
This is index for Series, columns for DataFrame and major_axis for Panel.
-
kurt
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
kurtosis
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
last
(offset)¶ Convenience method for subsetting final periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘5M’) -> Last 5 months
subset : type of caller
-
last_valid_index
()¶ Return label for last non-NA/null value
-
le
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods le
-
load
(path)¶ Deprecated. Use read_pickle instead.
-
loc
¶
-
lookup
(row_labels, col_labels)¶ Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.
- row_labels : sequence
- The row labels to use for lookup
- col_labels : sequence
- The column labels to use for lookup
Akin to:
result = [] for row, col in zip(row_labels, col_labels): result.append(df.get_value(row, col))
- values : ndarray
- The found values
-
lt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods lt
-
mad
(axis=None, skipna=None, level=None, **kwargs)¶ Return the mean absolute deviation of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mad : Series or DataFrame (if level specified)
-
mask
(cond)¶ Returns copy whose values are replaced with nan if the inverted condition is True
cond : boolean NDFrame or array
wh: same as input
-
max
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the maximum of the values in the object. If you want the index of the maximum, use
idxmax
. This is the equivalent of thenumpy.ndarray
methodargmax
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
max : Series or DataFrame (if level specified)
-
mean
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mean : Series or DataFrame (if level specified)
-
median
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the median of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
median : Series or DataFrame (if level specified)
-
merge
(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)¶ Merge DataFrame objects by performing a database-style join operation by columns or indexes.
If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
- left: use only keys from left frame (SQL: left outer join)
- right: use only keys from right frame (SQL: right outer join)
- outer: use union of keys from both frames (SQL: full outer join)
- inner: use intersection of keys from both frames (SQL: inner join)
- on : label or list
- Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
- left_on : label or list, or array-like
- Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
- right_on : label or list, or array-like
- Field names to join on in right DataFrame or vector/list of vectors per left_on docs
- left_index : boolean, default False
- Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
- right_index : boolean, default False
- Use the index from the right DataFrame as the join key. Same caveats as left_index
- sort : boolean, default False
- Sort the join keys lexicographically in the result DataFrame
- suffixes : 2-length sequence (tuple, list, ...)
- Suffix to apply to overlapping column names in the left and right side, respectively
- copy : boolean, default True
- If False, do not copy data unnecessarily
>>> A >>> B lkey value rkey value 0 foo 1 0 foo 5 1 bar 2 1 bar 6 2 baz 3 2 qux 7 3 foo 4 3 bar 8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer') lkey value_x rkey value_y 0 bar 2 bar 6 1 bar 2 bar 8 2 baz 3 NaN NaN 3 foo 1 foo 5 4 foo 4 foo 5 5 NaN NaN qux 7
merged : DataFrame
-
merge_results
(others)¶ Merges results of type
CleavageFragmentPredictionResult
and returns the merged resultParameters: others (list( CleavageFragmentPredictionResult
) orCleavageFragmentPredictionResult
) – A (list of)CleavageFragmentPredictionResult
object(s)Returns: new merged CleavageFragmentPredictionResult
objectReturn type: CleavageFragmentPredictionResult
-
min
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the minimum of the values in the object. If you want the index of the minimum, use
idxmin
. This is the equivalent of thenumpy.ndarray
methodargmin
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
min : Series or DataFrame (if level specified)
-
mod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
mode
(axis=0, numeric_only=False)¶ Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.
- axis : {0, 1, ‘index’, ‘columns’} (default 0)
- 0/’index’ : get mode of each column
- 1/’columns’ : get mode of each row
- numeric_only : boolean, default False
- if True, only apply to numeric columns
modes : DataFrame (sorted)
-
mul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
multiply
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
ndim
¶ Number of axes / array dimensions
-
ne
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ne
-
notnull
()¶ Return a boolean same-sized object indicating if the values are not null
-
pct_change
(periods=1, fill_method='pad', limit=None, freq=None, **kwds)¶ Percent change over given number of periods
- periods : int, default 1
- Periods to shift for forming percent change
- fill_method : str, default ‘pad’
- How to handle NAs before computing percent changes
- limit : int, default None
- The number of consecutive NAs to fill before stopping
- freq : DateOffset, timedelta, or offset alias string, optional
- Increment to use from time series API (e.g. ‘M’ or BDay())
chg : same type as caller
-
pivot
(index=None, columns=None, values=None)¶ Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)
- index : string or object
- Column name to use to make new frame’s index
- columns : string or object
- Column name to use to make new frame’s columns
- values : string or object, optional
- Column name to use for populating new frame’s values
For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods
>>> df foo bar baz 0 one A 1. 1 one B 2. 2 one C 3. 3 two A 4. 4 two B 5. 5 two C 6.
>>> df.pivot('foo', 'bar', 'baz') A B C one 1 2 3 two 4 5 6
>>> df.pivot('foo', 'bar')['baz'] A B C one 1 2 3 two 4 5 6
- pivoted : DataFrame
- If no values column specified, will have hierarchically indexed columns
-
pivot_table
(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)¶ Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on
Keys to group on the x-axis of the pivot table- cols : list of column names or arrays to group on
- Keys to group on the y-axis of the pivot table
- aggfunc : function, default numpy.mean, or list of functions
- If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
- fill_value : scalar, default None
- Value to replace missing values with
- margins : boolean, default False
- Add all row / columns (e.g. for subtotal / grand totals)
- dropna : boolean, default True
- Do not include columns whose entries are all NaN
>>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
>>> table = pivot_table(df, values='D', rows=['A', 'B'], ... cols=['C'], aggfunc=np.sum) >>> table small large foo one 1 4 two 6 NaN bar one 5 4 two 6 7
table : DataFrame
-
plot
(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)¶ Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.
frame : DataFrame x : label or position, default None y : label or position, default None
Allows plotting of one column versus another- subplots : boolean, default False
- Make separate subplots for each time series
- sharex : boolean, default True
- In case subplots=True, share x axis
- sharey : boolean, default False
- In case subplots=True, share y axis
- use_index : boolean, default True
- Use index as ticks for x axis
- stacked : boolean, default False
- If True, create stacked bar plot. Only valid for DataFrame input
- sort_columns: boolean, default False
- Sort column names to determine plot ordering
- title : string
- Title to use for the plot
- grid : boolean, default None (matlab style default)
- Axis grid lines
- legend : boolean, default True
- Place legend on axis subplots
ax : matplotlib axis object, default None style : list or dict
matplotlib line style per column- kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
- bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
- logx : boolean, default False
- For line plots, use log scaling on x axis
- logy : boolean, default False
- For line plots, use log scaling on y axis
- xticks : sequence
- Values to use for the xticks
- yticks : sequence
- Values to use for the yticks
xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None
Rotation for ticks- secondary_y : boolean or sequence, default False
- Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
- mark_right: boolean, default True
- When using a secondary_y axis, should the legend label the axis of the various columns automatically
- colormap : str or matplotlib colormap object, default None
- Colormap to select colors from. If string, load colormap with that name from matplotlib.
- kwds : keywords
- Options to pass to matplotlib plotting method
ax_or_axes : matplotlib.AxesSubplot or list of them
-
pop
(item)¶ Return item and drop from frame. Raise KeyError if not found.
-
pow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator pow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
prod
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
product
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
quantile
(q=0.5, axis=0, numeric_only=True)¶ Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats
- q : quantile, default 0.5 (50% quantile)
- 0 <= q <= 1
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
quantiles : Series
-
query
(expr, **kwargs)¶ Query the columns of a frame with a boolean expression.
- expr : string
- The query string to evaluate. The result of the evaluation of this
expression is first passed to
loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to__getitem__()
. - kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
q : DataFrame or Series
This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
index
andcolumns
attributes of theDataFrame
instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for this variable, and you can also use the name of the index to identify it in a query.For further details and examples see the
query
documentation in indexing.pandas.eval DataFrame.eval
>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.query('a > b') >>> df[df.a > df.b] # same result as the previous expression
-
radd
(other, axis='columns', level=None, fill_value=None)¶ Binary operator radd with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rank
(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)¶ Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values
- axis : {0, 1}, default 0
- Ranks over columns (0) or rows (1)
- numeric_only : boolean, default None
- Include only float, int, boolean data
- method : {‘average’, ‘min’, ‘max’, ‘first’}
- average: average rank of group
- min: lowest rank in group
- max: highest rank in group
- first: ranks assigned in order they appear in the array
- na_option : {‘keep’, ‘top’, ‘bottom’}
- keep: leave NA values where they are
- top: smallest rank if ascending
- bottom: smallest rank if descending
- ascending : boolean, default True
- False for ranks by high (1) to low (N)
ranks : DataFrame
-
rdiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
reindex
(index=None, columns=None, **kwargs)¶ Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index, columns : array-like, optional (can be specified in order, or as
- keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
- limit : int, default None
- Maximum size gap to forward or backward fill
- takeable : boolean, default False
- treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])
reindexed : DataFrame
-
reindex_axis
(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)¶ Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index : array-like, optional
- New labels / index to conform to. Preferably an Index object to avoid duplicating data
axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- limit : int, default None
- Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)
reindex, reindex_like
reindexed : DataFrame
-
reindex_like
(other, method=None, copy=True, limit=None)¶ return an object with matching indicies to myself
other : Object method : string or None copy : boolean, default True limit : int, default None
Maximum size gap to forward or backward fill- Like calling s.reindex(index=other.index, columns=other.columns,
- method=...)
reindexed : same as input
-
rename
(index=None, columns=None, **kwargs)¶ Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
- index, columns : dict-like or function, optional
- Transformation to apply to that axis values
- copy : boolean, default True
- Also copy underlying data
- inplace : boolean, default False
- Whether to return a new DataFrame. If True then value of copy is ignored.
renamed : DataFrame (new object)
-
rename_axis
(mapper, axis=0, copy=True, inplace=False)¶ Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True
Also copy underlying datainplace : boolean, default False
renamed : type of caller
-
reorder_levels
(order, axis=0)¶ Rearrange index levels using input order. May not drop or duplicate levels
- order : list of int or list of str
- List representing new level order. Reference level by number (position) or by key (label).
- axis : int
- Where to reorder levels.
type of caller (new object)
-
replace
(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)¶ Replace values given in ‘to_replace’ with ‘value’.
to_replace : str, regex, list, dict, Series, numeric, or None
str or regex:
- str: string exactly matching to_replace will be replaced with value
- regex: regexs matching to_replace will be replaced with value
list of str, regex, or numeric:
- First, if to_replace and value are both lists, they must be the same length.
- Second, if
regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use. - str and regex rules apply as above.
dict:
- Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
- Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
None:
- This means that the
regex
argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is alsoNone
then this must be a nested dictionary orSeries
.
- This means that the
See the examples section for examples of each of these.
- value : scalar, dict, list, str, regex, default None
- Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
- inplace : boolean, default False
- If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
- limit : int, default None
- Maximum size gap to forward or backward fill
- regex : bool or same types as to_replace, default False
- Whether to interpret to_replace and/or value as regular
expressions. If this is
True
then to_replace must be a string. Otherwise, to_replace must beNone
because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions. - method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
- The method to use when for replacement, when
to_replace
is alist
.
NDFrame.reindex NDFrame.asfreq NDFrame.fillna
filled : NDFrame
- AssertionError
- If regex is not a
bool
and to_replace is notNone
.
- If regex is not a
- TypeError
- If to_replace is a
dict
and value is not alist
,dict
,ndarray
, orSeries
- If to_replace is
None
and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
- If to_replace is a
- ValueError
- If to_replace and value are
list
s orndarray
s, but they are not the same length.
- If to_replace and value are
- Regex substitution is performed under the hood with
re.sub
. The rules for substitution forre.sub
are the same. - Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
- This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
-
resample
(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)¶ Convenience method for frequency conversion and resampling of regular time-series data.
- rule : string
- the offset string or object representing target conversion
- how : string
- method for down- or re-sampling, default to ‘mean’ for downsampling
axis : int, optional, default 0 fill_method : string, default None
fill_method for upsampling- closed : {‘right’, ‘left’}
- Which side of bin interval is closed
- label : {‘right’, ‘left’}
- Which bin edge label to label bucket with
convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta
Adjust the resampled time labels- limit : int, default None
- Maximum size gap to when reindexing with fill_method
- base : int, default 0
- For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
-
reset_index
(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶ For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.
- level : int, str, tuple, or list, default None
- Only remove the given levels from the index. Removes all levels by default
- drop : boolean, default False
- Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- col_level : int or str, default 0
- If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
- col_fill : object, default ‘’
- If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
resetted : DataFrame
-
rfloordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rpow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rsub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rtruediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
save
(path)¶ Deprecated. Use to_pickle instead
-
select
(crit, axis=0)¶ Return data corresponding to axis labels matching criteria
- crit : function
- To be called on each index (label). Should return True or False
axis : int
selection : type of caller
-
set_index
(keys, drop=True, append=False, inplace=False, verify_integrity=False)¶ Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.
keys : column label or list of column labels / arrays drop : boolean, default True
Delete columns to be used as the new index- append : boolean, default False
- Whether to append columns to existing index
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- verify_integrity : boolean, default False
- Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B']) >>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]]) >>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])
dataframe : DataFrame
-
set_value
(index, col, value)¶ Put single value at passed column and index
index : row label col : column label value : scalar value
- frame : DataFrame
- If label pair is contained, will be reference to calling DataFrame, otherwise a new object
-
shape
¶
-
shift
(periods=1, freq=None, axis=0, **kwds)¶ Shift index by desired number of periods with an optional time freq
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, optional
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
If freq is specified then the index values are shifted but the data if not realigned
shifted : same type as caller
-
skew
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased skew over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
skew : Series or DataFrame (if level specified)
-
sort
(columns=None, column=None, axis=0, ascending=True, inplace=False)¶ Sort DataFrame either by labels (along either axis) or by the values in column(s)
- columns : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- axis : {0, 1}
- Sort index/rows versus columns
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])
sorted : DataFrame
-
sort_index
(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')¶ Sort DataFrame either by labels (along either axis) or by the values in a column
- axis : {0, 1}
- Sort index/rows versus columns
- by : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])
sorted : DataFrame
-
sortlevel
(level=0, axis=0, ascending=True, inplace=False)¶ Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)
level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False
Sort the DataFrame without creating a new instancesorted : DataFrame
-
squeeze
()¶ squeeze length 1 dimensions
-
stack
(level=-1, dropna=True)¶ Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.
- level : int, string, or list of these, default last level
- Level(s) to stack, can pass level name
- dropna : boolean, default True
- Whether to drop rows in the resulting Frame/Series with no valid values
>>> s a b one 1. 2. two 3. 4.
>>> s.stack() one a 1 b 2 two a 3 b 4
stacked : DataFrame or Series
-
std
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased standard deviation over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
stdev : Series or DataFrame (if level specified)
-
sub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
subtract
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
sum
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the sum of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
sum : Series or DataFrame (if level specified)
-
swapaxes
(axis1, axis2, copy=True)¶ Interchange axes and swap values axes appropriately
y : same as input
-
swaplevel
(i, j, axis=0)¶ Swap levels i and j in a MultiIndex on a particular axis
- i, j : int, string (can be mixed)
- Level of index to be swapped. Can pass level name as string.
swapped : type of caller (new object)
-
tail
(n=5)¶ Returns last n rows
-
take
(indices, axis=0, convert=True, is_copy=True)¶ Analogous to ndarray.take
indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy
taken : type of caller
-
to_clipboard
(excel=None, sep=None, **kwargs)¶ Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.
- excel : boolean, defaults to True
- if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard
sep : optional, defaults to tab other keywords are passed to to_csv
- Requirements for your platform
- Linux: xclip, or xsel (with gtk or PyQt4 modules)
- Windows: none
- OS X: none
-
to_csv
(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)¶ Write DataFrame to a comma-separated values (csv) file
- path_or_buf : string or file handle / StringIO
- File path
- sep : character, default ”,”
- Field delimiter for the output file.
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, or False, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
- nanRep : None
- deprecated, use na_rep
- mode : str
- Python write mode, default ‘w’
- encoding : string, optional
- a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
- line_terminator : string, default ‘\\n’
- The newline character or character sequence to use in the output file
- quoting : optional constant from csv module
- defaults to csv.QUOTE_MINIMAL
- chunksize : int or None
- rows to write at a time
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- date_format : string, default None
- Format string for datetime objects.
-
to_dense
()¶ Return dense representation of NDFrame (as opposed to sparse)
-
to_dict
(outtype='dict')¶ Convert DataFrame to dictionary.
- outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
- Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.
result : dict like {column -> {index -> value}}
-
to_excel
(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)¶ Write DataFrame to a excel sheet
- excel_writer : string or ExcelWriter object
- File path or existing ExcelWriter
- sheet_name : string, default ‘Sheet1’
- Name of sheet which will contain DataFrame
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startow :
- upper left cell row to dump data frame
- startcol :
- upper left cell column to dump data frame
- engine : string, default None
- write engine to use - you can also set this via the options
io.excel.xlsx.writer
,io.excel.xls.writer
, andio.excel.xlsm.writer
. - merge_cells : boolean, default True
- Write MultiIndex and Hierarchical Rows as merged cells.
If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:
>>> writer = ExcelWriter('output.xlsx') >>> df1.to_excel(writer,'Sheet1') >>> df2.to_excel(writer,'Sheet2') >>> writer.save()
-
to_gbq
(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)¶ Write a DataFrame to a Google BigQuery table.
If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.
- destination_table : string
- name of table to be written, in the form ‘dataset.tablename’
- schema : sequence (optional)
- list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
- col_order : sequence (optional)
- order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
- if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
kwargs are passed to the Client constructor
- SchemaMissing :
- Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
- TableExists :
- Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
- InvalidSchema :
- Raised if the ‘schema’ parameter does not match the provided DataFrame
-
to_hdf
(path_or_buf, key, **kwargs)¶ activate the HDFStore
path_or_buf : the path (string) or buffer to put the store key : string
indentifier for the group in the storemode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’
'r'
- Read-only; no data can be modified.
'w'
- Write; a new file is created (an existing file with the same name would be deleted).
'a'
- Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
- It is similar to
'a'
, but the file must already exist.
- format : ‘fixed(f)|table(t)’, default is ‘fixed’
- fixed(f) : Fixed format
- Fast writing/reading. Not-appendable, nor searchable
- table(t) : Table format
- Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
- append : boolean, default False
- For Table formats, append the input data to the existing
- complevel : int, 1-9, default 0
- If a complib is specified compression will be applied where possible
- complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
- If complevel is > 0 apply compression to objects written in the store wherever possible
- fletcher32 : bool, default False
- If applying compression use the fletcher32 checksum
-
to_html
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame as an HTML table.
to_html-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- classes : str or list or tuple, default None
- CSS class(es) to apply to the resulting html table
- escape : boolean, default True
- Convert the characters <, >, and & to HTML-safe sequences.=
- max_rows : int, optional
- Maximum number of rows to show before truncating. If None, show all.
- max_cols : int, optional
- Maximum number of columns to show before truncating. If None, show all.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_json
(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)¶ Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.
- path_or_buf : the path or buffer to write the result string
- if this is None, return a StringIO of the converted string
orient : string
- Series
- default is ‘index’
- allowed values are: {‘split’,’records’,’index’}
- DataFrame
- default is ‘columns’
- allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
- The format of the JSON string
- split : dict like {index -> [index], columns -> [columns], data -> [values]}
- records : list like [{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
- columns : dict like {column -> {index -> value}}
- values : just the values array
- date_format : {‘epoch’, ‘iso’}
- Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
- double_precision : The number of decimal places to use when encoding
- floating point values, default 10.
force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.- default_handler : callable, default None
- Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
same type as input object with filtered info axis
-
to_latex
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)¶ Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.
to_latex-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_msgpack
(path_or_buf=None, **kwargs)¶ msgpack (serialize) object to input file path
THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.
- path : string File path, buffer-like, or None
- if None, return generated string
- append : boolean whether to append to an existing msgpack
- (default is False)
- compress : type of compressor (zlib or blosc), default to None (no
- compression)
-
to_panel
()¶ Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.
Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later
panel : Panel
-
to_period
(freq=None, axis=0, copy=True)¶ Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)
freq : string, default axis : {0, 1}, default 0
The axis to convert (the index by default)- copy : boolean, default True
- If False then underlying input data is not copied
ts : TimeSeries with PeriodIndex
-
to_pickle
(path)¶ Pickle (serialize) object to input file path
- path : string
- File path
-
to_records
(index=True, convert_datetime64=True)¶ Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested
- index : boolean, default True
- Include index in resulting record array, stored in ‘index’ field
- convert_datetime64 : boolean, default True
- Whether to convert the index to datetime.datetime if it is a DatetimeIndex
y : recarray
-
to_sparse
(fill_value=None, kind='block')¶ Convert to SparseDataFrame
fill_value : float, default NaN kind : {‘block’, ‘integer’}
y : SparseDataFrame
-
to_sql
(name, con, flavor='sqlite', if_exists='fail', **kwargs)¶ Write records stored in a DataFrame to a SQL database.
- name : str
- Name of SQL table
conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
-
to_stata
(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)¶ A class for writing Stata binary dta files from array-like objects
- fname : file path or buffer
- Where to save the dta file.
- convert_dates : dict
- Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
- encoding : str
- Default is latin-1. Note that Stata does not support unicode.
- byteorder : str
- Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data) >>> writer.write_file()
Or with dates
>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'}) >>> writer.write_file()
-
to_string
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame to a console-friendly tabular output.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_timestamp
(freq=None, how='start', axis=0, copy=True)¶ Cast to DatetimeIndex of timestamps, at beginning of period
- freq : string, default frequency of PeriodIndex
- Desired frequency
- how : {‘s’, ‘e’, ‘start’, ‘end’}
- Convention for converting period to timestamp; start of period vs. end
- axis : {0, 1} default 0
- The axis to convert (the index by default)
- copy : boolean, default True
- If false then underlying input data is not copied
df : DataFrame with DatetimeIndex
-
to_wide
(*args, **kwargs)¶
-
transpose
()¶ Transpose index and columns
-
truediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
truncate
(before=None, after=None, axis=None, copy=True)¶ Truncates a sorted NDFrame before and/or after some particular dates.
- before : date
- Truncate before date
- after : date
- Truncate after date
axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,
return a copy of the truncated sectiontruncated : type of caller
-
tshift
(periods=1, freq=None, axis=0, **kwds)¶ Shift the time index, using the index’s frequency if available
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, default None
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
- axis : int or basestring
- Corresponds to the axis that contains the Index
If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown
shifted : NDFrame
-
tz_convert
(tz, axis=0, copy=True)¶ Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data
-
tz_localize
(tz, axis=0, copy=True, infer_dst=False)¶ Localize tz-naive TimeSeries to target time zone
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data- infer_dst : boolean, default False
- Attempt to infer fall dst-transition times based on order
-
unstack
(level=-1)¶ Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)
- level : int, string, or list of these, default -1 (last level)
- Level(s) of index to unstack, can pass level name
DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation
from unstack).>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ... ('two', 'a'), ('two', 'b')]) >>> s = pd.Series(np.arange(1.0, 5.0), index=index) >>> s one a 1 b 2 two a 3 b 4 dtype: float64
>>> s.unstack(level=-1) a b one 1 2 two 3 4
>>> s.unstack(level=0) one two a 1 3 b 2 4
>>> df = s.unstack(level=0) >>> df.unstack() one a 1. b 3. two a 2. b 4.
unstacked : DataFrame or Series
-
update
(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)¶ Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices
other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame- filter_func : callable(1d-array) -> 1d-array<boolean>, default None
- Can choose to replace values other than NA. Return True for values that should be updated
- raise_conflict : boolean
- If True, will raise an error if the DataFrame and other both contain data in the same place.
-
values
¶ Numpy representation of NDFrame
-
var
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased variance over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
variance : Series or DataFrame (if level specified)
-
where
(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)¶ Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False
Whether to perform the operation in place on the dataaxis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False
try to cast the result back to the input type (if possible),- raise_on_error : boolean, default True
- Whether to raise on invalid data types (e.g. trying to where on strings)
wh : same type as caller
-
xs
(key, axis=0, level=None, copy=True, drop_level=True)¶ Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).
- key : object
- Some label contained in the index, or partially in a MultiIndex
- axis : int, default 0
- Axis to retrieve cross-section on
- level : object, defaults to first n levels (n=1 or len(key))
- In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
- copy : boolean, default True
- Whether to make a copy of the data
- drop_level : boolean, default True
- If False, returns object with same levels as self.
>>> df A B C a 4 5 2 b 4 0 9 c 9 7 3 >>> df.xs('a') A 4 B 5 C 2 Name: a >>> df.xs('C', axis=1) a 2 b 9 c 3 Name: C >>> s = df.xs('a', copy=False) >>> s['A'] = 100 >>> df A B C a 100 5 2 b 4 0 9 c 9 7 3
>>> df A B C D first second third bar one 1 4 1 8 9 two 1 7 5 5 0 baz one 1 6 6 8 0 three 2 5 3 5 3 >>> df.xs(('baz', 'three')) A B C D third 2 5 3 5 3 >>> df.xs('one', level=1) A B C D first third bar 1 4 1 8 9 baz 1 6 6 8 0 >>> df.xs(('baz', 2), level=[0, 'third']) A B C D second three 5 3 5 3
xs : Series or DataFrame
-
-
class
Fred2.Core.Result.
CleavageSitePredictionResult
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
Fred2.Core.Result.AResult
A
CleavageSitePredictionResult
object is apandas.DataFrame
with multi-indexing, where column Ids are the prediction scores fo the different prediction methods, as well as the amino acid a a specific position, row ID theProtein
ID and the position of the sequence (starting at 0).CleavageSitePredictionResult:
ID Pos Seq Method_name protein_ID 0 S 0.56 1 Y 15 2 F 0.36 3 P 10 -
T
¶ Transpose index and columns
-
abs
()¶ Return an object with absolute value taken. Only applicable to objects that are all numeric
abs: type of caller
-
add
(other, axis='columns', level=None, fill_value=None)¶ Binary operator add with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
add_prefix
(prefix)¶ Concatenate prefix string with panel items names.
prefix : string
with_prefix : type of caller
-
add_suffix
(suffix)¶ Concatenate suffix string with panel items names
suffix : string
with_suffix : type of caller
-
align
(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)¶ Align two object on their axes with the specified join method for each axis Index
other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None
Align on index (0), columns (1), or both (None)- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- copy : boolean, default True
- Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
method : str, default None limit : int, default None fill_axis : {0, 1}, default 0
Filling axis, method and limit- (left, right) : (type of input, type of other)
- Aligned objects
-
all
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
any
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any element is True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
append
(other, ignore_index=False, verify_integrity=False)¶ Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.
other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False
If True do not use the index labels. Useful for gluing together record arrays- verify_integrity : boolean, default False
- If True, raise ValueError on creating index with duplicates
If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged
appended : DataFrame
-
apply
(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)¶ Applies function along input axis of DataFrame.
Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.
- func : function
- Function to apply to each column/row
- axis : {0, 1}
- 0 : apply function to each column
- 1 : apply function to each row
- broadcast : boolean, default False
- For aggregation functions, return object of same size with values propagated
- reduce : boolean or None, default None
- Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
- raw : boolean, default False
- If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
- args : tuple
- Positional arguments to pass to function in addition to the array/series
Additional keyword arguments will be passed as keywords to the function
>>> df.apply(numpy.sqrt) # returns DataFrame >>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0) >>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)
DataFrame.applymap: For elementwise operations
applied : Series or DataFrame
-
applymap
(func)¶ Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
- func : function
- Python function, returns a single value from a single value
applied : DataFrame
DataFrame.apply : For operations on rows/columns
-
as_blocks
(columns=None)¶ Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.
are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
- as_matrix)
- columns : array-like
- Specific column order
values : a list of Object
-
as_matrix
(columns=None)¶ Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtype will be a lower-common-denominator dtype (implicit
upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks
- e.g. if the dtypes are float16,float32 -> float32
- float16,float32,float64 -> float64 int32,uint8 -> int32
- values : ndarray
- If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
-
asfreq
(freq, method=None, how=None, normalize=False)¶ Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.
freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method- how : {‘start’, ‘end’}, default end
- For PeriodIndex only, see PeriodIndex.asfreq
- normalize : bool, default False
- Whether to reset output index to midnight
converted : type of caller
-
astype
(dtype, copy=True, raise_on_error=True)¶ Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)
dtype : numpy.dtype or Python type raise_on_error : raise on invalid input
casted : type of caller
-
at
¶
-
at_time
(time, asof=False)¶ Select values at particular time of day (e.g. 9:30AM)
time : datetime.time or string
values_at_time : type of caller
-
axes
¶
-
between_time
(start_time, end_time, include_start=True, include_end=True)¶ Select values between particular times of the day (e.g., 9:00-9:30 AM)
start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True
values_between_time : type of caller
-
bfill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’bfill’)
-
blocks
¶ Internal property, property synonym for as_blocks()
-
bool
()¶ Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False
Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean
-
boxplot
(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)¶ Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns
data : DataFrame column : column names or list of names, or vector
Can be any valid input to groupby- by : string or sequence
- Column in the DataFrame to group by
ax : matplotlib axis object, default None fontsize : int or string rot : int, default None
Rotation for ticks- grid : boolean, default None (matlab style default)
- Axis grid lines
ax : matplotlib.axes.AxesSubplot
-
clip
(lower=None, upper=None, out=None)¶ Trim values at input threshold(s)
lower : float, default None upper : float, default None
clipped : Series
-
clip_lower
(threshold)¶ Return copy of the input with values below given value truncated
clip
clipped : same type as input
-
clip_upper
(threshold)¶ Return copy of input with values above given value truncated
clip
clipped : same type as input
-
combine
(other, func, fill_value=None, overwrite=True)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True
If True then overwrite values for common keys in the calling frameresult : DataFrame
-
combineAdd
(other)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combineMult
(other)¶ Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combine_first
(other)¶ Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns
other : DataFrame
a’s values prioritized, use values from b to fill holes:
>>> a.combine_first(b)
combined : DataFrame
-
compound
(axis=None, skipna=None, level=None, **kwargs)¶ Return the compound percentage of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
compounded : Series or DataFrame (if level specified)
-
consolidate
(inplace=False)¶ Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user
- inplace : boolean, default False
- If False return new object, otherwise modify existing object
consolidated : type of caller
-
convert_objects
(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)¶ Attempt to infer better dtype for object columns
- convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
- convert_numeric : if True attempt to coerce to numbers (including
- strings), non-convertibles get NaN
- convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
copy : Boolean, if True, return copy, default is True
converted : asm as input object
-
copy
(deep=True)¶ Make a copy of this object
- deep : boolean, default True
- Make a deep copy, i.e. also copy data
copy : type of caller
-
corr
(method='pearson', min_periods=1)¶ Compute pairwise correlation of columns, excluding NA/null values
- method : {‘pearson’, ‘kendall’, ‘spearman’}
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation
y : DataFrame
-
corrwith
(other, axis=0, drop=False)¶ Compute pairwise correlation between rows or columns of two DataFrame objects.
other : DataFrame axis : {0, 1}
0 to compute column-wise, 1 for row-wise- drop : boolean, default False
- Drop missing indices from result, default returns union of all
correls : Series
-
count
(axis=0, level=None, numeric_only=False)¶ Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- numeric_only : boolean, default False
- Include only float, int, boolean data
count : Series (or DataFrame if level specified)
-
cov
(min_periods=None)¶ Compute pairwise covariance of columns, excluding NA/null values
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result.
y : DataFrame
y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).
-
cummax
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative max over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmax : Series
-
cummin
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative min over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmin : Series
-
cumprod
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative prod over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAprod : Series
-
cumsum
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative sum over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAsum : Series
-
delevel
(*args, **kwargs)¶
-
describe
(percentile_width=50)¶ Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles
- percentile_width : float, optional
- width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75
DataFrame of summary statistics
-
diff
(periods=1)¶ 1st discrete difference of object
- periods : int, default 1
- Periods to shift for forming difference
diffed : DataFrame
-
div
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
divide
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
dot
(other)¶ Matrix multiplication with DataFrame or Series objects
other : DataFrame or Series
dot_product : DataFrame or Series
-
drop
(labels, axis=0, level=None, inplace=False, **kwargs)¶ Return new object with labels in requested axis removed
labels : single label or list-like axis : int or axis name level : int or name, default None
For MultiIndex- inplace : bool, default False
- If True, do operation inplace and return None.
dropped : type of caller
-
drop_duplicates
(cols=None, take_last=False, inplace=False)¶ Return DataFrame with duplicate rows removed, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
- inplace : boolean, default False
- Whether to drop duplicates in place or to return a copy
deduplicated : DataFrame
-
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Return object with labels on given axis omitted where alternately any or all of the data are missing
- axis : {0, 1}, or tuple/list thereof
- Pass tuple or list to drop on multiple axes
- how : {‘any’, ‘all’}
- any : if any NA values are present, drop that label
- all : if all values are NA, drop that label
- thresh : int, default None
- int value : require that many non-NA values
- subset : array-like
- Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
- inplace : boolean, defalt False
- If True, do operation inplace and return None.
dropped : DataFrame
-
dtypes
¶ Return the dtypes in this object
-
duplicated
(cols=None, take_last=False)¶ Return boolean Series denoting duplicate rows, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
duplicated : Series
-
empty
¶ True if NDFrame is entirely empty [no items]
-
eq
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods eq
-
equals
(other)¶ Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
-
eval
(expr, **kwargs)¶ Evaluate an expression in the context of the calling DataFrame instance.
- expr : string
- The expression string to evaluate.
- kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
ret : ndarray, scalar, or pandas object
pandas.DataFrame.query pandas.eval
For more details see the API documentation for
eval()
. For detailed examples see enhancing performance with eval.>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.eval('a + b') >>> df.eval('c=a + b')
-
ffill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’ffill’)
-
fillna
(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)¶ Fill NA/NaN values using the specified method
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- value : scalar, dict, or Series
- Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- inplace : boolean, default False
- If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
- limit : int, default None
- Maximum size gap to forward or backward fill
- downcast : dict, default is None
- a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)
reindex, asfreq
filled : same type as caller
-
filter
(items=None, like=None, regex=None, axis=None)¶ Restrict the info axis to set of items or wildcard
- items : list-like
- List of info axis to restrict to (must not all be present)
- like : string
- Keep info axis where “arg in col == True”
- regex : string (regular expression)
- Keep info axis with re.search(regex, col) == True
Arguments are mutually exclusive, but this is not checked for
-
filter_result
(expressions)¶ Filters a result data frame based on a specified expression consisting of a list of triple with (method_name, comparator, threshold). The expression is applied to each row. If any of the columns fulfill the criteria the row remains.
Parameters: expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold) Returns: A new filtered result object Return type: CleavageSitePredictionResult
-
first
(offset)¶ Convenience method for subsetting initial periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘10D’) -> First 10 days
subset : type of caller
-
first_valid_index
()¶ Return label for first non-NA/null value
-
floordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
from_csv
(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)¶ Read delimited file into DataFrame
path : string file path or file handle / StringIO header : int, default 0
Row to use at header (skip prior rows)- sep : string, default ‘,’
- Field delimiter
- index_col : int or sequence, default 0
- Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
- parse_dates : boolean, default True
- Parse dates. Different default from read_table
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- infer_datetime_format: boolean, default False
- If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.
Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data
y : DataFrame
-
from_dict
(data, orient='columns', dtype=None)¶ Construct DataFrame from dict of array-like or dicts
- data : dict
- {field : array-like} or {field : dict}
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.
DataFrame
-
from_items
(items, columns=None, orient='columns')¶ Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.
- items : sequence of (key, value) pairs
- Values should be arrays or Series.
- columns : sequence of column labels, optional
- Must be passed if orient=’index’.
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.
frame : DataFrame
-
from_records
(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)¶ Convert structured or record ndarray to DataFrame
data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like
Field of array to use as the index, alternately a specific set of input labels to use- exclude : sequence, default None
- Columns or fields to exclude
- columns : sequence, default None
- Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
- coerce_float : boolean, default False
- Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets
df : DataFrame
-
ftypes
¶ Return the ftypes (indication of sparse/dense and dtype) in this object.
-
ge
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ge
-
get
(key, default=None)¶ Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found
key : object
value : type of items contained in object
-
get_dtype_counts
()¶ Return the counts of dtypes in this object
-
get_ftype_counts
()¶ Return the counts of ftypes in this object
-
get_value
(index, col)¶ Quickly retrieve single value at passed column and index
index : row label col : column label
value : scalar value
-
get_values
()¶ same as values (but handles sparseness conversions)
-
groupby
(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)¶ Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
- by : mapping function / list of functions, dict, Series, or tuple /
- list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
axis : int, default 0 level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels- as_index : boolean, default True
- For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
- sort : boolean, default True
- Sort group keys. Get better performance by turning this off
- group_keys : boolean, default True
- When calling apply, add group keys to index to identify pieces
- squeeze : boolean, default False
- reduce the dimensionaility of the return type if possible, otherwise return a consistent type
# DataFrame result >>> data.groupby(func, axis=0).mean()
# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()
# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()
GroupBy object
-
gt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods gt
-
head
(n=5)¶ Returns first n rows
-
hist
(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)¶ Draw histogram of the DataFrame’s series using matplotlib / pylab.
data : DataFrame column : string or sequence
If passed, will be used to limit data to a subset of columns- by : object, optional
- If passed, then used to form histograms for separate groups
- grid : boolean, default True
- Whether to show axis grid lines
- xlabelsize : int, default None
- If specified changes the x-axis label size
- xrot : float, default None
- rotation of x axis labels
- ylabelsize : int, default None
- If specified changes the y-axis label size
- yrot : float, default None
- rotation of y axis labels
ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple
The size of the figure to create in inches by defaultlayout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments
To be passed to hist function
-
iat
¶
-
icol
(i)¶
-
idxmax
(axis=0, skipna=True)¶ Return index of first occurrence of maximum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be first index.
idxmax : Series
This method is the DataFrame version of
ndarray.argmax
.Series.idxmax
-
idxmin
(axis=0, skipna=True)¶ Return index of first occurrence of minimum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
idxmin : Series
This method is the DataFrame version of
ndarray.argmin
.Series.idxmin
-
iget_value
(i, j)¶
-
iloc
¶
-
info
(verbose=True, buf=None, max_cols=None)¶ Concise summary of a DataFrame.
- verbose : boolean, default True
- If False, don’t print column count summary
buf : writable buffer, defaults to sys.stdout max_cols : int, default None
Determines whether full summary or short summary is printed
-
insert
(loc, column, value, allow_duplicates=False)¶ Insert column into DataFrame at specified location.
If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.
- loc : int
- Must have 0 <= loc <= len(columns)
column : object value : int, Series, or array-like
-
interpolate
(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)¶ Interpolate values according to different methods.
- method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
- ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
- ‘linear’: ignore the index and treat the values as equally spaced. default
- ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
- ‘index’: use the actual numerical values of the index
- ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
- ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- limit : int, default None.
- Maximum number of consecutive NaNs to fill.
- inplace : bool, default False
- Update the NDFrame in place if possible.
- downcast : optional, ‘infer’ or None, defaults to ‘infer’
- Downcast dtypes if possible.
Series or DataFrame of same shape interpolated at the NaNs
reindex, replace, fillna
# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64
-
irow
(i, copy=False)¶
-
is_copy
= None¶
-
isin
(values)¶ Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
- values : iterable, Series, DataFrame or dictionary
- The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
DataFrame of booleans
When
values
is a list:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> df.isin([1, 3, 12, 'a']) A B 0 True True 1 False False 2 True False
When
values
is a dict:>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]}) >>> df.isin({'A': [1, 3], 'B': [4, 7, 12]}) A B 0 True False # Note that B didn't match the 1 here. 1 False True 2 True True
When
values
is a Series or DataFrame:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']}) >>> df.isin(other) A B 0 True False 1 False False # Column A in `other` has a 3, but not at index 1. 2 True True
-
isnull
()¶ Return a boolean same-sized object indicating if the values are null
-
iteritems
()¶ Iterator over (column, series) pairs
-
iterkv
(*args, **kwargs)¶ iteritems alias used to get around 2to3. Deprecated
-
iterrows
()¶ Iterate over rows of DataFrame as (index, Series) pairs.
iterrows
does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,>>> df = DataFrame([[1, 1.0]], columns=['x', 'y']) >>> row = next(df.iterrows())[1] >>> print(row['x'].dtype) float64 >>> print(df['x'].dtype) int64
- it : generator
- A generator that iterates over the rows of the frame.
-
itertuples
(index=True)¶ Iterate over rows of DataFrame as tuples, with index value as first element of the tuple
-
ix
¶
-
join
(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)¶ Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
- other : DataFrame, Series with name field set, or list of DataFrame
- Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
- on : column name, tuple/list of column names, or array-like
- Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
- how : {‘left’, ‘right’, ‘outer’, ‘inner’}
How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise
- left: use calling frame’s index
- right: use input frame’s index
- outer: form union of indexes
- inner: use intersection of indexes
- lsuffix : string
- Suffix to use from left frame’s overlapping columns
- rsuffix : string
- Suffix to use from right frame’s overlapping columns
- sort : boolean, default False
- Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame
on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects
joined : DataFrame
-
keys
()¶ Get the ‘info axis’ (see Indexing for more)
This is index for Series, columns for DataFrame and major_axis for Panel.
-
kurt
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
kurtosis
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
last
(offset)¶ Convenience method for subsetting final periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘5M’) -> Last 5 months
subset : type of caller
-
last_valid_index
()¶ Return label for last non-NA/null value
-
le
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods le
-
load
(path)¶ Deprecated. Use read_pickle instead.
-
loc
¶
-
lookup
(row_labels, col_labels)¶ Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.
- row_labels : sequence
- The row labels to use for lookup
- col_labels : sequence
- The column labels to use for lookup
Akin to:
result = [] for row, col in zip(row_labels, col_labels): result.append(df.get_value(row, col))
- values : ndarray
- The found values
-
lt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods lt
-
mad
(axis=None, skipna=None, level=None, **kwargs)¶ Return the mean absolute deviation of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mad : Series or DataFrame (if level specified)
-
mask
(cond)¶ Returns copy whose values are replaced with nan if the inverted condition is True
cond : boolean NDFrame or array
wh: same as input
-
max
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the maximum of the values in the object. If you want the index of the maximum, use
idxmax
. This is the equivalent of thenumpy.ndarray
methodargmax
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
max : Series or DataFrame (if level specified)
-
mean
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mean : Series or DataFrame (if level specified)
-
median
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the median of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
median : Series or DataFrame (if level specified)
-
merge
(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)¶ Merge DataFrame objects by performing a database-style join operation by columns or indexes.
If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
- left: use only keys from left frame (SQL: left outer join)
- right: use only keys from right frame (SQL: right outer join)
- outer: use union of keys from both frames (SQL: full outer join)
- inner: use intersection of keys from both frames (SQL: inner join)
- on : label or list
- Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
- left_on : label or list, or array-like
- Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
- right_on : label or list, or array-like
- Field names to join on in right DataFrame or vector/list of vectors per left_on docs
- left_index : boolean, default False
- Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
- right_index : boolean, default False
- Use the index from the right DataFrame as the join key. Same caveats as left_index
- sort : boolean, default False
- Sort the join keys lexicographically in the result DataFrame
- suffixes : 2-length sequence (tuple, list, ...)
- Suffix to apply to overlapping column names in the left and right side, respectively
- copy : boolean, default True
- If False, do not copy data unnecessarily
>>> A >>> B lkey value rkey value 0 foo 1 0 foo 5 1 bar 2 1 bar 6 2 baz 3 2 qux 7 3 foo 4 3 bar 8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer') lkey value_x rkey value_y 0 bar 2 bar 6 1 bar 2 bar 8 2 baz 3 NaN NaN 3 foo 1 foo 5 4 foo 4 foo 5 5 NaN NaN qux 7
merged : DataFrame
-
merge_results
(others)¶ Merges results of type
CleavageSitePredictionResult
and returns the merged resultParameters: others (list( CleavageSitePredictionResult
) orCleavageSitePredictionResult
) – A (list of)CleavageSitePredictionResult
object(s)Returns: A new merged CleavageSitePredictionResult
objectReturn type: CleavageSitePredictionResult
-
min
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the minimum of the values in the object. If you want the index of the minimum, use
idxmin
. This is the equivalent of thenumpy.ndarray
methodargmin
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
min : Series or DataFrame (if level specified)
-
mod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
mode
(axis=0, numeric_only=False)¶ Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.
- axis : {0, 1, ‘index’, ‘columns’} (default 0)
- 0/’index’ : get mode of each column
- 1/’columns’ : get mode of each row
- numeric_only : boolean, default False
- if True, only apply to numeric columns
modes : DataFrame (sorted)
-
mul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
multiply
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
ndim
¶ Number of axes / array dimensions
-
ne
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ne
-
notnull
()¶ Return a boolean same-sized object indicating if the values are not null
-
pct_change
(periods=1, fill_method='pad', limit=None, freq=None, **kwds)¶ Percent change over given number of periods
- periods : int, default 1
- Periods to shift for forming percent change
- fill_method : str, default ‘pad’
- How to handle NAs before computing percent changes
- limit : int, default None
- The number of consecutive NAs to fill before stopping
- freq : DateOffset, timedelta, or offset alias string, optional
- Increment to use from time series API (e.g. ‘M’ or BDay())
chg : same type as caller
-
pivot
(index=None, columns=None, values=None)¶ Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)
- index : string or object
- Column name to use to make new frame’s index
- columns : string or object
- Column name to use to make new frame’s columns
- values : string or object, optional
- Column name to use for populating new frame’s values
For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods
>>> df foo bar baz 0 one A 1. 1 one B 2. 2 one C 3. 3 two A 4. 4 two B 5. 5 two C 6.
>>> df.pivot('foo', 'bar', 'baz') A B C one 1 2 3 two 4 5 6
>>> df.pivot('foo', 'bar')['baz'] A B C one 1 2 3 two 4 5 6
- pivoted : DataFrame
- If no values column specified, will have hierarchically indexed columns
-
pivot_table
(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)¶ Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on
Keys to group on the x-axis of the pivot table- cols : list of column names or arrays to group on
- Keys to group on the y-axis of the pivot table
- aggfunc : function, default numpy.mean, or list of functions
- If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
- fill_value : scalar, default None
- Value to replace missing values with
- margins : boolean, default False
- Add all row / columns (e.g. for subtotal / grand totals)
- dropna : boolean, default True
- Do not include columns whose entries are all NaN
>>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
>>> table = pivot_table(df, values='D', rows=['A', 'B'], ... cols=['C'], aggfunc=np.sum) >>> table small large foo one 1 4 two 6 NaN bar one 5 4 two 6 7
table : DataFrame
-
plot
(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)¶ Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.
frame : DataFrame x : label or position, default None y : label or position, default None
Allows plotting of one column versus another- subplots : boolean, default False
- Make separate subplots for each time series
- sharex : boolean, default True
- In case subplots=True, share x axis
- sharey : boolean, default False
- In case subplots=True, share y axis
- use_index : boolean, default True
- Use index as ticks for x axis
- stacked : boolean, default False
- If True, create stacked bar plot. Only valid for DataFrame input
- sort_columns: boolean, default False
- Sort column names to determine plot ordering
- title : string
- Title to use for the plot
- grid : boolean, default None (matlab style default)
- Axis grid lines
- legend : boolean, default True
- Place legend on axis subplots
ax : matplotlib axis object, default None style : list or dict
matplotlib line style per column- kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
- bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
- logx : boolean, default False
- For line plots, use log scaling on x axis
- logy : boolean, default False
- For line plots, use log scaling on y axis
- xticks : sequence
- Values to use for the xticks
- yticks : sequence
- Values to use for the yticks
xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None
Rotation for ticks- secondary_y : boolean or sequence, default False
- Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
- mark_right: boolean, default True
- When using a secondary_y axis, should the legend label the axis of the various columns automatically
- colormap : str or matplotlib colormap object, default None
- Colormap to select colors from. If string, load colormap with that name from matplotlib.
- kwds : keywords
- Options to pass to matplotlib plotting method
ax_or_axes : matplotlib.AxesSubplot or list of them
-
pop
(item)¶ Return item and drop from frame. Raise KeyError if not found.
-
pow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator pow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
prod
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
product
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
quantile
(q=0.5, axis=0, numeric_only=True)¶ Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats
- q : quantile, default 0.5 (50% quantile)
- 0 <= q <= 1
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
quantiles : Series
-
query
(expr, **kwargs)¶ Query the columns of a frame with a boolean expression.
- expr : string
- The query string to evaluate. The result of the evaluation of this
expression is first passed to
loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to__getitem__()
. - kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
q : DataFrame or Series
This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
index
andcolumns
attributes of theDataFrame
instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for this variable, and you can also use the name of the index to identify it in a query.For further details and examples see the
query
documentation in indexing.pandas.eval DataFrame.eval
>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.query('a > b') >>> df[df.a > df.b] # same result as the previous expression
-
radd
(other, axis='columns', level=None, fill_value=None)¶ Binary operator radd with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rank
(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)¶ Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values
- axis : {0, 1}, default 0
- Ranks over columns (0) or rows (1)
- numeric_only : boolean, default None
- Include only float, int, boolean data
- method : {‘average’, ‘min’, ‘max’, ‘first’}
- average: average rank of group
- min: lowest rank in group
- max: highest rank in group
- first: ranks assigned in order they appear in the array
- na_option : {‘keep’, ‘top’, ‘bottom’}
- keep: leave NA values where they are
- top: smallest rank if ascending
- bottom: smallest rank if descending
- ascending : boolean, default True
- False for ranks by high (1) to low (N)
ranks : DataFrame
-
rdiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
reindex
(index=None, columns=None, **kwargs)¶ Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index, columns : array-like, optional (can be specified in order, or as
- keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
- limit : int, default None
- Maximum size gap to forward or backward fill
- takeable : boolean, default False
- treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])
reindexed : DataFrame
-
reindex_axis
(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)¶ Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index : array-like, optional
- New labels / index to conform to. Preferably an Index object to avoid duplicating data
axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- limit : int, default None
- Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)
reindex, reindex_like
reindexed : DataFrame
-
reindex_like
(other, method=None, copy=True, limit=None)¶ return an object with matching indicies to myself
other : Object method : string or None copy : boolean, default True limit : int, default None
Maximum size gap to forward or backward fill- Like calling s.reindex(index=other.index, columns=other.columns,
- method=...)
reindexed : same as input
-
rename
(index=None, columns=None, **kwargs)¶ Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
- index, columns : dict-like or function, optional
- Transformation to apply to that axis values
- copy : boolean, default True
- Also copy underlying data
- inplace : boolean, default False
- Whether to return a new DataFrame. If True then value of copy is ignored.
renamed : DataFrame (new object)
-
rename_axis
(mapper, axis=0, copy=True, inplace=False)¶ Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True
Also copy underlying datainplace : boolean, default False
renamed : type of caller
-
reorder_levels
(order, axis=0)¶ Rearrange index levels using input order. May not drop or duplicate levels
- order : list of int or list of str
- List representing new level order. Reference level by number (position) or by key (label).
- axis : int
- Where to reorder levels.
type of caller (new object)
-
replace
(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)¶ Replace values given in ‘to_replace’ with ‘value’.
to_replace : str, regex, list, dict, Series, numeric, or None
str or regex:
- str: string exactly matching to_replace will be replaced with value
- regex: regexs matching to_replace will be replaced with value
list of str, regex, or numeric:
- First, if to_replace and value are both lists, they must be the same length.
- Second, if
regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use. - str and regex rules apply as above.
dict:
- Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
- Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
None:
- This means that the
regex
argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is alsoNone
then this must be a nested dictionary orSeries
.
- This means that the
See the examples section for examples of each of these.
- value : scalar, dict, list, str, regex, default None
- Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
- inplace : boolean, default False
- If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
- limit : int, default None
- Maximum size gap to forward or backward fill
- regex : bool or same types as to_replace, default False
- Whether to interpret to_replace and/or value as regular
expressions. If this is
True
then to_replace must be a string. Otherwise, to_replace must beNone
because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions. - method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
- The method to use when for replacement, when
to_replace
is alist
.
NDFrame.reindex NDFrame.asfreq NDFrame.fillna
filled : NDFrame
- AssertionError
- If regex is not a
bool
and to_replace is notNone
.
- If regex is not a
- TypeError
- If to_replace is a
dict
and value is not alist
,dict
,ndarray
, orSeries
- If to_replace is
None
and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
- If to_replace is a
- ValueError
- If to_replace and value are
list
s orndarray
s, but they are not the same length.
- If to_replace and value are
- Regex substitution is performed under the hood with
re.sub
. The rules for substitution forre.sub
are the same. - Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
- This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
-
resample
(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)¶ Convenience method for frequency conversion and resampling of regular time-series data.
- rule : string
- the offset string or object representing target conversion
- how : string
- method for down- or re-sampling, default to ‘mean’ for downsampling
axis : int, optional, default 0 fill_method : string, default None
fill_method for upsampling- closed : {‘right’, ‘left’}
- Which side of bin interval is closed
- label : {‘right’, ‘left’}
- Which bin edge label to label bucket with
convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta
Adjust the resampled time labels- limit : int, default None
- Maximum size gap to when reindexing with fill_method
- base : int, default 0
- For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
-
reset_index
(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶ For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.
- level : int, str, tuple, or list, default None
- Only remove the given levels from the index. Removes all levels by default
- drop : boolean, default False
- Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- col_level : int or str, default 0
- If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
- col_fill : object, default ‘’
- If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
resetted : DataFrame
-
rfloordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rpow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rsub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rtruediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
save
(path)¶ Deprecated. Use to_pickle instead
-
select
(crit, axis=0)¶ Return data corresponding to axis labels matching criteria
- crit : function
- To be called on each index (label). Should return True or False
axis : int
selection : type of caller
-
set_index
(keys, drop=True, append=False, inplace=False, verify_integrity=False)¶ Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.
keys : column label or list of column labels / arrays drop : boolean, default True
Delete columns to be used as the new index- append : boolean, default False
- Whether to append columns to existing index
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- verify_integrity : boolean, default False
- Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B']) >>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]]) >>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])
dataframe : DataFrame
-
set_value
(index, col, value)¶ Put single value at passed column and index
index : row label col : column label value : scalar value
- frame : DataFrame
- If label pair is contained, will be reference to calling DataFrame, otherwise a new object
-
shape
¶
-
shift
(periods=1, freq=None, axis=0, **kwds)¶ Shift index by desired number of periods with an optional time freq
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, optional
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
If freq is specified then the index values are shifted but the data if not realigned
shifted : same type as caller
-
skew
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased skew over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
skew : Series or DataFrame (if level specified)
-
sort
(columns=None, column=None, axis=0, ascending=True, inplace=False)¶ Sort DataFrame either by labels (along either axis) or by the values in column(s)
- columns : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- axis : {0, 1}
- Sort index/rows versus columns
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])
sorted : DataFrame
-
sort_index
(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')¶ Sort DataFrame either by labels (along either axis) or by the values in a column
- axis : {0, 1}
- Sort index/rows versus columns
- by : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])
sorted : DataFrame
-
sortlevel
(level=0, axis=0, ascending=True, inplace=False)¶ Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)
level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False
Sort the DataFrame without creating a new instancesorted : DataFrame
-
squeeze
()¶ squeeze length 1 dimensions
-
stack
(level=-1, dropna=True)¶ Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.
- level : int, string, or list of these, default last level
- Level(s) to stack, can pass level name
- dropna : boolean, default True
- Whether to drop rows in the resulting Frame/Series with no valid values
>>> s a b one 1. 2. two 3. 4.
>>> s.stack() one a 1 b 2 two a 3 b 4
stacked : DataFrame or Series
-
std
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased standard deviation over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
stdev : Series or DataFrame (if level specified)
-
sub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
subtract
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
sum
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the sum of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
sum : Series or DataFrame (if level specified)
-
swapaxes
(axis1, axis2, copy=True)¶ Interchange axes and swap values axes appropriately
y : same as input
-
swaplevel
(i, j, axis=0)¶ Swap levels i and j in a MultiIndex on a particular axis
- i, j : int, string (can be mixed)
- Level of index to be swapped. Can pass level name as string.
swapped : type of caller (new object)
-
tail
(n=5)¶ Returns last n rows
-
take
(indices, axis=0, convert=True, is_copy=True)¶ Analogous to ndarray.take
indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy
taken : type of caller
-
to_clipboard
(excel=None, sep=None, **kwargs)¶ Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.
- excel : boolean, defaults to True
- if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard
sep : optional, defaults to tab other keywords are passed to to_csv
- Requirements for your platform
- Linux: xclip, or xsel (with gtk or PyQt4 modules)
- Windows: none
- OS X: none
-
to_csv
(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)¶ Write DataFrame to a comma-separated values (csv) file
- path_or_buf : string or file handle / StringIO
- File path
- sep : character, default ”,”
- Field delimiter for the output file.
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, or False, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
- nanRep : None
- deprecated, use na_rep
- mode : str
- Python write mode, default ‘w’
- encoding : string, optional
- a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
- line_terminator : string, default ‘\\n’
- The newline character or character sequence to use in the output file
- quoting : optional constant from csv module
- defaults to csv.QUOTE_MINIMAL
- chunksize : int or None
- rows to write at a time
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- date_format : string, default None
- Format string for datetime objects.
-
to_dense
()¶ Return dense representation of NDFrame (as opposed to sparse)
-
to_dict
(outtype='dict')¶ Convert DataFrame to dictionary.
- outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
- Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.
result : dict like {column -> {index -> value}}
-
to_excel
(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)¶ Write DataFrame to a excel sheet
- excel_writer : string or ExcelWriter object
- File path or existing ExcelWriter
- sheet_name : string, default ‘Sheet1’
- Name of sheet which will contain DataFrame
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startow :
- upper left cell row to dump data frame
- startcol :
- upper left cell column to dump data frame
- engine : string, default None
- write engine to use - you can also set this via the options
io.excel.xlsx.writer
,io.excel.xls.writer
, andio.excel.xlsm.writer
. - merge_cells : boolean, default True
- Write MultiIndex and Hierarchical Rows as merged cells.
If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:
>>> writer = ExcelWriter('output.xlsx') >>> df1.to_excel(writer,'Sheet1') >>> df2.to_excel(writer,'Sheet2') >>> writer.save()
-
to_gbq
(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)¶ Write a DataFrame to a Google BigQuery table.
If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.
- destination_table : string
- name of table to be written, in the form ‘dataset.tablename’
- schema : sequence (optional)
- list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
- col_order : sequence (optional)
- order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
- if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
kwargs are passed to the Client constructor
- SchemaMissing :
- Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
- TableExists :
- Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
- InvalidSchema :
- Raised if the ‘schema’ parameter does not match the provided DataFrame
-
to_hdf
(path_or_buf, key, **kwargs)¶ activate the HDFStore
path_or_buf : the path (string) or buffer to put the store key : string
indentifier for the group in the storemode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’
'r'
- Read-only; no data can be modified.
'w'
- Write; a new file is created (an existing file with the same name would be deleted).
'a'
- Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
- It is similar to
'a'
, but the file must already exist.
- format : ‘fixed(f)|table(t)’, default is ‘fixed’
- fixed(f) : Fixed format
- Fast writing/reading. Not-appendable, nor searchable
- table(t) : Table format
- Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
- append : boolean, default False
- For Table formats, append the input data to the existing
- complevel : int, 1-9, default 0
- If a complib is specified compression will be applied where possible
- complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
- If complevel is > 0 apply compression to objects written in the store wherever possible
- fletcher32 : bool, default False
- If applying compression use the fletcher32 checksum
-
to_html
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame as an HTML table.
to_html-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- classes : str or list or tuple, default None
- CSS class(es) to apply to the resulting html table
- escape : boolean, default True
- Convert the characters <, >, and & to HTML-safe sequences.=
- max_rows : int, optional
- Maximum number of rows to show before truncating. If None, show all.
- max_cols : int, optional
- Maximum number of columns to show before truncating. If None, show all.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_json
(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)¶ Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.
- path_or_buf : the path or buffer to write the result string
- if this is None, return a StringIO of the converted string
orient : string
- Series
- default is ‘index’
- allowed values are: {‘split’,’records’,’index’}
- DataFrame
- default is ‘columns’
- allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
- The format of the JSON string
- split : dict like {index -> [index], columns -> [columns], data -> [values]}
- records : list like [{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
- columns : dict like {column -> {index -> value}}
- values : just the values array
- date_format : {‘epoch’, ‘iso’}
- Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
- double_precision : The number of decimal places to use when encoding
- floating point values, default 10.
force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.- default_handler : callable, default None
- Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
same type as input object with filtered info axis
-
to_latex
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)¶ Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.
to_latex-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_msgpack
(path_or_buf=None, **kwargs)¶ msgpack (serialize) object to input file path
THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.
- path : string File path, buffer-like, or None
- if None, return generated string
- append : boolean whether to append to an existing msgpack
- (default is False)
- compress : type of compressor (zlib or blosc), default to None (no
- compression)
-
to_panel
()¶ Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.
Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later
panel : Panel
-
to_period
(freq=None, axis=0, copy=True)¶ Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)
freq : string, default axis : {0, 1}, default 0
The axis to convert (the index by default)- copy : boolean, default True
- If False then underlying input data is not copied
ts : TimeSeries with PeriodIndex
-
to_pickle
(path)¶ Pickle (serialize) object to input file path
- path : string
- File path
-
to_records
(index=True, convert_datetime64=True)¶ Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested
- index : boolean, default True
- Include index in resulting record array, stored in ‘index’ field
- convert_datetime64 : boolean, default True
- Whether to convert the index to datetime.datetime if it is a DatetimeIndex
y : recarray
-
to_sparse
(fill_value=None, kind='block')¶ Convert to SparseDataFrame
fill_value : float, default NaN kind : {‘block’, ‘integer’}
y : SparseDataFrame
-
to_sql
(name, con, flavor='sqlite', if_exists='fail', **kwargs)¶ Write records stored in a DataFrame to a SQL database.
- name : str
- Name of SQL table
conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
-
to_stata
(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)¶ A class for writing Stata binary dta files from array-like objects
- fname : file path or buffer
- Where to save the dta file.
- convert_dates : dict
- Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
- encoding : str
- Default is latin-1. Note that Stata does not support unicode.
- byteorder : str
- Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data) >>> writer.write_file()
Or with dates
>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'}) >>> writer.write_file()
-
to_string
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame to a console-friendly tabular output.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_timestamp
(freq=None, how='start', axis=0, copy=True)¶ Cast to DatetimeIndex of timestamps, at beginning of period
- freq : string, default frequency of PeriodIndex
- Desired frequency
- how : {‘s’, ‘e’, ‘start’, ‘end’}
- Convention for converting period to timestamp; start of period vs. end
- axis : {0, 1} default 0
- The axis to convert (the index by default)
- copy : boolean, default True
- If false then underlying input data is not copied
df : DataFrame with DatetimeIndex
-
to_wide
(*args, **kwargs)¶
-
transpose
()¶ Transpose index and columns
-
truediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
truncate
(before=None, after=None, axis=None, copy=True)¶ Truncates a sorted NDFrame before and/or after some particular dates.
- before : date
- Truncate before date
- after : date
- Truncate after date
axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,
return a copy of the truncated sectiontruncated : type of caller
-
tshift
(periods=1, freq=None, axis=0, **kwds)¶ Shift the time index, using the index’s frequency if available
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, default None
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
- axis : int or basestring
- Corresponds to the axis that contains the Index
If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown
shifted : NDFrame
-
tz_convert
(tz, axis=0, copy=True)¶ Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data
-
tz_localize
(tz, axis=0, copy=True, infer_dst=False)¶ Localize tz-naive TimeSeries to target time zone
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data- infer_dst : boolean, default False
- Attempt to infer fall dst-transition times based on order
-
unstack
(level=-1)¶ Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)
- level : int, string, or list of these, default -1 (last level)
- Level(s) of index to unstack, can pass level name
DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation
from unstack).>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ... ('two', 'a'), ('two', 'b')]) >>> s = pd.Series(np.arange(1.0, 5.0), index=index) >>> s one a 1 b 2 two a 3 b 4 dtype: float64
>>> s.unstack(level=-1) a b one 1 2 two 3 4
>>> s.unstack(level=0) one two a 1 3 b 2 4
>>> df = s.unstack(level=0) >>> df.unstack() one a 1. b 3. two a 2. b 4.
unstacked : DataFrame or Series
-
update
(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)¶ Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices
other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame- filter_func : callable(1d-array) -> 1d-array<boolean>, default None
- Can choose to replace values other than NA. Return True for values that should be updated
- raise_conflict : boolean
- If True, will raise an error if the DataFrame and other both contain data in the same place.
-
values
¶ Numpy representation of NDFrame
-
var
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased variance over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
variance : Series or DataFrame (if level specified)
-
where
(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)¶ Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False
Whether to perform the operation in place on the dataaxis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False
try to cast the result back to the input type (if possible),- raise_on_error : boolean, default True
- Whether to raise on invalid data types (e.g. trying to where on strings)
wh : same type as caller
-
xs
(key, axis=0, level=None, copy=True, drop_level=True)¶ Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).
- key : object
- Some label contained in the index, or partially in a MultiIndex
- axis : int, default 0
- Axis to retrieve cross-section on
- level : object, defaults to first n levels (n=1 or len(key))
- In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
- copy : boolean, default True
- Whether to make a copy of the data
- drop_level : boolean, default True
- If False, returns object with same levels as self.
>>> df A B C a 4 5 2 b 4 0 9 c 9 7 3 >>> df.xs('a') A 4 B 5 C 2 Name: a >>> df.xs('C', axis=1) a 2 b 9 c 3 Name: C >>> s = df.xs('a', copy=False) >>> s['A'] = 100 >>> df A B C a 100 5 2 b 4 0 9 c 9 7 3
>>> df A B C D first second third bar one 1 4 1 8 9 two 1 7 5 5 0 baz one 1 6 6 8 0 three 2 5 3 5 3 >>> df.xs(('baz', 'three')) A B C D third 2 5 3 5 3 >>> df.xs('one', level=1) A B C D first third bar 1 4 1 8 9 baz 1 6 6 8 0 >>> df.xs(('baz', 2), level=[0, 'third']) A B C D second three 5 3 5 3
xs : Series or DataFrame
-
-
class
Fred2.Core.Result.
Distance2SelfResult
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
Fred2.Core.Result.AResult
Distance2Self prediction result
-
T
¶ Transpose index and columns
-
abs
()¶ Return an object with absolute value taken. Only applicable to objects that are all numeric
abs: type of caller
-
add
(other, axis='columns', level=None, fill_value=None)¶ Binary operator add with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
add_prefix
(prefix)¶ Concatenate prefix string with panel items names.
prefix : string
with_prefix : type of caller
-
add_suffix
(suffix)¶ Concatenate suffix string with panel items names
suffix : string
with_suffix : type of caller
-
align
(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)¶ Align two object on their axes with the specified join method for each axis Index
other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None
Align on index (0), columns (1), or both (None)- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- copy : boolean, default True
- Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
method : str, default None limit : int, default None fill_axis : {0, 1}, default 0
Filling axis, method and limit- (left, right) : (type of input, type of other)
- Aligned objects
-
all
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
any
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any element is True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
append
(other, ignore_index=False, verify_integrity=False)¶ Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.
other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False
If True do not use the index labels. Useful for gluing together record arrays- verify_integrity : boolean, default False
- If True, raise ValueError on creating index with duplicates
If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged
appended : DataFrame
-
apply
(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)¶ Applies function along input axis of DataFrame.
Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.
- func : function
- Function to apply to each column/row
- axis : {0, 1}
- 0 : apply function to each column
- 1 : apply function to each row
- broadcast : boolean, default False
- For aggregation functions, return object of same size with values propagated
- reduce : boolean or None, default None
- Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
- raw : boolean, default False
- If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
- args : tuple
- Positional arguments to pass to function in addition to the array/series
Additional keyword arguments will be passed as keywords to the function
>>> df.apply(numpy.sqrt) # returns DataFrame >>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0) >>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)
DataFrame.applymap: For elementwise operations
applied : Series or DataFrame
-
applymap
(func)¶ Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
- func : function
- Python function, returns a single value from a single value
applied : DataFrame
DataFrame.apply : For operations on rows/columns
-
as_blocks
(columns=None)¶ Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.
are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
- as_matrix)
- columns : array-like
- Specific column order
values : a list of Object
-
as_matrix
(columns=None)¶ Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtype will be a lower-common-denominator dtype (implicit
upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks
- e.g. if the dtypes are float16,float32 -> float32
- float16,float32,float64 -> float64 int32,uint8 -> int32
- values : ndarray
- If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
-
asfreq
(freq, method=None, how=None, normalize=False)¶ Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.
freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method- how : {‘start’, ‘end’}, default end
- For PeriodIndex only, see PeriodIndex.asfreq
- normalize : bool, default False
- Whether to reset output index to midnight
converted : type of caller
-
astype
(dtype, copy=True, raise_on_error=True)¶ Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)
dtype : numpy.dtype or Python type raise_on_error : raise on invalid input
casted : type of caller
-
at
¶
-
at_time
(time, asof=False)¶ Select values at particular time of day (e.g. 9:30AM)
time : datetime.time or string
values_at_time : type of caller
-
axes
¶
-
between_time
(start_time, end_time, include_start=True, include_end=True)¶ Select values between particular times of the day (e.g., 9:00-9:30 AM)
start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True
values_between_time : type of caller
-
bfill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’bfill’)
-
blocks
¶ Internal property, property synonym for as_blocks()
-
bool
()¶ Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False
Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean
-
boxplot
(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)¶ Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns
data : DataFrame column : column names or list of names, or vector
Can be any valid input to groupby- by : string or sequence
- Column in the DataFrame to group by
ax : matplotlib axis object, default None fontsize : int or string rot : int, default None
Rotation for ticks- grid : boolean, default None (matlab style default)
- Axis grid lines
ax : matplotlib.axes.AxesSubplot
-
clip
(lower=None, upper=None, out=None)¶ Trim values at input threshold(s)
lower : float, default None upper : float, default None
clipped : Series
-
clip_lower
(threshold)¶ Return copy of the input with values below given value truncated
clip
clipped : same type as input
-
clip_upper
(threshold)¶ Return copy of input with values above given value truncated
clip
clipped : same type as input
-
combine
(other, func, fill_value=None, overwrite=True)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True
If True then overwrite values for common keys in the calling frameresult : DataFrame
-
combineAdd
(other)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combineMult
(other)¶ Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combine_first
(other)¶ Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns
other : DataFrame
a’s values prioritized, use values from b to fill holes:
>>> a.combine_first(b)
combined : DataFrame
-
compound
(axis=None, skipna=None, level=None, **kwargs)¶ Return the compound percentage of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
compounded : Series or DataFrame (if level specified)
-
consolidate
(inplace=False)¶ Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user
- inplace : boolean, default False
- If False return new object, otherwise modify existing object
consolidated : type of caller
-
convert_objects
(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)¶ Attempt to infer better dtype for object columns
- convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
- convert_numeric : if True attempt to coerce to numbers (including
- strings), non-convertibles get NaN
- convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
copy : Boolean, if True, return copy, default is True
converted : asm as input object
-
copy
(deep=True)¶ Make a copy of this object
- deep : boolean, default True
- Make a deep copy, i.e. also copy data
copy : type of caller
-
corr
(method='pearson', min_periods=1)¶ Compute pairwise correlation of columns, excluding NA/null values
- method : {‘pearson’, ‘kendall’, ‘spearman’}
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation
y : DataFrame
-
corrwith
(other, axis=0, drop=False)¶ Compute pairwise correlation between rows or columns of two DataFrame objects.
other : DataFrame axis : {0, 1}
0 to compute column-wise, 1 for row-wise- drop : boolean, default False
- Drop missing indices from result, default returns union of all
correls : Series
-
count
(axis=0, level=None, numeric_only=False)¶ Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- numeric_only : boolean, default False
- Include only float, int, boolean data
count : Series (or DataFrame if level specified)
-
cov
(min_periods=None)¶ Compute pairwise covariance of columns, excluding NA/null values
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result.
y : DataFrame
y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).
-
cummax
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative max over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmax : Series
-
cummin
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative min over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmin : Series
-
cumprod
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative prod over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAprod : Series
-
cumsum
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative sum over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAsum : Series
-
delevel
(*args, **kwargs)¶
-
describe
(percentile_width=50)¶ Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles
- percentile_width : float, optional
- width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75
DataFrame of summary statistics
-
diff
(periods=1)¶ 1st discrete difference of object
- periods : int, default 1
- Periods to shift for forming difference
diffed : DataFrame
-
div
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
divide
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
dot
(other)¶ Matrix multiplication with DataFrame or Series objects
other : DataFrame or Series
dot_product : DataFrame or Series
-
drop
(labels, axis=0, level=None, inplace=False, **kwargs)¶ Return new object with labels in requested axis removed
labels : single label or list-like axis : int or axis name level : int or name, default None
For MultiIndex- inplace : bool, default False
- If True, do operation inplace and return None.
dropped : type of caller
-
drop_duplicates
(cols=None, take_last=False, inplace=False)¶ Return DataFrame with duplicate rows removed, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
- inplace : boolean, default False
- Whether to drop duplicates in place or to return a copy
deduplicated : DataFrame
-
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Return object with labels on given axis omitted where alternately any or all of the data are missing
- axis : {0, 1}, or tuple/list thereof
- Pass tuple or list to drop on multiple axes
- how : {‘any’, ‘all’}
- any : if any NA values are present, drop that label
- all : if all values are NA, drop that label
- thresh : int, default None
- int value : require that many non-NA values
- subset : array-like
- Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
- inplace : boolean, defalt False
- If True, do operation inplace and return None.
dropped : DataFrame
-
dtypes
¶ Return the dtypes in this object
-
duplicated
(cols=None, take_last=False)¶ Return boolean Series denoting duplicate rows, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
duplicated : Series
-
empty
¶ True if NDFrame is entirely empty [no items]
-
eq
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods eq
-
equals
(other)¶ Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
-
eval
(expr, **kwargs)¶ Evaluate an expression in the context of the calling DataFrame instance.
- expr : string
- The expression string to evaluate.
- kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
ret : ndarray, scalar, or pandas object
pandas.DataFrame.query pandas.eval
For more details see the API documentation for
eval()
. For detailed examples see enhancing performance with eval.>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.eval('a + b') >>> df.eval('c=a + b')
-
ffill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’ffill’)
-
fillna
(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)¶ Fill NA/NaN values using the specified method
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- value : scalar, dict, or Series
- Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- inplace : boolean, default False
- If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
- limit : int, default None
- Maximum size gap to forward or backward fill
- downcast : dict, default is None
- a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)
reindex, asfreq
filled : same type as caller
-
filter
(items=None, like=None, regex=None, axis=None)¶ Restrict the info axis to set of items or wildcard
- items : list-like
- List of info axis to restrict to (must not all be present)
- like : string
- Keep info axis where “arg in col == True”
- regex : string (regular expression)
- Keep info axis with re.search(regex, col) == True
Arguments are mutually exclusive, but this is not checked for
-
filter_result
(expressions)¶
-
first
(offset)¶ Convenience method for subsetting initial periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘10D’) -> First 10 days
subset : type of caller
-
first_valid_index
()¶ Return label for first non-NA/null value
-
floordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
from_csv
(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)¶ Read delimited file into DataFrame
path : string file path or file handle / StringIO header : int, default 0
Row to use at header (skip prior rows)- sep : string, default ‘,’
- Field delimiter
- index_col : int or sequence, default 0
- Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
- parse_dates : boolean, default True
- Parse dates. Different default from read_table
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- infer_datetime_format: boolean, default False
- If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.
Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data
y : DataFrame
-
from_dict
(data, orient='columns', dtype=None)¶ Construct DataFrame from dict of array-like or dicts
- data : dict
- {field : array-like} or {field : dict}
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.
DataFrame
-
from_items
(items, columns=None, orient='columns')¶ Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.
- items : sequence of (key, value) pairs
- Values should be arrays or Series.
- columns : sequence of column labels, optional
- Must be passed if orient=’index’.
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.
frame : DataFrame
-
from_records
(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)¶ Convert structured or record ndarray to DataFrame
data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like
Field of array to use as the index, alternately a specific set of input labels to use- exclude : sequence, default None
- Columns or fields to exclude
- columns : sequence, default None
- Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
- coerce_float : boolean, default False
- Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets
df : DataFrame
-
ftypes
¶ Return the ftypes (indication of sparse/dense and dtype) in this object.
-
ge
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ge
-
get
(key, default=None)¶ Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found
key : object
value : type of items contained in object
-
get_dtype_counts
()¶ Return the counts of dtypes in this object
-
get_ftype_counts
()¶ Return the counts of ftypes in this object
-
get_value
(index, col)¶ Quickly retrieve single value at passed column and index
index : row label col : column label
value : scalar value
-
get_values
()¶ same as values (but handles sparseness conversions)
-
groupby
(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)¶ Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
- by : mapping function / list of functions, dict, Series, or tuple /
- list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
axis : int, default 0 level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels- as_index : boolean, default True
- For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
- sort : boolean, default True
- Sort group keys. Get better performance by turning this off
- group_keys : boolean, default True
- When calling apply, add group keys to index to identify pieces
- squeeze : boolean, default False
- reduce the dimensionaility of the return type if possible, otherwise return a consistent type
# DataFrame result >>> data.groupby(func, axis=0).mean()
# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()
# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()
GroupBy object
-
gt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods gt
-
head
(n=5)¶ Returns first n rows
-
hist
(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)¶ Draw histogram of the DataFrame’s series using matplotlib / pylab.
data : DataFrame column : string or sequence
If passed, will be used to limit data to a subset of columns- by : object, optional
- If passed, then used to form histograms for separate groups
- grid : boolean, default True
- Whether to show axis grid lines
- xlabelsize : int, default None
- If specified changes the x-axis label size
- xrot : float, default None
- rotation of x axis labels
- ylabelsize : int, default None
- If specified changes the y-axis label size
- yrot : float, default None
- rotation of y axis labels
ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple
The size of the figure to create in inches by defaultlayout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments
To be passed to hist function
-
iat
¶
-
icol
(i)¶
-
idxmax
(axis=0, skipna=True)¶ Return index of first occurrence of maximum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be first index.
idxmax : Series
This method is the DataFrame version of
ndarray.argmax
.Series.idxmax
-
idxmin
(axis=0, skipna=True)¶ Return index of first occurrence of minimum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
idxmin : Series
This method is the DataFrame version of
ndarray.argmin
.Series.idxmin
-
iget_value
(i, j)¶
-
iloc
¶
-
info
(verbose=True, buf=None, max_cols=None)¶ Concise summary of a DataFrame.
- verbose : boolean, default True
- If False, don’t print column count summary
buf : writable buffer, defaults to sys.stdout max_cols : int, default None
Determines whether full summary or short summary is printed
-
insert
(loc, column, value, allow_duplicates=False)¶ Insert column into DataFrame at specified location.
If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.
- loc : int
- Must have 0 <= loc <= len(columns)
column : object value : int, Series, or array-like
-
interpolate
(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)¶ Interpolate values according to different methods.
- method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
- ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
- ‘linear’: ignore the index and treat the values as equally spaced. default
- ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
- ‘index’: use the actual numerical values of the index
- ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
- ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- limit : int, default None.
- Maximum number of consecutive NaNs to fill.
- inplace : bool, default False
- Update the NDFrame in place if possible.
- downcast : optional, ‘infer’ or None, defaults to ‘infer’
- Downcast dtypes if possible.
Series or DataFrame of same shape interpolated at the NaNs
reindex, replace, fillna
# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64
-
irow
(i, copy=False)¶
-
is_copy
= None¶
-
isin
(values)¶ Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
- values : iterable, Series, DataFrame or dictionary
- The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
DataFrame of booleans
When
values
is a list:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> df.isin([1, 3, 12, 'a']) A B 0 True True 1 False False 2 True False
When
values
is a dict:>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]}) >>> df.isin({'A': [1, 3], 'B': [4, 7, 12]}) A B 0 True False # Note that B didn't match the 1 here. 1 False True 2 True True
When
values
is a Series or DataFrame:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']}) >>> df.isin(other) A B 0 True False 1 False False # Column A in `other` has a 3, but not at index 1. 2 True True
-
isnull
()¶ Return a boolean same-sized object indicating if the values are null
-
iteritems
()¶ Iterator over (column, series) pairs
-
iterkv
(*args, **kwargs)¶ iteritems alias used to get around 2to3. Deprecated
-
iterrows
()¶ Iterate over rows of DataFrame as (index, Series) pairs.
iterrows
does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,>>> df = DataFrame([[1, 1.0]], columns=['x', 'y']) >>> row = next(df.iterrows())[1] >>> print(row['x'].dtype) float64 >>> print(df['x'].dtype) int64
- it : generator
- A generator that iterates over the rows of the frame.
-
itertuples
(index=True)¶ Iterate over rows of DataFrame as tuples, with index value as first element of the tuple
-
ix
¶
-
join
(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)¶ Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
- other : DataFrame, Series with name field set, or list of DataFrame
- Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
- on : column name, tuple/list of column names, or array-like
- Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
- how : {‘left’, ‘right’, ‘outer’, ‘inner’}
How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise
- left: use calling frame’s index
- right: use input frame’s index
- outer: form union of indexes
- inner: use intersection of indexes
- lsuffix : string
- Suffix to use from left frame’s overlapping columns
- rsuffix : string
- Suffix to use from right frame’s overlapping columns
- sort : boolean, default False
- Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame
on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects
joined : DataFrame
-
keys
()¶ Get the ‘info axis’ (see Indexing for more)
This is index for Series, columns for DataFrame and major_axis for Panel.
-
kurt
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
kurtosis
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
last
(offset)¶ Convenience method for subsetting final periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘5M’) -> Last 5 months
subset : type of caller
-
last_valid_index
()¶ Return label for last non-NA/null value
-
le
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods le
-
load
(path)¶ Deprecated. Use read_pickle instead.
-
loc
¶
-
lookup
(row_labels, col_labels)¶ Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.
- row_labels : sequence
- The row labels to use for lookup
- col_labels : sequence
- The column labels to use for lookup
Akin to:
result = [] for row, col in zip(row_labels, col_labels): result.append(df.get_value(row, col))
- values : ndarray
- The found values
-
lt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods lt
-
mad
(axis=None, skipna=None, level=None, **kwargs)¶ Return the mean absolute deviation of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mad : Series or DataFrame (if level specified)
-
mask
(cond)¶ Returns copy whose values are replaced with nan if the inverted condition is True
cond : boolean NDFrame or array
wh: same as input
-
max
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the maximum of the values in the object. If you want the index of the maximum, use
idxmax
. This is the equivalent of thenumpy.ndarray
methodargmax
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
max : Series or DataFrame (if level specified)
-
mean
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mean : Series or DataFrame (if level specified)
-
median
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the median of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
median : Series or DataFrame (if level specified)
-
merge
(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)¶ Merge DataFrame objects by performing a database-style join operation by columns or indexes.
If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
- left: use only keys from left frame (SQL: left outer join)
- right: use only keys from right frame (SQL: right outer join)
- outer: use union of keys from both frames (SQL: full outer join)
- inner: use intersection of keys from both frames (SQL: inner join)
- on : label or list
- Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
- left_on : label or list, or array-like
- Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
- right_on : label or list, or array-like
- Field names to join on in right DataFrame or vector/list of vectors per left_on docs
- left_index : boolean, default False
- Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
- right_index : boolean, default False
- Use the index from the right DataFrame as the join key. Same caveats as left_index
- sort : boolean, default False
- Sort the join keys lexicographically in the result DataFrame
- suffixes : 2-length sequence (tuple, list, ...)
- Suffix to apply to overlapping column names in the left and right side, respectively
- copy : boolean, default True
- If False, do not copy data unnecessarily
>>> A >>> B lkey value rkey value 0 foo 1 0 foo 5 1 bar 2 1 bar 6 2 baz 3 2 qux 7 3 foo 4 3 bar 8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer') lkey value_x rkey value_y 0 bar 2 bar 6 1 bar 2 bar 8 2 baz 3 NaN NaN 3 foo 1 foo 5 4 foo 4 foo 5 5 NaN NaN qux 7
merged : DataFrame
-
merge_results
(others)¶
-
min
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the minimum of the values in the object. If you want the index of the minimum, use
idxmin
. This is the equivalent of thenumpy.ndarray
methodargmin
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
min : Series or DataFrame (if level specified)
-
mod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
mode
(axis=0, numeric_only=False)¶ Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.
- axis : {0, 1, ‘index’, ‘columns’} (default 0)
- 0/’index’ : get mode of each column
- 1/’columns’ : get mode of each row
- numeric_only : boolean, default False
- if True, only apply to numeric columns
modes : DataFrame (sorted)
-
mul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
multiply
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
ndim
¶ Number of axes / array dimensions
-
ne
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ne
-
notnull
()¶ Return a boolean same-sized object indicating if the values are not null
-
pct_change
(periods=1, fill_method='pad', limit=None, freq=None, **kwds)¶ Percent change over given number of periods
- periods : int, default 1
- Periods to shift for forming percent change
- fill_method : str, default ‘pad’
- How to handle NAs before computing percent changes
- limit : int, default None
- The number of consecutive NAs to fill before stopping
- freq : DateOffset, timedelta, or offset alias string, optional
- Increment to use from time series API (e.g. ‘M’ or BDay())
chg : same type as caller
-
pivot
(index=None, columns=None, values=None)¶ Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)
- index : string or object
- Column name to use to make new frame’s index
- columns : string or object
- Column name to use to make new frame’s columns
- values : string or object, optional
- Column name to use for populating new frame’s values
For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods
>>> df foo bar baz 0 one A 1. 1 one B 2. 2 one C 3. 3 two A 4. 4 two B 5. 5 two C 6.
>>> df.pivot('foo', 'bar', 'baz') A B C one 1 2 3 two 4 5 6
>>> df.pivot('foo', 'bar')['baz'] A B C one 1 2 3 two 4 5 6
- pivoted : DataFrame
- If no values column specified, will have hierarchically indexed columns
-
pivot_table
(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)¶ Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on
Keys to group on the x-axis of the pivot table- cols : list of column names or arrays to group on
- Keys to group on the y-axis of the pivot table
- aggfunc : function, default numpy.mean, or list of functions
- If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
- fill_value : scalar, default None
- Value to replace missing values with
- margins : boolean, default False
- Add all row / columns (e.g. for subtotal / grand totals)
- dropna : boolean, default True
- Do not include columns whose entries are all NaN
>>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
>>> table = pivot_table(df, values='D', rows=['A', 'B'], ... cols=['C'], aggfunc=np.sum) >>> table small large foo one 1 4 two 6 NaN bar one 5 4 two 6 7
table : DataFrame
-
plot
(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)¶ Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.
frame : DataFrame x : label or position, default None y : label or position, default None
Allows plotting of one column versus another- subplots : boolean, default False
- Make separate subplots for each time series
- sharex : boolean, default True
- In case subplots=True, share x axis
- sharey : boolean, default False
- In case subplots=True, share y axis
- use_index : boolean, default True
- Use index as ticks for x axis
- stacked : boolean, default False
- If True, create stacked bar plot. Only valid for DataFrame input
- sort_columns: boolean, default False
- Sort column names to determine plot ordering
- title : string
- Title to use for the plot
- grid : boolean, default None (matlab style default)
- Axis grid lines
- legend : boolean, default True
- Place legend on axis subplots
ax : matplotlib axis object, default None style : list or dict
matplotlib line style per column- kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
- bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
- logx : boolean, default False
- For line plots, use log scaling on x axis
- logy : boolean, default False
- For line plots, use log scaling on y axis
- xticks : sequence
- Values to use for the xticks
- yticks : sequence
- Values to use for the yticks
xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None
Rotation for ticks- secondary_y : boolean or sequence, default False
- Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
- mark_right: boolean, default True
- When using a secondary_y axis, should the legend label the axis of the various columns automatically
- colormap : str or matplotlib colormap object, default None
- Colormap to select colors from. If string, load colormap with that name from matplotlib.
- kwds : keywords
- Options to pass to matplotlib plotting method
ax_or_axes : matplotlib.AxesSubplot or list of them
-
pop
(item)¶ Return item and drop from frame. Raise KeyError if not found.
-
pow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator pow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
prod
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
product
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
quantile
(q=0.5, axis=0, numeric_only=True)¶ Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats
- q : quantile, default 0.5 (50% quantile)
- 0 <= q <= 1
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
quantiles : Series
-
query
(expr, **kwargs)¶ Query the columns of a frame with a boolean expression.
- expr : string
- The query string to evaluate. The result of the evaluation of this
expression is first passed to
loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to__getitem__()
. - kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
q : DataFrame or Series
This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
index
andcolumns
attributes of theDataFrame
instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for this variable, and you can also use the name of the index to identify it in a query.For further details and examples see the
query
documentation in indexing.pandas.eval DataFrame.eval
>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.query('a > b') >>> df[df.a > df.b] # same result as the previous expression
-
radd
(other, axis='columns', level=None, fill_value=None)¶ Binary operator radd with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rank
(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)¶ Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values
- axis : {0, 1}, default 0
- Ranks over columns (0) or rows (1)
- numeric_only : boolean, default None
- Include only float, int, boolean data
- method : {‘average’, ‘min’, ‘max’, ‘first’}
- average: average rank of group
- min: lowest rank in group
- max: highest rank in group
- first: ranks assigned in order they appear in the array
- na_option : {‘keep’, ‘top’, ‘bottom’}
- keep: leave NA values where they are
- top: smallest rank if ascending
- bottom: smallest rank if descending
- ascending : boolean, default True
- False for ranks by high (1) to low (N)
ranks : DataFrame
-
rdiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
reindex
(index=None, columns=None, **kwargs)¶ Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index, columns : array-like, optional (can be specified in order, or as
- keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
- limit : int, default None
- Maximum size gap to forward or backward fill
- takeable : boolean, default False
- treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])
reindexed : DataFrame
-
reindex_axis
(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)¶ Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index : array-like, optional
- New labels / index to conform to. Preferably an Index object to avoid duplicating data
axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- limit : int, default None
- Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)
reindex, reindex_like
reindexed : DataFrame
-
reindex_like
(other, method=None, copy=True, limit=None)¶ return an object with matching indicies to myself
other : Object method : string or None copy : boolean, default True limit : int, default None
Maximum size gap to forward or backward fill- Like calling s.reindex(index=other.index, columns=other.columns,
- method=...)
reindexed : same as input
-
rename
(index=None, columns=None, **kwargs)¶ Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
- index, columns : dict-like or function, optional
- Transformation to apply to that axis values
- copy : boolean, default True
- Also copy underlying data
- inplace : boolean, default False
- Whether to return a new DataFrame. If True then value of copy is ignored.
renamed : DataFrame (new object)
-
rename_axis
(mapper, axis=0, copy=True, inplace=False)¶ Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True
Also copy underlying datainplace : boolean, default False
renamed : type of caller
-
reorder_levels
(order, axis=0)¶ Rearrange index levels using input order. May not drop or duplicate levels
- order : list of int or list of str
- List representing new level order. Reference level by number (position) or by key (label).
- axis : int
- Where to reorder levels.
type of caller (new object)
-
replace
(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)¶ Replace values given in ‘to_replace’ with ‘value’.
to_replace : str, regex, list, dict, Series, numeric, or None
str or regex:
- str: string exactly matching to_replace will be replaced with value
- regex: regexs matching to_replace will be replaced with value
list of str, regex, or numeric:
- First, if to_replace and value are both lists, they must be the same length.
- Second, if
regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use. - str and regex rules apply as above.
dict:
- Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
- Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
None:
- This means that the
regex
argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is alsoNone
then this must be a nested dictionary orSeries
.
- This means that the
See the examples section for examples of each of these.
- value : scalar, dict, list, str, regex, default None
- Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
- inplace : boolean, default False
- If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
- limit : int, default None
- Maximum size gap to forward or backward fill
- regex : bool or same types as to_replace, default False
- Whether to interpret to_replace and/or value as regular
expressions. If this is
True
then to_replace must be a string. Otherwise, to_replace must beNone
because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions. - method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
- The method to use when for replacement, when
to_replace
is alist
.
NDFrame.reindex NDFrame.asfreq NDFrame.fillna
filled : NDFrame
- AssertionError
- If regex is not a
bool
and to_replace is notNone
.
- If regex is not a
- TypeError
- If to_replace is a
dict
and value is not alist
,dict
,ndarray
, orSeries
- If to_replace is
None
and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
- If to_replace is a
- ValueError
- If to_replace and value are
list
s orndarray
s, but they are not the same length.
- If to_replace and value are
- Regex substitution is performed under the hood with
re.sub
. The rules for substitution forre.sub
are the same. - Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
- This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
-
resample
(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)¶ Convenience method for frequency conversion and resampling of regular time-series data.
- rule : string
- the offset string or object representing target conversion
- how : string
- method for down- or re-sampling, default to ‘mean’ for downsampling
axis : int, optional, default 0 fill_method : string, default None
fill_method for upsampling- closed : {‘right’, ‘left’}
- Which side of bin interval is closed
- label : {‘right’, ‘left’}
- Which bin edge label to label bucket with
convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta
Adjust the resampled time labels- limit : int, default None
- Maximum size gap to when reindexing with fill_method
- base : int, default 0
- For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
-
reset_index
(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶ For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.
- level : int, str, tuple, or list, default None
- Only remove the given levels from the index. Removes all levels by default
- drop : boolean, default False
- Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- col_level : int or str, default 0
- If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
- col_fill : object, default ‘’
- If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
resetted : DataFrame
-
rfloordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rpow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rsub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rtruediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
save
(path)¶ Deprecated. Use to_pickle instead
-
select
(crit, axis=0)¶ Return data corresponding to axis labels matching criteria
- crit : function
- To be called on each index (label). Should return True or False
axis : int
selection : type of caller
-
set_index
(keys, drop=True, append=False, inplace=False, verify_integrity=False)¶ Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.
keys : column label or list of column labels / arrays drop : boolean, default True
Delete columns to be used as the new index- append : boolean, default False
- Whether to append columns to existing index
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- verify_integrity : boolean, default False
- Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B']) >>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]]) >>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])
dataframe : DataFrame
-
set_value
(index, col, value)¶ Put single value at passed column and index
index : row label col : column label value : scalar value
- frame : DataFrame
- If label pair is contained, will be reference to calling DataFrame, otherwise a new object
-
shape
¶
-
shift
(periods=1, freq=None, axis=0, **kwds)¶ Shift index by desired number of periods with an optional time freq
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, optional
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
If freq is specified then the index values are shifted but the data if not realigned
shifted : same type as caller
-
skew
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased skew over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
skew : Series or DataFrame (if level specified)
-
sort
(columns=None, column=None, axis=0, ascending=True, inplace=False)¶ Sort DataFrame either by labels (along either axis) or by the values in column(s)
- columns : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- axis : {0, 1}
- Sort index/rows versus columns
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])
sorted : DataFrame
-
sort_index
(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')¶ Sort DataFrame either by labels (along either axis) or by the values in a column
- axis : {0, 1}
- Sort index/rows versus columns
- by : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])
sorted : DataFrame
-
sortlevel
(level=0, axis=0, ascending=True, inplace=False)¶ Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)
level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False
Sort the DataFrame without creating a new instancesorted : DataFrame
-
squeeze
()¶ squeeze length 1 dimensions
-
stack
(level=-1, dropna=True)¶ Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.
- level : int, string, or list of these, default last level
- Level(s) to stack, can pass level name
- dropna : boolean, default True
- Whether to drop rows in the resulting Frame/Series with no valid values
>>> s a b one 1. 2. two 3. 4.
>>> s.stack() one a 1 b 2 two a 3 b 4
stacked : DataFrame or Series
-
std
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased standard deviation over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
stdev : Series or DataFrame (if level specified)
-
sub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
subtract
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
sum
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the sum of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
sum : Series or DataFrame (if level specified)
-
swapaxes
(axis1, axis2, copy=True)¶ Interchange axes and swap values axes appropriately
y : same as input
-
swaplevel
(i, j, axis=0)¶ Swap levels i and j in a MultiIndex on a particular axis
- i, j : int, string (can be mixed)
- Level of index to be swapped. Can pass level name as string.
swapped : type of caller (new object)
-
tail
(n=5)¶ Returns last n rows
-
take
(indices, axis=0, convert=True, is_copy=True)¶ Analogous to ndarray.take
indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy
taken : type of caller
-
to_clipboard
(excel=None, sep=None, **kwargs)¶ Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.
- excel : boolean, defaults to True
- if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard
sep : optional, defaults to tab other keywords are passed to to_csv
- Requirements for your platform
- Linux: xclip, or xsel (with gtk or PyQt4 modules)
- Windows: none
- OS X: none
-
to_csv
(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)¶ Write DataFrame to a comma-separated values (csv) file
- path_or_buf : string or file handle / StringIO
- File path
- sep : character, default ”,”
- Field delimiter for the output file.
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, or False, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
- nanRep : None
- deprecated, use na_rep
- mode : str
- Python write mode, default ‘w’
- encoding : string, optional
- a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
- line_terminator : string, default ‘\\n’
- The newline character or character sequence to use in the output file
- quoting : optional constant from csv module
- defaults to csv.QUOTE_MINIMAL
- chunksize : int or None
- rows to write at a time
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- date_format : string, default None
- Format string for datetime objects.
-
to_dense
()¶ Return dense representation of NDFrame (as opposed to sparse)
-
to_dict
(outtype='dict')¶ Convert DataFrame to dictionary.
- outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
- Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.
result : dict like {column -> {index -> value}}
-
to_excel
(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)¶ Write DataFrame to a excel sheet
- excel_writer : string or ExcelWriter object
- File path or existing ExcelWriter
- sheet_name : string, default ‘Sheet1’
- Name of sheet which will contain DataFrame
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startow :
- upper left cell row to dump data frame
- startcol :
- upper left cell column to dump data frame
- engine : string, default None
- write engine to use - you can also set this via the options
io.excel.xlsx.writer
,io.excel.xls.writer
, andio.excel.xlsm.writer
. - merge_cells : boolean, default True
- Write MultiIndex and Hierarchical Rows as merged cells.
If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:
>>> writer = ExcelWriter('output.xlsx') >>> df1.to_excel(writer,'Sheet1') >>> df2.to_excel(writer,'Sheet2') >>> writer.save()
-
to_gbq
(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)¶ Write a DataFrame to a Google BigQuery table.
If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.
- destination_table : string
- name of table to be written, in the form ‘dataset.tablename’
- schema : sequence (optional)
- list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
- col_order : sequence (optional)
- order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
- if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
kwargs are passed to the Client constructor
- SchemaMissing :
- Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
- TableExists :
- Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
- InvalidSchema :
- Raised if the ‘schema’ parameter does not match the provided DataFrame
-
to_hdf
(path_or_buf, key, **kwargs)¶ activate the HDFStore
path_or_buf : the path (string) or buffer to put the store key : string
indentifier for the group in the storemode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’
'r'
- Read-only; no data can be modified.
'w'
- Write; a new file is created (an existing file with the same name would be deleted).
'a'
- Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
- It is similar to
'a'
, but the file must already exist.
- format : ‘fixed(f)|table(t)’, default is ‘fixed’
- fixed(f) : Fixed format
- Fast writing/reading. Not-appendable, nor searchable
- table(t) : Table format
- Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
- append : boolean, default False
- For Table formats, append the input data to the existing
- complevel : int, 1-9, default 0
- If a complib is specified compression will be applied where possible
- complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
- If complevel is > 0 apply compression to objects written in the store wherever possible
- fletcher32 : bool, default False
- If applying compression use the fletcher32 checksum
-
to_html
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame as an HTML table.
to_html-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- classes : str or list or tuple, default None
- CSS class(es) to apply to the resulting html table
- escape : boolean, default True
- Convert the characters <, >, and & to HTML-safe sequences.=
- max_rows : int, optional
- Maximum number of rows to show before truncating. If None, show all.
- max_cols : int, optional
- Maximum number of columns to show before truncating. If None, show all.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_json
(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)¶ Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.
- path_or_buf : the path or buffer to write the result string
- if this is None, return a StringIO of the converted string
orient : string
- Series
- default is ‘index’
- allowed values are: {‘split’,’records’,’index’}
- DataFrame
- default is ‘columns’
- allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
- The format of the JSON string
- split : dict like {index -> [index], columns -> [columns], data -> [values]}
- records : list like [{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
- columns : dict like {column -> {index -> value}}
- values : just the values array
- date_format : {‘epoch’, ‘iso’}
- Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
- double_precision : The number of decimal places to use when encoding
- floating point values, default 10.
force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.- default_handler : callable, default None
- Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
same type as input object with filtered info axis
-
to_latex
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)¶ Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.
to_latex-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_msgpack
(path_or_buf=None, **kwargs)¶ msgpack (serialize) object to input file path
THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.
- path : string File path, buffer-like, or None
- if None, return generated string
- append : boolean whether to append to an existing msgpack
- (default is False)
- compress : type of compressor (zlib or blosc), default to None (no
- compression)
-
to_panel
()¶ Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.
Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later
panel : Panel
-
to_period
(freq=None, axis=0, copy=True)¶ Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)
freq : string, default axis : {0, 1}, default 0
The axis to convert (the index by default)- copy : boolean, default True
- If False then underlying input data is not copied
ts : TimeSeries with PeriodIndex
-
to_pickle
(path)¶ Pickle (serialize) object to input file path
- path : string
- File path
-
to_records
(index=True, convert_datetime64=True)¶ Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested
- index : boolean, default True
- Include index in resulting record array, stored in ‘index’ field
- convert_datetime64 : boolean, default True
- Whether to convert the index to datetime.datetime if it is a DatetimeIndex
y : recarray
-
to_sparse
(fill_value=None, kind='block')¶ Convert to SparseDataFrame
fill_value : float, default NaN kind : {‘block’, ‘integer’}
y : SparseDataFrame
-
to_sql
(name, con, flavor='sqlite', if_exists='fail', **kwargs)¶ Write records stored in a DataFrame to a SQL database.
- name : str
- Name of SQL table
conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
-
to_stata
(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)¶ A class for writing Stata binary dta files from array-like objects
- fname : file path or buffer
- Where to save the dta file.
- convert_dates : dict
- Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
- encoding : str
- Default is latin-1. Note that Stata does not support unicode.
- byteorder : str
- Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data) >>> writer.write_file()
Or with dates
>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'}) >>> writer.write_file()
-
to_string
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame to a console-friendly tabular output.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_timestamp
(freq=None, how='start', axis=0, copy=True)¶ Cast to DatetimeIndex of timestamps, at beginning of period
- freq : string, default frequency of PeriodIndex
- Desired frequency
- how : {‘s’, ‘e’, ‘start’, ‘end’}
- Convention for converting period to timestamp; start of period vs. end
- axis : {0, 1} default 0
- The axis to convert (the index by default)
- copy : boolean, default True
- If false then underlying input data is not copied
df : DataFrame with DatetimeIndex
-
to_wide
(*args, **kwargs)¶
-
transpose
()¶ Transpose index and columns
-
truediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
truncate
(before=None, after=None, axis=None, copy=True)¶ Truncates a sorted NDFrame before and/or after some particular dates.
- before : date
- Truncate before date
- after : date
- Truncate after date
axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,
return a copy of the truncated sectiontruncated : type of caller
-
tshift
(periods=1, freq=None, axis=0, **kwds)¶ Shift the time index, using the index’s frequency if available
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, default None
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
- axis : int or basestring
- Corresponds to the axis that contains the Index
If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown
shifted : NDFrame
-
tz_convert
(tz, axis=0, copy=True)¶ Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data
-
tz_localize
(tz, axis=0, copy=True, infer_dst=False)¶ Localize tz-naive TimeSeries to target time zone
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data- infer_dst : boolean, default False
- Attempt to infer fall dst-transition times based on order
-
unstack
(level=-1)¶ Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)
- level : int, string, or list of these, default -1 (last level)
- Level(s) of index to unstack, can pass level name
DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation
from unstack).>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ... ('two', 'a'), ('two', 'b')]) >>> s = pd.Series(np.arange(1.0, 5.0), index=index) >>> s one a 1 b 2 two a 3 b 4 dtype: float64
>>> s.unstack(level=-1) a b one 1 2 two 3 4
>>> s.unstack(level=0) one two a 1 3 b 2 4
>>> df = s.unstack(level=0) >>> df.unstack() one a 1. b 3. two a 2. b 4.
unstacked : DataFrame or Series
-
update
(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)¶ Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices
other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame- filter_func : callable(1d-array) -> 1d-array<boolean>, default None
- Can choose to replace values other than NA. Return True for values that should be updated
- raise_conflict : boolean
- If True, will raise an error if the DataFrame and other both contain data in the same place.
-
values
¶ Numpy representation of NDFrame
-
var
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased variance over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
variance : Series or DataFrame (if level specified)
-
where
(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)¶ Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False
Whether to perform the operation in place on the dataaxis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False
try to cast the result back to the input type (if possible),- raise_on_error : boolean, default True
- Whether to raise on invalid data types (e.g. trying to where on strings)
wh : same type as caller
-
xs
(key, axis=0, level=None, copy=True, drop_level=True)¶ Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).
- key : object
- Some label contained in the index, or partially in a MultiIndex
- axis : int, default 0
- Axis to retrieve cross-section on
- level : object, defaults to first n levels (n=1 or len(key))
- In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
- copy : boolean, default True
- Whether to make a copy of the data
- drop_level : boolean, default True
- If False, returns object with same levels as self.
>>> df A B C a 4 5 2 b 4 0 9 c 9 7 3 >>> df.xs('a') A 4 B 5 C 2 Name: a >>> df.xs('C', axis=1) a 2 b 9 c 3 Name: C >>> s = df.xs('a', copy=False) >>> s['A'] = 100 >>> df A B C a 100 5 2 b 4 0 9 c 9 7 3
>>> df A B C D first second third bar one 1 4 1 8 9 two 1 7 5 5 0 baz one 1 6 6 8 0 three 2 5 3 5 3 >>> df.xs(('baz', 'three')) A B C D third 2 5 3 5 3 >>> df.xs('one', level=1) A B C D first third bar 1 4 1 8 9 baz 1 6 6 8 0 >>> df.xs(('baz', 2), level=[0, 'third']) A B C D second three 5 3 5 3
xs : Series or DataFrame
-
-
class
Fred2.Core.Result.
EpitopePredictionResult
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
Fred2.Core.Result.AResult
A
EpitopePredictionResult
object is a DataFrame with multi-indexing, where column Ids are the prediction model (i.e HLAAllele
for epitope prediction), row ID the target of the prediction (i.e.Peptide
) and the second row ID the predictor (e.g. BIMAS)EpitopePredictionResult
Peptide Obj Method Name Allele1 Obj Allele2 Obj Allele3 Obj Peptide1 Method 1 0.324 0.56 0.013 Method 2 20 15 23 Peptide2 Method 1 0.50 0.36 0.98 Method 2 26 10 50 -
T
¶ Transpose index and columns
-
abs
()¶ Return an object with absolute value taken. Only applicable to objects that are all numeric
abs: type of caller
-
add
(other, axis='columns', level=None, fill_value=None)¶ Binary operator add with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
add_prefix
(prefix)¶ Concatenate prefix string with panel items names.
prefix : string
with_prefix : type of caller
-
add_suffix
(suffix)¶ Concatenate suffix string with panel items names
suffix : string
with_suffix : type of caller
-
align
(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)¶ Align two object on their axes with the specified join method for each axis Index
other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None
Align on index (0), columns (1), or both (None)- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- copy : boolean, default True
- Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
method : str, default None limit : int, default None fill_axis : {0, 1}, default 0
Filling axis, method and limit- (left, right) : (type of input, type of other)
- Aligned objects
-
all
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
any
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any element is True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
append
(other, ignore_index=False, verify_integrity=False)¶ Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.
other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False
If True do not use the index labels. Useful for gluing together record arrays- verify_integrity : boolean, default False
- If True, raise ValueError on creating index with duplicates
If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged
appended : DataFrame
-
apply
(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)¶ Applies function along input axis of DataFrame.
Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.
- func : function
- Function to apply to each column/row
- axis : {0, 1}
- 0 : apply function to each column
- 1 : apply function to each row
- broadcast : boolean, default False
- For aggregation functions, return object of same size with values propagated
- reduce : boolean or None, default None
- Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
- raw : boolean, default False
- If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
- args : tuple
- Positional arguments to pass to function in addition to the array/series
Additional keyword arguments will be passed as keywords to the function
>>> df.apply(numpy.sqrt) # returns DataFrame >>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0) >>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)
DataFrame.applymap: For elementwise operations
applied : Series or DataFrame
-
applymap
(func)¶ Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
- func : function
- Python function, returns a single value from a single value
applied : DataFrame
DataFrame.apply : For operations on rows/columns
-
as_blocks
(columns=None)¶ Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.
are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
- as_matrix)
- columns : array-like
- Specific column order
values : a list of Object
-
as_matrix
(columns=None)¶ Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtype will be a lower-common-denominator dtype (implicit
upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks
- e.g. if the dtypes are float16,float32 -> float32
- float16,float32,float64 -> float64 int32,uint8 -> int32
- values : ndarray
- If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
-
asfreq
(freq, method=None, how=None, normalize=False)¶ Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.
freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method- how : {‘start’, ‘end’}, default end
- For PeriodIndex only, see PeriodIndex.asfreq
- normalize : bool, default False
- Whether to reset output index to midnight
converted : type of caller
-
astype
(dtype, copy=True, raise_on_error=True)¶ Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)
dtype : numpy.dtype or Python type raise_on_error : raise on invalid input
casted : type of caller
-
at
¶
-
at_time
(time, asof=False)¶ Select values at particular time of day (e.g. 9:30AM)
time : datetime.time or string
values_at_time : type of caller
-
axes
¶
-
between_time
(start_time, end_time, include_start=True, include_end=True)¶ Select values between particular times of the day (e.g., 9:00-9:30 AM)
start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True
values_between_time : type of caller
-
bfill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’bfill’)
-
blocks
¶ Internal property, property synonym for as_blocks()
-
bool
()¶ Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False
Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean
-
boxplot
(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)¶ Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns
data : DataFrame column : column names or list of names, or vector
Can be any valid input to groupby- by : string or sequence
- Column in the DataFrame to group by
ax : matplotlib axis object, default None fontsize : int or string rot : int, default None
Rotation for ticks- grid : boolean, default None (matlab style default)
- Axis grid lines
ax : matplotlib.axes.AxesSubplot
-
clip
(lower=None, upper=None, out=None)¶ Trim values at input threshold(s)
lower : float, default None upper : float, default None
clipped : Series
-
clip_lower
(threshold)¶ Return copy of the input with values below given value truncated
clip
clipped : same type as input
-
clip_upper
(threshold)¶ Return copy of input with values above given value truncated
clip
clipped : same type as input
-
combine
(other, func, fill_value=None, overwrite=True)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True
If True then overwrite values for common keys in the calling frameresult : DataFrame
-
combineAdd
(other)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combineMult
(other)¶ Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combine_first
(other)¶ Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns
other : DataFrame
a’s values prioritized, use values from b to fill holes:
>>> a.combine_first(b)
combined : DataFrame
-
compound
(axis=None, skipna=None, level=None, **kwargs)¶ Return the compound percentage of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
compounded : Series or DataFrame (if level specified)
-
consolidate
(inplace=False)¶ Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user
- inplace : boolean, default False
- If False return new object, otherwise modify existing object
consolidated : type of caller
-
convert_objects
(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)¶ Attempt to infer better dtype for object columns
- convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
- convert_numeric : if True attempt to coerce to numbers (including
- strings), non-convertibles get NaN
- convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
copy : Boolean, if True, return copy, default is True
converted : asm as input object
-
copy
(deep=True)¶ Make a copy of this object
- deep : boolean, default True
- Make a deep copy, i.e. also copy data
copy : type of caller
-
corr
(method='pearson', min_periods=1)¶ Compute pairwise correlation of columns, excluding NA/null values
- method : {‘pearson’, ‘kendall’, ‘spearman’}
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation
y : DataFrame
-
corrwith
(other, axis=0, drop=False)¶ Compute pairwise correlation between rows or columns of two DataFrame objects.
other : DataFrame axis : {0, 1}
0 to compute column-wise, 1 for row-wise- drop : boolean, default False
- Drop missing indices from result, default returns union of all
correls : Series
-
count
(axis=0, level=None, numeric_only=False)¶ Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- numeric_only : boolean, default False
- Include only float, int, boolean data
count : Series (or DataFrame if level specified)
-
cov
(min_periods=None)¶ Compute pairwise covariance of columns, excluding NA/null values
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result.
y : DataFrame
y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).
-
cummax
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative max over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmax : Series
-
cummin
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative min over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmin : Series
-
cumprod
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative prod over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAprod : Series
-
cumsum
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative sum over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAsum : Series
-
delevel
(*args, **kwargs)¶
-
describe
(percentile_width=50)¶ Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles
- percentile_width : float, optional
- width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75
DataFrame of summary statistics
-
diff
(periods=1)¶ 1st discrete difference of object
- periods : int, default 1
- Periods to shift for forming difference
diffed : DataFrame
-
div
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
divide
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
dot
(other)¶ Matrix multiplication with DataFrame or Series objects
other : DataFrame or Series
dot_product : DataFrame or Series
-
drop
(labels, axis=0, level=None, inplace=False, **kwargs)¶ Return new object with labels in requested axis removed
labels : single label or list-like axis : int or axis name level : int or name, default None
For MultiIndex- inplace : bool, default False
- If True, do operation inplace and return None.
dropped : type of caller
-
drop_duplicates
(cols=None, take_last=False, inplace=False)¶ Return DataFrame with duplicate rows removed, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
- inplace : boolean, default False
- Whether to drop duplicates in place or to return a copy
deduplicated : DataFrame
-
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Return object with labels on given axis omitted where alternately any or all of the data are missing
- axis : {0, 1}, or tuple/list thereof
- Pass tuple or list to drop on multiple axes
- how : {‘any’, ‘all’}
- any : if any NA values are present, drop that label
- all : if all values are NA, drop that label
- thresh : int, default None
- int value : require that many non-NA values
- subset : array-like
- Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
- inplace : boolean, defalt False
- If True, do operation inplace and return None.
dropped : DataFrame
-
dtypes
¶ Return the dtypes in this object
-
duplicated
(cols=None, take_last=False)¶ Return boolean Series denoting duplicate rows, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
duplicated : Series
-
empty
¶ True if NDFrame is entirely empty [no items]
-
eq
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods eq
-
equals
(other)¶ Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
-
eval
(expr, **kwargs)¶ Evaluate an expression in the context of the calling DataFrame instance.
- expr : string
- The expression string to evaluate.
- kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
ret : ndarray, scalar, or pandas object
pandas.DataFrame.query pandas.eval
For more details see the API documentation for
eval()
. For detailed examples see enhancing performance with eval.>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.eval('a + b') >>> df.eval('c=a + b')
-
ffill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’ffill’)
-
fillna
(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)¶ Fill NA/NaN values using the specified method
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- value : scalar, dict, or Series
- Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- inplace : boolean, default False
- If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
- limit : int, default None
- Maximum size gap to forward or backward fill
- downcast : dict, default is None
- a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)
reindex, asfreq
filled : same type as caller
-
filter
(items=None, like=None, regex=None, axis=None)¶ Restrict the info axis to set of items or wildcard
- items : list-like
- List of info axis to restrict to (must not all be present)
- like : string
- Keep info axis where “arg in col == True”
- regex : string (regular expression)
- Keep info axis with re.search(regex, col) == True
Arguments are mutually exclusive, but this is not checked for
-
filter_result
(expressions)¶ Filters a result data frame based on a specified expression consisting of a list of triple with (method_name, comparator, threshold). The expression is applied to each row. If any of the columns fulfill the criteria the row remains.
Parameters: expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold) Returns: Filtered result object Return type: EpitopePredictionResult
-
first
(offset)¶ Convenience method for subsetting initial periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘10D’) -> First 10 days
subset : type of caller
-
first_valid_index
()¶ Return label for first non-NA/null value
-
floordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
from_csv
(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)¶ Read delimited file into DataFrame
path : string file path or file handle / StringIO header : int, default 0
Row to use at header (skip prior rows)- sep : string, default ‘,’
- Field delimiter
- index_col : int or sequence, default 0
- Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
- parse_dates : boolean, default True
- Parse dates. Different default from read_table
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- infer_datetime_format: boolean, default False
- If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.
Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data
y : DataFrame
-
from_dict
(data, orient='columns', dtype=None)¶ Construct DataFrame from dict of array-like or dicts
- data : dict
- {field : array-like} or {field : dict}
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.
DataFrame
-
from_items
(items, columns=None, orient='columns')¶ Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.
- items : sequence of (key, value) pairs
- Values should be arrays or Series.
- columns : sequence of column labels, optional
- Must be passed if orient=’index’.
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.
frame : DataFrame
-
from_records
(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)¶ Convert structured or record ndarray to DataFrame
data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like
Field of array to use as the index, alternately a specific set of input labels to use- exclude : sequence, default None
- Columns or fields to exclude
- columns : sequence, default None
- Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
- coerce_float : boolean, default False
- Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets
df : DataFrame
-
ftypes
¶ Return the ftypes (indication of sparse/dense and dtype) in this object.
-
ge
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ge
-
get
(key, default=None)¶ Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found
key : object
value : type of items contained in object
-
get_dtype_counts
()¶ Return the counts of dtypes in this object
-
get_ftype_counts
()¶ Return the counts of ftypes in this object
-
get_value
(index, col)¶ Quickly retrieve single value at passed column and index
index : row label col : column label
value : scalar value
-
get_values
()¶ same as values (but handles sparseness conversions)
-
groupby
(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)¶ Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
- by : mapping function / list of functions, dict, Series, or tuple /
- list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
axis : int, default 0 level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels- as_index : boolean, default True
- For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
- sort : boolean, default True
- Sort group keys. Get better performance by turning this off
- group_keys : boolean, default True
- When calling apply, add group keys to index to identify pieces
- squeeze : boolean, default False
- reduce the dimensionaility of the return type if possible, otherwise return a consistent type
# DataFrame result >>> data.groupby(func, axis=0).mean()
# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()
# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()
GroupBy object
-
gt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods gt
-
head
(n=5)¶ Returns first n rows
-
hist
(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)¶ Draw histogram of the DataFrame’s series using matplotlib / pylab.
data : DataFrame column : string or sequence
If passed, will be used to limit data to a subset of columns- by : object, optional
- If passed, then used to form histograms for separate groups
- grid : boolean, default True
- Whether to show axis grid lines
- xlabelsize : int, default None
- If specified changes the x-axis label size
- xrot : float, default None
- rotation of x axis labels
- ylabelsize : int, default None
- If specified changes the y-axis label size
- yrot : float, default None
- rotation of y axis labels
ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple
The size of the figure to create in inches by defaultlayout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments
To be passed to hist function
-
iat
¶
-
icol
(i)¶
-
idxmax
(axis=0, skipna=True)¶ Return index of first occurrence of maximum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be first index.
idxmax : Series
This method is the DataFrame version of
ndarray.argmax
.Series.idxmax
-
idxmin
(axis=0, skipna=True)¶ Return index of first occurrence of minimum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
idxmin : Series
This method is the DataFrame version of
ndarray.argmin
.Series.idxmin
-
iget_value
(i, j)¶
-
iloc
¶
-
info
(verbose=True, buf=None, max_cols=None)¶ Concise summary of a DataFrame.
- verbose : boolean, default True
- If False, don’t print column count summary
buf : writable buffer, defaults to sys.stdout max_cols : int, default None
Determines whether full summary or short summary is printed
-
insert
(loc, column, value, allow_duplicates=False)¶ Insert column into DataFrame at specified location.
If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.
- loc : int
- Must have 0 <= loc <= len(columns)
column : object value : int, Series, or array-like
-
interpolate
(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)¶ Interpolate values according to different methods.
- method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
- ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
- ‘linear’: ignore the index and treat the values as equally spaced. default
- ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
- ‘index’: use the actual numerical values of the index
- ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
- ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- limit : int, default None.
- Maximum number of consecutive NaNs to fill.
- inplace : bool, default False
- Update the NDFrame in place if possible.
- downcast : optional, ‘infer’ or None, defaults to ‘infer’
- Downcast dtypes if possible.
Series or DataFrame of same shape interpolated at the NaNs
reindex, replace, fillna
# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64
-
irow
(i, copy=False)¶
-
is_copy
= None¶
-
isin
(values)¶ Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
- values : iterable, Series, DataFrame or dictionary
- The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
DataFrame of booleans
When
values
is a list:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> df.isin([1, 3, 12, 'a']) A B 0 True True 1 False False 2 True False
When
values
is a dict:>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]}) >>> df.isin({'A': [1, 3], 'B': [4, 7, 12]}) A B 0 True False # Note that B didn't match the 1 here. 1 False True 2 True True
When
values
is a Series or DataFrame:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']}) >>> df.isin(other) A B 0 True False 1 False False # Column A in `other` has a 3, but not at index 1. 2 True True
-
isnull
()¶ Return a boolean same-sized object indicating if the values are null
-
iteritems
()¶ Iterator over (column, series) pairs
-
iterkv
(*args, **kwargs)¶ iteritems alias used to get around 2to3. Deprecated
-
iterrows
()¶ Iterate over rows of DataFrame as (index, Series) pairs.
iterrows
does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,>>> df = DataFrame([[1, 1.0]], columns=['x', 'y']) >>> row = next(df.iterrows())[1] >>> print(row['x'].dtype) float64 >>> print(df['x'].dtype) int64
- it : generator
- A generator that iterates over the rows of the frame.
-
itertuples
(index=True)¶ Iterate over rows of DataFrame as tuples, with index value as first element of the tuple
-
ix
¶
-
join
(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)¶ Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
- other : DataFrame, Series with name field set, or list of DataFrame
- Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
- on : column name, tuple/list of column names, or array-like
- Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
- how : {‘left’, ‘right’, ‘outer’, ‘inner’}
How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise
- left: use calling frame’s index
- right: use input frame’s index
- outer: form union of indexes
- inner: use intersection of indexes
- lsuffix : string
- Suffix to use from left frame’s overlapping columns
- rsuffix : string
- Suffix to use from right frame’s overlapping columns
- sort : boolean, default False
- Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame
on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects
joined : DataFrame
-
keys
()¶ Get the ‘info axis’ (see Indexing for more)
This is index for Series, columns for DataFrame and major_axis for Panel.
-
kurt
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
kurtosis
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
last
(offset)¶ Convenience method for subsetting final periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘5M’) -> Last 5 months
subset : type of caller
-
last_valid_index
()¶ Return label for last non-NA/null value
-
le
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods le
-
load
(path)¶ Deprecated. Use read_pickle instead.
-
loc
¶
-
lookup
(row_labels, col_labels)¶ Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.
- row_labels : sequence
- The row labels to use for lookup
- col_labels : sequence
- The column labels to use for lookup
Akin to:
result = [] for row, col in zip(row_labels, col_labels): result.append(df.get_value(row, col))
- values : ndarray
- The found values
-
lt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods lt
-
mad
(axis=None, skipna=None, level=None, **kwargs)¶ Return the mean absolute deviation of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mad : Series or DataFrame (if level specified)
-
mask
(cond)¶ Returns copy whose values are replaced with nan if the inverted condition is True
cond : boolean NDFrame or array
wh: same as input
-
max
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the maximum of the values in the object. If you want the index of the maximum, use
idxmax
. This is the equivalent of thenumpy.ndarray
methodargmax
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
max : Series or DataFrame (if level specified)
-
mean
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mean : Series or DataFrame (if level specified)
-
median
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the median of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
median : Series or DataFrame (if level specified)
-
merge
(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)¶ Merge DataFrame objects by performing a database-style join operation by columns or indexes.
If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
- left: use only keys from left frame (SQL: left outer join)
- right: use only keys from right frame (SQL: right outer join)
- outer: use union of keys from both frames (SQL: full outer join)
- inner: use intersection of keys from both frames (SQL: inner join)
- on : label or list
- Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
- left_on : label or list, or array-like
- Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
- right_on : label or list, or array-like
- Field names to join on in right DataFrame or vector/list of vectors per left_on docs
- left_index : boolean, default False
- Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
- right_index : boolean, default False
- Use the index from the right DataFrame as the join key. Same caveats as left_index
- sort : boolean, default False
- Sort the join keys lexicographically in the result DataFrame
- suffixes : 2-length sequence (tuple, list, ...)
- Suffix to apply to overlapping column names in the left and right side, respectively
- copy : boolean, default True
- If False, do not copy data unnecessarily
>>> A >>> B lkey value rkey value 0 foo 1 0 foo 5 1 bar 2 1 bar 6 2 baz 3 2 qux 7 3 foo 4 3 bar 8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer') lkey value_x rkey value_y 0 bar 2 bar 6 1 bar 2 bar 8 2 baz 3 NaN NaN 3 foo 1 foo 5 4 foo 4 foo 5 5 NaN NaN qux 7
merged : DataFrame
-
merge_results
(others)¶ Merges results of type
EpitopePredictionResult
and returns the merged resultParameters: others (list( EpitopePredictionResult
)/EpitopePredictionResult
) – Another (list of) :class:`~Fred2.Core.Result.EpitopePredictionResult`(s)Returns: A new merged EpitopePredictionResult
objectReturn type: EpitopePredictionResult
-
min
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the minimum of the values in the object. If you want the index of the minimum, use
idxmin
. This is the equivalent of thenumpy.ndarray
methodargmin
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
min : Series or DataFrame (if level specified)
-
mod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
mode
(axis=0, numeric_only=False)¶ Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.
- axis : {0, 1, ‘index’, ‘columns’} (default 0)
- 0/’index’ : get mode of each column
- 1/’columns’ : get mode of each row
- numeric_only : boolean, default False
- if True, only apply to numeric columns
modes : DataFrame (sorted)
-
mul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
multiply
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
ndim
¶ Number of axes / array dimensions
-
ne
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ne
-
notnull
()¶ Return a boolean same-sized object indicating if the values are not null
-
pct_change
(periods=1, fill_method='pad', limit=None, freq=None, **kwds)¶ Percent change over given number of periods
- periods : int, default 1
- Periods to shift for forming percent change
- fill_method : str, default ‘pad’
- How to handle NAs before computing percent changes
- limit : int, default None
- The number of consecutive NAs to fill before stopping
- freq : DateOffset, timedelta, or offset alias string, optional
- Increment to use from time series API (e.g. ‘M’ or BDay())
chg : same type as caller
-
pivot
(index=None, columns=None, values=None)¶ Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)
- index : string or object
- Column name to use to make new frame’s index
- columns : string or object
- Column name to use to make new frame’s columns
- values : string or object, optional
- Column name to use for populating new frame’s values
For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods
>>> df foo bar baz 0 one A 1. 1 one B 2. 2 one C 3. 3 two A 4. 4 two B 5. 5 two C 6.
>>> df.pivot('foo', 'bar', 'baz') A B C one 1 2 3 two 4 5 6
>>> df.pivot('foo', 'bar')['baz'] A B C one 1 2 3 two 4 5 6
- pivoted : DataFrame
- If no values column specified, will have hierarchically indexed columns
-
pivot_table
(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)¶ Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on
Keys to group on the x-axis of the pivot table- cols : list of column names or arrays to group on
- Keys to group on the y-axis of the pivot table
- aggfunc : function, default numpy.mean, or list of functions
- If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
- fill_value : scalar, default None
- Value to replace missing values with
- margins : boolean, default False
- Add all row / columns (e.g. for subtotal / grand totals)
- dropna : boolean, default True
- Do not include columns whose entries are all NaN
>>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
>>> table = pivot_table(df, values='D', rows=['A', 'B'], ... cols=['C'], aggfunc=np.sum) >>> table small large foo one 1 4 two 6 NaN bar one 5 4 two 6 7
table : DataFrame
-
plot
(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)¶ Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.
frame : DataFrame x : label or position, default None y : label or position, default None
Allows plotting of one column versus another- subplots : boolean, default False
- Make separate subplots for each time series
- sharex : boolean, default True
- In case subplots=True, share x axis
- sharey : boolean, default False
- In case subplots=True, share y axis
- use_index : boolean, default True
- Use index as ticks for x axis
- stacked : boolean, default False
- If True, create stacked bar plot. Only valid for DataFrame input
- sort_columns: boolean, default False
- Sort column names to determine plot ordering
- title : string
- Title to use for the plot
- grid : boolean, default None (matlab style default)
- Axis grid lines
- legend : boolean, default True
- Place legend on axis subplots
ax : matplotlib axis object, default None style : list or dict
matplotlib line style per column- kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
- bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
- logx : boolean, default False
- For line plots, use log scaling on x axis
- logy : boolean, default False
- For line plots, use log scaling on y axis
- xticks : sequence
- Values to use for the xticks
- yticks : sequence
- Values to use for the yticks
xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None
Rotation for ticks- secondary_y : boolean or sequence, default False
- Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
- mark_right: boolean, default True
- When using a secondary_y axis, should the legend label the axis of the various columns automatically
- colormap : str or matplotlib colormap object, default None
- Colormap to select colors from. If string, load colormap with that name from matplotlib.
- kwds : keywords
- Options to pass to matplotlib plotting method
ax_or_axes : matplotlib.AxesSubplot or list of them
-
pop
(item)¶ Return item and drop from frame. Raise KeyError if not found.
-
pow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator pow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
prod
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
product
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
quantile
(q=0.5, axis=0, numeric_only=True)¶ Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats
- q : quantile, default 0.5 (50% quantile)
- 0 <= q <= 1
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
quantiles : Series
-
query
(expr, **kwargs)¶ Query the columns of a frame with a boolean expression.
- expr : string
- The query string to evaluate. The result of the evaluation of this
expression is first passed to
loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to__getitem__()
. - kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
q : DataFrame or Series
This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
index
andcolumns
attributes of theDataFrame
instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for this variable, and you can also use the name of the index to identify it in a query.For further details and examples see the
query
documentation in indexing.pandas.eval DataFrame.eval
>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.query('a > b') >>> df[df.a > df.b] # same result as the previous expression
-
radd
(other, axis='columns', level=None, fill_value=None)¶ Binary operator radd with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rank
(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)¶ Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values
- axis : {0, 1}, default 0
- Ranks over columns (0) or rows (1)
- numeric_only : boolean, default None
- Include only float, int, boolean data
- method : {‘average’, ‘min’, ‘max’, ‘first’}
- average: average rank of group
- min: lowest rank in group
- max: highest rank in group
- first: ranks assigned in order they appear in the array
- na_option : {‘keep’, ‘top’, ‘bottom’}
- keep: leave NA values where they are
- top: smallest rank if ascending
- bottom: smallest rank if descending
- ascending : boolean, default True
- False for ranks by high (1) to low (N)
ranks : DataFrame
-
rdiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
reindex
(index=None, columns=None, **kwargs)¶ Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index, columns : array-like, optional (can be specified in order, or as
- keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
- limit : int, default None
- Maximum size gap to forward or backward fill
- takeable : boolean, default False
- treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])
reindexed : DataFrame
-
reindex_axis
(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)¶ Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index : array-like, optional
- New labels / index to conform to. Preferably an Index object to avoid duplicating data
axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- limit : int, default None
- Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)
reindex, reindex_like
reindexed : DataFrame
-
reindex_like
(other, method=None, copy=True, limit=None)¶ return an object with matching indicies to myself
other : Object method : string or None copy : boolean, default True limit : int, default None
Maximum size gap to forward or backward fill- Like calling s.reindex(index=other.index, columns=other.columns,
- method=...)
reindexed : same as input
-
rename
(index=None, columns=None, **kwargs)¶ Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
- index, columns : dict-like or function, optional
- Transformation to apply to that axis values
- copy : boolean, default True
- Also copy underlying data
- inplace : boolean, default False
- Whether to return a new DataFrame. If True then value of copy is ignored.
renamed : DataFrame (new object)
-
rename_axis
(mapper, axis=0, copy=True, inplace=False)¶ Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True
Also copy underlying datainplace : boolean, default False
renamed : type of caller
-
reorder_levels
(order, axis=0)¶ Rearrange index levels using input order. May not drop or duplicate levels
- order : list of int or list of str
- List representing new level order. Reference level by number (position) or by key (label).
- axis : int
- Where to reorder levels.
type of caller (new object)
-
replace
(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)¶ Replace values given in ‘to_replace’ with ‘value’.
to_replace : str, regex, list, dict, Series, numeric, or None
str or regex:
- str: string exactly matching to_replace will be replaced with value
- regex: regexs matching to_replace will be replaced with value
list of str, regex, or numeric:
- First, if to_replace and value are both lists, they must be the same length.
- Second, if
regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use. - str and regex rules apply as above.
dict:
- Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
- Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
None:
- This means that the
regex
argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is alsoNone
then this must be a nested dictionary orSeries
.
- This means that the
See the examples section for examples of each of these.
- value : scalar, dict, list, str, regex, default None
- Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
- inplace : boolean, default False
- If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
- limit : int, default None
- Maximum size gap to forward or backward fill
- regex : bool or same types as to_replace, default False
- Whether to interpret to_replace and/or value as regular
expressions. If this is
True
then to_replace must be a string. Otherwise, to_replace must beNone
because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions. - method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
- The method to use when for replacement, when
to_replace
is alist
.
NDFrame.reindex NDFrame.asfreq NDFrame.fillna
filled : NDFrame
- AssertionError
- If regex is not a
bool
and to_replace is notNone
.
- If regex is not a
- TypeError
- If to_replace is a
dict
and value is not alist
,dict
,ndarray
, orSeries
- If to_replace is
None
and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
- If to_replace is a
- ValueError
- If to_replace and value are
list
s orndarray
s, but they are not the same length.
- If to_replace and value are
- Regex substitution is performed under the hood with
re.sub
. The rules for substitution forre.sub
are the same. - Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
- This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
-
resample
(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)¶ Convenience method for frequency conversion and resampling of regular time-series data.
- rule : string
- the offset string or object representing target conversion
- how : string
- method for down- or re-sampling, default to ‘mean’ for downsampling
axis : int, optional, default 0 fill_method : string, default None
fill_method for upsampling- closed : {‘right’, ‘left’}
- Which side of bin interval is closed
- label : {‘right’, ‘left’}
- Which bin edge label to label bucket with
convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta
Adjust the resampled time labels- limit : int, default None
- Maximum size gap to when reindexing with fill_method
- base : int, default 0
- For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
-
reset_index
(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶ For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.
- level : int, str, tuple, or list, default None
- Only remove the given levels from the index. Removes all levels by default
- drop : boolean, default False
- Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- col_level : int or str, default 0
- If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
- col_fill : object, default ‘’
- If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
resetted : DataFrame
-
rfloordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rpow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rsub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rtruediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
save
(path)¶ Deprecated. Use to_pickle instead
-
select
(crit, axis=0)¶ Return data corresponding to axis labels matching criteria
- crit : function
- To be called on each index (label). Should return True or False
axis : int
selection : type of caller
-
set_index
(keys, drop=True, append=False, inplace=False, verify_integrity=False)¶ Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.
keys : column label or list of column labels / arrays drop : boolean, default True
Delete columns to be used as the new index- append : boolean, default False
- Whether to append columns to existing index
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- verify_integrity : boolean, default False
- Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B']) >>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]]) >>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])
dataframe : DataFrame
-
set_value
(index, col, value)¶ Put single value at passed column and index
index : row label col : column label value : scalar value
- frame : DataFrame
- If label pair is contained, will be reference to calling DataFrame, otherwise a new object
-
shape
¶
-
shift
(periods=1, freq=None, axis=0, **kwds)¶ Shift index by desired number of periods with an optional time freq
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, optional
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
If freq is specified then the index values are shifted but the data if not realigned
shifted : same type as caller
-
skew
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased skew over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
skew : Series or DataFrame (if level specified)
-
sort
(columns=None, column=None, axis=0, ascending=True, inplace=False)¶ Sort DataFrame either by labels (along either axis) or by the values in column(s)
- columns : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- axis : {0, 1}
- Sort index/rows versus columns
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])
sorted : DataFrame
-
sort_index
(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')¶ Sort DataFrame either by labels (along either axis) or by the values in a column
- axis : {0, 1}
- Sort index/rows versus columns
- by : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])
sorted : DataFrame
-
sortlevel
(level=0, axis=0, ascending=True, inplace=False)¶ Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)
level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False
Sort the DataFrame without creating a new instancesorted : DataFrame
-
squeeze
()¶ squeeze length 1 dimensions
-
stack
(level=-1, dropna=True)¶ Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.
- level : int, string, or list of these, default last level
- Level(s) to stack, can pass level name
- dropna : boolean, default True
- Whether to drop rows in the resulting Frame/Series with no valid values
>>> s a b one 1. 2. two 3. 4.
>>> s.stack() one a 1 b 2 two a 3 b 4
stacked : DataFrame or Series
-
std
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased standard deviation over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
stdev : Series or DataFrame (if level specified)
-
sub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
subtract
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
sum
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the sum of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
sum : Series or DataFrame (if level specified)
-
swapaxes
(axis1, axis2, copy=True)¶ Interchange axes and swap values axes appropriately
y : same as input
-
swaplevel
(i, j, axis=0)¶ Swap levels i and j in a MultiIndex on a particular axis
- i, j : int, string (can be mixed)
- Level of index to be swapped. Can pass level name as string.
swapped : type of caller (new object)
-
tail
(n=5)¶ Returns last n rows
-
take
(indices, axis=0, convert=True, is_copy=True)¶ Analogous to ndarray.take
indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy
taken : type of caller
-
to_clipboard
(excel=None, sep=None, **kwargs)¶ Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.
- excel : boolean, defaults to True
- if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard
sep : optional, defaults to tab other keywords are passed to to_csv
- Requirements for your platform
- Linux: xclip, or xsel (with gtk or PyQt4 modules)
- Windows: none
- OS X: none
-
to_csv
(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)¶ Write DataFrame to a comma-separated values (csv) file
- path_or_buf : string or file handle / StringIO
- File path
- sep : character, default ”,”
- Field delimiter for the output file.
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, or False, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
- nanRep : None
- deprecated, use na_rep
- mode : str
- Python write mode, default ‘w’
- encoding : string, optional
- a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
- line_terminator : string, default ‘\\n’
- The newline character or character sequence to use in the output file
- quoting : optional constant from csv module
- defaults to csv.QUOTE_MINIMAL
- chunksize : int or None
- rows to write at a time
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- date_format : string, default None
- Format string for datetime objects.
-
to_dense
()¶ Return dense representation of NDFrame (as opposed to sparse)
-
to_dict
(outtype='dict')¶ Convert DataFrame to dictionary.
- outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
- Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.
result : dict like {column -> {index -> value}}
-
to_excel
(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)¶ Write DataFrame to a excel sheet
- excel_writer : string or ExcelWriter object
- File path or existing ExcelWriter
- sheet_name : string, default ‘Sheet1’
- Name of sheet which will contain DataFrame
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startow :
- upper left cell row to dump data frame
- startcol :
- upper left cell column to dump data frame
- engine : string, default None
- write engine to use - you can also set this via the options
io.excel.xlsx.writer
,io.excel.xls.writer
, andio.excel.xlsm.writer
. - merge_cells : boolean, default True
- Write MultiIndex and Hierarchical Rows as merged cells.
If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:
>>> writer = ExcelWriter('output.xlsx') >>> df1.to_excel(writer,'Sheet1') >>> df2.to_excel(writer,'Sheet2') >>> writer.save()
-
to_gbq
(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)¶ Write a DataFrame to a Google BigQuery table.
If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.
- destination_table : string
- name of table to be written, in the form ‘dataset.tablename’
- schema : sequence (optional)
- list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
- col_order : sequence (optional)
- order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
- if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
kwargs are passed to the Client constructor
- SchemaMissing :
- Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
- TableExists :
- Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
- InvalidSchema :
- Raised if the ‘schema’ parameter does not match the provided DataFrame
-
to_hdf
(path_or_buf, key, **kwargs)¶ activate the HDFStore
path_or_buf : the path (string) or buffer to put the store key : string
indentifier for the group in the storemode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’
'r'
- Read-only; no data can be modified.
'w'
- Write; a new file is created (an existing file with the same name would be deleted).
'a'
- Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
- It is similar to
'a'
, but the file must already exist.
- format : ‘fixed(f)|table(t)’, default is ‘fixed’
- fixed(f) : Fixed format
- Fast writing/reading. Not-appendable, nor searchable
- table(t) : Table format
- Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
- append : boolean, default False
- For Table formats, append the input data to the existing
- complevel : int, 1-9, default 0
- If a complib is specified compression will be applied where possible
- complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
- If complevel is > 0 apply compression to objects written in the store wherever possible
- fletcher32 : bool, default False
- If applying compression use the fletcher32 checksum
-
to_html
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame as an HTML table.
to_html-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- classes : str or list or tuple, default None
- CSS class(es) to apply to the resulting html table
- escape : boolean, default True
- Convert the characters <, >, and & to HTML-safe sequences.=
- max_rows : int, optional
- Maximum number of rows to show before truncating. If None, show all.
- max_cols : int, optional
- Maximum number of columns to show before truncating. If None, show all.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_json
(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)¶ Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.
- path_or_buf : the path or buffer to write the result string
- if this is None, return a StringIO of the converted string
orient : string
- Series
- default is ‘index’
- allowed values are: {‘split’,’records’,’index’}
- DataFrame
- default is ‘columns’
- allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
- The format of the JSON string
- split : dict like {index -> [index], columns -> [columns], data -> [values]}
- records : list like [{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
- columns : dict like {column -> {index -> value}}
- values : just the values array
- date_format : {‘epoch’, ‘iso’}
- Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
- double_precision : The number of decimal places to use when encoding
- floating point values, default 10.
force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.- default_handler : callable, default None
- Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
same type as input object with filtered info axis
-
to_latex
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)¶ Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.
to_latex-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_msgpack
(path_or_buf=None, **kwargs)¶ msgpack (serialize) object to input file path
THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.
- path : string File path, buffer-like, or None
- if None, return generated string
- append : boolean whether to append to an existing msgpack
- (default is False)
- compress : type of compressor (zlib or blosc), default to None (no
- compression)
-
to_panel
()¶ Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.
Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later
panel : Panel
-
to_period
(freq=None, axis=0, copy=True)¶ Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)
freq : string, default axis : {0, 1}, default 0
The axis to convert (the index by default)- copy : boolean, default True
- If False then underlying input data is not copied
ts : TimeSeries with PeriodIndex
-
to_pickle
(path)¶ Pickle (serialize) object to input file path
- path : string
- File path
-
to_records
(index=True, convert_datetime64=True)¶ Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested
- index : boolean, default True
- Include index in resulting record array, stored in ‘index’ field
- convert_datetime64 : boolean, default True
- Whether to convert the index to datetime.datetime if it is a DatetimeIndex
y : recarray
-
to_sparse
(fill_value=None, kind='block')¶ Convert to SparseDataFrame
fill_value : float, default NaN kind : {‘block’, ‘integer’}
y : SparseDataFrame
-
to_sql
(name, con, flavor='sqlite', if_exists='fail', **kwargs)¶ Write records stored in a DataFrame to a SQL database.
- name : str
- Name of SQL table
conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
-
to_stata
(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)¶ A class for writing Stata binary dta files from array-like objects
- fname : file path or buffer
- Where to save the dta file.
- convert_dates : dict
- Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
- encoding : str
- Default is latin-1. Note that Stata does not support unicode.
- byteorder : str
- Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data) >>> writer.write_file()
Or with dates
>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'}) >>> writer.write_file()
-
to_string
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame to a console-friendly tabular output.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_timestamp
(freq=None, how='start', axis=0, copy=True)¶ Cast to DatetimeIndex of timestamps, at beginning of period
- freq : string, default frequency of PeriodIndex
- Desired frequency
- how : {‘s’, ‘e’, ‘start’, ‘end’}
- Convention for converting period to timestamp; start of period vs. end
- axis : {0, 1} default 0
- The axis to convert (the index by default)
- copy : boolean, default True
- If false then underlying input data is not copied
df : DataFrame with DatetimeIndex
-
to_wide
(*args, **kwargs)¶
-
transpose
()¶ Transpose index and columns
-
truediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
truncate
(before=None, after=None, axis=None, copy=True)¶ Truncates a sorted NDFrame before and/or after some particular dates.
- before : date
- Truncate before date
- after : date
- Truncate after date
axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,
return a copy of the truncated sectiontruncated : type of caller
-
tshift
(periods=1, freq=None, axis=0, **kwds)¶ Shift the time index, using the index’s frequency if available
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, default None
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
- axis : int or basestring
- Corresponds to the axis that contains the Index
If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown
shifted : NDFrame
-
tz_convert
(tz, axis=0, copy=True)¶ Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data
-
tz_localize
(tz, axis=0, copy=True, infer_dst=False)¶ Localize tz-naive TimeSeries to target time zone
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data- infer_dst : boolean, default False
- Attempt to infer fall dst-transition times based on order
-
unstack
(level=-1)¶ Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)
- level : int, string, or list of these, default -1 (last level)
- Level(s) of index to unstack, can pass level name
DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation
from unstack).>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ... ('two', 'a'), ('two', 'b')]) >>> s = pd.Series(np.arange(1.0, 5.0), index=index) >>> s one a 1 b 2 two a 3 b 4 dtype: float64
>>> s.unstack(level=-1) a b one 1 2 two 3 4
>>> s.unstack(level=0) one two a 1 3 b 2 4
>>> df = s.unstack(level=0) >>> df.unstack() one a 1. b 3. two a 2. b 4.
unstacked : DataFrame or Series
-
update
(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)¶ Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices
other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame- filter_func : callable(1d-array) -> 1d-array<boolean>, default None
- Can choose to replace values other than NA. Return True for values that should be updated
- raise_conflict : boolean
- If True, will raise an error if the DataFrame and other both contain data in the same place.
-
values
¶ Numpy representation of NDFrame
-
var
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased variance over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
variance : Series or DataFrame (if level specified)
-
where
(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)¶ Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False
Whether to perform the operation in place on the dataaxis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False
try to cast the result back to the input type (if possible),- raise_on_error : boolean, default True
- Whether to raise on invalid data types (e.g. trying to where on strings)
wh : same type as caller
-
xs
(key, axis=0, level=None, copy=True, drop_level=True)¶ Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).
- key : object
- Some label contained in the index, or partially in a MultiIndex
- axis : int, default 0
- Axis to retrieve cross-section on
- level : object, defaults to first n levels (n=1 or len(key))
- In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
- copy : boolean, default True
- Whether to make a copy of the data
- drop_level : boolean, default True
- If False, returns object with same levels as self.
>>> df A B C a 4 5 2 b 4 0 9 c 9 7 3 >>> df.xs('a') A 4 B 5 C 2 Name: a >>> df.xs('C', axis=1) a 2 b 9 c 3 Name: C >>> s = df.xs('a', copy=False) >>> s['A'] = 100 >>> df A B C a 100 5 2 b 4 0 9 c 9 7 3
>>> df A B C D first second third bar one 1 4 1 8 9 two 1 7 5 5 0 baz one 1 6 6 8 0 three 2 5 3 5 3 >>> df.xs(('baz', 'three')) A B C D third 2 5 3 5 3 >>> df.xs('one', level=1) A B C D first third bar 1 4 1 8 9 baz 1 6 6 8 0 >>> df.xs(('baz', 2), level=[0, 'third']) A B C D second three 5 3 5 3
xs : Series or DataFrame
-
-
class
Fred2.Core.Result.
TAPPredictionResult
(data=None, index=None, columns=None, dtype=None, copy=False)¶ Bases:
Fred2.Core.Result.AResult
A
TAPPredictionResult
object is apandas.DataFrame
with single-indexing, where column Ids are the ` prediction names of the different prediction methods, and row ID thePeptide
objectTAPPredictionResult:
Peptide Obj Method Name Peptide1 -15.34 Peptide2 23.34 -
T
¶ Transpose index and columns
-
abs
()¶ Return an object with absolute value taken. Only applicable to objects that are all numeric
abs: type of caller
-
add
(other, axis='columns', level=None, fill_value=None)¶ Binary operator add with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
add_prefix
(prefix)¶ Concatenate prefix string with panel items names.
prefix : string
with_prefix : type of caller
-
add_suffix
(suffix)¶ Concatenate suffix string with panel items names
suffix : string
with_suffix : type of caller
-
align
(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0)¶ Align two object on their axes with the specified join method for each axis Index
other : DataFrame or Series join : {‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’ axis : allowed axis of the other object, default None
Align on index (0), columns (1), or both (None)- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- copy : boolean, default True
- Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
method : str, default None limit : int, default None fill_axis : {0, 1}, default 0
Filling axis, method and limit- (left, right) : (type of input, type of other)
- Aligned objects
-
all
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
any
(axis=None, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any element is True over requested axis. %(na_action)s
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- bool_only : boolean, default None
- Only include boolean data.
any : Series (or DataFrame if level specified)
-
append
(other, ignore_index=False, verify_integrity=False)¶ Append columns of other to end of this frame’s columns and index, returning a new object. Columns not in this frame are added as new columns.
other : DataFrame or list of Series/dict-like objects ignore_index : boolean, default False
If True do not use the index labels. Useful for gluing together record arrays- verify_integrity : boolean, default False
- If True, raise ValueError on creating index with duplicates
If a list of dict is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged
appended : DataFrame
-
apply
(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)¶ Applies function along input axis of DataFrame.
Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.
- func : function
- Function to apply to each column/row
- axis : {0, 1}
- 0 : apply function to each column
- 1 : apply function to each row
- broadcast : boolean, default False
- For aggregation functions, return object of same size with values propagated
- reduce : boolean or None, default None
- Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.
- raw : boolean, default False
- If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
- args : tuple
- Positional arguments to pass to function in addition to the array/series
Additional keyword arguments will be passed as keywords to the function
>>> df.apply(numpy.sqrt) # returns DataFrame >>> df.apply(numpy.sum, axis=0) # equiv to df.sum(0) >>> df.apply(numpy.sum, axis=1) # equiv to df.sum(1)
DataFrame.applymap: For elementwise operations
applied : Series or DataFrame
-
applymap
(func)¶ Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
- func : function
- Python function, returns a single value from a single value
applied : DataFrame
DataFrame.apply : For operations on rows/columns
-
as_blocks
(columns=None)¶ Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.
are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
- as_matrix)
- columns : array-like
- Specific column order
values : a list of Object
-
as_matrix
(columns=None)¶ Convert the frame to its Numpy-array matrix representation. Columns are presented in sorted order unless a specific list of columns is provided.
- NOTE: the dtype will be a lower-common-denominator dtype (implicit
upcasting) that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen use this with care if you are not dealing with the blocks
- e.g. if the dtypes are float16,float32 -> float32
- float16,float32,float64 -> float64 int32,uint8 -> int32
- values : ndarray
- If the caller is heterogeneous and contains booleans or objects, the result will be of dtype=object
-
asfreq
(freq, method=None, how=None, normalize=False)¶ Convert all TimeSeries inside to specified frequency using DateOffset objects. Optionally provide fill method to pad/backfill missing values.
freq : DateOffset object, or string method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill method- how : {‘start’, ‘end’}, default end
- For PeriodIndex only, see PeriodIndex.asfreq
- normalize : bool, default False
- Whether to reset output index to midnight
converted : type of caller
-
astype
(dtype, copy=True, raise_on_error=True)¶ Cast object to input numpy.dtype Return a copy when copy = True (be really careful with this!)
dtype : numpy.dtype or Python type raise_on_error : raise on invalid input
casted : type of caller
-
at
¶
-
at_time
(time, asof=False)¶ Select values at particular time of day (e.g. 9:30AM)
time : datetime.time or string
values_at_time : type of caller
-
axes
¶
-
between_time
(start_time, end_time, include_start=True, include_end=True)¶ Select values between particular times of the day (e.g., 9:00-9:30 AM)
start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True
values_between_time : type of caller
-
bfill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’bfill’)
-
blocks
¶ Internal property, property synonym for as_blocks()
-
bool
()¶ Return the bool of a single element PandasObject This must be a boolean scalar value, either True or False
Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean
-
boxplot
(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, **kwds)¶ Make a box plot from DataFrame column/columns optionally grouped (stratified) by one or more columns
data : DataFrame column : column names or list of names, or vector
Can be any valid input to groupby- by : string or sequence
- Column in the DataFrame to group by
ax : matplotlib axis object, default None fontsize : int or string rot : int, default None
Rotation for ticks- grid : boolean, default None (matlab style default)
- Axis grid lines
ax : matplotlib.axes.AxesSubplot
-
clip
(lower=None, upper=None, out=None)¶ Trim values at input threshold(s)
lower : float, default None upper : float, default None
clipped : Series
-
clip_lower
(threshold)¶ Return copy of the input with values below given value truncated
clip
clipped : same type as input
-
clip_upper
(threshold)¶ Return copy of input with values above given value truncated
clip
clipped : same type as input
-
combine
(other, func, fill_value=None, overwrite=True)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame func : function fill_value : scalar value overwrite : boolean, default True
If True then overwrite values for common keys in the calling frameresult : DataFrame
-
combineAdd
(other)¶ Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combineMult
(other)¶ Multiply two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well)
other : DataFrame
DataFrame
-
combine_first
(other)¶ Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns
other : DataFrame
a’s values prioritized, use values from b to fill holes:
>>> a.combine_first(b)
combined : DataFrame
-
compound
(axis=None, skipna=None, level=None, **kwargs)¶ Return the compound percentage of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
compounded : Series or DataFrame (if level specified)
-
consolidate
(inplace=False)¶ Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). Mainly an internal API function, but available here to the savvy user
- inplace : boolean, default False
- If False return new object, otherwise modify existing object
consolidated : type of caller
-
convert_objects
(convert_dates=True, convert_numeric=False, convert_timedeltas=True, copy=True)¶ Attempt to infer better dtype for object columns
- convert_dates : if True, attempt to soft convert dates, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
- convert_numeric : if True attempt to coerce to numbers (including
- strings), non-convertibles get NaN
- convert_timedeltas : if True, attempt to soft convert timedeltas, if ‘coerce’,
- force conversion (and non-convertibles get NaT)
copy : Boolean, if True, return copy, default is True
converted : asm as input object
-
copy
(deep=True)¶ Make a copy of this object
- deep : boolean, default True
- Make a deep copy, i.e. also copy data
copy : type of caller
-
corr
(method='pearson', min_periods=1)¶ Compute pairwise correlation of columns, excluding NA/null values
- method : {‘pearson’, ‘kendall’, ‘spearman’}
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation
y : DataFrame
-
corrwith
(other, axis=0, drop=False)¶ Compute pairwise correlation between rows or columns of two DataFrame objects.
other : DataFrame axis : {0, 1}
0 to compute column-wise, 1 for row-wise- drop : boolean, default False
- Drop missing indices from result, default returns union of all
correls : Series
-
count
(axis=0, level=None, numeric_only=False)¶ Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
- numeric_only : boolean, default False
- Include only float, int, boolean data
count : Series (or DataFrame if level specified)
-
cov
(min_periods=None)¶ Compute pairwise covariance of columns, excluding NA/null values
- min_periods : int, optional
- Minimum number of observations required per pair of columns to have a valid result.
y : DataFrame
y contains the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-1 (unbiased estimator).
-
cummax
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative max over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmax : Series
-
cummin
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative min over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAmin : Series
-
cumprod
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative prod over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAprod : Series
-
cumsum
(axis=None, dtype=None, out=None, skipna=True, **kwargs)¶ Return cumulative sum over requested axis.
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NAsum : Series
-
delevel
(*args, **kwargs)¶
-
describe
(percentile_width=50)¶ Generate various summary statistics of each column, excluding NaN values. These include: count, mean, std, min, max, and lower%/50%/upper% percentiles
- percentile_width : float, optional
- width of the desired uncertainty interval, default is 50, which corresponds to lower=25, upper=75
DataFrame of summary statistics
-
diff
(periods=1)¶ 1st discrete difference of object
- periods : int, default 1
- Periods to shift for forming difference
diffed : DataFrame
-
div
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
divide
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
dot
(other)¶ Matrix multiplication with DataFrame or Series objects
other : DataFrame or Series
dot_product : DataFrame or Series
-
drop
(labels, axis=0, level=None, inplace=False, **kwargs)¶ Return new object with labels in requested axis removed
labels : single label or list-like axis : int or axis name level : int or name, default None
For MultiIndex- inplace : bool, default False
- If True, do operation inplace and return None.
dropped : type of caller
-
drop_duplicates
(cols=None, take_last=False, inplace=False)¶ Return DataFrame with duplicate rows removed, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
- inplace : boolean, default False
- Whether to drop duplicates in place or to return a copy
deduplicated : DataFrame
-
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Return object with labels on given axis omitted where alternately any or all of the data are missing
- axis : {0, 1}, or tuple/list thereof
- Pass tuple or list to drop on multiple axes
- how : {‘any’, ‘all’}
- any : if any NA values are present, drop that label
- all : if all values are NA, drop that label
- thresh : int, default None
- int value : require that many non-NA values
- subset : array-like
- Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
- inplace : boolean, defalt False
- If True, do operation inplace and return None.
dropped : DataFrame
-
dtypes
¶ Return the dtypes in this object
-
duplicated
(cols=None, take_last=False)¶ Return boolean Series denoting duplicate rows, optionally only considering certain columns
- cols : column label or sequence of labels, optional
- Only consider certain columns for identifying duplicates, by default use all of the columns
- take_last : boolean, default False
- Take the last observed row in a row. Defaults to the first row
duplicated : Series
-
empty
¶ True if NDFrame is entirely empty [no items]
-
eq
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods eq
-
equals
(other)¶ Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
-
eval
(expr, **kwargs)¶ Evaluate an expression in the context of the calling DataFrame instance.
- expr : string
- The expression string to evaluate.
- kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
ret : ndarray, scalar, or pandas object
pandas.DataFrame.query pandas.eval
For more details see the API documentation for
eval()
. For detailed examples see enhancing performance with eval.>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.eval('a + b') >>> df.eval('c=a + b')
-
ffill
(axis=0, inplace=False, limit=None, downcast=None)¶ Synonym for NDFrame.fillna(method=’ffill’)
-
fillna
(value=None, method=None, axis=0, inplace=False, limit=None, downcast=None)¶ Fill NA/NaN values using the specified method
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- value : scalar, dict, or Series
- Value to use to fill holes (e.g. 0), alternately a dict/Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series will not be filled). This value cannot be a list.
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- inplace : boolean, default False
- If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
- limit : int, default None
- Maximum size gap to forward or backward fill
- downcast : dict, default is None
- a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)
reindex, asfreq
filled : same type as caller
-
filter
(items=None, like=None, regex=None, axis=None)¶ Restrict the info axis to set of items or wildcard
- items : list-like
- List of info axis to restrict to (must not all be present)
- like : string
- Keep info axis where “arg in col == True”
- regex : string (regular expression)
- Keep info axis with re.search(regex, col) == True
Arguments are mutually exclusive, but this is not checked for
-
filter_result
(expressions)¶ Filters a result data frame based on a specified expression consisting of a list of triple with (method_name, comparator, threshold). The expression is applied to each row. If any of the columns fulfill the criteria the row remains.
Parameters: expressions (list((str,comparator,float))) – A list of triples consisting of (method_name, comparator, threshold) Returns: A new filtered result object Return type: TAPPredictionResult
-
first
(offset)¶ Convenience method for subsetting initial periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘10D’) -> First 10 days
subset : type of caller
-
first_valid_index
()¶ Return label for first non-NA/null value
-
floordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator floordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
from_csv
(path, header=0, sep=', ', index_col=0, parse_dates=True, encoding=None, tupleize_cols=False, infer_datetime_format=False)¶ Read delimited file into DataFrame
path : string file path or file handle / StringIO header : int, default 0
Row to use at header (skip prior rows)- sep : string, default ‘,’
- Field delimiter
- index_col : int or sequence, default 0
- Column to use for index. If a sequence is given, a MultiIndex is used. Different default from read_table
- parse_dates : boolean, default True
- Parse dates. Different default from read_table
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- infer_datetime_format: boolean, default False
- If True and parse_dates is True for a column, try to infer the datetime format based on the first datetime string. If the format can be inferred, there often will be a large parsing speed-up.
Preferable to use read_table for most general purposes but from_csv makes for an easy roundtrip to and from file, especially with a DataFrame of time series data
y : DataFrame
-
from_dict
(data, orient='columns', dtype=None)¶ Construct DataFrame from dict of array-like or dicts
- data : dict
- {field : array-like} or {field : dict}
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.
DataFrame
-
from_items
(items, columns=None, orient='columns')¶ Convert (key, value) pairs to DataFrame. The keys will be the axis index (usually the columns, but depends on the specified orientation). The values should be arrays or Series.
- items : sequence of (key, value) pairs
- Values should be arrays or Series.
- columns : sequence of column labels, optional
- Must be passed if orient=’index’.
- orient : {‘columns’, ‘index’}, default ‘columns’
- The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.
frame : DataFrame
-
from_records
(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)¶ Convert structured or record ndarray to DataFrame
data : ndarray (structured dtype), list of tuples, dict, or DataFrame index : string, list of fields, array-like
Field of array to use as the index, alternately a specific set of input labels to use- exclude : sequence, default None
- Columns or fields to exclude
- columns : sequence, default None
- Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns)
- coerce_float : boolean, default False
- Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets
df : DataFrame
-
ftypes
¶ Return the ftypes (indication of sparse/dense and dtype) in this object.
-
ge
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ge
-
get
(key, default=None)¶ Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found
key : object
value : type of items contained in object
-
get_dtype_counts
()¶ Return the counts of dtypes in this object
-
get_ftype_counts
()¶ Return the counts of ftypes in this object
-
get_value
(index, col)¶ Quickly retrieve single value at passed column and index
index : row label col : column label
value : scalar value
-
get_values
()¶ same as values (but handles sparseness conversions)
-
groupby
(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)¶ Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
- by : mapping function / list of functions, dict, Series, or tuple /
- list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
axis : int, default 0 level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels- as_index : boolean, default True
- For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
- sort : boolean, default True
- Sort group keys. Get better performance by turning this off
- group_keys : boolean, default True
- When calling apply, add group keys to index to identify pieces
- squeeze : boolean, default False
- reduce the dimensionaility of the return type if possible, otherwise return a consistent type
# DataFrame result >>> data.groupby(func, axis=0).mean()
# DataFrame result >>> data.groupby([‘col1’, ‘col2’])[‘col3’].mean()
# DataFrame with hierarchical index >>> data.groupby([‘col1’, ‘col2’]).mean()
GroupBy object
-
gt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods gt
-
head
(n=5)¶ Returns first n rows
-
hist
(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, **kwds)¶ Draw histogram of the DataFrame’s series using matplotlib / pylab.
data : DataFrame column : string or sequence
If passed, will be used to limit data to a subset of columns- by : object, optional
- If passed, then used to form histograms for separate groups
- grid : boolean, default True
- Whether to show axis grid lines
- xlabelsize : int, default None
- If specified changes the x-axis label size
- xrot : float, default None
- rotation of x axis labels
- ylabelsize : int, default None
- If specified changes the y-axis label size
- yrot : float, default None
- rotation of y axis labels
ax : matplotlib axes object, default None sharex : bool, if True, the X axis will be shared amongst all subplots. sharey : bool, if True, the Y axis will be shared amongst all subplots. figsize : tuple
The size of the figure to create in inches by defaultlayout: (optional) a tuple (rows, columns) for the layout of the histograms kwds : other plotting keyword arguments
To be passed to hist function
-
iat
¶
-
icol
(i)¶
-
idxmax
(axis=0, skipna=True)¶ Return index of first occurrence of maximum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be first index.
idxmax : Series
This method is the DataFrame version of
ndarray.argmax
.Series.idxmax
-
idxmin
(axis=0, skipna=True)¶ Return index of first occurrence of minimum over requested axis. NA/null values are excluded.
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
- skipna : boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA
idxmin : Series
This method is the DataFrame version of
ndarray.argmin
.Series.idxmin
-
iget_value
(i, j)¶
-
iloc
¶
-
info
(verbose=True, buf=None, max_cols=None)¶ Concise summary of a DataFrame.
- verbose : boolean, default True
- If False, don’t print column count summary
buf : writable buffer, defaults to sys.stdout max_cols : int, default None
Determines whether full summary or short summary is printed
-
insert
(loc, column, value, allow_duplicates=False)¶ Insert column into DataFrame at specified location.
If allow_duplicates is False, raises Exception if column is already contained in the DataFrame.
- loc : int
- Must have 0 <= loc <= len(columns)
column : object value : int, Series, or array-like
-
interpolate
(method='linear', axis=0, limit=None, inplace=False, downcast='infer', **kwargs)¶ Interpolate values according to different methods.
- method : {‘linear’, ‘time’, ‘values’, ‘index’ ‘nearest’, ‘zero’,
- ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’ ‘piecewise_polynomial’, ‘pchip’}
- ‘linear’: ignore the index and treat the values as equally spaced. default
- ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval
- ‘index’: use the actual numerical values of the index
- ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d with the order given both ‘polynomial’ and ‘spline’ requre that you also specify and order (int) e.g. df.interpolate(method=’polynomial’, order=4)
- ‘krogh’, ‘piecewise_polynomial’, ‘spline’, and ‘pchip’ are all wrappers around the scipy interpolation methods of similar names. See the scipy documentation for more on their behavior: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
- axis : {0, 1}, default 0
- 0: fill column-by-column
- 1: fill row-by-row
- limit : int, default None.
- Maximum number of consecutive NaNs to fill.
- inplace : bool, default False
- Update the NDFrame in place if possible.
- downcast : optional, ‘infer’ or None, defaults to ‘infer’
- Downcast dtypes if possible.
Series or DataFrame of same shape interpolated at the NaNs
reindex, replace, fillna
# Filling in NaNs: >>> s = pd.Series([0, 1, np.nan, 3]) >>> s.interpolate() 0 0 1 1 2 2 3 3 dtype: float64
-
irow
(i, copy=False)¶
-
is_copy
= None¶
-
isin
(values)¶ Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
- values : iterable, Series, DataFrame or dictionary
- The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dictionary, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
DataFrame of booleans
When
values
is a list:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> df.isin([1, 3, 12, 'a']) A B 0 True True 1 False False 2 True False
When
values
is a dict:>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]}) >>> df.isin({'A': [1, 3], 'B': [4, 7, 12]}) A B 0 True False # Note that B didn't match the 1 here. 1 False True 2 True True
When
values
is a Series or DataFrame:>>> df = DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) >>> other = DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']}) >>> df.isin(other) A B 0 True False 1 False False # Column A in `other` has a 3, but not at index 1. 2 True True
-
isnull
()¶ Return a boolean same-sized object indicating if the values are null
-
iteritems
()¶ Iterator over (column, series) pairs
-
iterkv
(*args, **kwargs)¶ iteritems alias used to get around 2to3. Deprecated
-
iterrows
()¶ Iterate over rows of DataFrame as (index, Series) pairs.
iterrows
does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,>>> df = DataFrame([[1, 1.0]], columns=['x', 'y']) >>> row = next(df.iterrows())[1] >>> print(row['x'].dtype) float64 >>> print(df['x'].dtype) int64
- it : generator
- A generator that iterates over the rows of the frame.
-
itertuples
(index=True)¶ Iterate over rows of DataFrame as tuples, with index value as first element of the tuple
-
ix
¶
-
join
(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)¶ Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
- other : DataFrame, Series with name field set, or list of DataFrame
- Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
- on : column name, tuple/list of column names, or array-like
- Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
- how : {‘left’, ‘right’, ‘outer’, ‘inner’}
How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise
- left: use calling frame’s index
- right: use input frame’s index
- outer: form union of indexes
- inner: use intersection of indexes
- lsuffix : string
- Suffix to use from left frame’s overlapping columns
- rsuffix : string
- Suffix to use from right frame’s overlapping columns
- sort : boolean, default False
- Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame
on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects
joined : DataFrame
-
keys
()¶ Get the ‘info axis’ (see Indexing for more)
This is index for Series, columns for DataFrame and major_axis for Panel.
-
kurt
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
kurtosis
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased kurtosis over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
kurt : Series or DataFrame (if level specified)
-
last
(offset)¶ Convenience method for subsetting final periods of time series data based on a date offset
offset : string, DateOffset, dateutil.relativedelta
ts.last(‘5M’) -> Last 5 months
subset : type of caller
-
last_valid_index
()¶ Return label for last non-NA/null value
-
le
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods le
-
load
(path)¶ Deprecated. Use read_pickle instead.
-
loc
¶
-
lookup
(row_labels, col_labels)¶ Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.
- row_labels : sequence
- The row labels to use for lookup
- col_labels : sequence
- The column labels to use for lookup
Akin to:
result = [] for row, col in zip(row_labels, col_labels): result.append(df.get_value(row, col))
- values : ndarray
- The found values
-
lt
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods lt
-
mad
(axis=None, skipna=None, level=None, **kwargs)¶ Return the mean absolute deviation of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mad : Series or DataFrame (if level specified)
-
mask
(cond)¶ Returns copy whose values are replaced with nan if the inverted condition is True
cond : boolean NDFrame or array
wh: same as input
-
max
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the maximum of the values in the object. If you want the index of the maximum, use
idxmax
. This is the equivalent of thenumpy.ndarray
methodargmax
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
max : Series or DataFrame (if level specified)
-
mean
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
mean : Series or DataFrame (if level specified)
-
median
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the median of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
median : Series or DataFrame (if level specified)
-
merge
(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True)¶ Merge DataFrame objects by performing a database-style join operation by columns or indexes.
If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
right : DataFrame how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
- left: use only keys from left frame (SQL: left outer join)
- right: use only keys from right frame (SQL: right outer join)
- outer: use union of keys from both frames (SQL: full outer join)
- inner: use intersection of keys from both frames (SQL: inner join)
- on : label or list
- Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
- left_on : label or list, or array-like
- Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
- right_on : label or list, or array-like
- Field names to join on in right DataFrame or vector/list of vectors per left_on docs
- left_index : boolean, default False
- Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels
- right_index : boolean, default False
- Use the index from the right DataFrame as the join key. Same caveats as left_index
- sort : boolean, default False
- Sort the join keys lexicographically in the result DataFrame
- suffixes : 2-length sequence (tuple, list, ...)
- Suffix to apply to overlapping column names in the left and right side, respectively
- copy : boolean, default True
- If False, do not copy data unnecessarily
>>> A >>> B lkey value rkey value 0 foo 1 0 foo 5 1 bar 2 1 bar 6 2 baz 3 2 qux 7 3 foo 4 3 bar 8
>>> merge(A, B, left_on='lkey', right_on='rkey', how='outer') lkey value_x rkey value_y 0 bar 2 bar 6 1 bar 2 bar 8 2 baz 3 NaN NaN 3 foo 1 foo 5 4 foo 4 foo 5 5 NaN NaN qux 7
merged : DataFrame
-
merge_results
(others)¶ Merges results of type :class:`~Fred2.Core.Result.TAPPredictionResult and returns the merged result
Parameters: others (list( TAPPredictionResult
) orTAPPredictionResult
) – A (list of)TAPPredictionResult
object(s)Returns: A new merged TAPPredictionResult
objectReturn type: TAPPredictionResult`
-
min
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ This method returns the minimum of the values in the object. If you want the index of the minimum, use
idxmin
. This is the equivalent of thenumpy.ndarray
methodargmin
.axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
min : Series or DataFrame (if level specified)
-
mod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
mode
(axis=0, numeric_only=False)¶ Gets the mode of each element along the axis selected. Empty if nothing has 2+ occurrences. Adds a row for each mode per label, fills in gaps with nan.
- axis : {0, 1, ‘index’, ‘columns’} (default 0)
- 0/’index’ : get mode of each column
- 1/’columns’ : get mode of each row
- numeric_only : boolean, default False
- if True, only apply to numeric columns
modes : DataFrame (sorted)
-
mul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
multiply
(other, axis='columns', level=None, fill_value=None)¶ Binary operator mul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
ndim
¶ Number of axes / array dimensions
-
ne
(other, axis='columns', level=None)¶ Wrapper for flexible comparison methods ne
-
notnull
()¶ Return a boolean same-sized object indicating if the values are not null
-
pct_change
(periods=1, fill_method='pad', limit=None, freq=None, **kwds)¶ Percent change over given number of periods
- periods : int, default 1
- Periods to shift for forming percent change
- fill_method : str, default ‘pad’
- How to handle NAs before computing percent changes
- limit : int, default None
- The number of consecutive NAs to fill before stopping
- freq : DateOffset, timedelta, or offset alias string, optional
- Increment to use from time series API (e.g. ‘M’ or BDay())
chg : same type as caller
-
pivot
(index=None, columns=None, values=None)¶ Reshape data (produce a “pivot” table) based on column values. Uses unique values from index / columns to form axes and return either DataFrame or Panel, depending on whether you request a single value column (DataFrame) or all columns (Panel)
- index : string or object
- Column name to use to make new frame’s index
- columns : string or object
- Column name to use to make new frame’s columns
- values : string or object, optional
- Column name to use for populating new frame’s values
For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods
>>> df foo bar baz 0 one A 1. 1 one B 2. 2 one C 3. 3 two A 4. 4 two B 5. 5 two C 6.
>>> df.pivot('foo', 'bar', 'baz') A B C one 1 2 3 two 4 5 6
>>> df.pivot('foo', 'bar')['baz'] A B C one 1 2 3 two 4 5 6
- pivoted : DataFrame
- If no values column specified, will have hierarchically indexed columns
-
pivot_table
(data, values=None, rows=None, cols=None, aggfunc='mean', fill_value=None, margins=False, dropna=True)¶ Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
data : DataFrame values : column to aggregate, optional rows : list of column names or arrays to group on
Keys to group on the x-axis of the pivot table- cols : list of column names or arrays to group on
- Keys to group on the y-axis of the pivot table
- aggfunc : function, default numpy.mean, or list of functions
- If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
- fill_value : scalar, default None
- Value to replace missing values with
- margins : boolean, default False
- Add all row / columns (e.g. for subtotal / grand totals)
- dropna : boolean, default True
- Do not include columns whose entries are all NaN
>>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7
>>> table = pivot_table(df, values='D', rows=['A', 'B'], ... cols=['C'], aggfunc=np.sum) >>> table small large foo one 1 4 two 6 NaN bar one 5 4 two 6 7
table : DataFrame
-
plot
(frame=None, x=None, y=None, subplots=False, sharex=True, sharey=False, use_index=True, figsize=None, grid=None, legend=True, rot=None, ax=None, style=None, title=None, xlim=None, ylim=None, logx=False, logy=False, xticks=None, yticks=None, kind='line', sort_columns=False, fontsize=None, secondary_y=False, **kwds)¶ Make line, bar, or scatter plots of DataFrame series with the index on the x-axis using matplotlib / pylab.
frame : DataFrame x : label or position, default None y : label or position, default None
Allows plotting of one column versus another- subplots : boolean, default False
- Make separate subplots for each time series
- sharex : boolean, default True
- In case subplots=True, share x axis
- sharey : boolean, default False
- In case subplots=True, share y axis
- use_index : boolean, default True
- Use index as ticks for x axis
- stacked : boolean, default False
- If True, create stacked bar plot. Only valid for DataFrame input
- sort_columns: boolean, default False
- Sort column names to determine plot ordering
- title : string
- Title to use for the plot
- grid : boolean, default None (matlab style default)
- Axis grid lines
- legend : boolean, default True
- Place legend on axis subplots
ax : matplotlib axis object, default None style : list or dict
matplotlib line style per column- kind : {‘line’, ‘bar’, ‘barh’, ‘kde’, ‘density’, ‘scatter’}
- bar : vertical bar plot barh : horizontal bar plot kde/density : Kernel Density Estimation plot scatter: scatter plot
- logx : boolean, default False
- For line plots, use log scaling on x axis
- logy : boolean, default False
- For line plots, use log scaling on y axis
- xticks : sequence
- Values to use for the xticks
- yticks : sequence
- Values to use for the yticks
xlim : 2-tuple/list ylim : 2-tuple/list rot : int, default None
Rotation for ticks- secondary_y : boolean or sequence, default False
- Whether to plot on the secondary y-axis If a list/tuple, which columns to plot on secondary y-axis
- mark_right: boolean, default True
- When using a secondary_y axis, should the legend label the axis of the various columns automatically
- colormap : str or matplotlib colormap object, default None
- Colormap to select colors from. If string, load colormap with that name from matplotlib.
- kwds : keywords
- Options to pass to matplotlib plotting method
ax_or_axes : matplotlib.AxesSubplot or list of them
-
pop
(item)¶ Return item and drop from frame. Raise KeyError if not found.
-
pow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator pow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
prod
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
product
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the product of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
prod : Series or DataFrame (if level specified)
-
quantile
(q=0.5, axis=0, numeric_only=True)¶ Return values at the given quantile over requested axis, a la scoreatpercentile in scipy.stats
- q : quantile, default 0.5 (50% quantile)
- 0 <= q <= 1
- axis : {0, 1}
- 0 for row-wise, 1 for column-wise
quantiles : Series
-
query
(expr, **kwargs)¶ Query the columns of a frame with a boolean expression.
- expr : string
- The query string to evaluate. The result of the evaluation of this
expression is first passed to
loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to__getitem__()
. - kwargs : dict
- See the documentation for
eval()
for complete details on the keyword arguments accepted byquery()
.
q : DataFrame or Series
This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
index
andcolumns
attributes of theDataFrame
instance is placed in the namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for this variable, and you can also use the name of the index to identify it in a query.For further details and examples see the
query
documentation in indexing.pandas.eval DataFrame.eval
>>> from numpy.random import randn >>> from pandas import DataFrame >>> df = DataFrame(randn(10, 2), columns=list('ab')) >>> df.query('a > b') >>> df[df.a > df.b] # same result as the previous expression
-
radd
(other, axis='columns', level=None, fill_value=None)¶ Binary operator radd with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rank
(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True)¶ Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values
- axis : {0, 1}, default 0
- Ranks over columns (0) or rows (1)
- numeric_only : boolean, default None
- Include only float, int, boolean data
- method : {‘average’, ‘min’, ‘max’, ‘first’}
- average: average rank of group
- min: lowest rank in group
- max: highest rank in group
- first: ranks assigned in order they appear in the array
- na_option : {‘keep’, ‘top’, ‘bottom’}
- keep: leave NA values where they are
- top: smallest rank if ascending
- bottom: smallest rank if descending
- ascending : boolean, default True
- False for ranks by high (1) to low (N)
ranks : DataFrame
-
rdiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
reindex
(index=None, columns=None, **kwargs)¶ Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index, columns : array-like, optional (can be specified in order, or as
- keywords) New labels / index to conform to. Preferably an Index object to avoid duplicating data
- method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
- Method to use for filling holes in reindexed DataFrame pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- fill_value : scalar, default np.NaN
- Value to use for missing values. Defaults to NaN, but can be any “compatible” value
- limit : int, default None
- Maximum size gap to forward or backward fill
- takeable : boolean, default False
- treat the passed as positional values
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])
reindexed : DataFrame
-
reindex_axis
(labels, axis=0, method=None, level=None, copy=True, limit=None, fill_value=nan)¶ Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False
- index : array-like, optional
- New labels / index to conform to. Preferably an Index object to avoid duplicating data
axis : {0,1,’index’,’columns’} method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed object. pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap- copy : boolean, default True
- Return a new object, even if the passed indexes are the same
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
- limit : int, default None
- Maximum size gap to forward or backward fill
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)
reindex, reindex_like
reindexed : DataFrame
-
reindex_like
(other, method=None, copy=True, limit=None)¶ return an object with matching indicies to myself
other : Object method : string or None copy : boolean, default True limit : int, default None
Maximum size gap to forward or backward fill- Like calling s.reindex(index=other.index, columns=other.columns,
- method=...)
reindexed : same as input
-
rename
(index=None, columns=None, **kwargs)¶ Alter axes input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
- index, columns : dict-like or function, optional
- Transformation to apply to that axis values
- copy : boolean, default True
- Also copy underlying data
- inplace : boolean, default False
- Whether to return a new DataFrame. If True then value of copy is ignored.
renamed : DataFrame (new object)
-
rename_axis
(mapper, axis=0, copy=True, inplace=False)¶ Alter index and / or columns using input function or functions. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is.
mapper : dict-like or function, optional axis : int or string, default 0 copy : boolean, default True
Also copy underlying datainplace : boolean, default False
renamed : type of caller
-
reorder_levels
(order, axis=0)¶ Rearrange index levels using input order. May not drop or duplicate levels
- order : list of int or list of str
- List representing new level order. Reference level by number (position) or by key (label).
- axis : int
- Where to reorder levels.
type of caller (new object)
-
replace
(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)¶ Replace values given in ‘to_replace’ with ‘value’.
to_replace : str, regex, list, dict, Series, numeric, or None
str or regex:
- str: string exactly matching to_replace will be replaced with value
- regex: regexs matching to_replace will be replaced with value
list of str, regex, or numeric:
- First, if to_replace and value are both lists, they must be the same length.
- Second, if
regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use. - str and regex rules apply as above.
dict:
- Nested dictionaries, e.g., {‘a’: {‘b’: nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with nan. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
- Keys map to column names and values map to substitution values. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
None:
- This means that the
regex
argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is alsoNone
then this must be a nested dictionary orSeries
.
- This means that the
See the examples section for examples of each of these.
- value : scalar, dict, list, str, regex, default None
- Value to use to fill holes (e.g. 0), alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
- inplace : boolean, default False
- If True, in place. Note: this will modify any other views on this object (e.g. a column form a DataFrame). Returns the caller if this is True.
- limit : int, default None
- Maximum size gap to forward or backward fill
- regex : bool or same types as to_replace, default False
- Whether to interpret to_replace and/or value as regular
expressions. If this is
True
then to_replace must be a string. Otherwise, to_replace must beNone
because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions. - method : string, optional, {‘pad’, ‘ffill’, ‘bfill’}
- The method to use when for replacement, when
to_replace
is alist
.
NDFrame.reindex NDFrame.asfreq NDFrame.fillna
filled : NDFrame
- AssertionError
- If regex is not a
bool
and to_replace is notNone
.
- If regex is not a
- TypeError
- If to_replace is a
dict
and value is not alist
,dict
,ndarray
, orSeries
- If to_replace is
None
and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
- If to_replace is a
- ValueError
- If to_replace and value are
list
s orndarray
s, but they are not the same length.
- If to_replace and value are
- Regex substitution is performed under the hood with
re.sub
. The rules for substitution forre.sub
are the same. - Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
- This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
-
resample
(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0)¶ Convenience method for frequency conversion and resampling of regular time-series data.
- rule : string
- the offset string or object representing target conversion
- how : string
- method for down- or re-sampling, default to ‘mean’ for downsampling
axis : int, optional, default 0 fill_method : string, default None
fill_method for upsampling- closed : {‘right’, ‘left’}
- Which side of bin interval is closed
- label : {‘right’, ‘left’}
- Which bin edge label to label bucket with
convention : {‘start’, ‘end’, ‘s’, ‘e’} kind : “period”/”timestamp” loffset : timedelta
Adjust the resampled time labels- limit : int, default None
- Maximum size gap to when reindexing with fill_method
- base : int, default 0
- For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
-
reset_index
(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶ For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. if any are None. For a standard index, the index name will be used (if set), otherwise a default ‘index’ or ‘level_0’ (if ‘index’ is already taken) will be used.
- level : int, str, tuple, or list, default None
- Only remove the given levels from the index. Removes all levels by default
- drop : boolean, default False
- Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- col_level : int or str, default 0
- If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
- col_fill : object, default ‘’
- If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
resetted : DataFrame
-
rfloordiv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rfloordiv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmod
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmod with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rmul
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rmul with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rpow
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rpow with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rsub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rsub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
rtruediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator rtruediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
save
(path)¶ Deprecated. Use to_pickle instead
-
select
(crit, axis=0)¶ Return data corresponding to axis labels matching criteria
- crit : function
- To be called on each index (label). Should return True or False
axis : int
selection : type of caller
-
set_index
(keys, drop=True, append=False, inplace=False, verify_integrity=False)¶ Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.
keys : column label or list of column labels / arrays drop : boolean, default True
Delete columns to be used as the new index- append : boolean, default False
- Whether to append columns to existing index
- inplace : boolean, default False
- Modify the DataFrame in place (do not create a new object)
- verify_integrity : boolean, default False
- Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method
>>> indexed_df = df.set_index(['A', 'B']) >>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]]) >>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])
dataframe : DataFrame
-
set_value
(index, col, value)¶ Put single value at passed column and index
index : row label col : column label value : scalar value
- frame : DataFrame
- If label pair is contained, will be reference to calling DataFrame, otherwise a new object
-
shape
¶
-
shift
(periods=1, freq=None, axis=0, **kwds)¶ Shift index by desired number of periods with an optional time freq
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, optional
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
If freq is specified then the index values are shifted but the data if not realigned
shifted : same type as caller
-
skew
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return unbiased skew over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
skew : Series or DataFrame (if level specified)
-
sort
(columns=None, column=None, axis=0, ascending=True, inplace=False)¶ Sort DataFrame either by labels (along either axis) or by the values in column(s)
- columns : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- axis : {0, 1}
- Sort index/rows versus columns
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort(['A', 'B'], ascending=[1, 0])
sorted : DataFrame
-
sort_index
(axis=0, by=None, ascending=True, inplace=False, kind='quicksort')¶ Sort DataFrame either by labels (along either axis) or by the values in a column
- axis : {0, 1}
- Sort index/rows versus columns
- by : object
- Column name(s) in frame. Accepts a column name or a list or tuple for a nested sort.
- ascending : boolean or list, default True
- Sort ascending vs. descending. Specify list for multiple sort orders
- inplace : boolean, default False
- Sort the DataFrame without creating a new instance
>>> result = df.sort_index(by=['A', 'B'], ascending=[True, False])
sorted : DataFrame
-
sortlevel
(level=0, axis=0, ascending=True, inplace=False)¶ Sort multilevel index by chosen axis and primary level. Data will be lexicographically sorted by the chosen level followed by the other levels (in order)
level : int axis : {0, 1} ascending : boolean, default True inplace : boolean, default False
Sort the DataFrame without creating a new instancesorted : DataFrame
-
squeeze
()¶ squeeze length 1 dimensions
-
stack
(level=-1, dropna=True)¶ Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.
- level : int, string, or list of these, default last level
- Level(s) to stack, can pass level name
- dropna : boolean, default True
- Whether to drop rows in the resulting Frame/Series with no valid values
>>> s a b one 1. 2. two 3. 4.
>>> s.stack() one a 1 b 2 two a 3 b 4
stacked : DataFrame or Series
-
std
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased standard deviation over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
stdev : Series or DataFrame (if level specified)
-
sub
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
subtract
(other, axis='columns', level=None, fill_value=None)¶ Binary operator sub with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
sum
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the sum of the values for the requested axis
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
sum : Series or DataFrame (if level specified)
-
swapaxes
(axis1, axis2, copy=True)¶ Interchange axes and swap values axes appropriately
y : same as input
-
swaplevel
(i, j, axis=0)¶ Swap levels i and j in a MultiIndex on a particular axis
- i, j : int, string (can be mixed)
- Level of index to be swapped. Can pass level name as string.
swapped : type of caller (new object)
-
tail
(n=5)¶ Returns last n rows
-
take
(indices, axis=0, convert=True, is_copy=True)¶ Analogous to ndarray.take
indices : list / array of ints axis : int, default 0 convert : translate neg to pos indices (default) is_copy : mark the returned frame as a copy
taken : type of caller
-
to_clipboard
(excel=None, sep=None, **kwargs)¶ Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.
- excel : boolean, defaults to True
- if True, use the provided separator, writing in a csv format for allowing easy pasting into excel. if False, write a string representation of the object to the clipboard
sep : optional, defaults to tab other keywords are passed to to_csv
- Requirements for your platform
- Linux: xclip, or xsel (with gtk or PyQt4 modules)
- Windows: none
- OS X: none
-
to_csv
(path_or_buf, sep=', ', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, quoting=None, line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, **kwds)¶ Write DataFrame to a comma-separated values (csv) file
- path_or_buf : string or file handle / StringIO
- File path
- sep : character, default ”,”
- Field delimiter for the output file.
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, or False, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R
- nanRep : None
- deprecated, use na_rep
- mode : str
- Python write mode, default ‘w’
- encoding : string, optional
- a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3
- line_terminator : string, default ‘\\n’
- The newline character or character sequence to use in the output file
- quoting : optional constant from csv module
- defaults to csv.QUOTE_MINIMAL
- chunksize : int or None
- rows to write at a time
- tupleize_cols : boolean, default False
- write multi_index columns as a list of tuples (if True) or new (expanded format) if False)
- date_format : string, default None
- Format string for datetime objects.
-
to_dense
()¶ Return dense representation of NDFrame (as opposed to sparse)
-
to_dict
(outtype='dict')¶ Convert DataFrame to dictionary.
- outtype : str {‘dict’, ‘list’, ‘series’, ‘records’}
- Determines the type of the values of the dictionary. The default dict is a nested dictionary {column -> {index -> value}}. list returns {column -> list(values)}. series returns {column -> Series(values)}. records returns [{columns -> value}]. Abbreviations are allowed.
result : dict like {column -> {index -> value}}
-
to_excel
(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True)¶ Write DataFrame to a excel sheet
- excel_writer : string or ExcelWriter object
- File path or existing ExcelWriter
- sheet_name : string, default ‘Sheet1’
- Name of sheet which will contain DataFrame
- na_rep : string, default ‘’
- Missing data representation
- float_format : string, default None
- Format string for floating point numbers
- cols : sequence, optional
- Columns to write
- header : boolean or list of string, default True
- Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True
- Write row names (index)
- index_label : string or sequence, default None
- Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startow :
- upper left cell row to dump data frame
- startcol :
- upper left cell column to dump data frame
- engine : string, default None
- write engine to use - you can also set this via the options
io.excel.xlsx.writer
,io.excel.xls.writer
, andio.excel.xlsm.writer
. - merge_cells : boolean, default True
- Write MultiIndex and Hierarchical Rows as merged cells.
If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:
>>> writer = ExcelWriter('output.xlsx') >>> df1.to_excel(writer,'Sheet1') >>> df2.to_excel(writer,'Sheet2') >>> writer.save()
-
to_gbq
(destination_table, schema=None, col_order=None, if_exists='fail', **kwargs)¶ Write a DataFrame to a Google BigQuery table.
If the table exists, the DataFrame will be appended. If not, a new table will be created, in which case the schema will have to be specified. By default, rows will be written in the order they appear in the DataFrame, though the user may specify an alternative order.
- destination_table : string
- name of table to be written, in the form ‘dataset.tablename’
- schema : sequence (optional)
- list of column types in order for data to be inserted, e.g. [‘INTEGER’, ‘TIMESTAMP’, ‘BOOLEAN’]
- col_order : sequence (optional)
- order which columns are to be inserted, e.g. [‘primary_key’, ‘birthday’, ‘username’]
- if_exists : {‘fail’, ‘replace’, ‘append’} (optional)
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
kwargs are passed to the Client constructor
- SchemaMissing :
- Raised if the ‘if_exists’ parameter is set to ‘replace’, but no schema is specified
- TableExists :
- Raised if the specified ‘destination_table’ exists but the ‘if_exists’ parameter is set to ‘fail’ (the default)
- InvalidSchema :
- Raised if the ‘schema’ parameter does not match the provided DataFrame
-
to_hdf
(path_or_buf, key, **kwargs)¶ activate the HDFStore
path_or_buf : the path (string) or buffer to put the store key : string
indentifier for the group in the storemode : optional, {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’
'r'
- Read-only; no data can be modified.
'w'
- Write; a new file is created (an existing file with the same name would be deleted).
'a'
- Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
- It is similar to
'a'
, but the file must already exist.
- format : ‘fixed(f)|table(t)’, default is ‘fixed’
- fixed(f) : Fixed format
- Fast writing/reading. Not-appendable, nor searchable
- table(t) : Table format
- Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
- append : boolean, default False
- For Table formats, append the input data to the existing
- complevel : int, 1-9, default 0
- If a complib is specified compression will be applied where possible
- complib : {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’, None}, default None
- If complevel is > 0 apply compression to objects written in the store wherever possible
- fletcher32 : bool, default False
- If applying compression use the fletcher32 checksum
-
to_html
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, force_unicode=None, bold_rows=True, classes=None, escape=True, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame as an HTML table.
to_html-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- classes : str or list or tuple, default None
- CSS class(es) to apply to the resulting html table
- escape : boolean, default True
- Convert the characters <, >, and & to HTML-safe sequences.=
- max_rows : int, optional
- Maximum number of rows to show before truncating. If None, show all.
- max_cols : int, optional
- Maximum number of columns to show before truncating. If None, show all.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_json
(path_or_buf=None, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None)¶ Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.
- path_or_buf : the path or buffer to write the result string
- if this is None, return a StringIO of the converted string
orient : string
- Series
- default is ‘index’
- allowed values are: {‘split’,’records’,’index’}
- DataFrame
- default is ‘columns’
- allowed values are: {‘split’,’records’,’index’,’columns’,’values’}
- The format of the JSON string
- split : dict like {index -> [index], columns -> [columns], data -> [values]}
- records : list like [{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
- columns : dict like {column -> {index -> value}}
- values : just the values array
- date_format : {‘epoch’, ‘iso’}
- Type of date conversion. epoch = epoch milliseconds, iso` = ISO8601, default is epoch.
- double_precision : The number of decimal places to use when encoding
- floating point values, default 10.
force_ascii : force encoded string to be ASCII, default True. date_unit : string, default ‘ms’ (milliseconds)
The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.- default_handler : callable, default None
- Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
same type as input object with filtered info axis
-
to_latex
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=True, force_unicode=None)¶ Render a DataFrame to a tabular environment table. You can splice this into a LaTeX document.
to_latex-specific options:
- bold_rows : boolean, default True
- Make the row labels bold in the output
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_msgpack
(path_or_buf=None, **kwargs)¶ msgpack (serialize) object to input file path
THIS IS AN EXPERIMENTAL LIBRARY and the storage format may not be stable until a future release.
- path : string File path, buffer-like, or None
- if None, return generated string
- append : boolean whether to append to an existing msgpack
- (default is False)
- compress : type of compressor (zlib or blosc), default to None (no
- compression)
-
to_panel
()¶ Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.
Currently the index of the DataFrame must be a 2-level MultiIndex. This may be generalized later
panel : Panel
-
to_period
(freq=None, axis=0, copy=True)¶ Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed)
freq : string, default axis : {0, 1}, default 0
The axis to convert (the index by default)- copy : boolean, default True
- If False then underlying input data is not copied
ts : TimeSeries with PeriodIndex
-
to_pickle
(path)¶ Pickle (serialize) object to input file path
- path : string
- File path
-
to_records
(index=True, convert_datetime64=True)¶ Convert DataFrame to record array. Index will be put in the ‘index’ field of the record array if requested
- index : boolean, default True
- Include index in resulting record array, stored in ‘index’ field
- convert_datetime64 : boolean, default True
- Whether to convert the index to datetime.datetime if it is a DatetimeIndex
y : recarray
-
to_sparse
(fill_value=None, kind='block')¶ Convert to SparseDataFrame
fill_value : float, default NaN kind : {‘block’, ‘integer’}
y : SparseDataFrame
-
to_sql
(name, con, flavor='sqlite', if_exists='fail', **kwargs)¶ Write records stored in a DataFrame to a SQL database.
- name : str
- Name of SQL table
conn : an open SQL database connection object flavor: {‘sqlite’, ‘mysql’, ‘oracle’}, default ‘sqlite’ if_exists: {‘fail’, ‘replace’, ‘append’}, default ‘fail’
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
-
to_stata
(fname, convert_dates=None, write_index=True, encoding='latin-1', byteorder=None)¶ A class for writing Stata binary dta files from array-like objects
- fname : file path or buffer
- Where to save the dta file.
- convert_dates : dict
- Dictionary mapping column of datetime types to the stata internal format that you want to use for the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either a number or a name.
- encoding : str
- Default is latin-1. Note that Stata does not support unicode.
- byteorder : str
- Can be “>”, “<”, “little”, or “big”. The default is None which uses sys.byteorder
>>> writer = StataWriter('./data_file.dta', data) >>> writer.write_file()
Or with dates
>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'}) >>> writer.write_file()
-
to_string
(buf=None, columns=None, col_space=None, colSpace=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, nanRep=None, index_names=True, justify=None, force_unicode=None, line_width=None, max_rows=None, max_cols=None, show_dimensions=False)¶ Render a DataFrame to a console-friendly tabular output.
- frame : DataFrame
- object to render
- buf : StringIO-like, optional
- buffer to write to
- columns : sequence, optional
- the subset of columns to write; default None writes all columns
- col_space : int, optional
- the minimum width of each column
- header : bool, optional
- whether to print column labels, default True
- index : bool, optional
- whether to print index (row) labels, default True
- na_rep : string, optional
- string representation of NAN to use, default ‘NaN’
- formatters : list or dict of one-parameter functions, optional
- formatter functions to apply to columns’ elements by position or name, default None, if the result is a string , it must be a unicode string. List must be of length equal to the number of columns.
- float_format : one-parameter function, optional
- formatter function to apply to columns’ elements if they are floats default None
- sparsify : bool, optional
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row, default True
- justify : {‘left’, ‘right’}, default None
- Left or right-justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box.
- index_names : bool, optional
- Prints the names of the indexes, default True
- force_unicode : bool, default False
- Always return a unicode result. Deprecated in v0.10.0 as string formatting is now rendered to unicode by default.
formatted : string (or unicode, depending on data and options)
-
to_timestamp
(freq=None, how='start', axis=0, copy=True)¶ Cast to DatetimeIndex of timestamps, at beginning of period
- freq : string, default frequency of PeriodIndex
- Desired frequency
- how : {‘s’, ‘e’, ‘start’, ‘end’}
- Convention for converting period to timestamp; start of period vs. end
- axis : {0, 1} default 0
- The axis to convert (the index by default)
- copy : boolean, default True
- If false then underlying input data is not copied
df : DataFrame with DatetimeIndex
-
to_wide
(*args, **kwargs)¶
-
transpose
()¶ Transpose index and columns
-
truediv
(other, axis='columns', level=None, fill_value=None)¶ Binary operator truediv with support to substitute a fill_value for missing data in one of the inputs
other : Series, DataFrame, or constant axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on- fill_value : None or float value, default None
- Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing
- level : int or name
- Broadcast across a level, matching Index values on the passed MultiIndex level
Mismatched indices will be unioned together
result : DataFrame
-
truncate
(before=None, after=None, axis=None, copy=True)¶ Truncates a sorted NDFrame before and/or after some particular dates.
- before : date
- Truncate before date
- after : date
- Truncate after date
axis : the truncation axis, defaults to the stat axis copy : boolean, default is True,
return a copy of the truncated sectiontruncated : type of caller
-
tshift
(periods=1, freq=None, axis=0, **kwds)¶ Shift the time index, using the index’s frequency if available
- periods : int
- Number of periods to move, can be positive or negative
- freq : DateOffset, timedelta, or time rule string, default None
- Increment to use from datetools module or time rule (e.g. ‘EOM’)
- axis : int or basestring
- Corresponds to the axis that contains the Index
If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown
shifted : NDFrame
-
tz_convert
(tz, axis=0, copy=True)¶ Convert TimeSeries to target time zone. If it is time zone naive, it will be localized to the passed time zone.
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data
-
tz_localize
(tz, axis=0, copy=True, infer_dst=False)¶ Localize tz-naive TimeSeries to target time zone
tz : string or pytz.timezone object copy : boolean, default True
Also make a copy of the underlying data- infer_dst : boolean, default False
- Attempt to infer fall dst-transition times based on order
-
unstack
(level=-1)¶ Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex)
- level : int, string, or list of these, default -1 (last level)
- Level(s) of index to unstack, can pass level name
DataFrame.pivot : Pivot a table based on column values. DataFrame.stack : Pivot a level of the column labels (inverse operation
from unstack).>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ... ('two', 'a'), ('two', 'b')]) >>> s = pd.Series(np.arange(1.0, 5.0), index=index) >>> s one a 1 b 2 two a 3 b 4 dtype: float64
>>> s.unstack(level=-1) a b one 1 2 two 3 4
>>> s.unstack(level=0) one two a 1 3 b 2 4
>>> df = s.unstack(level=0) >>> df.unstack() one a 1. b 3. two a 2. b 4.
unstacked : DataFrame or Series
-
update
(other, join='left', overwrite=True, filter_func=None, raise_conflict=False)¶ Modify DataFrame in place using non-NA values from passed DataFrame. Aligns on indices
other : DataFrame, or object coercible into a DataFrame join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’ overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame- filter_func : callable(1d-array) -> 1d-array<boolean>, default None
- Can choose to replace values other than NA. Return True for values that should be updated
- raise_conflict : boolean
- If True, will raise an error if the DataFrame and other both contain data in the same place.
-
values
¶ Numpy representation of NDFrame
-
var
(axis=None, skipna=None, level=None, ddof=1, **kwargs)¶ Return unbiased variance over requested axis Normalized by N-1
axis : {index (0), columns (1)} skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA- level : int, default None
- If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : boolean, default None
- Include only float, int, boolean data. If None, will attempt to use everything, then use only numeric data
variance : Series or DataFrame (if level specified)
-
where
(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)¶ Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
cond : boolean NDFrame or array other : scalar or NDFrame inplace : boolean, default False
Whether to perform the operation in place on the dataaxis : alignment axis if needed, default None level : alignment level if needed, default None try_cast : boolean, default False
try to cast the result back to the input type (if possible),- raise_on_error : boolean, default True
- Whether to raise on invalid data types (e.g. trying to where on strings)
wh : same type as caller
-
xs
(key, axis=0, level=None, copy=True, drop_level=True)¶ Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Defaults to cross-section on the rows (axis=0).
- key : object
- Some label contained in the index, or partially in a MultiIndex
- axis : int, default 0
- Axis to retrieve cross-section on
- level : object, defaults to first n levels (n=1 or len(key))
- In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.
- copy : boolean, default True
- Whether to make a copy of the data
- drop_level : boolean, default True
- If False, returns object with same levels as self.
>>> df A B C a 4 5 2 b 4 0 9 c 9 7 3 >>> df.xs('a') A 4 B 5 C 2 Name: a >>> df.xs('C', axis=1) a 2 b 9 c 3 Name: C >>> s = df.xs('a', copy=False) >>> s['A'] = 100 >>> df A B C a 100 5 2 b 4 0 9 c 9 7 3
>>> df A B C D first second third bar one 1 4 1 8 9 two 1 7 5 5 0 baz one 1 6 6 8 0 three 2 5 3 5 3 >>> df.xs(('baz', 'three')) A B C D third 2 5 3 5 3 >>> df.xs('one', level=1) A B C D first third bar 1 4 1 8 9 baz 1 6 6 8 0 >>> df.xs(('baz', 2), level=[0, 'third']) A B C D second three 5 3 5 3
xs : Series or DataFrame
-
Core.Transcript¶
-
class
Fred2.Core.Transcript.
Transcript
(seq, gene_id='unknown', transcript_id=None, vars=None)¶ Bases:
Fred2.Core.Base.MetadataLogger
,Bio.Seq.Seq
A Transcript is the mRNA sequence containing no or several
Fred2.Core.Variant.Variant
.Note
For accessing and manipulating the sequence see also
Bio.Seq.Seq
(from Biopython)Parameters: - gene_id (str) – Genome ID
- transcript_id (str) –
Transcript
RefSeqID - vars (dict(int,:class:Fred2.Core.Variant.Variant)) – Dict of
Fred2.Core.Variant.Variant
for specific positions in theTranscript
. key=position, value=Variant
-
back_transcribe
()¶ Returns the DNA sequence from an RNA sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", ... IUPAC.unambiguous_rna) >>> messenger_rna Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA()) >>> messenger_rna.back_transcribe() Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
Trying to back-transcribe a protein or DNA sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.back_transcribe() Traceback (most recent call last): ... ValueError: Proteins cannot be back transcribed!
-
complement
()¶ Returns the complement sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_dna = Seq("CCCCCGATAG", IUPAC.unambiguous_dna) >>> my_dna Seq('CCCCCGATAG', IUPACUnambiguousDNA()) >>> my_dna.complement() Seq('GGGGGCTATC', IUPACUnambiguousDNA())
You can of course used mixed case sequences,
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("CCCCCgatA-GD", generic_dna) >>> my_dna Seq('CCCCCgatA-GD', DNAAlphabet()) >>> my_dna.complement() Seq('GGGGGctaT-CH', DNAAlphabet())
Note in the above example, ambiguous character D denotes G, A or T so its complement is H (for C, T or A).
Trying to complement a protein sequence raises an exception.
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.complement() Traceback (most recent call last): ... ValueError: Proteins do not have complements!
-
count
(sub, start=0, end=9223372036854775807)¶ Non-overlapping count method, like that of a python string.
This behaves like the python string method of the same name, which does a non-overlapping count!
Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
e.g.
>>> from Bio.Seq import Seq >>> my_seq = Seq("AAAATGA") >>> print(my_seq.count("A")) 5 >>> print(my_seq.count("ATG")) 1 >>> print(my_seq.count(Seq("AT"))) 1 >>> print(my_seq.count("AT", 2, -1)) 1
HOWEVER, please note because python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:
>>> "AAAA".count("AA") 2 >>> print(Seq("AAAA").count("AA")) 2
An overlapping search would give the answer as three!
-
endswith
(suffix, start=0, end=9223372036854775807)¶ Does the Seq end with the given suffix? Returns True/False.
This behaves like the python string method of the same name.
Return True if the sequence ends with the specified suffix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. suffix can also be a tuple of strings to try. e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.endswith("UUG") True >>> my_rna.endswith("AUG") False >>> my_rna.endswith("AUG", 0, 18) True >>> my_rna.endswith(("UCC", "UCA", "UUG")) True
-
find
(sub, start=0, end=9223372036854775807)¶ Find method, like that of a python string.
This behaves like the python string method of the same name.
Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
Returns -1 if the subsequence is NOT found.
e.g. Locating the first typical start codon, AUG, in an RNA sequence:
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.find("AUG") 3
-
get_metadata
(label, only_first=False)¶ Getter for the saved metadata with the key
label
Parameters: - label (str) – key for the metadata that is inferred
- only_first (bool) – true if only the the first element of the matadata list is to be returned
-
log_metadata
(label, value)¶ Inserts a new metadata
Parameters: - label (str) – key for the metadata that will be added
- value (list(object)) – any kindy of additional value that should be kept
-
lower
()¶ Returns a lower case copy of the sequence.
This will adjust the alphabet if required. Note that the IUPAC alphabets are upper case only, and thus a generic alphabet must be substituted.
>>> from Bio.Alphabet import Gapped, generic_dna >>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> my_seq = Seq("CGGTACGCTTATGTCACGTAG*AAAAAA", Gapped(IUPAC.unambiguous_dna, "*")) >>> my_seq Seq('CGGTACGCTTATGTCACGTAG*AAAAAA', Gapped(IUPACUnambiguousDNA(), '*')) >>> my_seq.lower() Seq('cggtacgcttatgtcacgtag*aaaaaa', Gapped(DNAAlphabet(), '*'))
See also the upper method.
-
lstrip
(chars=None)¶ Returns a new Seq object with leading (left) end stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. print(my_seq.lstrip(“-”))
See also the strip and rstrip methods.
-
newid
= <method-wrapper 'next' of itertools.count object>¶
-
reverse_complement
()¶ Returns the reverse complement sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_dna = Seq("CCCCCGATAGNR", IUPAC.ambiguous_dna) >>> my_dna Seq('CCCCCGATAGNR', IUPACAmbiguousDNA()) >>> my_dna.reverse_complement() Seq('YNCTATCGGGGG', IUPACAmbiguousDNA())
Note in the above example, since R = G or A, its complement is Y (which denotes C or T).
You can of course used mixed case sequences,
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("CCCCCgatA-G", generic_dna) >>> my_dna Seq('CCCCCgatA-G', DNAAlphabet()) >>> my_dna.reverse_complement() Seq('C-TatcGGGGG', DNAAlphabet())
Trying to complement a protein sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.reverse_complement() Traceback (most recent call last): ... ValueError: Proteins do not have complements!
-
rfind
(sub, start=0, end=9223372036854775807)¶ Find from right method, like that of a python string.
This behaves like the python string method of the same name.
Returns an integer, the index of the last (right most) occurrence of substring argument sub in the (sub)sequence given by [start:end].
- Arguments:
- sub - a string or another Seq object to look for
- start - optional integer, slice start
- end - optional integer, slice end
Returns -1 if the subsequence is NOT found.
e.g. Locating the last typical start codon, AUG, in an RNA sequence:
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.rfind("AUG") 15
-
rsplit
(sep=None, maxsplit=-1)¶ Right split method, like that of a python string.
This behaves like the python string method of the same name.
Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done COUNTING FROM THE RIGHT. If maxsplit is omitted, all splits are made.
Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.
e.g. print(my_seq.rsplit(“*”,1))
See also the split method.
-
rstrip
(chars=None)¶ Returns a new Seq object with trailing (right) end stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. Removing a nucleotide sequence’s polyadenylation (poly-A tail):
>>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> my_seq = Seq("CGGTACGCTTATGTCACGTAGAAAAAA", IUPAC.unambiguous_dna) >>> my_seq Seq('CGGTACGCTTATGTCACGTAGAAAAAA', IUPACUnambiguousDNA()) >>> my_seq.rstrip("A") Seq('CGGTACGCTTATGTCACGTAG', IUPACUnambiguousDNA())
See also the strip and lstrip methods.
-
split
(sep=None, maxsplit=-1)¶ Split method, like that of a python string.
This behaves like the python string method of the same name.
Return a list of the ‘words’ in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If maxsplit is omitted, all splits are made.
Following the python string method, sep will by default be any white space (tabs, spaces, newlines) but this is unlikely to apply to biological sequences.
e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_aa = my_rna.translate() >>> my_aa Seq('VMAIVMGR*KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_aa.split("*") [Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))] >>> my_aa.split("*", 1) [Seq('VMAIVMGR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('KGAR*L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
See also the rsplit method:
>>> my_aa.rsplit("*", 1) [Seq('VMAIVMGR*KGAR', HasStopCodon(ExtendedIUPACProtein(), '*')), Seq('L', HasStopCodon(ExtendedIUPACProtein(), '*'))]
-
startswith
(prefix, start=0, end=9223372036854775807)¶ Does the Seq start with the given prefix? Returns True/False.
This behaves like the python string method of the same name.
Return True if the sequence starts with the specified prefix (a string or another Seq object), False otherwise. With optional start, test sequence beginning at that position. With optional end, stop comparing sequence at that position. prefix can also be a tuple of strings to try. e.g.
>>> from Bio.Seq import Seq >>> my_rna = Seq("GUCAUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAGUUG") >>> my_rna.startswith("GUC") True >>> my_rna.startswith("AUG") False >>> my_rna.startswith("AUG", 3) True >>> my_rna.startswith(("UCC", "UCA", "UCG"), 1) True
-
strip
(chars=None)¶ Returns a new Seq object with leading and trailing ends stripped.
This behaves like the python string method of the same name.
Optional argument chars defines which characters to remove. If omitted or None (default) then as for the python string method, this defaults to removing any white space.
e.g. print(my_seq.strip(“-”))
See also the lstrip and rstrip methods.
-
tomutable
()¶ Returns the full sequence as a MutableSeq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_seq = Seq("MKQHKAMIVALIVICITAVVAAL", ... IUPAC.protein) >>> my_seq Seq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein()) >>> my_seq.tomutable() MutableSeq('MKQHKAMIVALIVICITAVVAAL', IUPACProtein())
Note that the alphabet is preserved.
-
tostring
()¶ Returns the full sequence as a python string (DEPRECATED).
You are now encouraged to use str(my_seq) instead of my_seq.tostring().
-
transcribe
()¶ Returns the RNA sequence from a DNA sequence. New Seq object.
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", ... IUPAC.unambiguous_dna) >>> coding_dna Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA()) >>> coding_dna.transcribe() Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())
Trying to transcribe a protein or RNA sequence raises an exception:
>>> my_protein = Seq("MAIVMGR", IUPAC.protein) >>> my_protein.transcribe() Traceback (most recent call last): ... ValueError: Proteins cannot be transcribed!
-
translate
(table='Standard', stop_symbol='*', to_stop=False, cds=False)¶ Turns a nucleotide sequence into a protein sequence. New Seq object.
This method will translate DNA or RNA sequences, and those with a nucleotide or generic alphabet. Trying to translate a protein sequence raises an exception.
- Arguments:
- table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). This defaults to the “Standard” table.
- stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, “*”.
- to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
- cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.
e.g. Using the standard table:
>>> coding_dna = Seq("GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG") >>> coding_dna.translate() Seq('VAIVMGR*KGAR*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> coding_dna.translate(stop_symbol="@") Seq('VAIVMGR@KGAR@', HasStopCodon(ExtendedIUPACProtein(), '@')) >>> coding_dna.translate(to_stop=True) Seq('VAIVMGR', ExtendedIUPACProtein())
Now using NCBI table 2, where TGA is not a stop codon:
>>> coding_dna.translate(table=2) Seq('VAIVMGRWKGAR*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> coding_dna.translate(table=2, to_stop=True) Seq('VAIVMGRWKGAR', ExtendedIUPACProtein())
In fact, GTG is an alternative start codon under NCBI table 2, meaning this sequence could be a complete CDS:
>>> coding_dna.translate(table=2, cds=True) Seq('MAIVMGRWKGAR', ExtendedIUPACProtein())
It isn’t a valid CDS under NCBI table 1, due to both the start codon and also the in frame stop codons:
>>> coding_dna.translate(table=1, cds=True) Traceback (most recent call last): ... TranslationError: First codon 'GTG' is not a start codon
If the sequence has no in-frame stop codon, then the to_stop argument has no effect:
>>> coding_dna2 = Seq("TTGGCCATTGTAATGGGCCGC") >>> coding_dna2.translate() Seq('LAIVMGR', ExtendedIUPACProtein()) >>> coding_dna2.translate(to_stop=True) Seq('LAIVMGR', ExtendedIUPACProtein())
NOTE - Ambiguous codons like “TAN” or “NNN” could be an amino acid or a stop codon. These are translated as “X”. Any invalid codon (e.g. “TA?” or “T-A”) will throw a TranslationError.
NOTE - Does NOT support gapped sequences.
NOTE - This does NOT behave like the python string’s translate method. For that use str(my_seq).translate(...) instead.
-
ungap
(gap=None)¶ Return a copy of the sequence without the gap character(s).
The gap character can be specified in two ways - either as an explicit argument, or via the sequence’s alphabet. For example:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("-ATA--TGAAAT-TTGAAAA", generic_dna) >>> my_dna Seq('-ATA--TGAAAT-TTGAAAA', DNAAlphabet()) >>> my_dna.ungap("-") Seq('ATATGAAATTTGAAAA', DNAAlphabet())
If the gap character is not given as an argument, it will be taken from the sequence’s alphabet (if defined). Notice that the returned sequence’s alphabet is adjusted since it no longer requires a gapped alphabet:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC, Gapped, HasStopCodon >>> my_pro = Seq("MVVLE=AD*", HasStopCodon(Gapped(IUPAC.protein, "="))) >>> my_pro Seq('MVVLE=AD*', HasStopCodon(Gapped(IUPACProtein(), '='), '*')) >>> my_pro.ungap() Seq('MVVLEAD*', HasStopCodon(IUPACProtein(), '*'))
Or, with a simpler gapped DNA example:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC, Gapped >>> my_seq = Seq("CGGGTAG=AAAAAA", Gapped(IUPAC.unambiguous_dna, "=")) >>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap() Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())
As long as it is consistent with the alphabet, although it is redundant, you can still supply the gap character as an argument to this method:
>>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap("=") Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())
However, if the gap character given as the argument disagrees with that declared in the alphabet, an exception is raised:
>>> my_seq Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '=')) >>> my_seq.ungap("-") Traceback (most recent call last): ... ValueError: Gap '-' does not match '=' from alphabet
Finally, if a gap character is not supplied, and the alphabet does not define one, an exception is raised:
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_dna >>> my_dna = Seq("ATA--TGAAAT-TTGAAAA", generic_dna) >>> my_dna Seq('ATA--TGAAAT-TTGAAAA', DNAAlphabet()) >>> my_dna.ungap() Traceback (most recent call last): ... ValueError: Gap character not given and not defined in alphabet
-
upper
()¶ Returns an upper case copy of the sequence.
>>> from Bio.Alphabet import HasStopCodon, generic_protein >>> from Bio.Seq import Seq >>> my_seq = Seq("VHLTPeeK*", HasStopCodon(generic_protein)) >>> my_seq Seq('VHLTPeeK*', HasStopCodon(ProteinAlphabet(), '*')) >>> my_seq.lower() Seq('vhltpeek*', HasStopCodon(ProteinAlphabet(), '*')) >>> my_seq.upper() Seq('VHLTPEEK*', HasStopCodon(ProteinAlphabet(), '*'))
This will adjust the alphabet if required. See also the lower method.
Core.Variant¶
-
class
Fred2.Core.Variant.
MutationSyntax
(transID, transPos, protPos, cds, aas)¶ This class represents the mutation syntax of a variant and stores its transcript and protein position
Parameters: - transID (str) – The
Transcript
id - transPos (int) – The position of the
Variant
within theTranscript
- protPos (int) – The
Protein
position of theVariant
within theTranscript
- cds (str) – The complete cds_mutation_syntax string
- aas (str) – The complete protein_mutation_syntax string
- transID (str) – The
-
class
Fred2.Core.Variant.
Variant
(id, type, chrom, genomePos, ref, obs, coding, isHomozygous, isSynonymous, experimentalDesign=None, metadata=None)¶ Bases:
Fred2.Core.Base.MetadataLogger
A
Variant
contains information about a single genetic modification of the reference genome.-
get_annotated_protein_pos
(transID)¶ Returns the annotated protein position
Parameters: transID (str) – The Transcript
ID of interestReturns: The annotated Protein
position of the givenTranscript
IDReturn type: int Raises KeyError: If Variant
is not annotated to the givenTranscript
ID
-
get_annotated_transcript_pos
(transID)¶ Returns the annotated
Transcript
positionParameters: transID (str) – The Transcript
ID of interestReturns: The annotated Transcript
position of the givenTranscript
IDReturn type: int Raises KeyError: If variant is not annotated to the given Transcript
ID
-
get_metadata
(label, only_first=False)¶ Getter for the saved metadata with the key
label
Parameters: - label (str) – key for the metadata that is inferred
- only_first (bool) – true if only the the first element of the matadata list is to be returned
-
get_shift
()¶ Returns the frameshift offset caused by the mutation in {0,1,2}
Returns: The frameshift caused by mutation Return type: int
-
get_transcript_offset
()¶ Returns the sequence offset caused by the mutation
Returns: The sequence offset Return type: int
-
log_metadata
(label, value)¶ Inserts a new metadata
Parameters: - label (str) – key for the metadata that will be added
- value (list(object)) – any kindy of additional value that should be kept
-
-
Fred2.Core.Variant.
VariationType
¶ alias of
Enum
Fred2.IO Module¶
IO.ADBAdapter¶
IO.EnsemblAdapter¶
IO.FileReader¶
-
Fred2.IO.FileReader.
read_annovar_exonic
(annovar_file, gene_filter=None, experimentalDesig=None)¶ Reads an gene-based ANNOVAR output file and generates
Variant
objects containing all annotatedTranscript
ids an outputs a listVariant
.Parameters: - annovar_file (str) – The path ot the ANNOVAR file
- gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)
Returns: List of :class:`~Fred2.Core.Variant.Variants fully annotated
Return type: list(
Variant
)
-
Fred2.IO.FileReader.
read_fasta
(files, type=<class 'Fred2.Core.Peptide.Peptide'>, id_position=1)¶ Generator function:
Read a (couple of) peptide, protein or rna sequence from a FASTA file. User needs to specify the correct type of the underlying sequences. It can either be: Peptide, Protein or Transcript (for RNA).
Parameters: - files (list(str) or str) – A (list) of file names to read in
- type (
Peptide
orTranscript
orProtein
) – The type to read in - id_position (int) – the position of the id specified counted by |
Returns: a list of the specified sequence type derived from the FASTA file sequences.
Return type: (list(
type
))Raises ValueError: if a file is not readable
-
Fred2.IO.FileReader.
read_lines
(files, type=<class 'Fred2.Core.Peptide.Peptide'>)¶ Generator function:
Read a sequence directly from a line. User needs to manually specify the correct type of the underlying data. It can either be: Peptide, Protein or Transcript, Allele.
Parameters: - files (list(str) or str) – a list of strings of absolute file names that are to be read.
- type (
Peptide
orProtein
orTranscript
orAllele
) – Possible types arePeptide
,Protein
,Transcript
, andAllele
.
Returns: A list of the specified objects
Return type: (list(
type
))Raises IOError: if a file is not readable
IO.MartsAdapter¶
-
class
Fred2.IO.MartsAdapter.
MartsAdapter
(usr=None, host=None, pwd=None, db=None, biomart=None)¶ Bases:
Fred2.IO.ADBAdapter.ADBAdapter
-
get_all_variant_gene
(locations, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ Fetches the important db ids and names for given chromosomal location :param chrom: integer value of the chromosome in question :param start: integer value of the variation start position on given chromosome :param stop: integer value of the variation stop position on given chromosome :return: The respective gene name, i.e. the first one reported
-
get_all_variant_ids
(**kwargs)¶ Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:
- ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
- ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
- ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
Parameters: - 'locations' – list of locations as triplets of integer values representing (chrom, start, stop)
- 'genes' – list of genes as string value of the genes of variation
Returns: The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)
-
get_product_sequence
(product_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ fetches product sequence for the given id :param product_refseq: given refseq id :return: list of dictionaries of the requested sequence, the respective strand and the associated gene name
-
get_protein_sequence_from_protein_id
(**kwargs)¶ Returns the protein sequence for a given protein ID that can either be refeseq, uniprot or ensamble id
Parameters: kwargs – Returns:
-
get_transcript_information
(transcript_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ It also already uses the Field-Enum for DBAdapters
Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
get_transcript_information_from_protein_id
(**kwargs)¶ It also already uses the Field-Enum for DBAdapters
Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
get_transcript_position
(start, stop, gene_id, transcript_id, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ If no transcript position is available for the variant :param start: :param stop: :param gene_id: :param transcript_id: :param _db: :param _dataset: :return:
-
get_transcript_sequence
(transcript_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
get_variant_gene
(chrom, start, stop, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ Fetches the important db ids and names for given chromosomal location :param chrom: integer value of the chromosome in question :param start: integer value of the variation start position on given chromosome :param stop: integer value of the variation stop position on given chromosome :return: The respective gene name, i.e. the first one reported
-
get_variant_id_from_gene_id
(**kwargs)¶ returns all information needed to instantiate a variation
Parameters: trans_id – A transcript ID (either ENSAMBLE (ENS) or RefSeq (NM, XN) Returns: list of dicts – containing all information needed for a variant initialization
-
get_variant_id_from_protein_id
(**kwargs)¶ returns all information needed to instantiate a variation
Parameters: trans_id – A transcript ID (either ENSAMBLE (ENS) or RefSeq (NM, XN) Returns: list of dicts – containing all information needed for a variant initialization
-
get_variant_ids
(**kwargs)¶ Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:
- ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
- ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
- ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
Parameters: - 'chrom' – integer value of the chromosome in question
- 'start' – integer value of the variation start position on given chromosome
- 'stop' – integer value of the variation stop position on given chromosome
- 'gene' – string value of the gene of variation
- 'transcript_id' – string value of the gene of variation
Returns: The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)
-
IO.RefSeqAdapter¶
-
class
Fred2.IO.RefSeqAdapter.
RefSeqAdapter
(prot_file=None, prot_vers=None, mrna_file=None, mrna_vers=None)¶ Bases:
Fred2.IO.ADBAdapter.ADBAdapter
-
get_product_sequence
(product_refseq)¶ fetches product sequence for the given id :param product_refseq: given refseq id :return: list of dictionaries of the requested sequence, the respective strand and the associated gene name
-
get_transcript_information
(transcript_refseq)¶
-
get_transcript_sequence
(transcript_refseq)¶ Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
load
(filename)¶
-
IO.UniProtAdapter¶
-
class
Fred2.IO.UniProtAdapter.
UniProtDB
(name='fdb')¶ -
exists
(seq)¶ fast check if given sequence exists (as subsequence) in one of the UniProtDB objects collection of sequences.
Parameters: seq – the subsequence to be searched for Returns: True, if it is found somewhere, False otherwise
-
read_seqs
(sequence_file)¶ read sequences from uniprot files (.dat or .fasta) or from lists or dicts of BioPython SeqRecords and make them available for fast search. Appending also with this function.
Parameters: sequence_file – uniprot files (.dat or .fasta) Returns:
-
search
(seq)¶ search for first occurrence of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of the first occurrence.
Parameters: seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences Returns: a dictionary of sequences to lists (of ids, ‘null’ if n/a)
-
search_all
(seq)¶ search for all occurrences of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of all occurrences.
Parameters: seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences Returns: a dictionary of the given sequences to lists (of ids, ‘null’ if n/a)
-
write_seqs
(name)¶ writes all fasta entries in the current object into one fasta file
Parameters: name – the complete path with file name where the fasta is going to be written
-
Prediction¶
Fred2.CleavagePrediction Module¶
CleavagePrediction.External¶
-
class
Fred2.CleavagePrediction.External.
AExternalCleavageSitePrediction
¶ Bases:
Fred2.Core.Base.ACleavageSitePrediction
,Fred2.Core.Base.AExternal
Abstract base class for external cleavage site prediction methods. Implements predict functionality.
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved (starting from 1)
-
command
¶ Defines the commandline call for external tool
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – - Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(aa_seq, command=None, options=None, **kwargs)¶ Overwrites ACleavageSitePrediction.predict
Parameters: Returns: A
CleavageSitePredictionResult
objectReturn type: CleavageSitePredictionResult
-
prepare_input
(input, file)¶ Prepares the data :attr:_input and writes them to :attr:_file in the special format used by the external tool
Parameters: - input (list(str)) – The input data (here peptide sequences)
- file (File) – A file handler with which the data are written to file
-
supportedLength
¶ The supported lengths of the predictor
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.CleavagePrediction.External.
NetChop_3_1
¶ Bases:
Fred2.CleavagePrediction.External.AExternalCleavageSitePrediction
,Fred2.Core.Base.AExternal
Implements NetChop Cleavage Site Prediction (v. 3.1).
Note
Nielsen, M., Lundegaard, C., Lund, O., & Kesmir, C. (2005). The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage. Immunogenetics, 57(1-2), 33-41.
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved
-
command
¶ Defines the commandline call for external tool
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – - Optional specification of executable path if deviant from :attr:self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: Returns a dictionary with the prediction results Return type: dict(str,dict((str,int),float))
-
predict
(aa_seq, command=None, options=None, **kwargs)¶ Overwrites ACleavageSitePrediction.predict
Parameters: Returns: A
CleavageSitePredictionResult
objectReturn type: CleavageSitePredictionResult
-
prepare_input
(input, file)¶ Prepares the data and writes them to _file in the special format used by the external tool
Parameters: - input (list(str)) – The input data (here peptide sequences)
- file (File) – A file handler with which the data are written to file
-
supportedLength
¶ The supported lengths of the predictor
-
version
¶ The version of the Method
-
CleavagePrediction.PSSM¶
-
class
Fred2.CleavagePrediction.PSSM.
APSSMCleavageFragmentPredictor
¶ Bases:
Fred2.Core.Base.ACleavageFragmentPrediction
Abstract base class for PSSM predictions.
This implementation only supports cleavage fragment prediction not site prediction
Implements predict functionality
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved
-
name
¶ The name of the predictor
-
predict
(peptides, **kwargs)¶ Takes peptides plus their trailing C and N-terminal residues to predict the probability that this n-mer was produced by proteasomal cleavage. It returns the score and the peptide sequence in a AResult object. Row-IDs are the peitopes column is the prediction score.
Parameters: peptides (list( Peptide
) orPeptide
) – A list of peptide objects or a single peptide objectReturns: Returns a Fred2.Core.Result.CleavageFragmentPredictionResult
objectReturn type: Fred2.Core.Result.CleavageFragmentPredictionResult
-
supportedLength
¶ The supported lengths of the predictor
-
trailingN
¶ The number of trailing residues at the N-terminal of the peptide used for prediction
-
tralingC
¶ The number of trailing residues at the C-terminal of the peptide used for prediction
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.CleavagePrediction.PSSM.
APSSMCleavageSitePredictor
¶ Bases:
Fred2.Core.Base.ACleavageSitePrediction
Abstract base class for PSSM predictions. This implementation only supports cleavage site prediction not fragment prediction. Implements predict functionality.
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved (starting from 1)
-
name
¶ The name of the predictor
-
predict
(aa_seq, length=None, **kwargs)¶ Returns predictions for given peptides.
Parameters: Returns: Returns a
CleavageSitePredictionResult
objectReturn type:
-
supportedLength
¶ The supported lengths of the predictor
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.CleavagePrediction.PSSM.
PCM
¶ Bases:
Fred2.CleavagePrediction.PSSM.APSSMCleavageSitePredictor
Implements the PCM cleavage prediction method.
Note
Doennes, P., and Kohlbacher, O. (2005). Integrated modeling of the major events in the MHC class I antigen processing pathway. Protein Science, 14(8), 2132-2140.
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved
-
name
¶ The name of the predictor
-
predict
(peptides, length=None, **kwargs)¶ Returns predictions for given peptides.
Parameters: Returns: Returns a
CleavageSitePredictionResult
objectReturn type:
-
supportedLength
¶ A list of supported peptide lengths
-
version
¶ The version of the predictor
-
-
class
Fred2.CleavagePrediction.PSSM.
PSSMGinodi
¶ Bases:
Fred2.CleavagePrediction.PSSM.APSSMCleavageFragmentPredictor
Implements the Cleavage Fragment prediction method of Ginodi et al.
Note
Ido Ginodi, Tal Vider-Shalit, Lea Tsaban, and Yoram Louzoun Precise score for the prediction of peptides cleaved by the proteasome Bioinformatics (2008) 24 (4): 477-483
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved
-
name
¶ The name of the predictor
-
predict
(peptides, **kwargs)¶ Takes peptides plus their trailing C and N-terminal residues to predict the probability that this n-mer was produced by proteasomal cleavage. It returns the score and the peptide sequence in a AResult object. Row-IDs are the peitopes column is the prediction score.
Parameters: peptides (list( Peptide
) orPeptide
) – A list of peptide objects or a single peptide objectReturns: Returns a Fred2.Core.Result.CleavageFragmentPredictionResult
objectReturn type: Fred2.Core.Result.CleavageFragmentPredictionResult
-
supportedLength
¶ A list of supported peptide lengths
-
trailingN
¶ The number of trailing residues at the N-terminal of the peptide used for prediction
-
tralingC
¶ The number of trailing residues at the C-terminal of the peptide used for prediction
-
version
¶ The version of the predictor
-
-
class
Fred2.CleavagePrediction.PSSM.
ProteaSMMConsecutive
¶ Bases:
Fred2.CleavagePrediction.PSSM.APSSMCleavageSitePredictor
Implements the ProteaSMM cleavage prediction method.
Note
Tenzer, S., et al. “Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding.” Cellular and Molecular Life Sciences CMLS 62.9 (2005): 1025-1037.
This model represents the consecutive proteasom
The matrices are generated not using the preon-dataset since a recent study has show that including those worsened the results.
Note
Calis, Jorg JA, et al. “Role of peptide processing predictions in T cell epitope identification: contribution of different prediction programs.” Immunogenetics (2014): 1-9.
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved
-
name
¶ The name of the predictor
-
predict
(peptides, length=None, **kwargs)¶ Returns predictions for given peptides.
Parameters: Returns: Returns a
CleavageSitePredictionResult
objectReturn type:
-
supportedLength
¶ A list of supported peptide lengths
-
version
¶ The version of the predictor
-
-
class
Fred2.CleavagePrediction.PSSM.
ProteaSMMImmuno
¶ Bases:
Fred2.CleavagePrediction.PSSM.APSSMCleavageSitePredictor
Implements the ProteaSMM cleavage prediction method.
Note
Tenzer, S., et al. “Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding.” Cellular and Molecular Life Sciences CMLS 62.9 (2005): 1025-1037.
This model represents the immuno proteasom
The matrices are generated not using the preon-dataset since a recent study has show that including those worsened the results.
Note
Calis, Jorg JA, et al. “Role of peptide processing predictions in T cell epitope identification: contribution of different prediction programs.” Immunogenetics (2014): 1-9.
-
cleavagePos
¶ Parameter specifying the position of aa (within the prediction window) after which the sequence is cleaved
-
name
¶ The name of the predictor
-
predict
(peptides, length=None, **kwargs)¶ Returns predictions for given peptides.
Parameters: Returns: Returns a
CleavageSitePredictionResult
objectReturn type:
-
supportedLength
¶ A list of supported peptide lengths
-
version
¶ The version of the predictor
-
Fred2.TAPPrediction Module¶
TAPPrediction.PSSM¶
-
class
Fred2.TAPPrediction.PSSM.
APSSMTAPPrediction
¶ Bases:
Fred2.Core.Base.ATAPPrediction
Abstract base class for PSSM predictions. Implements predict functionality
-
name
¶ The name of the predictor
-
predict
(peptides, **kwargs)¶ Returns TAP predictions for given
Peptide
.Parameters: peptides (list( Peptide
) orPeptide
) – A singlePeptide
or a list ofPeptide
Returns: Returns a TAPPredictionResult
object with the prediction resultsReturn type: TAPPredictionResult
-
supportedLength
¶ The supported lengths of the predictor
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.TAPPrediction.PSSM.
SMMTAP
¶ Bases:
Fred2.TAPPrediction.PSSM.APSSMTAPPrediction
Implementation of SMMTAP.
Note
Peters, B., Bulik, S., Tampe, R., Van Endert, P. M., & Holzhuetter, H. G. (2003). Identifying MHC class I epitopes by predicting the TAP transport efficiency of epitope precursors. The Journal of Immunology, 171(4), 1741-1749.
-
name
¶ The name of the predictor
-
predict
(peptides, **kwargs)¶ Returns TAP predictions for given
Peptide
.Parameters: peptides (list( Peptide
) orPeptide
) – A singlePeptide
or a list ofPeptide
Returns: Returns a TAPPredictionResult
object with the prediction resultsReturn type: TAPPredictionResult
-
version
¶ The version of the predictor
-
-
class
Fred2.TAPPrediction.PSSM.
TAPDoytchinova
¶ Bases:
Fred2.TAPPrediction.PSSM.APSSMTAPPrediction
Implements the TAP prediction model from Doytchinova.
Note
Doytchinova, I., Hemsley, S. and Flower, D. R. Transporter associated with antigen processing preselection of peptides binding to the MHC: a bioinformatic evaluation. J. Immunol, 2004, 173, 6813-6819
-
name
¶ The name of the predictor
-
predict
(peptides, **kwargs)¶ Returns TAP predictions for given
Peptide
.Parameters: peptides (list( Peptide
) orPeptide
) – A singlePeptide
or a list ofPeptide
Returns: Returns a TAPPredictionResult
object with the prediction resultsReturn type: TAPPredictionResult
-
version
¶ The version of the predictor
-
TAPPrediction.SVM¶
-
class
Fred2.TAPPrediction.SVM.
ASVMTAPPrediction
¶ Bases:
Fred2.Core.Base.ATAPPrediction
,Fred2.Core.Base.ASVM
-
encode
(peptides)¶ Returns the feature encoding for peptides
Parameters: peptides (list( Peptide
)/Peptide
) – List of or a singlePeptide
objectReturns: Feature encoding of the Peptide objects Return type: list(Object)
-
name
¶ The name of the predictor
-
predict
(peptides, **kwargs)¶ Returns TAP predictions for given
Peptide
.Parameters: peptides (list( Peptide
) orPeptide
) – A singlePeptide
or a list ofPeptide
Returns: Returns a TAPPredictionResult
object with the prediction resultsReturn type: TAPPredictionResult
-
supportedLength
¶ The supported lengths of the predictor
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.TAPPrediction.SVM.
SVMTAP
¶ Bases:
Fred2.TAPPrediction.SVM.ASVMTAPPrediction
Implements SVMTAP prediction of Doeness et al.
Note
Doennes, P. and Kohlbacher, O. Integrated modeling of the major events in the MHC class I antigen processing pathway. Protein Sci, 2005
-
encode
(peptides)¶ Encodes the
Peptide
with a binary sparse encodingParameters: peptides (list(str)) – A list of Peptide
Returns: Dictionary with Peptide
as key and feature encoding as value (see svmlight encoding scheme http://svmlight.joachims.org/)Return type: dict( Peptide
, (tuple(int, list(tuple(int,float))))
-
name
¶ The name of the predictor
-
predict
(peptides, **kwargs)¶ Returns predictions for given
Peptide
.Parameters: peptides (list( Peptide
) orPeptide
) – A singlePeptide
or a list ofPeptide
Returns: Returns a TAPPredictionResult
object with the prediction resultsReturn type: TAPPredictionResult
-
supportedLength
¶ A list of supported peptide lengths
-
version
¶ The version of the predictor
-
Fred2.EpitopePrediction Module¶
EpitopePrediction.External¶
-
class
Fred2.EpitopePrediction.External.
AExternalEpitopePrediction
¶ Bases:
Fred2.Core.Base.AEpitopePrediction
,Fred2.Core.Base.AExternal
Abstract class representing an external prediction function. Implementations shall wrap external binaries by following the given abstraction.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – The alleles for which the internal predictor representation is neededReturns: Returns a string representation of the input alleles Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – - Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to _file in the specific format
NO return value!
Param: list(str) _input: The Peptide
sequences to write into fileParameters: file (File) – File-handler to input file for external tool
-
supportedAlleles
¶ A list of valid allele models
-
supportedLength
¶ A list of supported peptide lengths
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
NetCTLpan_1_1
¶ Bases:
Fred2.EpitopePrediction.External.AExternalEpitopePrediction
Interface for NetCTLpan 1.1.
Note
NetCTLpan - Pan-specific MHC class I epitope predictions Stranzl T., Larsen M. V., Lundegaard C., Nielsen M. Immunogenetics. 2010 Apr 9. [Epub ahead of print]
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input alleles Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to file in the specific format
No return value!
Param: list(str) input: The Peptide
sequences to write into fileParameters: file (File) – File-handler to input file for external tool
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
NetMHCII_2_2
¶ Bases:
Fred2.EpitopePrediction.External.AExternalEpitopePrediction
Implements a wrapper for NetMHCII
Note
Nielsen, M., & Lund, O. (2009). NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinformatics, 10(1), 296.
Nielsen, M., Lundegaard, C., & Lund, O. (2007). Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics, 8(1), 238.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts
Allele
into the internal allele representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to _file in the specific format
No return value!
Param: list(str) input: The Peptide
sequences to write into fileParameters: file (File) – File-handler to input file for external tool
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
NetMHCIIpan_3_0
¶ Bases:
Fred2.EpitopePrediction.External.AExternalEpitopePrediction
Implements a wrapper for NetMHCIIpan.
Note
Andreatta, M., Karosiene, E., Rasmussen, M., Stryhn, A., Buus, S., & Nielsen, M. (2015). Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification. Immunogenetics, 1-10.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input alleles Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to _file in the specific format
No return value!
Param: list(str) input: The Peptide
sequences to write into fileParameters: file (File) – File-handler to input file for external tool
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
NetMHCIIpan_3_1
¶ Bases:
Fred2.EpitopePrediction.External.NetMHCIIpan_3_0
Implementation of NetMHCIIpan 3.1 adapter.
Note
Andreatta, M., Karosiene, E., Rasmussen, M., Stryhn, A., Buus, S., & Nielsen, M. (2015). Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification. Immunogenetics, 1-10.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input alleles Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to _file in the specific format
No return value!
Param: list(str) input: The Peptide
sequences to write into fileParameters: file (File) – File-handler to input file for external tool
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
NetMHC_3_0
¶ Bases:
Fred2.EpitopePrediction.External.NetMHC_3_4
Implements the NetMHC binding (for netMHC3.0):Note
NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. Nucleic Acids Res. 1;36(Web Server issue):W509-12. 2008
Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers. Lundegaard C, Lund O, Nielsen M. Bioinformatics, 24(11):1397-98, 2008.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: dict
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to file in the specific format
NO return value!
Param: list(str) input: The : sequences to write into _file Parameters: file (File) – File-handler to input file for external tool
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
NetMHC_3_4
¶ Bases:
Fred2.EpitopePrediction.External.AExternalEpitopePrediction
Implements the NetMHC binding (in current form for netMHC3.4).
Note
NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. Nucleic Acids Res. 1;36(Web Server issue):W509-12. 2008
Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers. Lundegaard C, Lund O, Nielsen M. Bioinformatics, 24(11):1397-98, 2008.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles ( Allele
) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: dict
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to file in the specific format
NO return value!
Param: list(str) input: The : sequences to write into _file Parameters: file (File) – File-handler to input file for external tool
-
supportedAlleles
¶ A list of valid allele models
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
NetMHCpan_2_4
¶ Bases:
Fred2.EpitopePrediction.External.AExternalEpitopePrediction
Implements the NetMHC binding (in current form for netMHCpan 2.4). Supported MHC alleles currently only restricted to HLA alleles.
Note
Nielsen, Morten, et al. “NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and-B locus protein of known sequence.” PloS one 2.8 (2007): e796.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts
Allele
into the internal allele representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to file in the specific format
NO return value!
Param: list(str) input: The Peptide
sequences to write into fileParameters: file (File) – File-handler to input file for external tool
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
NetMHCpan_2_8
¶ Bases:
Fred2.EpitopePrediction.External.AExternalEpitopePrediction
Implements the NetMHC binding (in current form for netMHCpan 2.8). Supported MHC alleles currently only restricted to HLA alleles.
Note
Nielsen, Morten, et al. “NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and-B locus protein of known sequence.” PloS one 2.8 (2007): e796.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to file in the specific format
No return value!
Param: list(str) input: The Peptide
sequences to write into fileParameters: file (File) – File-handler to input file for external tool
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.External.
PickPocket_1_1
¶ Bases:
Fred2.EpitopePrediction.External.AExternalEpitopePrediction
Implementation of PickPocket adapter.
Note
Zhang, H., Lund, O., & Nielsen, M. (2009). The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding. Bioinformatics, 25(10), 1293-1299.
-
command
¶ Defines the commandline call for external tool
-
convert_alleles
(alleles)¶ Converts
Allele
into the internal allele representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – The alleles for which the internal predictor representation is neededReturns: Returns a string representation of the input alleles Return type: list(str)
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from elf.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(peptides, alleles=None, command=None, options=None, **kwargs)¶ Overwrites AEpitopePrediction.predict
Parameters: - peptides (list(
Peptide
) orPeptide
) – A list of or a singlePeptide
object - alleles (list(
Allele
)/Allele
) – A list of or a singleAllele
object. If noAllele
are provided, predictions are made for allAllele
supported by the prediction method - command (str) – The path to a alternative binary (can be used if binary is not globally executable)
- options (str) – A string of additional options directly past to the external tool.
Returns: A
EpitopePredictionResult
objectReturn type: - peptides (list(
-
prepare_input
(input, file)¶ Prepares input for external tools and writes them to file in the specific format
No return value!
Param: list(str) input: The Peptide
sequences to write into _fileParameters: file (File) – File-handler to input file for external tool
-
version
¶ The version of the predictor
-
EpitopePrediction.PSSM¶
-
class
Fred2.EpitopePrediction.PSSM.
APSSMEpitopePrediction
¶ Bases:
Fred2.Core.Base.AEpitopePrediction
Abstract base class for PSSM predictions. Implements predict functionality
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – The alleles for which the internal predictor representation is neededReturns: Returns a string representation of the input alleles Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
supportedAlleles
¶ A list of valid allele models
-
supportedLength
¶ A list of supported peptide lengths
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
ARB
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Implements IEDBs ARB method.
Note
Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothe BR, Chisari FV, Watkins DI, Sette A. 2005. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics 57:304-314.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
BIMAS
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Represents the BIMAS PSSM predictor.
Note
Parker, K.C., Bednarek, M.A. and Coligan, J.E. Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. The Journal of Immunology 1994;152(1):163-175.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
ComblibSidney2008
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Implements IEDBs Comblib_Sidney2008 PSSM method.
Note
Sidney J, Assarsson E, Moore C, Ngo S, Pinilla C, Sette A, Peters B. 2008. Quantitative peptide binding motifs for 19 human and mouse MHC class I molecules derived using positional scanning combinatorial peptide libraries. Immunome Res 4:2.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
Epidemix
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Represents the Epidemix PSSM predictor.
Note
Feldhahn, M., et al. FRED-a framework for T-cell epitope detection. Bioinformatics 2009;25(20):2758-2759.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
Hammer
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Represents the virtual pockets approach by Sturniolo et al.
Note
Sturniolo, T., et al. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nature biotechnology 1999;17(6):555-561.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
SMM
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Implements IEDBs SMM PSSM method.
Note
Peters B, Sette A. 2005. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics 6:132.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
SMMPMBEC
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Implements IEDBs SMMPMBEC PSSM method.
Note
Kim, Y., Sidney, J., Pinilla, C., Sette, A., & Peters, B. (2009). Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior. BMC Bioinformatics, 10(1), 394.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
Syfpeithi
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Represents the Syfpeithi PSSM predictor.
Note
Rammensee, H. G., Bachmann, J., Emmerich, N. P. N., Bachor, O. A., & Stevanovic, S. (1999). SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics, 50(3-4), 213-219.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.PSSM.
TEPITOPEpan
¶ Bases:
Fred2.EpitopePrediction.PSSM.APSSMEpitopePrediction
Implements TEPITOPEpan.
Note
TEPITOPEpan: Extending TEPITOPE for Peptide Binding Prediction Covering over 700 HLA-DR Molecules Zhang L, Chen Y, Wong H-S, Zhou S, Mamitsuka H, et al. (2012) TEPITOPEpan: Extending TEPITOPE for Peptide Binding Prediction Covering over 700 HLA-DR Molecules. PLoS ONE 7(2): e30483.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an
Allele
. If noAllele
are given, predictions for all available models are made.Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
EpitopePrediction.SVM¶
-
class
Fred2.EpitopePrediction.SVM.
ASVMEpitopePrediction
¶ Bases:
Fred2.Core.Base.AEpitopePrediction
,Fred2.Core.Base.ASVM
Implements default prediction routine for SVM based epitope prediction tools
-
convert_alleles
(alleles)¶ Converts alleles into the internal allele representation of the predictor and returns a string representation
Parameters: alleles (list( Allele
)) – The alleles for which the internal predictor representation is neededReturns: Returns a string representation of the input alleles Return type: list(str)
-
encode
(peptides)¶ Returns the feature encoding for peptides
Parameters: peptides (list( Peptide
)/Peptide
) – List of or a singlePeptide
objectReturns: Feature encoding of the Peptide objects Return type: list(Object)
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an alleles. If no alleles are given, predictions for all available models are made.
Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
supportedAlleles
¶ A list of valid allele models
-
supportedLength
¶ A list of supported peptide lengths
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.SVM.
SVMHC
¶ Bases:
Fred2.EpitopePrediction.SVM.ASVMEpitopePrediction
Implements SVMHC epitope prediction for MHC-I alleles (SYFPEITHI models).
Note
Doennes, P. and Kohlbacher, O. SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res, 2006, 34, W194-W197
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
encode
(peptides)¶ Encodes the input with binary sparse encoding of the
Peptide
Parameters: peptides (str) – A list of Peptide
sequencesReturns: Dictionary with Peptide
as key and feature encoding as value (see svmlight encoding scheme http://svmlight.joachims.org/)Return type: dict( Peptide
, (tuple(int, list(tuple(int,float))))
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an alleles. If no alleles are given, predictions for all available models are made.
Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
supportedAlleles
¶ A list of supported allele models
-
version
¶ The version of the predictor
-
-
class
Fred2.EpitopePrediction.SVM.
UniTope
¶ Bases:
Fred2.EpitopePrediction.SVM.ASVMEpitopePrediction
Implements UniTope prediction for MHC-I.
Note
Toussaint, N. C., Feldhahn, M., Ziehm, M., Stevanovic, S., & Kohlbacher, O. (2011, August). T-cell epitope prediction based on self-tolerance. In Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (pp. 584-588). ACM.
-
convert_alleles
(alleles)¶ Converts
Allele
into the internalAllele
representation of the predictor and returns a string representationParameters: alleles (list( Allele
)) – TheAllele
for which the internal predictor representation is neededReturns: Returns a string representation of the input Allele
Return type: list(str)
-
encode
(peptides, allele)¶ Encodes the input with binary sparse encoding of the
Peptide
Parameters: Returns: Dictionary with
Peptide
as key and feature encoding as value (see svmlight encoding scheme http://svmlight.joachims.org/)Return type: dict(
Peptide
, (tuple(int, list(tuple(int,float))))`
-
name
¶ The name of the predictor
-
predict
(peptides, alleles=None, **kwargs)¶ Returns predictions for given peptides an alleles. If no alleles are given, predictions for all available models are made.
Parameters: Returns: Returns a
EpitopePredictionResult
object with the prediction resultsReturn type:
-
version
¶ The version of the predictor
-
Module contents¶
Vaccine Design¶
Fred2.EpitopeSelection Module¶
EpitopeSelection.OptiTope¶
-
class
Fred2.EpitopeSelection.OptiTope.
OptiTope
(results, threshold=None, k=10, solver='glpk', verbosity=0)¶ Bases:
object
This class implements the epitope selection functionality of OptiTope published by Toussaint et al. [1].
This module builds upon Pyomo, an embedded algebraic modeling languages [2].
It allows to (de)select specific constraints of the underlying ILP and to solve the specific problem with a MIP solver of choice
Note
[1] N. C. Toussaint and O. Kohlbacher. OptiTope–a web server for the selection of an optimal set of peptides for epitope-based vaccines. Nucleic Acids Res, 2009, 37, W617-W622 [2] Pyomo - Optimization Modeling in Python. William E. Hart, Carl Laird, Jean-Paul Watson and David L. Woodruff. Springer, 2012.
-
activate_allele_coverage_const
(minCoverage)¶ Enables the allele coverage constraint
Parameters: minCoverage (float) – Percentage of alleles which have to be covered [0,1] Raises ValueError: If the input variable is not in the same domain as the parameter
-
activate_antigen_coverage_const
(t_var)¶ Activates the variation coverage constraint
Parameters: t_var (int) – The number of epitopes which have to come from each variation Raises ValueError: If the input variable is not in the same domain as the parameter
-
activate_epitope_conservation_const
(t_c, conservation=None)¶ Activates the epitope conservation constraint
Parameters: t_c (float) – The percentage of conservation an epitope has to have [0.0,1.0]. Param: conservation: A dict with key=:class:~Fred2.Core.Peptide.Peptide specifying a different conservation score for each Peptide
Raises ValueError: If the input variable is not in the same domain as the parameter
-
deactivate_allele_coverage_const
()¶ Deactivates the allele coverage constraint
-
deactivate_antigen_coverage_const
()¶ Deactivates the variation coverage constraint
-
deactivate_epitope_conservation_const
()¶ Deactivates epitope conservation constraint
-
set_k
(k)¶ Sets the number of epitopes to select
Parameters: k (int) – The number of epitopes Raises ValueError: If the input variable is not in the same domain as the parameter
-
solve
(options=None)¶ Invokes the selected solver and solves the problem
Parameters: options (dict(str,str)) – A dictionary of solver specific options as keys and their parameters as values :return Returns the optimal epitopes as list of
Peptide
objectives :rtype: list(Peptide
) :raise RuntimeError: If the solver raised a problem or the solver is not accessible via the PATHenvironmental variable.
-
Fred2.EpitopeAssembly Module¶
EpitopeAssembly.EpitopeAssembly¶
-
class
Fred2.EpitopeAssembly.EpitopeAssembly.
EpitopeAssembly
(peptides, pred, solver='glpk', weight=0.0, matrix=None, verbosity=0)¶ Bases:
object
Implements the epitope assembly approach proposed by Toussaint et al. using proteasomal cleavage site prediction and formulating the problem as TSP.
Note
Toussaint, N.C., et al. Universal peptide vaccines - Optimal peptide vaccine design based on viral sequence conservation. Vaccine 2011;29(47):8745-8753.
Parameters: - peptides (list(
Peptide
)) – A list ofPeptide
which shell be arranged - pred (
ACleavageSitePredictor
) – AACleavageSitePrediction
- solver (str) – Specifies the solver to use (mused by callable by pyomo)
- weight (float) – Specifies how strong unwanted cleavage sites should be punished [0,1], where 0 means they will be ignored, and 1 the sum of all unwanted cleave sites is subtracted from the cleave site between two epitopes
- verbosity (int) – Specifies how verbos the class will be, 0 means normal, >0 debug mode
-
approximate
()¶ Approximates the eptiope assembly problem by applying Lin-Kernighan traveling salesman heuristic
Note
LKH implementation must be downloaded, compiled, and globally executable. Source code can be found here: http://www.akira.ruc.dk/~keld/research/LKH/
Returns: An order list of the Peptide
(based on the sting-of-beads ordering)Return type: list( Peptide
)
-
solve
(options=None)¶ Solves the Epitope Assembly problem and returns an ordered list of the peptides
Note
This can take quite long and should not be done for more and 30 epitopes max!
Parameters: options (str) – Solver specific options as string (will not be checked for correctness) Returns: An order list of the Peptide
(based on the string-of-beads ordering)Return type: list( Peptide
)
- peptides (list(
-
class
Fred2.EpitopeAssembly.EpitopeAssembly.
EpitopeAssemblyWithSpacer
(peptides, cleav_pred, epi_pred, alleles, k=5, en=9, threshold=None, solver='glpk', alpha=0.99, beta=0, verbosity=0)¶ Bases:
object
Implements the epitope assembly approach proposed by Toussaint et al. using proteasomal cleavage site prediction and formulating the problem as TSP.
It also extends it by optimal spacer design. (currently only allowed with PSSM cleavage site and epitope prediction)
The ILP model is implemented. So be reasonable with the size of epitope to be arranged.
-
approximate
(start=0, threads=1, options=None)¶ Approximates the Eptiope Assembly problem by applying Lin-Kernighan traveling salesman heuristic
LKH implementation must be downloaded, compiled, and globally executable.
Source code can be found here: http://www.akira.ruc.dk/~keld/research/LKH/
Parameters: - start (int) – Start length for spacers (default 0).
- threads (int) – Number of threads used for spacer design. Be careful, if options contain solver threads it will allocate threads*solver_threads cores!
- options (dict(str,str)) – Solver specific options (threads for example)
Returns: A list of ordered
Peptide
Return type: list(
Peptide
)
-
solve
(start=0, threads=None, options=None)¶ Solve the epitope assembly problem with spacers optimally using integer linear programming.
Note
This can take quite long and should not be done for more and 30 epitopes max! Also, one has to disable pre-solving steps in order to use this model.
Parameters: - start (int) – Start length for spacers (default 0).
- threads (int) – Number of threads used for spacer design. Be careful, if options contain solver threads it will allocate threads*solver_threads cores!
- options (dict(str,str)) – Solver specific options as keys and parameters as values
Returns: A list of ordered
Peptide
Return type: list(
Peptide
)
-
EpitopeAssembly.MosaicVaccine¶
The methods offers an exact solution for small till medium sized problems as well as heuristics based on a Matheuristic using Tabu Search and Branch-and-Bound for large problems.
The heuristic proceeds as follows:
I: initialize solution s_best via greedy construction
s_current = s_best WHILE convergence is not reached DO:
I: s<-Tabu Search(s_current)
- II: s<-Intensification via local MIP(s) solution (allow only alpha arcs to change)
- if s > s_best:
- s_best = s
III: Diversification(s) to escape local maxima
END
-
class
Fred2.EpitopeAssembly.MosaicVaccine.
MosaicVaccineTS
(_results, threshold=None, k=10, solver='glpk', verbosity=0)¶ -
approximate
(phi=0.05, options=None, _greedyLP=True, _tabu=True, _intensify=True, _jump=True, max_iter=10000, delta_change=0.0001, max_delta=101, seed=23478234)¶ Matheueristic using Tabu Search
-
solve
(options=None)¶ solves the model optimally
-
-
class
Fred2.EpitopeAssembly.MosaicVaccine.
TabuList
(iterable=None, size=None)¶ Bases:
_abcoll.MutableSet
-
add
(key)¶
-
clear
()¶ This is slow (creates N new iterators!) but effective.
-
discard
(key)¶
-
isdisjoint
(other)¶ Return True if two sets have a null intersection.
-
pop
(last=True)¶
-
remove
(value)¶ Remove an element. If not a member, raise a KeyError.
-
-
Fred2.EpitopeAssembly.MosaicVaccine.
suffixPrefixMatch
(m)¶ Return length of longest suffix of x of length at least k that matches a prefix of y. Return 0 if there no suffix/prefix match has length at least k.
HLA Typing¶
Fred2.HLAtyping Module¶
HLAtyping.External¶
-
class
Fred2.HLAtyping.External.
AExternalHLATyping
¶ Bases:
Fred2.Core.Base.AHLATyping
,Fred2.Core.Base.AExternal
-
clean_up
(_output)¶ Cleans the generated files after prediction
Parameters: output (str) – The path to the output file or directory
-
command
¶ Defines the commandline call for external tool
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – - Optional specification of executable path if deviant from self.__command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(file)¶ Parses external results and returns the result
Parameters: file (str) – The file path or the external prediction results Returns: A dictionary containing the prediction results Return type: dict
-
predict
(ngsFile, output, command=None, options=None, delete=True, **kwargs)¶ Implementation of prediction
Parameters: - ngsFile (str) – The path to the NGS file of interest
- output (str) – The path to the output file or directory
- command (str) – The path to a alternative binary (if binary is not globally executable)
- options (str) – A string with additional options that is directly past to the tool
- delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns: A list of
Allele
objects representing the most likely HLA genotypeReturn type: list(
Allele
)
-
version
¶ Parameter specifying the version of the prediction method
-
-
class
Fred2.HLAtyping.External.
ATHLATES_1_0
¶ Bases:
Fred2.HLAtyping.External.AExternalHLATyping
Wrapper for ATHLATES.
Note
C. Liu, X. Yang, B. Duffy, T. Mohanakumar, R.D. Mitra, M.C. Zody, J.D. Pfeifer (2012) ATHLATES: accurate typing of human leukocyte antigen through exome sequencing, Nucl. Acids Res. (2013)
-
clean_up
(output)¶ Deletes files created by ATHLATES within _output
Parameters: output (str) – The path to the output file or directory of the programme
-
command
¶ Defines the commandline call for external tool
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(output)¶ Searches within the defined dir _file for the newest dir and reads the prediction file from there
Parameters: output (str) – The path to the output dir Returns: The predicted HLA genotype Return type: list( Allele
)
-
predict
(ngsFile, output, command=None, options=None, delete=True, **kwargs)¶ Implementation of prediction
Parameters: - ngsFile (str) – The path to the NGS file of interest
- output (str) – The path to the output file or directory
- command (str) – The path to a alternative binary (if binary is not globally executable)
- options (str) – A string with additional options that is directly past to the tool
- delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns: A list of
Allele
objects representing the most likely HLA genotypeReturn type: list(
Allele
)
-
version
¶ The version of the predictor
-
-
class
Fred2.HLAtyping.External.
OptiType_1_0
¶ Bases:
Fred2.HLAtyping.External.AExternalHLATyping
Wrapper of OptiType v1.0.
Note
Szolek, A., Schubert, B., Mohr, C., Sturm, M., Feldhahn, M., & Kohlbacher, O. (2014). OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics, 30(23), 3310-3316.
-
clean_up
(output)¶ Searches within the defined dir _file for the newest dir and deletes it. This should be the one OptiType had created
This could cause some terrible site effects if someone or something also writes in that directory!! OptiType should change the way it writes its output!
Parameters: output (str) – The path to the output file or directory of the programme
-
command
¶ Defines the commandline call for external tool
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.command
Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(output)¶ Searches within the defined dir _file for the newest dir and reads the prediction file from there
Parameters: output (str) – The path to the output dir Returns: The predicted HLA genotype Return type: list( Allele
)
-
predict
(ngsFile, output, command=None, options=None, delete=True, **kwargs)¶ Implementation of prediction
Parameters: - ngsFile (str) – The path to the NGS file of interest
- output (str) – The path to the output file or directory
- command (str) – The path to a alternative binary (if binary is not globally executable)
- options (str) – A string with additional options that is directly past to the tool
- delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns: A list of
Allele
objects representing the most likely HLA genotypeReturn type: list(
Allele
)
-
version
¶ The version of the predictor
-
-
class
Fred2.HLAtyping.External.
Polysolver
¶ Bases:
Fred2.HLAtyping.External.AExternalHLATyping
Wrapper for Polysolver.
Note
Shukla, Sachet A., Rooney, Michael S., Rajasagi, Mohini, Tiao, Grace, et al. (2015). Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotech, advance online publication. doi: 10.1038/nbt.3344
-
clean_up
(output)¶ Deletes files created by Polysolver within output
Parameters: output (str) – The path to the output file or directory of the programme
-
command
¶ Defines the commandline call for external tool
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(output)¶ Searches within the defined dir _file for the newest dir and reads the prediction file from there
Parameters: output (str) – The path to the output dir Returns: The predicted HLA genotype Return type: list( Allele
)
-
predict
(ngsFile, output, command=None, options=None, delete=True, **kwargs)¶ Implementation of prediction
Parameters: - ngsFile (str) – The path to the NGS file of interest
- output (str) – The path to the output file or directory
- command (str) – The path to a alternative binary (if binary is not globally executable)
- options (str) – A string with additional options that is directly past to the tool
- delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns: A list of
Allele
objects representing the most likely HLA genotypeReturn type: list(
Allele
)
-
version
¶ The version of the predictor
-
-
class
Fred2.HLAtyping.External.
Seq2HLA_2_2
¶ Bases:
Fred2.HLAtyping.External.AExternalHLATyping
Wrapper of seq2HLA v2.2.
Note
Boegel, S., Scholtalbers, J., Loewer, M., Sahin, U., & Castle, J. C. (2015). In Silico HLA Typing Using Standard RNA-Seq Sequence Reads. Molecular Typing of Blood Cell Antigens, 247.
-
clean_up
(output)¶ Deletes all created files.
Parameters: output (str) – The path to the output file or directory of the programme
-
command
¶ Defines the commandline call for external tool
-
get_external_version
(path=None)¶ Returns the external version of the tool by executing >{command} –version
might be dependent on the method and has to be overwritten therefore it is declared abstract to enforce the user to overwrite the method. The function in the base class can be called with super()
Parameters: path (str) – Optional specification of executable path if deviant from self.__command Returns: The external version of the tool or None if tool does not support versioning Return type: str
-
is_in_path
()¶ Checks whether the specified execution command can be found in PATH
Returns: Whether or not command could be found in PATH Return type: bool
-
name
¶ The name of the predictor
-
parse_external_result
(output)¶ Searches within the defined dir _file for the newest dir and reads the prediction file from there
Parameters: output (str) – The path to the output dir Returns: The predicted HLA genotype Return type: list( Allele
)
-
predict
(ngsFile, output, command=None, options=None, delete=True, **kwargs)¶ Implementation of prediction
Parameters: - ngsFile (str) – The path to the NGS file of interest
- output (str) – The path to the output file or directory
- command (str) – The path to a alternative binary (if binary is not globally executable)
- options (str) – A string with additional options that is directly past to the tool
- delete (bool) – Boolean indicator whether generated files should be deleted afterwards
Returns: A list of
Allele
objects representing the most likely HLA genotypeReturn type: list(
Allele
)
-
version
¶ The version of the predictor
-