Module implementing the TFFMs.
platform:  Unix 

synopsis:  Define the class representing the Transcription Factor Flexible Models and the necessary functions to manipulate them. 
todo:  Allow the construction of TFFMs using a different de novo motif finding tool than MEME. 
Bases: ghmm.DiscreteEmissionHMM
Define the Transcription Factor Flexible Models.
Note
Instances of this class have to be created through the functions tffm_from_xml() or tffm_from_meme().
Delete the underlying C structures.
Note:  The destruction is made using the ghmm.DiscreteEmissionHMM destructor. 

Construct an instance of the TFFM class.
Parameters: 


Raises:  exceptions.TFFMKindError when the given kind is neither ‘1storder’ nor ‘detailed’. 
Give the length of the TFFM, i.e. the number of nucleotides in the model excluding the background.
Compute the TFBS hits on a sequence given the posterior probabilities and construct the corresponding instances of HIT.
Parameters: 


Returns:  The list of TFBS hits predicted on the sequence strand. 
Return type:  list of HIT 
Predict TFBS hits in the sequence given the TFFM.
Parameters: 


Returns:  The list of TFBS hits predicted on the sequence strand. 
Return type:  list of HIT 
Get the posterior probabilities at each nucleotide position given the TFFM.
Parameters:  sequence_split (list) – The sequence splitted in subsequences to not consider non ACGT nucleotides. 

Returns:  The posterior probabilities at each position of the sequence. 
Return type:  list of list 
Note:  One example of a sequence_split is [“ACT”, “N”, “ATC”]. 
Return the new trimmed HMM.
Parameters: 


Returns:  The new trimmed HMM. 
Return type:  ghmm.DiscreteEmissionHMM 
Todo:  Raise an error rather than a sys.exit() when the trimmed HMM becomes empty. 
Return the emission probabilities of the nucleotides in the background state.
Returns:  A dictionnary with characters ‘A’, ‘C’, ‘G’, and ‘T’ as keys and the corresponding probabilities as values. 

Return type:  dict 
Give the list of final states in the HMM (i.e. corresponding to the last matching position in the TFFM).
Returns:  A list of final states as int. 

Return type:  list 
Get the emission probabilities of ACGT at position position and update the emission probabilities in position_proba given the emission probabilities at the previous position (previous_position_proba).
Note:  This function is used state by state and several states represent the same position in detailed TFFM, this is why we need to update the probabilities listed in position_proba. 

Parameters: 

Returns:  The emission probabilities of ACGT by the state indexed by index at position position in the TFFM. 
Return type:  list 
Give the information content of the whole TFFM.
Returns:  A float corresponding to the information content of the TFFM. 

Return type:  float 
Give the position of the first matching state.
Returns:  The position of the first matching state of the TFFM. 

Return type:  float 
Warning:  The position is given 0based. 
Give the information content for every positions of the motif modeled by the TFFM.
Returns:  A list of floats giving the information contents of the positions. 

Return type:  list 
Note:  The output is an ordered list following the order of the positions within the motif. 
Get the first and last significant position the TFFM where the insignificant positions are the ones on the edges with low information content.
Parameters:  threshold (float) – The minimal information content to consider a position to be significant. 

Returns:  The positions of the first and last positions that are to be considered significant (given in this order). 
Return type:  tuple 
Trim the current TFFM by removing edges with low information content.
Parameters: 


Returns:  A TFFM corresponding to the current TFFM trimmed. 
Return type:  
See also: 
Apply the TFFM on the fasta sequences and return the Pocc value (probability of occupancy) for each sequence.
Parameters: 


Returns:  Pocc values through a generator. 
Return type:  Generator of HIT 
Note:  (0.0<= threshold <=1.0) 
Print the svg code of the corresponding dense logo (i.e. displaying the dinucleotide dependencies captured by the TFFM).
Parameters:  output (file) – Stream where to output the svg (defaut: sys.stdout). 

Note:  The output argument is not a file name but it is an already open file stream. 
Print the svg code of the corresponding summary logo (i.e. similar to a regular sequence logo).
Parameters:  output (file) – Stream where to output the svg (defaut: sys.stdout). 

Note:  The output argument is not a file name but it is an already open file stream. 
Apply the TFFM on the fasta sequence and return the TFBS hits.
Parameters: 


Returns:  TFBS hits. 
Return type:  list of HIT 
Note:  (0.0<= threshold <=1.0) 
Apply the TFFM on the fasta sequences and return the TFBS hits.
Parameters: 


Returns:  TFBS hits through a generator. 
Return type:  Generator of HIT 
Note:  (0.0<= threshold <=1.0) 
Train the TFFM using the fasta sequences to learn emission and transition probabilities.
Note:  The training of the underlying HMM is made using the BaumWelsh algorithm. 

Parameters: 

Trim the current TFFM by removing edges with low information content.
Parameters:  threshold (float) – The minimal information content value for an edge TFFM match position to be kept. 

Warning:  Trims the TFFM in place. To preserve the TFFM, use the get_trimmed() method which returns a trimmed copy of the TFFM but does not alter this TFFM. 
See also:  get_trimmed() 
Give the best hit in a sequence by considering both positive and negative strands.
Parameters: 


Returns:  The best hit (None if no hit). 
Return type:  HIT 
Compute the entropy given the emission probabilities of the ACGT nucleotides.
Parameters:  emissions (list of float) – Emission probabilities of the ACGT nucleotides. 

Returns:  The computed entropy. 
Return type:  float 
Warning:  The list gives the probabilities corresponding to A, C, G, and T in this order. 
Create a 0order HMM initialized from MEME result
Parameters: 


Returns:  The constructed HMM 
Return type:  ghmm.DiscreteEmissionHMM 
Create a 1storder HMM initialized from MEME result
Parameters: 


Returns:  The constructed HMM 
Return type:  ghmm.DiscreteEmissionHMM 
Create a detailed HMM initialized from MEME result
Parameters: 


Returns:  The constructed HMM 
Return type:  ghmm.DiscreteEmissionHMM 
Merges the hits from both strands.
Parameters: 


Returns:  A list containing the TFBS hits (empty if no hit). 
Return type:  list 
Note:  The two input lists are required to be ordered following the positions on the sequence. The best hit per position is given. When no hit has been found at a position, the constant None is used. 
Construct a TFFM from the output of MEME on ChIPseq data.
Parameters: 


Returns:  The TFFM initialized from MEME results. 
Return type:  
Note:  As the PFM is used to initialize the TFFM, a pseudocount of 1 is added to all the values in the PFM 
Construct an initialized TFFM from a PFM.
Parameters: 


Returns:  The TFFM initialized from the PFM 
Return type:  
See also:  
Note:  As the PFM is used to initialize the TFFM, a pseudocount of 1 is added to all the values in the PFM 
Construct a TFFM described in an XML file.
Parameters: 


Returns:  The TFFM described in the XML file. 
Return type: 
Module author: Anthony Mathelier <amathelier@cmmt.ubc.ca>