Frustratometer classes
The frustratometer package provides a handful of classes used to encapsulate the data.
- class frustratometer.Structure(pdb_file: Path | str, chain: str | None = None, seq_selection: str = None, aligned_sequence: str = None, filtered_aligned_sequence: str = None, distance_matrix_method: str = 'CB', pdb_directory: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/frustratometer/checkouts/latest/docs'), repair_pdb: bool = True)[source]
Bases:
object- classmethod full_pdb(pdb_file: Path | str, chain: str | None = None, aligned_sequence: str = None, filtered_aligned_sequence: str = None, distance_matrix_method: str = 'CB', pdb_directory: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/frustratometer/checkouts/latest/docs'), repair_pdb: bool = True)[source]
- classmethod spliced_pdb(pdb_file: Path | str, chain: str | None = None, seq_selection: str = None, aligned_sequence: str = None, filtered_aligned_sequence: str = None, distance_matrix_method: str = 'CB', pdb_directory: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/frustratometer/checkouts/latest/docs'), repair_pdb: bool = True)[source]
- class frustratometer.AWSEM(pdb_structure: object, sequence: str = None, expose_indicator_functions: bool = False, **parameters)[source]
Bases:
Frustratometer- q = 20
- aa_map_awsem_list = [0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18]
- aa_map_awsem_x = array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], [ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [ 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13], [ 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7], [ 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8], [ 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9], [11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11], [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10], [12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12], [ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14], [ 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15], [16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16], [19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19], [17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17], [18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18]])
- aa_map_awsem_y = array([[ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18], [ 0, 0, 4, 3, 6, 13, 7, 8, 9, 11, 10, 12, 2, 14, 5, 1, 15, 16, 19, 17, 18]])
- auc()
Computes area under the curve of the receiver-operating characteristic. A higher AUC value (maximum=1) indicates that the TPR is always high, regardless of FPR.
- couplings_energy(sequence: str = None, ignore_couplings_of_gaps: bool = False) float
Computes the couplings energy of a protein sequence based on a given Potts model and an interaction mask.
\[E = \frac{1}{2} \sum_{i,j} J_{ij} \Theta_{ij}\]- Parameters:
sequence (str) – The amino acid sequence of the protein. If no sequence is provided as input, the original protein sequence of the protein structure object is used for the energy calculation.
ignore_couplings_of_gaps (bool) – If True, couplings involving gaps (‘-’) in the sequence are set to 0 in the energy calculation. Default is False.
- Returns:
couplings_energy – The computed couplings energy of the protein sequence.
- Return type:
- decoy_energy(kind: str = 'singleresidue', sequence: str = None) array
Computes all possible decoy energies.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
- Returns:
decoy_energy – Matrix describing all possible decoy energies.
- Return type:
np.array
- decoy_fluctuation(sequence: str = None, kind: str = 'singleresidue', mask: array = None) array
Computes a matrix for a sequence of length L that describes all possible changes in energy upon mutating a single or pair of residues (depending on “kind” entry used) simultaneously.
- Parameters:
sequence (str) – The amino acid sequence of the protein. If no sequence is provided as input, the original protein sequence of the protein structure object is used for the energy calculation.
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
mask (np.array) – A 2D Boolean array that determines which residue pairs should be considered in the energy computation. The mask should have dimensions (L, L), where L is the length of the sequence.
- Returns:
fluctuation – The computed couplings energy of the protein sequence.
- Return type:
np.array
- fields_energy(sequence: str = None, ignore_fields_of_gaps: bool = False) float
Computes the fields energy of a protein sequence.
\[E = \sum_i h_i\]- Parameters:
sequence (str) – The amino acid sequence of the protein. If no sequence is provided as input, the original protein sequence of the protein structure object is used for the energy calculation.
ignore_fields_of_gaps (bool) – If True, fields corresponding to gaps (‘-’) in the sequence are set to 0 in the energy calculation. Default is False.
- Returns:
fields_energy – The computed fields energy of the protein sequence.
- Return type:
- frustration(sequence: str = None, kind: str = 'singleresidue', mask: array = None, aa_freq: array = None, correction: int = 0) array
Calculates frustration index values.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
mask (np.array) – A 2D Boolean array that determines which residue pairs should be considered in the energy computation. The mask should have dimensions (L, L), where L is the length of the sequence.
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
- Returns:
frustration_values – Frustration index values.
- Return type:
np.array
- generate_frustration_pair_distribution(sequence: str = None, kind: str = 'singleresidue', bins: int = 30, maximum_shell_radius: int = 20)
Calculates frustration pair distributions. This helps identify spatial proximity of similarly frustrated residues or contacts from one another.
For mutational, configurational, and contact frustration pair distributions, the distances between midpoints of Cb-Cb (or Ca in the case of glycine) atom pairs are measured. For single residue frustration, the distances of Cb (or Ca in the case of glycine) atoms are measured.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
bins (int) – Number of bins
maximum_shell_radius (int) – Maximum shell radius to evaluate
- Returns:
minimally_frustrated_gr (np.array) – Pair distribution function of minimally frustrated contacts
frustrated_gr (np.array) – Pair distribution function of frustrated contacts
neutral_gr (np.array) – Pair distribution function of neutral contacts
r_m (np.array) – Array of midpoints between evaluated spherical shells
- native_energy(sequence: str = None, ignore_couplings_of_gaps: bool = False, ignore_fields_of_gaps: bool = False) float
Calculates the native energy of the protein sequence.
- Parameters:
sequence (str) – The amino acid sequence of the protein. If no sequence is provided as input, the original protein sequence of the protein structure object is used for the energy calculation.
ignore_couplings_of_gaps (bool) – If set to True, the couplings terms of any gaps in the protein sequence are ignored in energy calculations.
ignore_fields_of_gaps (bool) – If set to True, the fields terms of any gaps in the protein sequence are ignored in energy calculations.
- Returns:
energy_value – Native energy of sequence
- Return type:
- plot_decoy_energy(sequence: str = None, kind: str = 'singleresidue', method: str = 'clustermap')
Plot comparison of single residue decoy energies, relative to the native energy
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
method (str) – Options: “clustermap”, “heatmap”
- plot_roc()
Plots the curve of the receiver-operating characteristic.
- roc()
Computes Receiver Operating Characteristic (ROC) curve of contacts predicted by DCA and true contacts, as identified from the distance matrix.
- scores()
Computes accuracy of DCA predicted contacts by calculating contact scores based on the Frobenius norm
- Returns:
corr_norm – Contact score matrix (N x N)
- Return type:
np.array
- sequences_energies(sequences: array, split_couplings_and_fields: bool = False)
Computes the energy of multiple protein sequences.
\[E = \sum_i h_i + \frac{1}{2} \sum_{i,j} J_{ij} \Theta_{ij}\]- Parameters:
sequences (list) – List of amino acid sequences in string format, separated by commas. The sequences are assumed to be in one-letter code. Gaps are represented as ‘-’. The length of each sequence (L) should all match the dimensions of the Potts model.
split_couplings_and_fields (bool) – If True, two lists of the sequences’ couplings and fields energies are returned. Default is False.
- Returns:
output (if split_couplings_and_fields==False) (float) – The computed energies of the protein sequences
output (if split_couplings_and_fields==True) (np.array) – Array containing computed fields and couplings energies of the protein sequences.
- sliding_window(win_size: int = 5, ndecoys: int = 1000, config_decoys: bool = False) dict
Computes the total frustration, the native energy, the decoy average energy and the decoy standard deviation for fragments on a sliding window
- Parameters:
win_size (int) – Size of the sliding window
ndecoys (int) – Number of decoy sequences to use
config_decoys (bool) – If True, use the configurational decoys approximation, shuffling index positions for configurational decoys energy calculation. If False, mutational decoys.
- Returns:
results – Dictionary with the results, containing ‘fragment_center’: center position of each window ‘win_size’: size of the sliding windows ‘native_energy’: native energy for each window ‘decoy_energy_av’: decoy energy average for each window ‘decoy_energy_std’: decoy energy standard deviation for each window ‘frustration’: total frustration index for each window
- Return type:
- total_frustration(n_decoys: int = 1000, config_decoys: bool = False, msa_mask: int | array = 1, fragment_pos: None | array = None, fragment_in_context: bool = False, output_kind: str = 'frustration') float | array
Calculates the total frustration of a protein fragment.
- Parameters:
n_decoys (int) – Number of sequence decoys to create
config_decoys (bool) – If True, use the configurational decoys approximation, shuffling index positions for configurational decoys energy calculation. If False, mutational decoys.
msa_mask (np.array) – Extra mask to use a Multiple Sequence Alignment that do not cover completely the reference PDB
fragment_pos (np.array) – Fragment positions. If None, use the complete model
fragment_in_context (bool) – If True, the energetics calculations take into account the interactions between the fragment and other sequence positions
output_kind (str) – If ‘frustration’, returns frustration. If not, returns native energy, decoy energy average and decoy energy standard deviation.
- Returns:
total_frustration (float) – Total frustration of the fragment or complete protein
native_energy (float) – Native energy of the given sequence
decoy_energy_average (float) – Average of the decoy energy distribution
decoy_energy_std (float) – Standard deviation of the decoy energy distribution
- view_pair_frustration(sequence: str = None, pair: str = 'mutational', aa_freq: array = None)
Calculates pair frustration indices and superimposes frustration patterns onto PDB structure, using Pymol for local visualization.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
pair (str) – Kind of pair frustration calculated. Options: “mutational,” “configurational,” and “contact.”
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
- view_single_frustration(aa_freq: array = None, only_frustrated_contacts: bool = False)
Calculates single residue frustration indices and superimposes frustration patterns onto PDB structure, using Pymol for local visualization.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
only_frustrated_contacts (bool) – If set to True, minimally frustrated contacts are also hilighted.
- vmd(sequence: str = None, single: str | array = 'singleresidue', pair: str | array = 'mutational', aa_freq: array = None, correction: int = 0, max_connections: int | None = None, movie_name=None, still_image_name=None)
Calculates frustration indices and superimposes frustration patterns onto PDB structure using the VMD software.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
pair (str) – Kind of pair frustration calculated. Options: “mutational,” “configurational,” and “contact.”
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
max_connections (int) – Maximum number of pair frustration values visualized in tcl file
movie_name (Path or str) – Output tcl script file with rotating structure
- class frustratometer.DCA[source]
Bases:
FrustratometerThe DCA class contains many class methods that can be used, depending on whether they have already calculated the DCA couplings and fields parameters.
If the user already has calculated these parameters, the “from_potts_model_file” or “from_pottsmodel” methods can be used. Otherwise, the user can locally generate these parameters using the pyDCA package. In this case, the user can try using the “from_distance_matrix,” “from_pfam_alignment,” or “from_hmmer_alignment” methods.
- classmethod from_potts_model_file(pdb_structure: object, potts_model_file: Path | str = None, reformat_potts_model: bool = False, sequence_cutoff: float | None = None, distance_cutoff: float | None = None) object[source]
Generate DCA object from previously generated potts model file.
- Parameters:
pdb_structure (object) – Structure object generated by Structure class
potts_model_file (Path or str) – File path of potts model file
reformat_potts_model (bool) – If True, the fields matrix will be transposed, while the couplings matrix will be reformatted into a (NxNx21x21) matrix. This reformatting is necessary for some potts model files generated by the mfDCA Matlab algorithm. Default is False.
sequence_cutoff (float) – Sequence seperation cutoff; the couplings terms of contacts that are separated by more than this cutoff will be ignored.
distance_cutoff (float) – Distance seperation cutoff; the couplings terms of contacts that are separated by more than this cutoff will be ignored.
- Return type:
DCA object
- classmethod from_pottsmodel(pdb_structure: object, potts_model: dict, reformat_potts_model: bool = False, sequence_cutoff: float | None = None, distance_cutoff: float | None = None) object[source]
Generate DCA object from previously generated potts model.
- Parameters:
pdb_structure (object) – Structure object generated by Structure class
potts_model (dict) – Dictionary of potts model file, containing fields and couplings keys.
reformat_potts_model (bool) – If True, the fields matrix will be transposed, while the couplings matrix will be reformatted into a (NxNx21x21) matrix. This reformatting is necessary for some potts model files generated by the mfDCA Matlab algorithm. Default is False.
sequence_cutoff (float) – Sequence seperation cutoff; the couplings terms of contacts that are separated by more than this cutoff will be ignored.
distance_cutoff (float) – Distance seperation cutoff; the couplings terms of contacts that are separated by more than this cutoff will be ignored.
- Return type:
DCA object
- classmethod from_pfam_alignment(pdb_structure: object, alignment_output_file_name: Path | str, filtered_alignment_output_file_name: Path | str, PFAM_ID: str = None, DCA_format: str = 'plmDCA', sequence_cutoff: float | None = None, distance_cutoff: float | None = None) object[source]
Generate DCA object from a locally downloaded PFAM alignment file that will be used to generate a Potts Model file.
- Parameters:
pdb_structure (object) – Structure object generated by Structure class
alignment_output_file_name (Path or str) – File name of generated alignment file. The file will be in stockholm format.
filtered_alignment_output_file_name (Path or str) – File name of generated filtered alignment file. The file will be in Fasta format.
PFAM_ID (str) – PFAM ID associated with structure object
DCA_format (str) – Options are “plmDCA” and “mfDCA”
sequence_cutoff (float) – Sequence seperation cutoff; the couplings terms of contacts that are separated by more than this cutoff will be ignored.
distance_cutoff (float) – Distance seperation cutoff; the couplings terms of contacts that are separated by more than this cutoff will be ignored.
- Return type:
DCA object
- classmethod from_hmmer_alignment(pdb_structure: object, alignment_output_file_name: Path | str, filtered_alignment_output_file_name: Path | str, query_sequence_database_file: Path | str, DCA_format: str = 'plmDCA', sequence_cutoff: float | None = None, distance_cutoff: float | None = None) object[source]
Generate DCA object from a locally generated jackhmmer alignment file that will be used to generate a Potts Model file. The protein sequence of the structure object will be used as the query sequence by the Jackhmmer algorithm.
- Parameters:
pdb_structure (object) – Structure object generated by Structure class
alignment_output_file_name (Path or str) – File name of generated alignment file. The file will be in stockholm format.
filtered_alignment_output_file_name (Path or str) – File name of generated filtered alignment file. The file will be in Fasta format.
query_sequence_database_file (Path or str) – File name of sequence database.
DCA_format (str) – Options are “plmDCA” and “mfDCA”
sequence_cutoff (float) – Sequence seperation cutoff; the couplings terms of contacts that are separated by more than this cutoff will be ignored.
distance_cutoff (float) – Distance seperation cutoff; the couplings terms of contacts that are separated by more than this cutoff will be ignored.
- Return type:
DCA object
- property sequence
- property pdb_file
- property pdb_name
Returns PDBid from pdb name
- property chain
- property pfamID
Returns pfamID from pdb name
- property alignment_type
- property alignment_sequence_database
- property download_all_alignment_files
- property alignment_files_directory
- property alignment_output_file
- property sequence_cutoff
- auc()
Computes area under the curve of the receiver-operating characteristic. A higher AUC value (maximum=1) indicates that the TPR is always high, regardless of FPR.
- couplings_energy(sequence: str = None, ignore_couplings_of_gaps: bool = False) float
Computes the couplings energy of a protein sequence based on a given Potts model and an interaction mask.
\[E = \frac{1}{2} \sum_{i,j} J_{ij} \Theta_{ij}\]- Parameters:
sequence (str) – The amino acid sequence of the protein. If no sequence is provided as input, the original protein sequence of the protein structure object is used for the energy calculation.
ignore_couplings_of_gaps (bool) – If True, couplings involving gaps (‘-’) in the sequence are set to 0 in the energy calculation. Default is False.
- Returns:
couplings_energy – The computed couplings energy of the protein sequence.
- Return type:
- decoy_energy(kind: str = 'singleresidue', sequence: str = None) array
Computes all possible decoy energies.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
- Returns:
decoy_energy – Matrix describing all possible decoy energies.
- Return type:
np.array
- decoy_fluctuation(sequence: str = None, kind: str = 'singleresidue', mask: array = None) array
Computes a matrix for a sequence of length L that describes all possible changes in energy upon mutating a single or pair of residues (depending on “kind” entry used) simultaneously.
- Parameters:
sequence (str) – The amino acid sequence of the protein. If no sequence is provided as input, the original protein sequence of the protein structure object is used for the energy calculation.
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
mask (np.array) – A 2D Boolean array that determines which residue pairs should be considered in the energy computation. The mask should have dimensions (L, L), where L is the length of the sequence.
- Returns:
fluctuation – The computed couplings energy of the protein sequence.
- Return type:
np.array
- property distance_cutoff
- fields_energy(sequence: str = None, ignore_fields_of_gaps: bool = False) float
Computes the fields energy of a protein sequence.
\[E = \sum_i h_i\]- Parameters:
sequence (str) – The amino acid sequence of the protein. If no sequence is provided as input, the original protein sequence of the protein structure object is used for the energy calculation.
ignore_fields_of_gaps (bool) – If True, fields corresponding to gaps (‘-’) in the sequence are set to 0 in the energy calculation. Default is False.
- Returns:
fields_energy – The computed fields energy of the protein sequence.
- Return type:
- frustration(sequence: str = None, kind: str = 'singleresidue', mask: array = None, aa_freq: array = None, correction: int = 0) array
Calculates frustration index values.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
mask (np.array) – A 2D Boolean array that determines which residue pairs should be considered in the energy computation. The mask should have dimensions (L, L), where L is the length of the sequence.
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
- Returns:
frustration_values – Frustration index values.
- Return type:
np.array
- generate_frustration_pair_distribution(sequence: str = None, kind: str = 'singleresidue', bins: int = 30, maximum_shell_radius: int = 20)
Calculates frustration pair distributions. This helps identify spatial proximity of similarly frustrated residues or contacts from one another.
For mutational, configurational, and contact frustration pair distributions, the distances between midpoints of Cb-Cb (or Ca in the case of glycine) atom pairs are measured. For single residue frustration, the distances of Cb (or Ca in the case of glycine) atoms are measured.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
bins (int) – Number of bins
maximum_shell_radius (int) – Maximum shell radius to evaluate
- Returns:
minimally_frustrated_gr (np.array) – Pair distribution function of minimally frustrated contacts
frustrated_gr (np.array) – Pair distribution function of frustrated contacts
neutral_gr (np.array) – Pair distribution function of neutral contacts
r_m (np.array) – Array of midpoints between evaluated spherical shells
- native_energy(sequence: str = None, ignore_couplings_of_gaps: bool = False, ignore_fields_of_gaps: bool = False) float
Calculates the native energy of the protein sequence.
- Parameters:
sequence (str) – The amino acid sequence of the protein. If no sequence is provided as input, the original protein sequence of the protein structure object is used for the energy calculation.
ignore_couplings_of_gaps (bool) – If set to True, the couplings terms of any gaps in the protein sequence are ignored in energy calculations.
ignore_fields_of_gaps (bool) – If set to True, the fields terms of any gaps in the protein sequence are ignored in energy calculations.
- Returns:
energy_value – Native energy of sequence
- Return type:
- plot_decoy_energy(sequence: str = None, kind: str = 'singleresidue', method: str = 'clustermap')
Plot comparison of single residue decoy energies, relative to the native energy
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
kind (str) – Kind of decoys generated. Options: “singleresidue,” “mutational,” “configurational,” and “contact.”
method (str) – Options: “clustermap”, “heatmap”
- plot_roc()
Plots the curve of the receiver-operating characteristic.
- roc()
Computes Receiver Operating Characteristic (ROC) curve of contacts predicted by DCA and true contacts, as identified from the distance matrix.
- scores()
Computes accuracy of DCA predicted contacts by calculating contact scores based on the Frobenius norm
- Returns:
corr_norm – Contact score matrix (N x N)
- Return type:
np.array
- sequences_energies(sequences: array, split_couplings_and_fields: bool = False)
Computes the energy of multiple protein sequences.
\[E = \sum_i h_i + \frac{1}{2} \sum_{i,j} J_{ij} \Theta_{ij}\]- Parameters:
sequences (list) – List of amino acid sequences in string format, separated by commas. The sequences are assumed to be in one-letter code. Gaps are represented as ‘-’. The length of each sequence (L) should all match the dimensions of the Potts model.
split_couplings_and_fields (bool) – If True, two lists of the sequences’ couplings and fields energies are returned. Default is False.
- Returns:
output (if split_couplings_and_fields==False) (float) – The computed energies of the protein sequences
output (if split_couplings_and_fields==True) (np.array) – Array containing computed fields and couplings energies of the protein sequences.
- sliding_window(win_size: int = 5, ndecoys: int = 1000, config_decoys: bool = False) dict
Computes the total frustration, the native energy, the decoy average energy and the decoy standard deviation for fragments on a sliding window
- Parameters:
win_size (int) – Size of the sliding window
ndecoys (int) – Number of decoy sequences to use
config_decoys (bool) – If True, use the configurational decoys approximation, shuffling index positions for configurational decoys energy calculation. If False, mutational decoys.
- Returns:
results – Dictionary with the results, containing ‘fragment_center’: center position of each window ‘win_size’: size of the sliding windows ‘native_energy’: native energy for each window ‘decoy_energy_av’: decoy energy average for each window ‘decoy_energy_std’: decoy energy standard deviation for each window ‘frustration’: total frustration index for each window
- Return type:
- total_frustration(n_decoys: int = 1000, config_decoys: bool = False, msa_mask: int | array = 1, fragment_pos: None | array = None, fragment_in_context: bool = False, output_kind: str = 'frustration') float | array
Calculates the total frustration of a protein fragment.
- Parameters:
n_decoys (int) – Number of sequence decoys to create
config_decoys (bool) – If True, use the configurational decoys approximation, shuffling index positions for configurational decoys energy calculation. If False, mutational decoys.
msa_mask (np.array) – Extra mask to use a Multiple Sequence Alignment that do not cover completely the reference PDB
fragment_pos (np.array) – Fragment positions. If None, use the complete model
fragment_in_context (bool) – If True, the energetics calculations take into account the interactions between the fragment and other sequence positions
output_kind (str) – If ‘frustration’, returns frustration. If not, returns native energy, decoy energy average and decoy energy standard deviation.
- Returns:
total_frustration (float) – Total frustration of the fragment or complete protein
native_energy (float) – Native energy of the given sequence
decoy_energy_average (float) – Average of the decoy energy distribution
decoy_energy_std (float) – Standard deviation of the decoy energy distribution
- view_pair_frustration(sequence: str = None, pair: str = 'mutational', aa_freq: array = None)
Calculates pair frustration indices and superimposes frustration patterns onto PDB structure, using Pymol for local visualization.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
pair (str) – Kind of pair frustration calculated. Options: “mutational,” “configurational,” and “contact.”
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
- view_single_frustration(aa_freq: array = None, only_frustrated_contacts: bool = False)
Calculates single residue frustration indices and superimposes frustration patterns onto PDB structure, using Pymol for local visualization.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
only_frustrated_contacts (bool) – If set to True, minimally frustrated contacts are also hilighted.
- vmd(sequence: str = None, single: str | array = 'singleresidue', pair: str | array = 'mutational', aa_freq: array = None, correction: int = 0, max_connections: int | None = None, movie_name=None, still_image_name=None)
Calculates frustration indices and superimposes frustration patterns onto PDB structure using the VMD software.
- Parameters:
sequence (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
pair (str) – Kind of pair frustration calculated. Options: “mutational,” “configurational,” and “contact.”
aa_freq (np.array) – Array of frequencies of all 21 possible amino acids within sequence
max_connections (int) – Maximum number of pair frustration values visualized in tcl file
movie_name (Path or str) – Output tcl script file with rotating structure
- property distance_matrix_method
- property potts_model_file
- property potts_model
- class frustratometer.Map(map_array)[source]
Bases:
object- classmethod from_sequences(sequence_a, sequence_b, substitution_matrix='BLOSUM62', match_score=2, mismatch_score=-1, open_gap_score=-0.5, extend_gap_score=-0.1, target_end_gap_score=-0.01, query_end_gap_score=-0.01)[source]
- property map_array