Frustratometer Functions
Overview
The frustratometer
package includes several submodules each designed to handle different aspects of the frustration analysis pipeline. Below are the main modules and their descriptions.
Protein Database Handling
PDB functions
- frustratometer.pdb.download(pdbID: str, directory: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/frustratometer/checkouts/latest/docs')) Path [source]
Downloads a single pdb file
- frustratometer.pdb.get_sequence(pdb_file: str, chain: str) str [source]
Get a protein sequence from a pdb file
- Parameters:
pdb (str,) – PDB file location.
chain (str,) – Chain ID of the selected protein.
- Returns:
sequence – Protein sequence.
- Return type:
- frustratometer.pdb.get_distance_matrix(pdb_file: Path, chain: str, method: str = 'CB') array [source]
Calculate the distance matrix of the specified atoms in a PDB file.
- Parameters:
pdb_file (str): The path to the PDB file. chain (str): The chainID or chainIDs (space separated) of the protein. method (str): The method to use for calculating the distance matrix.
Defaults to ‘CB’, which uses the CB atom for all residues except GLY, which uses the CA atom. Options:
‘CA’ for using only the CA atom, ‘minimum’ for using the minimum distance between all atoms in each residue, ‘CB_force’ computes a new coordinate for the CB atom based on the CA, C, and N atoms and uses CB distance even for glycine.
- Returns:
np.array: The distance matrix of the selected atoms.
- Raises:
IndexError: If the selection of atoms is empty. ValueError: If the method is not recognized.
Protein Family Analysis
PFAM functions
This module includes functions to download and manipulate PFAM alignments. There are two options to obtain alignments from PFAM: downloading the database locally or downloading specific alignments.
If you want to download the database locally use the download function to download or update the database and the get function to select a single alignment.
If you want to retrieve a single alignment use the alignemnt function
Examples:
$ frustratometer.pfam.database(‘/media/databases’, https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.uniprot.gz)
- Todo:
Create a function to get single alignments
- frustratometer.pfam.download_database(path, name='PFAM_current', url='https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.uniprot.gz')[source]
Downloads and creates a pfam database in the Database folder
- Parameters:
url (str) – Adress of pfam database
name (str) – Name of the new database folder
- Returns:
alignment_path – Path of the alignments
- Return type:
Path
- frustratometer.pfam.download_aligment(pfamid, output_file, alignment_type='full')[source]
‘ Retrieves a pfam family alignment from interpro
- Parameters:
pfamid (str,) – ID of PFAM family. ex: PF00001
alignment_type (str,) – alignment type to retrieve. Options: full, seed, uniprot
output_file (str) – location of the output file. Default: Temporary file
- Returns:
output – location of alignment
- Return type:
Path
Sequence Alignment
Align functions
- frustratometer.align.jackhmmer(sequence, output_file, database, log=-3, dry_run: bool = False, **kwargs)[source]
Generates alignment using jackhmmer
- Parameters:
sequence (str) – Protein sequence
output_file (str) – If True, will record alignment output messages; Arguments: True OR False (Default is False)
database (str) – Location of the sequence database used to generation MSA
log (File handle) – jackhmmer output Default:DEVNULL
dry_run (bool (default: False)) – Save the temporary fasta_file and print the command instead of running
**kwargs – Other arguments that can be passed to jackhmmer. More information can be found by executing jackhmmer -h arguments without a value such as –noali should be passed as noali=True Common kwargs:
N: number of iterations E: E-value threshold
- Returns:
output_file – Path of the alignment file
- Return type:
Path
MSA Filtering
Filter functions
- frustratometer.filter.filter_alignment(alignment_file, output_file=None, alignment_format='stockholm')[source]
Filter stockholm alignment by removing unaligned sections. Filters by saving the alignment into memory.
- Parameters:
alignment_file (Path) – location of file
output_file (Path) – location of the output_file
alignment_format (str) – Biopython alignment format (Default: stockholm)
- Returns:
output_file – location of the output_file
- Return type:
Path
- frustratometer.filter.filter_alignment_lowmem(alignment_file, output_file, alignment_format='stockholm')[source]
Filter stockholm alignment by removing unaligned sections. Filters by reading the file without saving to memory.
- Parameters:
alignment_file (Path) – location of file
output_file (Path) – location of the output_file
alignment_format (str) – Biopython alignment format (Default: stockholm)
- Returns:
output_file – location of the output_file
- Return type:
Path
Mapping Utilities
MAP functions
Direct Coupling Analysis
DCA functions
- frustratometer.dca.pydca.plmdca(filtered_alignment_file, sequence_type='protein', seqid=0.8, lambda_h=1.0, lambda_J=20.0, num_threads=10, max_iterations=500, verbose=False, msa_file_format='fasta')[source]
Frustration Analysis
Frustration functions
- frustratometer.frustration.compute_mask(distance_matrix: array, maximum_contact_distance: float | None = None, minimum_sequence_separation: int | None = None) array [source]
Computes a 2D Boolean mask from a given distance matrix based on a distance cutoff and a sequence separation cutoff.
- Parameters:
distance_matrix (np.array) – A 2D array where the element at index [i, j] represents the spatial distance between residues i and j. This matrix is assumed to be symmetric.
maximum_contact_distance (float, optional) – The maximum distance of a contact. Pairs of residues with distances less than this threshold are marked as True in the mask. If None, the spatial distance criterion is ignored and all distances are included. Default is None.
minimum_sequence_separation (int, optional) – A minimum sequence distance threshold. Pairs of residues with sequence indices differing by at least this value are marked as True in the mask. If None, the sequence distance criterion is ignored. Default is None.
- Returns:
mask – A 2D Boolean array of the same dimensions as distance_matrix. Elements of the mask are True where the residue pairs meet the specified distance_cutoff and sequence_distance_cutoff criteria.
- Return type:
np.array
Examples
>>> import numpy as np >>> dm = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> print(compute_mask(dm, distance_cutoff=1.5, sequence_distance_cutoff=1)) [[False True False] [ True False True] [False True False]]
Todo
Add chain information for sequence separation
- frustratometer.frustration.compute_native_energy(seq: str, potts_model: dict, mask: array, ignore_gap_couplings: bool = False, ignore_gap_fields: bool = False) float [source]
Computes the native energy of a protein sequence based on a given Potts model and an interaction mask.
\[E = \sum_i h_i + \frac{1}{2} \sum_{i,j} J_{ij} \Theta_{ij}\]- Parameters:
seq (str) – The amino acid sequence of the protein. The sequence is assumed to be in one-letter code. Gaps are represented as ‘-’. The length of the sequence (L) should match the dimensions of the Potts model.
potts_model (dict) – A dictionary containing the Potts model parameters ‘h’ (fields) and ‘J’ (couplings). The fields are a 2D array of shape (L, 20), where L is the length of the sequence and 20 is the number of amino acids. The couplings are a 4D array of shape (L, L, 20, 20). The fields and couplings are assumed to be in units of energy.
mask (np.array) – A 2D Boolean array that determines which residue pairs should be considered in the energy computation. The mask should have dimensions (L, L), where L is the length of the sequence.
ignore_couplings_of_gaps (bool, optional) – If True, couplings involving gaps (‘-’) in the sequence are set to 0 in the energy calculation. Default is False.
ignore_fields_of_gaps (bool, optional) – If True, fields corresponding to gaps (‘-’) in the sequence are set to 0 in the energy calculation. Default is False.
- Returns:
energy – The computed energy of the protein sequence based on the Potts model and the interaction mask.
- Return type:
Examples
>>> seq = "ACDEFGHIKLMNPQRSTVWY" >>> potts_model = { 'h': np.random.rand(20, 20), # Random fields 'J': np.random.rand(20, 20, 20, 20) # Random couplings } >>> mask = np.ones((len(seq), len(seq)), dtype=bool) # Include all pairs >>> energy = compute_native_energy(seq, potts_model, mask) >>> print(f"Computed energy: {energy:.2f}")
Notes
The energy is computed as the sum of the fields and the half-sum of the couplings for all pairs of residues where the mask is True. The division by 2 for the couplings accounts for double-counting in symmetric matrices.
Todo
Optimize the computation.
- frustratometer.frustration.compute_fields_energy(seq: str, potts_model: dict, ignore_fields_of_gaps: bool = False) float [source]
- frustratometer.frustration.compute_couplings_energy(seq: str, potts_model: dict, mask: array, ignore_couplings_of_gaps: bool = False) float [source]
- frustratometer.frustration.compute_sequences_energy(seqs: list, potts_model: dict, mask: array, split_couplings_and_fields=False) array [source]
- frustratometer.frustration.compute_singleresidue_decoy_energy_fluctuation(seq: str, potts_model: dict, mask: array) array [source]
$ Delta H_i = Delta h_i + sum_kDelta j_{ik} $
- Parameters:
seq
potts_model
mask
- Returns:
- frustratometer.frustration.compute_mutational_decoy_energy_fluctuation(seq: str, potts_model: dict, mask: array) array [source]
$$ Delta DCA_{ij} = H_i - H_{i’} + H_{j}-H_{j’} + J_{ij} -J_{ij’} + J_{i’j’} - J_{i’j} + sum_k {J_{ik} - J_{i’k} + J_{jk} -J_{j’k}} $$ :param seq: :param potts_model: :param mask: :return:
- frustratometer.frustration.compute_configurational_decoy_energy_fluctuation(seq: str, potts_model: dict, mask: array) array [source]
$$ Delta DCA_{ij} = H_i - H_{i’} + H_{j}-H_{j’} + J_{ij} -J_{ij’} + J_{i’j’} - J_{i’j} + sum_k {J_{ik} - J_{i’k} + J_{jk} -J_{j’k}} $$ :param seq: :param potts_model: :param mask: :return:
- frustratometer.frustration.compute_contact_decoy_energy_fluctuation(seq: str, potts_model: dict, mask: array) array [source]
$$ Delta DCA_{ij} = Delta j_{ij} $$ :param seq: :param potts_model: :param mask: :return:
- frustratometer.frustration.compute_decoy_energy(seq: str, potts_model: dict, mask: array, kind='singleresidue')[source]
Calculates the decoy energy (Obsolete) :param seq: :param potts_model: :param mask: :param kind: :return:
- frustratometer.frustration.compute_single_frustration(decoy_fluctuation, aa_freq=None, correction=0)[source]
- frustratometer.frustration.compute_pair_frustration(decoy_fluctuation, contact_freq: None | array, correction=0) array [source]
- frustratometer.frustration.compute_scores(potts_model: dict) array [source]
Computes contact scores based on the Frobenius norm
CN[i,j] = F[i,j] - F[i,:] * F[:,j] / F[:,:]
- Parameters:
potts_model (dict) – Potts model containing the couplings in the “J” key
- Returns:
scores – Score matrix (N x N)
- Return type:
np.array
- frustratometer.frustration.plot_singleresidue_decoy_energy(decoy_energy, native_energy, method='clustermap')[source]
- frustratometer.frustration.write_tcl_script(pdb_file, chain, single_frustration, pair_frustration, tcl_script='frustration.tcl', max_connections=100)[source]
- frustratometer.frustration.canvas(with_attribution=True)[source]
Placeholder function to show example docstring (NumPy format).
Replace this function and doc string for your own project.
- Parameters:
with_attribution (bool, Optional, default: True) – Set whether or not to display who the quote is from.
- Returns:
quote – Compiled string including quote and optional attribution.
- Return type: