seq-to-first-iso¶
Compute first two isotopologue intensities from sequences.
The program computes M0 and M1 and differentiate labelled (with a 99.99 % C[12] enrichment) and unlabelled amino acids.
Read a file composed of amino acid sequences on each line and return : sequence, mass, formula, formula_X, M0_NC, M1_NC, M0_12C and M1_12C in a tsv file.
Formula_X is the chemical formula with carbon of unlabelled amino acids marked as X.
NC means Normal Condition, 12C means C[12] enrichment condition.
Example
Running the script after installation
$ seq-to-first-iso sequences.txt
will provide file ‘sequences_stfi.tsv’
Notes
Carbon of unlabelled amino acids keep default isotopic abundance, and are represented as X in formulas. Naming conventions for isotopes follow pyteomics’s conventions.
-
seq_to_first_iso.
sequence_parser
(file, sep='\t')¶ Return information on sequences parsed from a file.
- Parameters
file (str) – Filename, the file can either just have sequences for each line or can have have annotations and sequences with a separator in-between.
sep (str, optional) – Separator for files with annotations (default is
\t
).
- Returns
- Parsed output with “key: values” :- “annotations”: a list of annotations if any.- “raw_sequences”: a list of unmodified peptide sequences.- “sequences”: a list of uppercase peptide sequences.- “modifications”: a list of lists of PTMs.- “ignored_lines”: the number of ignored lines.
- Return type
dict
Warning
The function uses the first line to evaluate if the file has annotations or not, hence a file should have a consistent format.
Notes
Supports Xtandem’s Post-Translational Modification notation (0.4.0).Supports annotations (0.3.0).
-
seq_to_first_iso.
separate_labelled
(sequence, unlabelled_aa)¶ Get the sequence of unlabelled amino acids from a sequence.
- Parameters
sequence (str) – String of amino acids.
unlabelled_aa (container object) – Container (list, string…) of unlabelled amino acids.
- Returns
- The sequences as a tuple of string with:- the sequence without the unlabelled amino acids- the unlabelled amino acids in the sequence
- Return type
tuple(str, str)
-
seq_to_first_iso.
compute_M0_nl
(f, a)¶ Return the monoisotopic abundance M0 of a formula with mixed labels.
- Parameters
f (pyteomics.mass.Composition) – Chemical formula, as a dict of counts for each element: {element_name: count_of_element_in_sequence, …}.
a (dict) – Dictionary of abundances of isotopes, in the format: {element_name[isotope_number]: relative abundance, ..}.
- Returns
Value of M0.
- Return type
float
Notes
X represents C with default isotopic abundance.
-
seq_to_first_iso.
compute_M1_nl
(f, a)¶ Compute abundance of second isotopologue M1 from its formula.
- Parameters
f (pyteomics.mass.Composition) – Chemical formula, as a dict of counts for each element: {element_name: count_of_element_in_sequence, …}.
a (dict) – Dictionary of abundances of isotopes, in the format: {element_name[isotope_number]: relative abundance, ..}.
- Returns
Value of M1.
- Return type
float
Notes
X represents C with default isotopic abundance.
-
seq_to_first_iso.
formula_to_str
(composition)¶ Return formula from Composition as a string.
- Parameters
composition (pyteomics.mass.Composition) – Chemical formula.
- Returns
Human-readable string of the formula.
- Return type
str
Warning
If the composition has elements not in USED_ELEMS, they will not be added to the output.
-
seq_to_first_iso.
seq_to_xcomp
(sequence_l, sequence_nl)¶ Take 2 amino acid sequences and return the composition with X.
The second sequence will have its C replaced by X.
- Parameters
sequence_l (str or pyteomics.mass.Composition) – Sequence or composition with labelled amino acids.
sequence_nl (str or pyteomics.mass.Composition) – Sequence or composition where amino acids are not labelled.
- Returns
Composition with unlabelled carbon as element X.
- Return type
pyteomics.mass.Composition
Notes
The function assumes the second sequence has no terminii (H-, -OH).Supports pyteomics.mass.Composition as argument (0.5.1).If mass.Composition objects are provided, the function assumes the terminii of the second composition were already removed.
-
seq_to_first_iso.
get_mods_composition
(modifications)¶ Return the composition of a list of modifications.
- Parameters
modifications (list of str) – List of modifications string (corresponding to Unimod titles).
- Returns
The total composition change.
- Return type
pyteomics.mass.Composition
-
seq_to_first_iso.
seq_to_df
(sequences, unlabelled_aa, **kwargs)¶ Create a dataframe from sequences and return its name.
- Parameters
sequences (list of str) – List of pure peptide sequences string.
unlabelled_aa (container object) – Container of unlabelled amino acids.
annotations (list of str, optional) – List of IDs for the sequences.
raw_sequences (list of str, optional) – List of sequences with Xtandem PTMs.
modifications (list of str, optional) – List of modifications for raw_sequences.
- Returns
- Dataframe with :annotation (optional), sequence, mass, formula, formula_X, M0_NC, M1_NC, M0_12C, M1_12C.
- Return type
pandas.Dataframe
Warning
If raw_sequence is provided, modifications must also be provided and vice-versa.