seq-to-first-iso¶
Compute intensities of the first two isotopologue.
Use peptide sequences and charges.
The program computes M0 and M1 and differentiate labelled (with a 99.99 % C[12] enrichment) and unlabelled amino acids.
Read a .tsv file composed of amino acid sequences on each line and return: sequence, mass, formula, formula_X, M0_NC, M1_NC, M0_12C and M1_12C in a .tsv file.
Formula_X is the chemical formula with carbon of unlabelled amino acids marked as X.
NC means Normal Condition, 12C means C[12] enrichment condition.
Example
Running the script after installation
$ seq-to-first-iso sequences.tsv sequence_column_name charge_column_name
will provide file ‘sequences_stfi.tsv’
Notes
Carbon of unlabelled amino acids keep default isotopic abundance, and are represented as X in formulas. Naming conventions for isotopes follow pyteomics’s conventions.
-
seq_to_first_iso.
parse_input_file
(filename, sep='\t')¶ Parse input file.
- Parameters
filename (str) – Filename, the file can either just have sequences for each line or can have have annotations and sequences with a separator in-between.
sep (str, optional) – Separator for files with annotations (default is
\t
).
- Returns
- Return type
pandas.DataFrame
- Raises
FileNotFoundError – If the input file is not found. Exception chaining is explicitly suppressed (from None).
Exception – If the input file cannot be read with pandas. Exception chaining is explicitly suppressed (from None).
-
seq_to_first_iso.
filter_input_dataframe
(dataframe, sequence_col_name, charge_col_name)¶ Filter input file with peptide sequences and charges.
- Parameters
dataframe (pandas.DataFrame) – Raw dataframe with all input columns
sequence_col_name (str) – Name of column with peptide sequences
charge_col_name (str) – Name of column with peptide charges
- Returns
- With columns :- “sequence”: peptide sequences.- “charge”: peptide charges.
- Return type
pandas.DataFrame
- Raises
KeyError – If the sequence or charge column is not found.
-
seq_to_first_iso.
check_amino_acids
(seq)¶ Check elements of a sequence are known amino acids.
- Parameters
seq (str) – Peptide sequence.
- Returns
- (sequence, “”) if the sequence is composedof allowed amino acids(“”, “Unrecognized amino acids.”) if the sequence is composedof unallowed amino acids.
- Return type
Tuple of two str
-
seq_to_first_iso.
separate_labelled
(sequence, unlabelled_aa)¶ Get the sequence of unlabelled amino acids from a sequence.
- Parameters
sequence (str) – String of amino acids.
unlabelled_aa (container object) – Container (list, string…) of unlabelled amino acids.
- Returns
- The sequences as a tuple of string with:- the sequence without the unlabelled amino acids- the unlabelled amino acids in the sequence
- Return type
tuple(str, str)
-
seq_to_first_iso.
compute_M0_nl
(formula, abundance)¶ Compute intensity of the first isotopologue M0.
Handle element X with specific abundance.
- Parameters
formula (pyteomics.mass.Composition) – Chemical formula, as a dict of the number of atoms for each element: {element_name: number_of_atoms, …}.
abundance (dict) – Dictionary of abundances of isotopes: {“element_name[isotope_number]”: relative abundance, ..}.
- Returns
Value of M0.
- Return type
float
Notes
X represents C with default isotopic abundance.
-
seq_to_first_iso.
compute_M1_nl
(formula, abundance)¶ Compute intensity of the second isotopologue M1.
Handle element X with specific abundance.
- Parameters
formula (pyteomics.mass.Composition) – Chemical formula, as a dict of the number of atoms for each element: {element_name: number_of_atoms, …}.
abundance (dict) – Dictionary of abundances of isotopes: {“element_name[isotope_number]”: relative abundance, ..}.
- Returns
Value of M1.
- Return type
float
Notes
X represents C with default isotopic abundance.
-
seq_to_first_iso.
formula_to_str
(composition)¶ Return formula from Composition as a string.
- Parameters
composition (pyteomics.mass.Composition) – Chemical formula.
- Returns
Human-readable string of the formula.
- Return type
str
Warning
If the composition has elements not in USED_ELEMS, they will not be added to the output.
-
seq_to_first_iso.
convert_atom_C_to_X
(sequence)¶ Replace carbon atom by element X atom in a composition.
- Parameters
sequence (str or pyteomics.mass.Composition) – Sequence or composition.
- Returns
Composition with carbon atoms replaced by element X atoms.
- Return type
pyteomics.mass.Composition
-
seq_to_first_iso.
get_charge_composition
(charge)¶ Return the composition of a given charge (only H+).
- Parameters
charge (int) – Peptide charge.
- Returns
Composition of the change (H+).
- Return type
pyteomics.mass.Composition
-
seq_to_first_iso.
get_mods_composition
(modifications)¶ Return the composition of a list of modifications.
- Parameters
modifications (list of str) – List of modifications string (corresponding to Unimod titles).
- Returns
The total composition change.
- Return type
pyteomics.mass.Composition
-
seq_to_first_iso.
compute_intensities
(df_peptides, unlabelled_aa=[])¶ Compute isotopologues intensities from peptide sequences.
- Parameters
df_peptides (pandas.DataFrame) – Dataframe with column ‘sequence’ and ‘charge’
unlabelled_aa (container object) – Container of unlabelled amino acids.
- Returns
- Dataframe with all computed values, compositions and formulas.
- Return type
pandas.DataFrame
Notes
Supports Xtandem’s Post-Translational Modification notation (0.4.0).