seq-to-first-iso¶

Compute intensities of the first two isotopologue.

Use peptide sequences and charges.

The program computes M0 and M1 and differentiate labelled (with a 99.99 % C[12] enrichment) and unlabelled amino acids.

Read a .tsv file composed of amino acid sequences on each line and return: sequence, mass, formula, formula_X, M0_NC, M1_NC, M0_12C and M1_12C in a .tsv file.

Formula_X is the chemical formula with carbon of unlabelled amino acids marked as X.

NC means Normal Condition, 12C means C[12] enrichment condition.

Example

Running the script after installation

$ seq-to-first-iso sequences.tsv sequence_column_name charge_column_name

will provide file ‘sequences_stfi.tsv’

Notes

Carbon of unlabelled amino acids keep default isotopic abundance, and are represented as X in formulas. Naming conventions for isotopes follow pyteomics’s conventions.

seq_to_first_iso.parse_input_file(filename, sep='\t')¶

Parse input file.

Parameters

filename (str) – Filename, the file can either just have sequences for each line or can have have annotations and sequences with a separator in-between.
sep (str, optional) – Separator for files with annotations (default is \t).

Returns

Return type

pandas.DataFrame

Raises

FileNotFoundError – If the input file is not found. Exception chaining is explicitly suppressed (from None).
Exception – If the input file cannot be read with pandas. Exception chaining is explicitly suppressed (from None).

seq_to_first_iso.filter_input_dataframe(dataframe, sequence_col_name, charge_col_name)¶

Filter input file with peptide sequences and charges.

Parameters

dataframe (pandas.DataFrame) – Raw dataframe with all input columns
sequence_col_name (str) – Name of column with peptide sequences
charge_col_name (str) – Name of column with peptide charges

Returns

With columns :

- “sequence”: peptide sequences.

- “charge”: peptide charges.

Return type

pandas.DataFrame

Raises

KeyError – If the sequence or charge column is not found.

seq_to_first_iso.check_amino_acids(seq)¶

Check elements of a sequence are known amino acids.

Parameters

seq (str) – Peptide sequence.

Returns

(sequence, “”) if the sequence is composed

of allowed amino acids

(“”, “Unrecognized amino acids.”) if the sequence is composed

of unallowed amino acids.

Return type

Tuple of two str

seq_to_first_iso.separate_labelled(sequence, unlabelled_aa)¶

Get the sequence of unlabelled amino acids from a sequence.

Parameters

sequence (str) – String of amino acids.
unlabelled_aa (container object) – Container (list, string…) of unlabelled amino acids.

Returns

The sequences as a tuple of string with:

- the sequence without the unlabelled amino acids

- the unlabelled amino acids in the sequence

Return type

tuple(str, str)

seq_to_first_iso.compute_M0_nl(formula, abundance)¶

Compute intensity of the first isotopologue M0.

Handle element X with specific abundance.

Parameters

formula (pyteomics.mass.Composition) – Chemical formula, as a dict of the number of atoms for each element: {element_name: number_of_atoms, …}.
abundance (dict) – Dictionary of abundances of isotopes: {“element_name[isotope_number]”: relative abundance, ..}.

Returns

Value of M0.

Return type

float

Notes

X represents C with default isotopic abundance.

seq_to_first_iso.compute_M1_nl(formula, abundance)¶

Compute intensity of the second isotopologue M1.

Handle element X with specific abundance.

Parameters

formula (pyteomics.mass.Composition) – Chemical formula, as a dict of the number of atoms for each element: {element_name: number_of_atoms, …}.
abundance (dict) – Dictionary of abundances of isotopes: {“element_name[isotope_number]”: relative abundance, ..}.

Returns

Value of M1.

Return type

float

Notes

X represents C with default isotopic abundance.

seq_to_first_iso.formula_to_str(composition)¶

Return formula from Composition as a string.

Parameters: composition (pyteomics.mass.Composition) – Chemical formula.
Returns: Human-readable string of the formula.
Return type: str

Warning

If the composition has elements not in USED_ELEMS, they will not be added to the output.

seq_to_first_iso.convert_atom_C_to_X(sequence)¶

Replace carbon atom by element X atom in a composition.

Parameters: sequence (str or pyteomics.mass.Composition) – Sequence or composition.
Returns: Composition with carbon atoms replaced by element X atoms.
Return type: pyteomics.mass.Composition

seq_to_first_iso.get_charge_composition(charge)¶

Return the composition of a given charge (only H+).

Parameters: charge (int) – Peptide charge.
Returns: Composition of the change (H+).
Return type: pyteomics.mass.Composition

seq_to_first_iso.get_mods_composition(modifications)¶

Return the composition of a list of modifications.

Parameters: modifications (list of str) – List of modifications string (corresponding to Unimod titles).
Returns: The total composition change.
Return type: pyteomics.mass.Composition

seq_to_first_iso.compute_intensities(df_peptides, unlabelled_aa=[])¶

Compute isotopologues intensities from peptide sequences.

Parameters

df_peptides (pandas.DataFrame) – Dataframe with column ‘sequence’ and ‘charge’
unlabelled_aa (container object) – Container of unlabelled amino acids.

Returns

Dataframe with all computed values, compositions and formulas.

Return type

pandas.DataFrame

Notes

Supports Xtandem’s Post-Translational Modification notation (0.4.0).