seq-to-first-iso

Compute intensities of the first two isotopologue.

Use peptide sequences and charges.

The program computes M0 and M1 and differentiate labelled (with a 99.99 % C[12] enrichment) and unlabelled amino acids.

Read a .tsv file composed of amino acid sequences on each line and return: sequence, mass, formula, formula_X, M0_NC, M1_NC, M0_12C and M1_12C in a .tsv file.

Formula_X is the chemical formula with carbon of unlabelled amino acids marked as X.

NC means Normal Condition, 12C means C[12] enrichment condition.

Example

Running the script after installation

$ seq-to-first-iso sequences.tsv sequence_column_name charge_column_name

will provide file ‘sequences_stfi.tsv’

Notes

Carbon of unlabelled amino acids keep default isotopic abundance, and are represented as X in formulas. Naming conventions for isotopes follow pyteomics’s conventions.

seq_to_first_iso.parse_input_file(filename, sep='\t')

Parse input file.

Parameters
  • filename (str) – Filename, the file can either just have sequences for each line or can have have annotations and sequences with a separator in-between.

  • sep (str, optional) – Separator for files with annotations (default is \t).

Returns

Return type

pandas.DataFrame

Raises
  • FileNotFoundError – If the input file is not found. Exception chaining is explicitly suppressed (from None).

  • Exception – If the input file cannot be read with pandas. Exception chaining is explicitly suppressed (from None).

seq_to_first_iso.filter_input_dataframe(dataframe, sequence_col_name, charge_col_name)

Filter input file with peptide sequences and charges.

Parameters
  • dataframe (pandas.DataFrame) – Raw dataframe with all input columns

  • sequence_col_name (str) – Name of column with peptide sequences

  • charge_col_name (str) – Name of column with peptide charges

Returns

With columns :
- “sequence”: peptide sequences.
- “charge”: peptide charges.

Return type

pandas.DataFrame

Raises

KeyError – If the sequence or charge column is not found.

seq_to_first_iso.check_amino_acids(seq)

Check elements of a sequence are known amino acids.

Parameters

seq (str) – Peptide sequence.

Returns

(sequence, “”) if the sequence is composed
of allowed amino acids
(“”, “Unrecognized amino acids.”) if the sequence is composed
of unallowed amino acids.

Return type

Tuple of two str

seq_to_first_iso.separate_labelled(sequence, unlabelled_aa)

Get the sequence of unlabelled amino acids from a sequence.

Parameters
  • sequence (str) – String of amino acids.

  • unlabelled_aa (container object) – Container (list, string…) of unlabelled amino acids.

Returns

The sequences as a tuple of string with:
- the sequence without the unlabelled amino acids
- the unlabelled amino acids in the sequence

Return type

tuple(str, str)

seq_to_first_iso.compute_M0_nl(formula, abundance)

Compute intensity of the first isotopologue M0.

Handle element X with specific abundance.

Parameters
  • formula (pyteomics.mass.Composition) – Chemical formula, as a dict of the number of atoms for each element: {element_name: number_of_atoms, …}.

  • abundance (dict) – Dictionary of abundances of isotopes: {“element_name[isotope_number]”: relative abundance, ..}.

Returns

Value of M0.

Return type

float

Notes

X represents C with default isotopic abundance.

seq_to_first_iso.compute_M1_nl(formula, abundance)

Compute intensity of the second isotopologue M1.

Handle element X with specific abundance.

Parameters
  • formula (pyteomics.mass.Composition) – Chemical formula, as a dict of the number of atoms for each element: {element_name: number_of_atoms, …}.

  • abundance (dict) – Dictionary of abundances of isotopes: {“element_name[isotope_number]”: relative abundance, ..}.

Returns

Value of M1.

Return type

float

Notes

X represents C with default isotopic abundance.

seq_to_first_iso.formula_to_str(composition)

Return formula from Composition as a string.

Parameters

composition (pyteomics.mass.Composition) – Chemical formula.

Returns

Human-readable string of the formula.

Return type

str

Warning

If the composition has elements not in USED_ELEMS, they will not be added to the output.

seq_to_first_iso.convert_atom_C_to_X(sequence)

Replace carbon atom by element X atom in a composition.

Parameters

sequence (str or pyteomics.mass.Composition) – Sequence or composition.

Returns

Composition with carbon atoms replaced by element X atoms.

Return type

pyteomics.mass.Composition

seq_to_first_iso.get_charge_composition(charge)

Return the composition of a given charge (only H+).

Parameters

charge (int) – Peptide charge.

Returns

Composition of the change (H+).

Return type

pyteomics.mass.Composition

seq_to_first_iso.get_mods_composition(modifications)

Return the composition of a list of modifications.

Parameters

modifications (list of str) – List of modifications string (corresponding to Unimod titles).

Returns

The total composition change.

Return type

pyteomics.mass.Composition

seq_to_first_iso.compute_intensities(df_peptides, unlabelled_aa=[])

Compute isotopologues intensities from peptide sequences.

Parameters
  • df_peptides (pandas.DataFrame) – Dataframe with column ‘sequence’ and ‘charge’

  • unlabelled_aa (container object) – Container of unlabelled amino acids.

Returns

Dataframe with all computed values, compositions and formulas.

Return type

pandas.DataFrame

Notes

Supports Xtandem’s Post-Translational Modification notation (0.4.0).