seq-to-first-iso

Compute first two isotopologue intensities from sequences.

The program computes M0 and M1 and differentiate labelled (with a 99.99 % C[12] enrichment) and unlabelled amino acids.

Read a file composed of amino acid sequences on each line and return : sequence, mass, formula, formula_X, M0_NC, M1_NC, M0_12C and M1_12C in a tsv file.

Formula_X is the chemical formula with carbon of unlabelled amino acids marked as X.

NC means Normal Condition, 12C means C[12] enrichment condition.

Example

Running the script after installation

$ seq-to-first-iso sequences.txt

will provide file ‘sequences_stfi.tsv’

Notes

Carbon of unlabelled amino acids keep default isotopic abundance, and are represented as X in formulas. Naming conventions for isotopes follow pyteomics’s conventions.

seq_to_first_iso.sequence_parser(file, sep='\t')

Return information on sequences parsed from a file.

Parameters
  • file (str) – Filename, the file can either just have sequences for each line or can have have annotations and sequences with a separator in-between.

  • sep (str, optional) – Separator for files with annotations (default is \t).

Returns

Parsed output with “key: values” :
- “annotations”: a list of annotations if any.
- “raw_sequences”: a list of unmodified peptide sequences.
- “sequences”: a list of uppercase peptide sequences.
- “modifications”: a list of lists of PTMs.
- “ignored_lines”: the number of ignored lines.

Return type

dict

Warning

The function uses the first line to evaluate if the file has annotations or not, hence a file should have a consistent format.

Notes

Supports Xtandem’s Post-Translational Modification notation (0.4.0).
Supports annotations (0.3.0).
seq_to_first_iso.separate_labelled(sequence, unlabelled_aa)

Get the sequence of unlabelled amino acids from a sequence.

Parameters
  • sequence (str) – String of amino acids.

  • unlabelled_aa (container object) – Container (list, string…) of unlabelled amino acids.

Returns

The sequences as a tuple of string with:
- the sequence without the unlabelled amino acids
- the unlabelled amino acids in the sequence

Return type

tuple(str, str)

seq_to_first_iso.compute_M0_nl(f, a)

Return the monoisotopic abundance M0 of a formula with mixed labels.

Parameters
  • f (pyteomics.mass.Composition) – Chemical formula, as a dict of counts for each element: {element_name: count_of_element_in_sequence, …}.

  • a (dict) – Dictionary of abundances of isotopes, in the format: {element_name[isotope_number]: relative abundance, ..}.

Returns

Value of M0.

Return type

float

Notes

X represents C with default isotopic abundance.

seq_to_first_iso.compute_M1_nl(f, a)

Compute abundance of second isotopologue M1 from its formula.

Parameters
  • f (pyteomics.mass.Composition) – Chemical formula, as a dict of counts for each element: {element_name: count_of_element_in_sequence, …}.

  • a (dict) – Dictionary of abundances of isotopes, in the format: {element_name[isotope_number]: relative abundance, ..}.

Returns

Value of M1.

Return type

float

Notes

X represents C with default isotopic abundance.

seq_to_first_iso.formula_to_str(composition)

Return formula from Composition as a string.

Parameters

composition (pyteomics.mass.Composition) – Chemical formula.

Returns

Human-readable string of the formula.

Return type

str

Warning

If the composition has elements not in USED_ELEMS, they will not be added to the output.

seq_to_first_iso.seq_to_xcomp(sequence_l, sequence_nl)

Take 2 amino acid sequences and return the composition with X.

The second sequence will have its C replaced by X.

Parameters
  • sequence_l (str or pyteomics.mass.Composition) – Sequence or composition with labelled amino acids.

  • sequence_nl (str or pyteomics.mass.Composition) – Sequence or composition where amino acids are not labelled.

Returns

Composition with unlabelled carbon as element X.

Return type

pyteomics.mass.Composition

Notes

The function assumes the second sequence has no terminii (H-, -OH).
Supports pyteomics.mass.Composition as argument (0.5.1).
If mass.Composition objects are provided, the function assumes the terminii of the second composition were already removed.
seq_to_first_iso.get_mods_composition(modifications)

Return the composition of a list of modifications.

Parameters

modifications (list of str) – List of modifications string (corresponding to Unimod titles).

Returns

The total composition change.

Return type

pyteomics.mass.Composition

seq_to_first_iso.seq_to_df(sequences, unlabelled_aa, **kwargs)

Create a dataframe from sequences and return its name.

Parameters
  • sequences (list of str) – List of pure peptide sequences string.

  • unlabelled_aa (container object) – Container of unlabelled amino acids.

  • annotations (list of str, optional) – List of IDs for the sequences.

  • raw_sequences (list of str, optional) – List of sequences with Xtandem PTMs.

  • modifications (list of str, optional) – List of modifications for raw_sequences.

Returns

Dataframe with :
annotation (optional), sequence, mass, formula, formula_X, M0_NC, M1_NC, M0_12C, M1_12C.

Return type

pandas.Dataframe

Warning

If raw_sequence is provided, modifications must also be provided and vice-versa.