API of seq-to-first-iso¶

seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99% 12C enriched carbon.

The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.

seq-to-first-iso is available as a Python module.

[1]:

from pathlib import Path
from pprint import pprint

from pkg_resources import get_distribution  # Comes with setuptools.
import pandas as pd
from pyteomics import mass

import seq_to_first_iso as stfi

[2]:

try:
    print(f"pyteomics version: {get_distribution('pyteomics').version}")
except:
    print("pyteomics version not found")

print(f"pandas version: {pd.__version__}\n"
      f"seq-to-first-iso version: {stfi.__version__}"
     )

pyteomics version: 4.1.2
pandas version: 0.25.1
seq-to-first-iso version: 0.5.1

Abundances defined in seq-to-first-iso¶

[3]:

pprint(stfi.NATURAL_ABUNDANCE)

{'C[12]': 0.9893,
 'C[13]': 0.0107,
 'H[1]': 0.999885,
 'H[2]': 0.000115,
 'N[14]': 0.99632,
 'N[15]': 0.00368,
 'O[16]': 0.99757,
 'O[17]': 0.00038,
 'O[18]': 0.00205,
 'S[32]': 0.9493,
 'S[33]': 0.0076,
 'S[34]': 0.0429,
 'X[12]': 0.9893,
 'X[13]': 0.0107}

[4]:

pprint(stfi.C12_ABUNDANCE)

{'C[12]': 0.9999,
 'C[13]': 9.999999999998899e-05,
 'H[1]': 0.999885,
 'H[2]': 0.000115,
 'N[14]': 0.99632,
 'N[15]': 0.00368,
 'O[16]': 0.99757,
 'O[17]': 0.00038,
 'O[18]': 0.00205,
 'S[32]': 0.9493,
 'S[33]': 0.0076,
 'S[34]': 0.0429,
 'X[12]': 0.9893,
 'X[13]': 0.0107}

NATURAL_ABUNDANCE and C12_ABUNDANCE are dictionaries with abundances of common isotopes of organic elements.
C12_ABUNDANCE has a 12C abundance of 99.99 %, hence 13C abundance is 0.01 %.
Element X is a virtual element created to replace the carbon of unlabelled amino acids, it has the same isotopic abundances as natural carbon.

Separate sequences according to unlabelled amino acids¶

[5]:

help(stfi.separate_labelled)

Help on function separate_labelled in module seq_to_first_iso.seq_to_first_iso:

separate_labelled(sequence, unlabelled_aa)
    Get the sequence of unlabelled amino acids from a sequence.

    Parameters
    ----------
    sequence : str
        String of amino acids.
    unlabelled_aa : container object
        Container (list, string...) of unlabelled amino acids.

    Returns
    -------
    tuple(str, str)
        | The sequences as a tuple of string with:
        |    - the sequence without the unlabelled amino acids
        |    - the unlabelled amino acids in the sequence

[6]:

# Separate sequence "YAQEISRAR" with amino acids A and R unlabelled.
peptide_seq = "YAQEISRAR"
unlabelled_amino_acids = ["A", "R"]

labelled_sequence, unlabelled_sequence = stfi.separate_labelled(peptide_seq, unlabelled_aa=unlabelled_amino_acids)

print(
    f"Original sequence: {peptide_seq}\n"
    f"Unlabelled amino acids: {unlabelled_amino_acids}\n"
    f"Sequence with labelled carbon: {labelled_sequence}\n"
    f"Sequence with unlabelled carbon: {unlabelled_sequence}")

Original sequence: YAQEISRAR
Unlabelled amino acids: ['A', 'R']
Sequence with labelled carbon: YQEIS
Sequence with unlabelled carbon: ARAR

Obtain a composition with element X¶

[7]:

# Get the chemical formula with unlabelled carbon as element X.
labelled_formula = mass.Composition(labelled_sequence)
unlabelled_formula = stfi.convert_atom_C_to_X(mass.Composition(parsed_sequence=unlabelled_sequence))
peptide_formula = unlabelled_formula + labelled_formula
print(f"Composition of labelled amino acids: {labelled_formula}")
print(f"Composition of unlabelled amino acids (X is C): {unlabelled_formula}")
print(f"Composition of {peptide_seq} with {unlabelled_amino_acids} unlabelled:\n{peptide_formula}")

Composition of labelled amino acids: Composition({'H': 42, 'C': 28, 'O': 11, 'N': 6})
Composition of unlabelled amino acids (X is C): Composition({'H': 34, 'O': 4, 'N': 10, 'X': 18})
Composition of YAQEISRAR with ['A', 'R'] unlabelled:
Composition({'H': 76, 'O': 15, 'N': 16, 'X': 18, 'C': 28})

Compute isotopologue intensity¶

[8]:

help(stfi.compute_M0_nl)
print("-" * 79)
help(stfi.compute_M1_nl)

Help on function compute_M0_nl in module seq_to_first_iso.seq_to_first_iso:

compute_M0_nl(formula, abundance)
    Compute intensity of the first isotopologue M0.

    Handle element X with specific abundance.

    Parameters
    ----------
    formula : pyteomics.mass.Composition
        Chemical formula, as a dict of the number of atoms for each element:
        {element_name: number_of_atoms, ...}.
    abundance : dict
        Dictionary of abundances of isotopes:
        {"element_name[isotope_number]": relative abundance, ..}.

    Returns
    -------
    float
        Value of M0.

    Notes
    -----
    X represents C with default isotopic abundance.

-------------------------------------------------------------------------------
Help on function compute_M1_nl in module seq_to_first_iso.seq_to_first_iso:

compute_M1_nl(formula, abundance)
    Compute intensity of the second isotopologue M1.

    Handle element X with specific abundance.

    Parameters
    ----------
    formula : pyteomics.mass.Composition
        Chemical formula, as a dict of the number of atoms for each element:
        {element_name: number_of_atoms, ...}.
    abundance : dict
        Dictionary of abundances of isotopes:
        {"element_name[isotope_number]": relative abundance, ..}.

    Returns
    -------
    float
        Value of M1.

    Notes
    -----
    X represents C with default isotopic abundance.

[9]:

# Compute M0 with natural carbon.
first_isotopologue = stfi.compute_M0_nl(peptide_formula, stfi.NATURAL_ABUNDANCE)
print(f"M0 in normal (98.93% 12C) condition: {first_isotopologue}")

first_isotopologue = stfi.compute_M0_nl(peptide_formula, stfi.C12_ABUNDANCE)
print(f"M0 in    12C (99.99% 12C) condition: {first_isotopologue}")

M0 in normal (98.93% 12C) condition: 0.5493191520383802
M0 in    12C (99.99% 12C) condition: 0.7403283857401063

[10]:

# Compute M1 with natural carbon.
second_isotopologue = stfi.compute_M1_nl(peptide_formula, stfi.NATURAL_ABUNDANCE)
print(f"M1 in normal (98.93% 12C) condition: {second_isotopologue}")

second_isotopologue = stfi.compute_M1_nl(peptide_formula, stfi.C12_ABUNDANCE)
print(f"M1 in    12C (99.99% 12C) condition: {second_isotopologue}")

M1 in normal (98.93% 12C) condition: 0.313702912736476
M1 in    12C (99.99% 12C) condition: 0.200655465179031

Get the composition of a list of Post-translational modifications (PTMs)¶

[11]:

help(stfi.get_mods_composition)

Help on function get_mods_composition in module seq_to_first_iso.seq_to_first_iso:

get_mods_composition(modifications)
    Return the composition of a list of modifications.

    Parameters
    ----------
    modifications : list of str
        List of modifications string (corresponding to Unimod titles).

    Returns
    -------
    pyteomics.mass.Composition
        The total composition change.

[12]:

# Modifications must be strict Unimod entries title.
modification_list = ["Acetyl", "Phospho", "phospho"]  # phospho does not correspond to a real PTM name, it will be ignored
total_composition = stfi.get_mods_composition(modification_list)
print(f"Total composition for {modification_list} is {total_composition}")

[2019-12-05, 13:55:32] WARNING : Unimod entry not found for : phospho

Total composition for ['Acetyl', 'Phospho', 'phospho'] is Composition({'H': 3, 'C': 2, 'O': 4, 'P': 1})

Get human-readable chemical formula¶

[13]:

help(stfi.formula_to_str)

Help on function formula_to_str in module seq_to_first_iso.seq_to_first_iso:

formula_to_str(composition)
    Return formula from Composition as a string.

    Parameters
    ----------
    composition : pyteomics.mass.Composition
        Chemical formula.

    Returns
    -------
    str
        Human-readable string of the formula.

    Warnings
    --------
    If the composition has elements not in USED_ELEMS, they will not
    be added to the output.

[14]:

# This is the function used to get the formulas in the output.
formula_str = stfi.formula_to_str(total_composition)
print(f"{total_composition} becomes {formula_str}")

Composition({'H': 3, 'C': 2, 'O': 4, 'P': 1}) becomes C2H3O4P1

[15]:

# !!! Warning: if the Composition has elements not in "CHONPSX", they will not be in the final string.
bad_composition = mass.Composition("U")
formula_str = stfi.formula_to_str(bad_composition)
print(f"Compostion with unsupported element {bad_composition} becomes {formula_str}")

Compostion with unsupported element Composition({'H': 7, 'C': 3, 'O': 2, 'N': 1, 'Se': 1}) becomes C3H7O2N1

Here, “non-CHONPSX” element Se (Selenium) is ignored!

Parse a file with peptide sequences and charges¶

seq-to-first-iso reads tsv files with at least a sequence and a charge columns.

The parser will ignore lines where sequences have incorrect characters (not in ACDEFGHIKLMNPQRSTVWY) unless it corresponds to XTandem’s PTMs notation.

[16]:

df_raw = stfi.parse_input_file("peptides.tsv")
df_filtered = stfi.filter_input_dataframe(df_raw, "pep_sequence", "pep_charge")
print(df_filtered)

[2019-12-05, 13:55:32] INFO    : Read peptides.tsv
[2019-12-05, 13:55:32] INFO    : Found 11 lines and 3 columns

                                             sequence  charge
0                                             YAQEISR       2
1          VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK       3
2                           QRTTFFVLGINTVNYPDIYEHILER       2
3                               AELFL(Glutathione)LNR       1
4   .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D...       4
5                       YKTMNTFDPD(Heme)EKFEWFQVWQAVK       2
6          HKSASSPAV(Pro-&gt;Val)NADTDIQDSSTPSTSPSGRR       2
7                                                FHNK       1
8                                .(Glutathione)MDLEIK       3
9                                        LANEKPEDVFER       2
10  .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)...       3

[17]:

df_final = stfi.compute_intensities(df_filtered, unlabelled_aa=["A", "R"])
df_final

[2019-12-05, 13:55:33] INFO    : Reading sequences.
[2019-12-05, 13:55:33] INFO    : Computing composition and formula.
[2019-12-05, 13:55:33] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 13:55:33] INFO    : Computing neutral mass
[2019-12-05, 13:55:33] INFO    : Computing M0 and M1

[17]:

	stfi_sequence	stfi_charge	stfi_sequence_clean	stfi_modification	stfi_sequence_without_mod	stfi_sequence_to_process	stfi_sequence_labelled	stfi_sequence_unlabelled	stfi_composition_mod	...	stfi_composition_peptide_neutral	stfi_composition_peptide_with_charge	stfi_composition_peptide_with_charge_X	stfi_formula	stfi_formula_X	stfi_neutral_mass	stfi_M0_NC	stfi_M1_NC	stfi_M0_12C	stfi_M1_12C
0	YAQEISR	2	YAQEISR	[]	YAQEISR	YAQEISR	YQEIS	AR	{}	...	{'H': 59, 'C': 37, 'O': 13, 'N': 11}	{'H': 61, 'C': 37, 'O': 13, 'N': 11}	{'H': 61, 'C': 28, 'O': 13, 'N': 11, 'X': 9}	C37H61O13N11	C28H61O13N11X9	865.429381	0.620499	0.280949	0.836258	0.127729
1	VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK	3	VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK	[Phospho]	VLLIDLRIPQRSAINHIVAPNLVNVDPNLLWDK	VLLIDLRIPQRSAINHIVAPNLVNVDPNLLWDK	VLLIDLIPQSINHIVPNLVNVDPNLLWDK	RRAA	{'H': 1, 'O': 3, 'P': 1}	...	{'H': 285, 'C': 172, 'O': 49, 'N': 48, 'P': 1}	{'H': 288, 'C': 172, 'O': 49, 'N': 48, 'P': 1}	{'H': 288, 'C': 154, 'O': 49, 'N': 48, 'X': 18...	C172H288O49N48P1	C154H288O49N48P1X18	3838.102264	0.113085	0.236277	0.583716	0.256348
2	QRTTFFVLGINTVNYPDIYEHILER	2	QRTTFFVLGINTVNYPDIYEHILER	[]	QRTTFFVLGINTVNYPDIYEHILER	QRTTFFVLGINTVNYPDIYEHILER	QTTFFVLGINTVNYPDIYEHILE	RR	{}	...	{'H': 212, 'C': 140, 'O': 40, 'N': 36}	{'H': 214, 'C': 140, 'O': 40, 'N': 36}	{'H': 214, 'C': 128, 'O': 40, 'N': 36, 'X': 12}	C140H214O40N36	C128H214O40N36X12	3037.566156	0.171920	0.290033	0.672639	0.212157
3	AELFL(Glutathione)LNR	1	AELFL(Glutathione)LNR	[Glutathione]	AELFLLNR	AELFLLNR	ELFLLN	AR	{'H': 15, 'C': 10, 'N': 3, 'O': 6, 'S': 1}	...	{'H': 89, 'C': 55, 'O': 18, 'N': 15, 'S': 1}	{'H': 90, 'C': 55, 'O': 18, 'N': 15, 'S': 1}	{'H': 90, 'C': 46, 'O': 18, 'N': 15, 'X': 9, '...	C55H90O18N15S1	C46H90O18N15S1X9	1279.623072	0.470882	0.318073	0.768822	0.140356
4	.(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D...	4	.(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D...	[Acetyl, Oxidation]	VGEVFINYIQRQNELFQGKLAYLIIDTCLSIVRPNDSKPLDNR	VGEVFINYIQRQNELFQGKLAYLIIDTCLSIVRPNDSKPLDNR	VGEVFINYIQQNELFQGKLYLIIDTCLSIVPNDSKPLDN	RARR	{'H': 2, 'C': 2, 'O': 2}	...	{'H': 361, 'C': 226, 'O': 68, 'N': 61, 'S': 1}	{'H': 365, 'C': 226, 'O': 68, 'N': 61, 'S': 1}	{'H': 365, 'C': 205, 'O': 68, 'N': 61, 'S': 1,...	C226H365O68N61S1	C205H365O68N61S1X21	5049.638616	0.054173	0.148735	0.481545	0.264287
5	YKTMNTFDPD(Heme)EKFEWFQVWQAVK	2	YKTMNTFDPD(Heme)EKFEWFQVWQAVK	[Heme]	YKTMNTFDPDEKFEWFQVWQAVK	YKTMNTFDPDEKFEWFQVWQAVK	YKTMNTFDPDEKFEWFQVWQVK	A	{'H': 32, 'C': 34, 'N': 4, 'O': 4, 'Fe': 1}	...	{'H': 225, 'C': 173, 'O': 42, 'N': 35, 'S': 1,...	{'H': 227, 'C': 173, 'O': 42, 'N': 35, 'S': 1,...	{'H': 227, 'C': 170, 'O': 42, 'N': 35, 'S': 1,...	C173H227O42N35S1	C170H227O42N35S1X3	3552.561645	0.114128	0.234021	0.698631	0.159873
6	HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR	2	HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR	[Pro->Val]	HKSASSPAVNADTDIQDSSTPSTSPSGRR	HKSASSPAVNADTDIQDSSTPSTSPSGRR	HKSSSPVNDTDIQDSSTPSTSPSG	AAARR	{'H': 2}	...	{'H': 196, 'C': 118, 'N': 40, 'O': 49}	{'H': 198, 'C': 118, 'N': 40, 'O': 49}	{'H': 198, 'C': 97, 'N': 40, 'O': 49, 'X': 21}	C118H198O49N40	C97H198O49N40X21	2957.407483	0.210376	0.308292	0.591515	0.251993
7	FHNK	1	FHNK	[]	FHNK	FHNK	FHNK		{}	...	{'H': 36, 'C': 25, 'O': 6, 'N': 8}	{'H': 37, 'C': 25, 'O': 6, 'N': 8}	{'H': 37, 'C': 25, 'O': 6, 'N': 8}	C25H37O6N8	C25H37O6N8	544.275781	0.728121	0.223157	0.950424	0.036677
8	.(Glutathione)MDLEIK	3	.(Glutathione)MDLEIK	[Glutathione]	MDLEIK	MDLEIK	MDLEIK		{'H': 15, 'C': 10, 'N': 3, 'O': 6, 'S': 1}	...	{'H': 72, 'C': 42, 'S': 2, 'O': 17, 'N': 10}	{'H': 75, 'C': 42, 'S': 2, 'O': 17, 'N': 10}	{'H': 75, 'C': 42, 'S': 2, 'O': 17, 'N': 10}	C42H75O17N10S2	C42H75O17N10S2	1052.451833	0.525852	0.274658	0.822740	0.059443
9	LANEKPEDVFER	2	LANEKPEDVFER	[]	LANEKPEDVFER	LANEKPEDVFER	LNEKPEDVFE	AR	{}	...	{'H': 99, 'C': 63, 'O': 22, 'N': 17}	{'H': 101, 'C': 63, 'O': 22, 'N': 17}	{'H': 101, 'C': 54, 'O': 22, 'N': 17, 'X': 9}	C63H101O22N17	C54H101O22N17X9	1445.715058	0.446843	0.341468	0.794506	0.147405
10	.(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)...	3	.(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)...	[Acetyl, Oxidation, Acetyl, Acetyl]	SDTPLRDEDGLDFWETLRSLATTNPNPPVEK	SDTPLRDEDGLDFWETLRSLATTNPNPPVEK	SDTPLDEDGLDFWETLSLTTNPNPPVEK	RRA	{'H': 6, 'C': 6, 'O': 4}	...	{'H': 243, 'C': 159, 'O': 58, 'N': 41}	{'H': 246, 'C': 159, 'O': 58, 'N': 41}	{'H': 246, 'C': 144, 'O': 58, 'N': 41, 'X': 15}	C159H246O58N41	C144H246O58N41X15	3654.732565	0.131200	0.252105	0.608763	0.230393

11 rows × 22 columns

[18]:

# Most interesting columns are the following
df_final[["stfi_sequence", "stfi_charge", "stfi_M0_NC", "stfi_M1_NC", "stfi_M0_12C", "stfi_M1_12C"]]

[18]:

	stfi_sequence	stfi_charge	stfi_M0_NC	stfi_M1_NC	stfi_M0_12C	stfi_M1_12C
0	YAQEISR	2	0.620499	0.280949	0.836258	0.127729
1	VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK	3	0.113085	0.236277	0.583716	0.256348
2	QRTTFFVLGINTVNYPDIYEHILER	2	0.171920	0.290033	0.672639	0.212157
3	AELFL(Glutathione)LNR	1	0.470882	0.318073	0.768822	0.140356
4	.(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D...	4	0.054173	0.148735	0.481545	0.264287
5	YKTMNTFDPD(Heme)EKFEWFQVWQAVK	2	0.114128	0.234021	0.698631	0.159873
6	HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR	2	0.210376	0.308292	0.591515	0.251993
7	FHNK	1	0.728121	0.223157	0.950424	0.036677
8	.(Glutathione)MDLEIK	3	0.525852	0.274658	0.822740	0.059443
9	LANEKPEDVFER	2	0.446843	0.341468	0.794506	0.147405
10	.(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)...	3	0.131200	0.252105	0.608763	0.230393

Concatenation of results with input data¶

[19]:

input_file_name = "peptides.tsv"
output_file_name = Path(input_file_name).stem + "_stfi.tsv"

column_of_interest = ["stfi_neutral_mass",
                      "stfi_formula", "stfi_formula_X",
                      "stfi_M0_NC", "stfi_M1_NC",
                      "stfi_M0_12C", "stfi_M1_12C"]

# Read original file and append STFI data.
df_old = pd.read_csv(input_file_name, sep="\t")
df_new = pd.concat([df_old, df_final[column_of_interest]], axis=1)
df_new.to_csv(output_file_name, sep="\t", index=False)

[20]:

!head peptides_stfi.tsv

pep_name        pep_sequence    pep_charge      stfi_neutral_mass       stfi_formula    stfi_formula_X  stfi_M0_NC      stfi_M1_NC      stfi_M0_12C     stfi_M1_12C
seq1    YAQEISR 2       865.42938099921 C37H61O13N11    C28H61O13N11X9  0.6204986747402674      0.28094895790268576     0.8362584492452608      0.1277294394585608
seq2    VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK      3       3838.1022643587894      C172H288O49N48P1        C154H288O49N48P1X18     0.11308454311128492     0.23627735941497488     0.5837157078086469      0.256348239423703
seq3    QRTTFFVLGINTVNYPDIYEHILER       2       3037.56615575404        C140H214O40N36  C128H214O40N36X12       0.17192000472677066     0.29003268314604863     0.6726389393255647      0.2121565119028707
seq4    AELFL(Glutathione)LNR   1       1279.6230720783099      C55H90O18N15S1  C46H90O18N15S1X9        0.47088227298965996     0.31807282610880205     0.7688224723128251      0.1403559631032404
seq5    .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)DTCLSIVRPNDSKPLDNR 4       5049.63861600015        C226H365O68N61S1        C205H365O68N61S1X21     0.05417296058666768     0.14873470210020426     0.48154538801515706     0.26428662893114313
seq6    YKTMNTFDPD(Heme)EKFEWFQVWQAVK   2       3552.56164490527        C173H227O42N35S1        C170H227O42N35S1X3      0.11412815567709074     0.23402086836029898     0.6986310451922292      0.15987291091234185
seq7    HKSASSPAV(Pro-&gt;Val)NADTDIQDSSTPSTSPSGRR  2       2957.40748283616        C118H198O49N40  C97H198O49N40X21        0.21037550761092094     0.30829218128938995     0.5915145465128161      0.2519928490706656
seq8    FHNK    1       544.27578091028 C25H37O6N8      C25H37O6N8      0.7281205110566825      0.2231565512772339      0.950423678912205       0.036676880813002036
seq9    .(Glutathione)MDLEIK    3       1052.4518328895601      C42H75O17N10S2  C42H75O17N10S2  0.5258517009900313      0.27465762228958784     0.8227403058336873      0.05944288050042882

[ ]: