API of seq-to-first-iso

seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99% 12C enriched carbon.

The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.

seq-to-first-iso is available as a Python module.

[1]:
from pathlib import Path
from pprint import pprint

from pkg_resources import get_distribution  # Comes with setuptools.
import pandas as pd
from pyteomics import mass

import seq_to_first_iso as stfi
[2]:
try:
    print(f"pyteomics version: {get_distribution('pyteomics').version}")
except:
    print("pyteomics version not found")

print(f"pandas version: {pd.__version__}\n"
      f"seq-to-first-iso version: {stfi.__version__}"
     )
pyteomics version: 4.1.2
pandas version: 0.25.1
seq-to-first-iso version: 0.5.1

Abundances defined in seq-to-first-iso

[3]:
pprint(stfi.NATURAL_ABUNDANCE)
{'C[12]': 0.9893,
 'C[13]': 0.0107,
 'H[1]': 0.999885,
 'H[2]': 0.000115,
 'N[14]': 0.99632,
 'N[15]': 0.00368,
 'O[16]': 0.99757,
 'O[17]': 0.00038,
 'O[18]': 0.00205,
 'S[32]': 0.9493,
 'S[33]': 0.0076,
 'S[34]': 0.0429,
 'X[12]': 0.9893,
 'X[13]': 0.0107}
[4]:
pprint(stfi.C12_ABUNDANCE)
{'C[12]': 0.9999,
 'C[13]': 9.999999999998899e-05,
 'H[1]': 0.999885,
 'H[2]': 0.000115,
 'N[14]': 0.99632,
 'N[15]': 0.00368,
 'O[16]': 0.99757,
 'O[17]': 0.00038,
 'O[18]': 0.00205,
 'S[32]': 0.9493,
 'S[33]': 0.0076,
 'S[34]': 0.0429,
 'X[12]': 0.9893,
 'X[13]': 0.0107}
NATURAL_ABUNDANCE and C12_ABUNDANCE are dictionaries with abundances of common isotopes of organic elements.
C12_ABUNDANCE has a 12C abundance of 99.99 %, hence 13C abundance is 0.01 %.
Element X is a virtual element created to replace the carbon of unlabelled amino acids, it has the same isotopic abundances as natural carbon.

Separate sequences according to unlabelled amino acids

[5]:
help(stfi.separate_labelled)
Help on function separate_labelled in module seq_to_first_iso.seq_to_first_iso:

separate_labelled(sequence, unlabelled_aa)
    Get the sequence of unlabelled amino acids from a sequence.

    Parameters
    ----------
    sequence : str
        String of amino acids.
    unlabelled_aa : container object
        Container (list, string...) of unlabelled amino acids.

    Returns
    -------
    tuple(str, str)
        | The sequences as a tuple of string with:
        |    - the sequence without the unlabelled amino acids
        |    - the unlabelled amino acids in the sequence

[6]:
# Separate sequence "YAQEISRAR" with amino acids A and R unlabelled.
peptide_seq = "YAQEISRAR"
unlabelled_amino_acids = ["A", "R"]

labelled_sequence, unlabelled_sequence = stfi.separate_labelled(peptide_seq, unlabelled_aa=unlabelled_amino_acids)

print(
    f"Original sequence: {peptide_seq}\n"
    f"Unlabelled amino acids: {unlabelled_amino_acids}\n"
    f"Sequence with labelled carbon: {labelled_sequence}\n"
    f"Sequence with unlabelled carbon: {unlabelled_sequence}")
Original sequence: YAQEISRAR
Unlabelled amino acids: ['A', 'R']
Sequence with labelled carbon: YQEIS
Sequence with unlabelled carbon: ARAR

Obtain a composition with element X

[7]:
# Get the chemical formula with unlabelled carbon as element X.
labelled_formula = mass.Composition(labelled_sequence)
unlabelled_formula = stfi.convert_atom_C_to_X(mass.Composition(parsed_sequence=unlabelled_sequence))
peptide_formula = unlabelled_formula + labelled_formula
print(f"Composition of labelled amino acids: {labelled_formula}")
print(f"Composition of unlabelled amino acids (X is C): {unlabelled_formula}")
print(f"Composition of {peptide_seq} with {unlabelled_amino_acids} unlabelled:\n{peptide_formula}")
Composition of labelled amino acids: Composition({'H': 42, 'C': 28, 'O': 11, 'N': 6})
Composition of unlabelled amino acids (X is C): Composition({'H': 34, 'O': 4, 'N': 10, 'X': 18})
Composition of YAQEISRAR with ['A', 'R'] unlabelled:
Composition({'H': 76, 'O': 15, 'N': 16, 'X': 18, 'C': 28})

Compute isotopologue intensity

[8]:
help(stfi.compute_M0_nl)
print("-" * 79)
help(stfi.compute_M1_nl)
Help on function compute_M0_nl in module seq_to_first_iso.seq_to_first_iso:

compute_M0_nl(formula, abundance)
    Compute intensity of the first isotopologue M0.

    Handle element X with specific abundance.

    Parameters
    ----------
    formula : pyteomics.mass.Composition
        Chemical formula, as a dict of the number of atoms for each element:
        {element_name: number_of_atoms, ...}.
    abundance : dict
        Dictionary of abundances of isotopes:
        {"element_name[isotope_number]": relative abundance, ..}.

    Returns
    -------
    float
        Value of M0.

    Notes
    -----
    X represents C with default isotopic abundance.

-------------------------------------------------------------------------------
Help on function compute_M1_nl in module seq_to_first_iso.seq_to_first_iso:

compute_M1_nl(formula, abundance)
    Compute intensity of the second isotopologue M1.

    Handle element X with specific abundance.

    Parameters
    ----------
    formula : pyteomics.mass.Composition
        Chemical formula, as a dict of the number of atoms for each element:
        {element_name: number_of_atoms, ...}.
    abundance : dict
        Dictionary of abundances of isotopes:
        {"element_name[isotope_number]": relative abundance, ..}.

    Returns
    -------
    float
        Value of M1.

    Notes
    -----
    X represents C with default isotopic abundance.

[9]:
# Compute M0 with natural carbon.
first_isotopologue = stfi.compute_M0_nl(peptide_formula, stfi.NATURAL_ABUNDANCE)
print(f"M0 in normal (98.93% 12C) condition: {first_isotopologue}")

first_isotopologue = stfi.compute_M0_nl(peptide_formula, stfi.C12_ABUNDANCE)
print(f"M0 in    12C (99.99% 12C) condition: {first_isotopologue}")
M0 in normal (98.93% 12C) condition: 0.5493191520383802
M0 in    12C (99.99% 12C) condition: 0.7403283857401063
[10]:
# Compute M1 with natural carbon.
second_isotopologue = stfi.compute_M1_nl(peptide_formula, stfi.NATURAL_ABUNDANCE)
print(f"M1 in normal (98.93% 12C) condition: {second_isotopologue}")

second_isotopologue = stfi.compute_M1_nl(peptide_formula, stfi.C12_ABUNDANCE)
print(f"M1 in    12C (99.99% 12C) condition: {second_isotopologue}")
M1 in normal (98.93% 12C) condition: 0.313702912736476
M1 in    12C (99.99% 12C) condition: 0.200655465179031

Get the composition of a list of Post-translational modifications (PTMs)

[11]:
help(stfi.get_mods_composition)
Help on function get_mods_composition in module seq_to_first_iso.seq_to_first_iso:

get_mods_composition(modifications)
    Return the composition of a list of modifications.

    Parameters
    ----------
    modifications : list of str
        List of modifications string (corresponding to Unimod titles).

    Returns
    -------
    pyteomics.mass.Composition
        The total composition change.

[12]:
# Modifications must be strict Unimod entries title.
modification_list = ["Acetyl", "Phospho", "phospho"]  # phospho does not correspond to a real PTM name, it will be ignored
total_composition = stfi.get_mods_composition(modification_list)
print(f"Total composition for {modification_list} is {total_composition}")
[2019-12-05, 13:55:32] WARNING : Unimod entry not found for : phospho
Total composition for ['Acetyl', 'Phospho', 'phospho'] is Composition({'H': 3, 'C': 2, 'O': 4, 'P': 1})

Get human-readable chemical formula

[13]:
help(stfi.formula_to_str)
Help on function formula_to_str in module seq_to_first_iso.seq_to_first_iso:

formula_to_str(composition)
    Return formula from Composition as a string.

    Parameters
    ----------
    composition : pyteomics.mass.Composition
        Chemical formula.

    Returns
    -------
    str
        Human-readable string of the formula.

    Warnings
    --------
    If the composition has elements not in USED_ELEMS, they will not
    be added to the output.

[14]:
# This is the function used to get the formulas in the output.
formula_str = stfi.formula_to_str(total_composition)
print(f"{total_composition} becomes {formula_str}")
Composition({'H': 3, 'C': 2, 'O': 4, 'P': 1}) becomes C2H3O4P1
[15]:
# !!! Warning: if the Composition has elements not in "CHONPSX", they will not be in the final string.
bad_composition = mass.Composition("U")
formula_str = stfi.formula_to_str(bad_composition)
print(f"Compostion with unsupported element {bad_composition} becomes {formula_str}")
Compostion with unsupported element Composition({'H': 7, 'C': 3, 'O': 2, 'N': 1, 'Se': 1}) becomes C3H7O2N1

Here, “non-CHONPSX” element Se (Selenium) is ignored!

Parse a file with peptide sequences and charges

seq-to-first-iso reads tsv files with at least a sequence and a charge columns.

The parser will ignore lines where sequences have incorrect characters (not in ACDEFGHIKLMNPQRSTVWY) unless it corresponds to XTandem’s PTMs notation.

[16]:
df_raw = stfi.parse_input_file("peptides.tsv")
df_filtered = stfi.filter_input_dataframe(df_raw, "pep_sequence", "pep_charge")
print(df_filtered)
[2019-12-05, 13:55:32] INFO    : Read peptides.tsv
[2019-12-05, 13:55:32] INFO    : Found 11 lines and 3 columns
                                             sequence  charge
0                                             YAQEISR       2
1          VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK       3
2                           QRTTFFVLGINTVNYPDIYEHILER       2
3                               AELFL(Glutathione)LNR       1
4   .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D...       4
5                       YKTMNTFDPD(Heme)EKFEWFQVWQAVK       2
6          HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR       2
7                                                FHNK       1
8                                .(Glutathione)MDLEIK       3
9                                        LANEKPEDVFER       2
10  .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)...       3
[17]:
df_final = stfi.compute_intensities(df_filtered, unlabelled_aa=["A", "R"])
df_final
[2019-12-05, 13:55:33] INFO    : Reading sequences.
[2019-12-05, 13:55:33] INFO    : Computing composition and formula.
[2019-12-05, 13:55:33] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 13:55:33] INFO    : Computing neutral mass
[2019-12-05, 13:55:33] INFO    : Computing M0 and M1
[17]:
stfi_sequence stfi_charge stfi_sequence_clean stfi_modification stfi_sequence_without_mod stfi_sequence_to_process stfi_log stfi_sequence_labelled stfi_sequence_unlabelled stfi_composition_mod ... stfi_composition_peptide_neutral stfi_composition_peptide_with_charge stfi_composition_peptide_with_charge_X stfi_formula stfi_formula_X stfi_neutral_mass stfi_M0_NC stfi_M1_NC stfi_M0_12C stfi_M1_12C
0 YAQEISR 2 YAQEISR [] YAQEISR YAQEISR YQEIS AR {} ... {'H': 59, 'C': 37, 'O': 13, 'N': 11} {'H': 61, 'C': 37, 'O': 13, 'N': 11} {'H': 61, 'C': 28, 'O': 13, 'N': 11, 'X': 9} C37H61O13N11 C28H61O13N11X9 865.429381 0.620499 0.280949 0.836258 0.127729
1 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK 3 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK [Phospho] VLLIDLRIPQRSAINHIVAPNLVNVDPNLLWDK VLLIDLRIPQRSAINHIVAPNLVNVDPNLLWDK VLLIDLIPQSINHIVPNLVNVDPNLLWDK RRAA {'H': 1, 'O': 3, 'P': 1} ... {'H': 285, 'C': 172, 'O': 49, 'N': 48, 'P': 1} {'H': 288, 'C': 172, 'O': 49, 'N': 48, 'P': 1} {'H': 288, 'C': 154, 'O': 49, 'N': 48, 'X': 18... C172H288O49N48P1 C154H288O49N48P1X18 3838.102264 0.113085 0.236277 0.583716 0.256348
2 QRTTFFVLGINTVNYPDIYEHILER 2 QRTTFFVLGINTVNYPDIYEHILER [] QRTTFFVLGINTVNYPDIYEHILER QRTTFFVLGINTVNYPDIYEHILER QTTFFVLGINTVNYPDIYEHILE RR {} ... {'H': 212, 'C': 140, 'O': 40, 'N': 36} {'H': 214, 'C': 140, 'O': 40, 'N': 36} {'H': 214, 'C': 128, 'O': 40, 'N': 36, 'X': 12} C140H214O40N36 C128H214O40N36X12 3037.566156 0.171920 0.290033 0.672639 0.212157
3 AELFL(Glutathione)LNR 1 AELFL(Glutathione)LNR [Glutathione] AELFLLNR AELFLLNR ELFLLN AR {'H': 15, 'C': 10, 'N': 3, 'O': 6, 'S': 1} ... {'H': 89, 'C': 55, 'O': 18, 'N': 15, 'S': 1} {'H': 90, 'C': 55, 'O': 18, 'N': 15, 'S': 1} {'H': 90, 'C': 46, 'O': 18, 'N': 15, 'X': 9, '... C55H90O18N15S1 C46H90O18N15S1X9 1279.623072 0.470882 0.318073 0.768822 0.140356
4 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... 4 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... [Acetyl, Oxidation] VGEVFINYIQRQNELFQGKLAYLIIDTCLSIVRPNDSKPLDNR VGEVFINYIQRQNELFQGKLAYLIIDTCLSIVRPNDSKPLDNR VGEVFINYIQQNELFQGKLYLIIDTCLSIVPNDSKPLDN RARR {'H': 2, 'C': 2, 'O': 2} ... {'H': 361, 'C': 226, 'O': 68, 'N': 61, 'S': 1} {'H': 365, 'C': 226, 'O': 68, 'N': 61, 'S': 1} {'H': 365, 'C': 205, 'O': 68, 'N': 61, 'S': 1,... C226H365O68N61S1 C205H365O68N61S1X21 5049.638616 0.054173 0.148735 0.481545 0.264287
5 YKTMNTFDPD(Heme)EKFEWFQVWQAVK 2 YKTMNTFDPD(Heme)EKFEWFQVWQAVK [Heme] YKTMNTFDPDEKFEWFQVWQAVK YKTMNTFDPDEKFEWFQVWQAVK YKTMNTFDPDEKFEWFQVWQVK A {'H': 32, 'C': 34, 'N': 4, 'O': 4, 'Fe': 1} ... {'H': 225, 'C': 173, 'O': 42, 'N': 35, 'S': 1,... {'H': 227, 'C': 173, 'O': 42, 'N': 35, 'S': 1,... {'H': 227, 'C': 170, 'O': 42, 'N': 35, 'S': 1,... C173H227O42N35S1 C170H227O42N35S1X3 3552.561645 0.114128 0.234021 0.698631 0.159873
6 HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR 2 HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR [Pro->Val] HKSASSPAVNADTDIQDSSTPSTSPSGRR HKSASSPAVNADTDIQDSSTPSTSPSGRR HKSSSPVNDTDIQDSSTPSTSPSG AAARR {'H': 2} ... {'H': 196, 'C': 118, 'N': 40, 'O': 49} {'H': 198, 'C': 118, 'N': 40, 'O': 49} {'H': 198, 'C': 97, 'N': 40, 'O': 49, 'X': 21} C118H198O49N40 C97H198O49N40X21 2957.407483 0.210376 0.308292 0.591515 0.251993
7 FHNK 1 FHNK [] FHNK FHNK FHNK {} ... {'H': 36, 'C': 25, 'O': 6, 'N': 8} {'H': 37, 'C': 25, 'O': 6, 'N': 8} {'H': 37, 'C': 25, 'O': 6, 'N': 8} C25H37O6N8 C25H37O6N8 544.275781 0.728121 0.223157 0.950424 0.036677
8 .(Glutathione)MDLEIK 3 .(Glutathione)MDLEIK [Glutathione] MDLEIK MDLEIK MDLEIK {'H': 15, 'C': 10, 'N': 3, 'O': 6, 'S': 1} ... {'H': 72, 'C': 42, 'S': 2, 'O': 17, 'N': 10} {'H': 75, 'C': 42, 'S': 2, 'O': 17, 'N': 10} {'H': 75, 'C': 42, 'S': 2, 'O': 17, 'N': 10} C42H75O17N10S2 C42H75O17N10S2 1052.451833 0.525852 0.274658 0.822740 0.059443
9 LANEKPEDVFER 2 LANEKPEDVFER [] LANEKPEDVFER LANEKPEDVFER LNEKPEDVFE AR {} ... {'H': 99, 'C': 63, 'O': 22, 'N': 17} {'H': 101, 'C': 63, 'O': 22, 'N': 17} {'H': 101, 'C': 54, 'O': 22, 'N': 17, 'X': 9} C63H101O22N17 C54H101O22N17X9 1445.715058 0.446843 0.341468 0.794506 0.147405
10 .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)... 3 .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)... [Acetyl, Oxidation, Acetyl, Acetyl] SDTPLRDEDGLDFWETLRSLATTNPNPPVEK SDTPLRDEDGLDFWETLRSLATTNPNPPVEK SDTPLDEDGLDFWETLSLTTNPNPPVEK RRA {'H': 6, 'C': 6, 'O': 4} ... {'H': 243, 'C': 159, 'O': 58, 'N': 41} {'H': 246, 'C': 159, 'O': 58, 'N': 41} {'H': 246, 'C': 144, 'O': 58, 'N': 41, 'X': 15} C159H246O58N41 C144H246O58N41X15 3654.732565 0.131200 0.252105 0.608763 0.230393

11 rows × 22 columns

[18]:
# Most interesting columns are the following
df_final[["stfi_sequence", "stfi_charge", "stfi_M0_NC", "stfi_M1_NC", "stfi_M0_12C", "stfi_M1_12C"]]
[18]:
stfi_sequence stfi_charge stfi_M0_NC stfi_M1_NC stfi_M0_12C stfi_M1_12C
0 YAQEISR 2 0.620499 0.280949 0.836258 0.127729
1 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK 3 0.113085 0.236277 0.583716 0.256348
2 QRTTFFVLGINTVNYPDIYEHILER 2 0.171920 0.290033 0.672639 0.212157
3 AELFL(Glutathione)LNR 1 0.470882 0.318073 0.768822 0.140356
4 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... 4 0.054173 0.148735 0.481545 0.264287
5 YKTMNTFDPD(Heme)EKFEWFQVWQAVK 2 0.114128 0.234021 0.698631 0.159873
6 HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR 2 0.210376 0.308292 0.591515 0.251993
7 FHNK 1 0.728121 0.223157 0.950424 0.036677
8 .(Glutathione)MDLEIK 3 0.525852 0.274658 0.822740 0.059443
9 LANEKPEDVFER 2 0.446843 0.341468 0.794506 0.147405
10 .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)... 3 0.131200 0.252105 0.608763 0.230393

Concatenation of results with input data

[19]:
input_file_name = "peptides.tsv"
output_file_name = Path(input_file_name).stem + "_stfi.tsv"

column_of_interest = ["stfi_neutral_mass",
                      "stfi_formula", "stfi_formula_X",
                      "stfi_M0_NC", "stfi_M1_NC",
                      "stfi_M0_12C", "stfi_M1_12C"]

# Read original file and append STFI data.
df_old = pd.read_csv(input_file_name, sep="\t")
df_new = pd.concat([df_old, df_final[column_of_interest]], axis=1)
df_new.to_csv(output_file_name, sep="\t", index=False)
[20]:
!head peptides_stfi.tsv
pep_name        pep_sequence    pep_charge      stfi_neutral_mass       stfi_formula    stfi_formula_X  stfi_M0_NC      stfi_M1_NC      stfi_M0_12C     stfi_M1_12C
seq1    YAQEISR 2       865.42938099921 C37H61O13N11    C28H61O13N11X9  0.6204986747402674      0.28094895790268576     0.8362584492452608      0.1277294394585608
seq2    VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK      3       3838.1022643587894      C172H288O49N48P1        C154H288O49N48P1X18     0.11308454311128492     0.23627735941497488     0.5837157078086469      0.256348239423703
seq3    QRTTFFVLGINTVNYPDIYEHILER       2       3037.56615575404        C140H214O40N36  C128H214O40N36X12       0.17192000472677066     0.29003268314604863     0.6726389393255647      0.2121565119028707
seq4    AELFL(Glutathione)LNR   1       1279.6230720783099      C55H90O18N15S1  C46H90O18N15S1X9        0.47088227298965996     0.31807282610880205     0.7688224723128251      0.1403559631032404
seq5    .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)DTCLSIVRPNDSKPLDNR 4       5049.63861600015        C226H365O68N61S1        C205H365O68N61S1X21     0.05417296058666768     0.14873470210020426     0.48154538801515706     0.26428662893114313
seq6    YKTMNTFDPD(Heme)EKFEWFQVWQAVK   2       3552.56164490527        C173H227O42N35S1        C170H227O42N35S1X3      0.11412815567709074     0.23402086836029898     0.6986310451922292      0.15987291091234185
seq7    HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR  2       2957.40748283616        C118H198O49N40  C97H198O49N40X21        0.21037550761092094     0.30829218128938995     0.5915145465128161      0.2519928490706656
seq8    FHNK    1       544.27578091028 C25H37O6N8      C25H37O6N8      0.7281205110566825      0.2231565512772339      0.950423678912205       0.036676880813002036
seq9    .(Glutathione)MDLEIK    3       1052.4518328895601      C42H75O17N10S2  C42H75O17N10S2  0.5258517009900313      0.27465762228958784     0.8227403058336873      0.05944288050042882
[ ]: