API of seq-to-first-iso¶
seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99% 12C enriched carbon.
The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.
seq-to-first-iso is available as a Python module.
[1]:
from pathlib import Path
from pprint import pprint
from pkg_resources import get_distribution # Comes with setuptools.
import pandas as pd
from pyteomics import mass
import seq_to_first_iso as stfi
[2]:
try:
print(f"pyteomics version: {get_distribution('pyteomics').version}")
except:
print("pyteomics version not found")
print(f"pandas version: {pd.__version__}\n"
f"seq-to-first-iso version: {stfi.__version__}"
)
pyteomics version: 4.1.2
pandas version: 0.25.1
seq-to-first-iso version: 0.5.1
Abundances defined in seq-to-first-iso¶
[3]:
pprint(stfi.NATURAL_ABUNDANCE)
{'C[12]': 0.9893,
'C[13]': 0.0107,
'H[1]': 0.999885,
'H[2]': 0.000115,
'N[14]': 0.99632,
'N[15]': 0.00368,
'O[16]': 0.99757,
'O[17]': 0.00038,
'O[18]': 0.00205,
'S[32]': 0.9493,
'S[33]': 0.0076,
'S[34]': 0.0429,
'X[12]': 0.9893,
'X[13]': 0.0107}
[4]:
pprint(stfi.C12_ABUNDANCE)
{'C[12]': 0.9999,
'C[13]': 9.999999999998899e-05,
'H[1]': 0.999885,
'H[2]': 0.000115,
'N[14]': 0.99632,
'N[15]': 0.00368,
'O[16]': 0.99757,
'O[17]': 0.00038,
'O[18]': 0.00205,
'S[32]': 0.9493,
'S[33]': 0.0076,
'S[34]': 0.0429,
'X[12]': 0.9893,
'X[13]': 0.0107}
NATURAL_ABUNDANCE
and C12_ABUNDANCE
are dictionaries with abundances of common isotopes of organic elements.C12_ABUNDANCE
has a 12C abundance of 99.99 %, hence 13C abundance is 0.01 %.Element X is a virtual element created to replace the carbon of unlabelled amino acids, it has the same isotopic abundances as natural carbon.
Separate sequences according to unlabelled amino acids¶
[5]:
help(stfi.separate_labelled)
Help on function separate_labelled in module seq_to_first_iso.seq_to_first_iso:
separate_labelled(sequence, unlabelled_aa)
Get the sequence of unlabelled amino acids from a sequence.
Parameters
----------
sequence : str
String of amino acids.
unlabelled_aa : container object
Container (list, string...) of unlabelled amino acids.
Returns
-------
tuple(str, str)
| The sequences as a tuple of string with:
| - the sequence without the unlabelled amino acids
| - the unlabelled amino acids in the sequence
[6]:
# Separate sequence "YAQEISRAR" with amino acids A and R unlabelled.
peptide_seq = "YAQEISRAR"
unlabelled_amino_acids = ["A", "R"]
labelled_sequence, unlabelled_sequence = stfi.separate_labelled(peptide_seq, unlabelled_aa=unlabelled_amino_acids)
print(
f"Original sequence: {peptide_seq}\n"
f"Unlabelled amino acids: {unlabelled_amino_acids}\n"
f"Sequence with labelled carbon: {labelled_sequence}\n"
f"Sequence with unlabelled carbon: {unlabelled_sequence}")
Original sequence: YAQEISRAR
Unlabelled amino acids: ['A', 'R']
Sequence with labelled carbon: YQEIS
Sequence with unlabelled carbon: ARAR
Obtain a composition with element X¶
[7]:
# Get the chemical formula with unlabelled carbon as element X.
labelled_formula = mass.Composition(labelled_sequence)
unlabelled_formula = stfi.convert_atom_C_to_X(mass.Composition(parsed_sequence=unlabelled_sequence))
peptide_formula = unlabelled_formula + labelled_formula
print(f"Composition of labelled amino acids: {labelled_formula}")
print(f"Composition of unlabelled amino acids (X is C): {unlabelled_formula}")
print(f"Composition of {peptide_seq} with {unlabelled_amino_acids} unlabelled:\n{peptide_formula}")
Composition of labelled amino acids: Composition({'H': 42, 'C': 28, 'O': 11, 'N': 6})
Composition of unlabelled amino acids (X is C): Composition({'H': 34, 'O': 4, 'N': 10, 'X': 18})
Composition of YAQEISRAR with ['A', 'R'] unlabelled:
Composition({'H': 76, 'O': 15, 'N': 16, 'X': 18, 'C': 28})
Compute isotopologue intensity¶
[8]:
help(stfi.compute_M0_nl)
print("-" * 79)
help(stfi.compute_M1_nl)
Help on function compute_M0_nl in module seq_to_first_iso.seq_to_first_iso:
compute_M0_nl(formula, abundance)
Compute intensity of the first isotopologue M0.
Handle element X with specific abundance.
Parameters
----------
formula : pyteomics.mass.Composition
Chemical formula, as a dict of the number of atoms for each element:
{element_name: number_of_atoms, ...}.
abundance : dict
Dictionary of abundances of isotopes:
{"element_name[isotope_number]": relative abundance, ..}.
Returns
-------
float
Value of M0.
Notes
-----
X represents C with default isotopic abundance.
-------------------------------------------------------------------------------
Help on function compute_M1_nl in module seq_to_first_iso.seq_to_first_iso:
compute_M1_nl(formula, abundance)
Compute intensity of the second isotopologue M1.
Handle element X with specific abundance.
Parameters
----------
formula : pyteomics.mass.Composition
Chemical formula, as a dict of the number of atoms for each element:
{element_name: number_of_atoms, ...}.
abundance : dict
Dictionary of abundances of isotopes:
{"element_name[isotope_number]": relative abundance, ..}.
Returns
-------
float
Value of M1.
Notes
-----
X represents C with default isotopic abundance.
[9]:
# Compute M0 with natural carbon.
first_isotopologue = stfi.compute_M0_nl(peptide_formula, stfi.NATURAL_ABUNDANCE)
print(f"M0 in normal (98.93% 12C) condition: {first_isotopologue}")
first_isotopologue = stfi.compute_M0_nl(peptide_formula, stfi.C12_ABUNDANCE)
print(f"M0 in 12C (99.99% 12C) condition: {first_isotopologue}")
M0 in normal (98.93% 12C) condition: 0.5493191520383802
M0 in 12C (99.99% 12C) condition: 0.7403283857401063
[10]:
# Compute M1 with natural carbon.
second_isotopologue = stfi.compute_M1_nl(peptide_formula, stfi.NATURAL_ABUNDANCE)
print(f"M1 in normal (98.93% 12C) condition: {second_isotopologue}")
second_isotopologue = stfi.compute_M1_nl(peptide_formula, stfi.C12_ABUNDANCE)
print(f"M1 in 12C (99.99% 12C) condition: {second_isotopologue}")
M1 in normal (98.93% 12C) condition: 0.313702912736476
M1 in 12C (99.99% 12C) condition: 0.200655465179031
Get the composition of a list of Post-translational modifications (PTMs)¶
[11]:
help(stfi.get_mods_composition)
Help on function get_mods_composition in module seq_to_first_iso.seq_to_first_iso:
get_mods_composition(modifications)
Return the composition of a list of modifications.
Parameters
----------
modifications : list of str
List of modifications string (corresponding to Unimod titles).
Returns
-------
pyteomics.mass.Composition
The total composition change.
[12]:
# Modifications must be strict Unimod entries title.
modification_list = ["Acetyl", "Phospho", "phospho"] # phospho does not correspond to a real PTM name, it will be ignored
total_composition = stfi.get_mods_composition(modification_list)
print(f"Total composition for {modification_list} is {total_composition}")
[2019-12-05, 13:55:32] WARNING : Unimod entry not found for : phospho
Total composition for ['Acetyl', 'Phospho', 'phospho'] is Composition({'H': 3, 'C': 2, 'O': 4, 'P': 1})
Get human-readable chemical formula¶
[13]:
help(stfi.formula_to_str)
Help on function formula_to_str in module seq_to_first_iso.seq_to_first_iso:
formula_to_str(composition)
Return formula from Composition as a string.
Parameters
----------
composition : pyteomics.mass.Composition
Chemical formula.
Returns
-------
str
Human-readable string of the formula.
Warnings
--------
If the composition has elements not in USED_ELEMS, they will not
be added to the output.
[14]:
# This is the function used to get the formulas in the output.
formula_str = stfi.formula_to_str(total_composition)
print(f"{total_composition} becomes {formula_str}")
Composition({'H': 3, 'C': 2, 'O': 4, 'P': 1}) becomes C2H3O4P1
[15]:
# !!! Warning: if the Composition has elements not in "CHONPSX", they will not be in the final string.
bad_composition = mass.Composition("U")
formula_str = stfi.formula_to_str(bad_composition)
print(f"Compostion with unsupported element {bad_composition} becomes {formula_str}")
Compostion with unsupported element Composition({'H': 7, 'C': 3, 'O': 2, 'N': 1, 'Se': 1}) becomes C3H7O2N1
Here, “non-CHONPSX” element Se (Selenium) is ignored!
Parse a file with peptide sequences and charges¶
seq-to-first-iso
reads tsv files with at least a sequence and a charge columns.
The parser will ignore lines where sequences have incorrect characters (not in ACDEFGHIKLMNPQRSTVWY
) unless it corresponds to XTandem’s PTMs notation.
[16]:
df_raw = stfi.parse_input_file("peptides.tsv")
df_filtered = stfi.filter_input_dataframe(df_raw, "pep_sequence", "pep_charge")
print(df_filtered)
[2019-12-05, 13:55:32] INFO : Read peptides.tsv
[2019-12-05, 13:55:32] INFO : Found 11 lines and 3 columns
sequence charge
0 YAQEISR 2
1 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK 3
2 QRTTFFVLGINTVNYPDIYEHILER 2
3 AELFL(Glutathione)LNR 1
4 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... 4
5 YKTMNTFDPD(Heme)EKFEWFQVWQAVK 2
6 HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR 2
7 FHNK 1
8 .(Glutathione)MDLEIK 3
9 LANEKPEDVFER 2
10 .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)... 3
[17]:
df_final = stfi.compute_intensities(df_filtered, unlabelled_aa=["A", "R"])
df_final
[2019-12-05, 13:55:33] INFO : Reading sequences.
[2019-12-05, 13:55:33] INFO : Computing composition and formula.
[2019-12-05, 13:55:33] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 13:55:33] INFO : Computing neutral mass
[2019-12-05, 13:55:33] INFO : Computing M0 and M1
[17]:
stfi_sequence | stfi_charge | stfi_sequence_clean | stfi_modification | stfi_sequence_without_mod | stfi_sequence_to_process | stfi_log | stfi_sequence_labelled | stfi_sequence_unlabelled | stfi_composition_mod | ... | stfi_composition_peptide_neutral | stfi_composition_peptide_with_charge | stfi_composition_peptide_with_charge_X | stfi_formula | stfi_formula_X | stfi_neutral_mass | stfi_M0_NC | stfi_M1_NC | stfi_M0_12C | stfi_M1_12C | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | YAQEISR | 2 | YAQEISR | [] | YAQEISR | YAQEISR | YQEIS | AR | {} | ... | {'H': 59, 'C': 37, 'O': 13, 'N': 11} | {'H': 61, 'C': 37, 'O': 13, 'N': 11} | {'H': 61, 'C': 28, 'O': 13, 'N': 11, 'X': 9} | C37H61O13N11 | C28H61O13N11X9 | 865.429381 | 0.620499 | 0.280949 | 0.836258 | 0.127729 | |
1 | VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK | 3 | VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK | [Phospho] | VLLIDLRIPQRSAINHIVAPNLVNVDPNLLWDK | VLLIDLRIPQRSAINHIVAPNLVNVDPNLLWDK | VLLIDLIPQSINHIVPNLVNVDPNLLWDK | RRAA | {'H': 1, 'O': 3, 'P': 1} | ... | {'H': 285, 'C': 172, 'O': 49, 'N': 48, 'P': 1} | {'H': 288, 'C': 172, 'O': 49, 'N': 48, 'P': 1} | {'H': 288, 'C': 154, 'O': 49, 'N': 48, 'X': 18... | C172H288O49N48P1 | C154H288O49N48P1X18 | 3838.102264 | 0.113085 | 0.236277 | 0.583716 | 0.256348 | |
2 | QRTTFFVLGINTVNYPDIYEHILER | 2 | QRTTFFVLGINTVNYPDIYEHILER | [] | QRTTFFVLGINTVNYPDIYEHILER | QRTTFFVLGINTVNYPDIYEHILER | QTTFFVLGINTVNYPDIYEHILE | RR | {} | ... | {'H': 212, 'C': 140, 'O': 40, 'N': 36} | {'H': 214, 'C': 140, 'O': 40, 'N': 36} | {'H': 214, 'C': 128, 'O': 40, 'N': 36, 'X': 12} | C140H214O40N36 | C128H214O40N36X12 | 3037.566156 | 0.171920 | 0.290033 | 0.672639 | 0.212157 | |
3 | AELFL(Glutathione)LNR | 1 | AELFL(Glutathione)LNR | [Glutathione] | AELFLLNR | AELFLLNR | ELFLLN | AR | {'H': 15, 'C': 10, 'N': 3, 'O': 6, 'S': 1} | ... | {'H': 89, 'C': 55, 'O': 18, 'N': 15, 'S': 1} | {'H': 90, 'C': 55, 'O': 18, 'N': 15, 'S': 1} | {'H': 90, 'C': 46, 'O': 18, 'N': 15, 'X': 9, '... | C55H90O18N15S1 | C46H90O18N15S1X9 | 1279.623072 | 0.470882 | 0.318073 | 0.768822 | 0.140356 | |
4 | .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... | 4 | .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... | [Acetyl, Oxidation] | VGEVFINYIQRQNELFQGKLAYLIIDTCLSIVRPNDSKPLDNR | VGEVFINYIQRQNELFQGKLAYLIIDTCLSIVRPNDSKPLDNR | VGEVFINYIQQNELFQGKLYLIIDTCLSIVPNDSKPLDN | RARR | {'H': 2, 'C': 2, 'O': 2} | ... | {'H': 361, 'C': 226, 'O': 68, 'N': 61, 'S': 1} | {'H': 365, 'C': 226, 'O': 68, 'N': 61, 'S': 1} | {'H': 365, 'C': 205, 'O': 68, 'N': 61, 'S': 1,... | C226H365O68N61S1 | C205H365O68N61S1X21 | 5049.638616 | 0.054173 | 0.148735 | 0.481545 | 0.264287 | |
5 | YKTMNTFDPD(Heme)EKFEWFQVWQAVK | 2 | YKTMNTFDPD(Heme)EKFEWFQVWQAVK | [Heme] | YKTMNTFDPDEKFEWFQVWQAVK | YKTMNTFDPDEKFEWFQVWQAVK | YKTMNTFDPDEKFEWFQVWQVK | A | {'H': 32, 'C': 34, 'N': 4, 'O': 4, 'Fe': 1} | ... | {'H': 225, 'C': 173, 'O': 42, 'N': 35, 'S': 1,... | {'H': 227, 'C': 173, 'O': 42, 'N': 35, 'S': 1,... | {'H': 227, 'C': 170, 'O': 42, 'N': 35, 'S': 1,... | C173H227O42N35S1 | C170H227O42N35S1X3 | 3552.561645 | 0.114128 | 0.234021 | 0.698631 | 0.159873 | |
6 | HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR | 2 | HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR | [Pro->Val] | HKSASSPAVNADTDIQDSSTPSTSPSGRR | HKSASSPAVNADTDIQDSSTPSTSPSGRR | HKSSSPVNDTDIQDSSTPSTSPSG | AAARR | {'H': 2} | ... | {'H': 196, 'C': 118, 'N': 40, 'O': 49} | {'H': 198, 'C': 118, 'N': 40, 'O': 49} | {'H': 198, 'C': 97, 'N': 40, 'O': 49, 'X': 21} | C118H198O49N40 | C97H198O49N40X21 | 2957.407483 | 0.210376 | 0.308292 | 0.591515 | 0.251993 | |
7 | FHNK | 1 | FHNK | [] | FHNK | FHNK | FHNK | {} | ... | {'H': 36, 'C': 25, 'O': 6, 'N': 8} | {'H': 37, 'C': 25, 'O': 6, 'N': 8} | {'H': 37, 'C': 25, 'O': 6, 'N': 8} | C25H37O6N8 | C25H37O6N8 | 544.275781 | 0.728121 | 0.223157 | 0.950424 | 0.036677 | ||
8 | .(Glutathione)MDLEIK | 3 | .(Glutathione)MDLEIK | [Glutathione] | MDLEIK | MDLEIK | MDLEIK | {'H': 15, 'C': 10, 'N': 3, 'O': 6, 'S': 1} | ... | {'H': 72, 'C': 42, 'S': 2, 'O': 17, 'N': 10} | {'H': 75, 'C': 42, 'S': 2, 'O': 17, 'N': 10} | {'H': 75, 'C': 42, 'S': 2, 'O': 17, 'N': 10} | C42H75O17N10S2 | C42H75O17N10S2 | 1052.451833 | 0.525852 | 0.274658 | 0.822740 | 0.059443 | ||
9 | LANEKPEDVFER | 2 | LANEKPEDVFER | [] | LANEKPEDVFER | LANEKPEDVFER | LNEKPEDVFE | AR | {} | ... | {'H': 99, 'C': 63, 'O': 22, 'N': 17} | {'H': 101, 'C': 63, 'O': 22, 'N': 17} | {'H': 101, 'C': 54, 'O': 22, 'N': 17, 'X': 9} | C63H101O22N17 | C54H101O22N17X9 | 1445.715058 | 0.446843 | 0.341468 | 0.794506 | 0.147405 | |
10 | .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)... | 3 | .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)... | [Acetyl, Oxidation, Acetyl, Acetyl] | SDTPLRDEDGLDFWETLRSLATTNPNPPVEK | SDTPLRDEDGLDFWETLRSLATTNPNPPVEK | SDTPLDEDGLDFWETLSLTTNPNPPVEK | RRA | {'H': 6, 'C': 6, 'O': 4} | ... | {'H': 243, 'C': 159, 'O': 58, 'N': 41} | {'H': 246, 'C': 159, 'O': 58, 'N': 41} | {'H': 246, 'C': 144, 'O': 58, 'N': 41, 'X': 15} | C159H246O58N41 | C144H246O58N41X15 | 3654.732565 | 0.131200 | 0.252105 | 0.608763 | 0.230393 |
11 rows × 22 columns
[18]:
# Most interesting columns are the following
df_final[["stfi_sequence", "stfi_charge", "stfi_M0_NC", "stfi_M1_NC", "stfi_M0_12C", "stfi_M1_12C"]]
[18]:
stfi_sequence | stfi_charge | stfi_M0_NC | stfi_M1_NC | stfi_M0_12C | stfi_M1_12C | |
---|---|---|---|---|---|---|
0 | YAQEISR | 2 | 0.620499 | 0.280949 | 0.836258 | 0.127729 |
1 | VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK | 3 | 0.113085 | 0.236277 | 0.583716 | 0.256348 |
2 | QRTTFFVLGINTVNYPDIYEHILER | 2 | 0.171920 | 0.290033 | 0.672639 | 0.212157 |
3 | AELFL(Glutathione)LNR | 1 | 0.470882 | 0.318073 | 0.768822 | 0.140356 |
4 | .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... | 4 | 0.054173 | 0.148735 | 0.481545 | 0.264287 |
5 | YKTMNTFDPD(Heme)EKFEWFQVWQAVK | 2 | 0.114128 | 0.234021 | 0.698631 | 0.159873 |
6 | HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR | 2 | 0.210376 | 0.308292 | 0.591515 | 0.251993 |
7 | FHNK | 1 | 0.728121 | 0.223157 | 0.950424 | 0.036677 |
8 | .(Glutathione)MDLEIK | 3 | 0.525852 | 0.274658 | 0.822740 | 0.059443 |
9 | LANEKPEDVFER | 2 | 0.446843 | 0.341468 | 0.794506 | 0.147405 |
10 | .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)... | 3 | 0.131200 | 0.252105 | 0.608763 | 0.230393 |
Concatenation of results with input data¶
[19]:
input_file_name = "peptides.tsv"
output_file_name = Path(input_file_name).stem + "_stfi.tsv"
column_of_interest = ["stfi_neutral_mass",
"stfi_formula", "stfi_formula_X",
"stfi_M0_NC", "stfi_M1_NC",
"stfi_M0_12C", "stfi_M1_12C"]
# Read original file and append STFI data.
df_old = pd.read_csv(input_file_name, sep="\t")
df_new = pd.concat([df_old, df_final[column_of_interest]], axis=1)
df_new.to_csv(output_file_name, sep="\t", index=False)
[20]:
!head peptides_stfi.tsv
pep_name pep_sequence pep_charge stfi_neutral_mass stfi_formula stfi_formula_X stfi_M0_NC stfi_M1_NC stfi_M0_12C stfi_M1_12C
seq1 YAQEISR 2 865.42938099921 C37H61O13N11 C28H61O13N11X9 0.6204986747402674 0.28094895790268576 0.8362584492452608 0.1277294394585608
seq2 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK 3 3838.1022643587894 C172H288O49N48P1 C154H288O49N48P1X18 0.11308454311128492 0.23627735941497488 0.5837157078086469 0.256348239423703
seq3 QRTTFFVLGINTVNYPDIYEHILER 2 3037.56615575404 C140H214O40N36 C128H214O40N36X12 0.17192000472677066 0.29003268314604863 0.6726389393255647 0.2121565119028707
seq4 AELFL(Glutathione)LNR 1 1279.6230720783099 C55H90O18N15S1 C46H90O18N15S1X9 0.47088227298965996 0.31807282610880205 0.7688224723128251 0.1403559631032404
seq5 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)DTCLSIVRPNDSKPLDNR 4 5049.63861600015 C226H365O68N61S1 C205H365O68N61S1X21 0.05417296058666768 0.14873470210020426 0.48154538801515706 0.26428662893114313
seq6 YKTMNTFDPD(Heme)EKFEWFQVWQAVK 2 3552.56164490527 C173H227O42N35S1 C170H227O42N35S1X3 0.11412815567709074 0.23402086836029898 0.6986310451922292 0.15987291091234185
seq7 HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR 2 2957.40748283616 C118H198O49N40 C97H198O49N40X21 0.21037550761092094 0.30829218128938995 0.5915145465128161 0.2519928490706656
seq8 FHNK 1 544.27578091028 C25H37O6N8 C25H37O6N8 0.7281205110566825 0.2231565512772339 0.950423678912205 0.036676880813002036
seq9 .(Glutathione)MDLEIK 3 1052.4518328895601 C42H75O17N10S2 C42H75O17N10S2 0.5258517009900313 0.27465762228958784 0.8227403058336873 0.05944288050042882
[ ]: