Command line interface of seq-to-first-iso

seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99% 12C enriched carbon.

The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.

seq-to-first-iso is available as a Python module.

[1]:
import pandas as pd  # For output visualisation.

Note: the exclamation mark ``!`` is a magic command to run a Linux command within a Jupyter notebook. In a real Linux terminal, you don’t need it.

[2]:
!seq-to-first-iso -v
seq-to-first-iso 1.0.0
[3]:
!seq-to-first-iso -h
usage: seq-to-first-iso [-h] [-o OUTPUT] [-u amino_a] [-v]
                        input_file_name sequence_col_name charge_col_name

Read a tsv file with sequences and charges and compute intensity of first
isotopologues

positional arguments:
  input_file_name       file to parse in .tsv format
  sequence_col_name     column name with sequences
  charge_col_name       column name with charges

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        name of output file
  -u amino_a, --unlabelled-aa amino_a
                        amino acids with default abundance
  -v, --version         show program's version number and exit
[4]:
# File used.
!cat peptides.tsv
pep_name        pep_sequence    pep_charge
seq1    YAQEISR 2
seq2    VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK      3
seq3    QRTTFFVLGINTVNYPDIYEHILER       2
seq4    AELFL(Glutathione)LNR   1
seq5    .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)DTCLSIVRPNDSKPLDNR 4
seq6    YKTMNTFDPD(Heme)EKFEWFQVWQAVK   2
seq7    HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR  2
seq8    FHNK    1
seq9    .(Glutathione)MDLEIK    3
seq10   LANEKPEDVFER    2
seq11   .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)LDFWETLRSLATTNPNPPVEK     3

Minimal command

[5]:
!seq-to-first-iso peptides.tsv pep_sequence pep_charge
Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output=None, sequence_col_name='pep_sequence', unlabelled_aa=[])
[2019-12-05, 17:22:32] INFO    : Parsing file
[2019-12-05, 17:22:32] INFO    : Read peptides.tsv
[2019-12-05, 17:22:32] INFO    : Found 11 lines and 3 columns
[2019-12-05, 17:22:32] INFO    : Reading sequences.
[2019-12-05, 17:22:32] INFO    : Computing composition and formula.
[2019-12-05, 17:22:32] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:32] INFO    : Computing neutral mass
[2019-12-05, 17:22:32] INFO    : Computing M0 and M1

Running the command above will write a tab-separated-values file (peptides_stfi.tsv).

[6]:
# Read basic output file.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()
[6]:
pep_name pep_sequence pep_charge stfi_neutral_mass stfi_formula stfi_formula_X stfi_M0_NC stfi_M1_NC stfi_M0_12C stfi_M1_12C
0 seq1 YAQEISR 2 865.429381 C37H61O13N11 C37H61O13N11 0.620499 0.280949 0.920444 0.051819
1 seq2 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK 3 3838.102264 C172H288O49N48P1 C172H288O49N48P1 0.113085 0.236277 0.707156 0.174161
2 seq3 QRTTFFVLGINTVNYPDIYEHILER 2 3037.566156 C140H214O40N36 C140H214O40N36 0.171920 0.290033 0.764407 0.142807
3 seq4 AELFL(Glutathione)LNR 1 1279.623072 C55H90O18N15S1 C55H90O18N15S1 0.470882 0.318073 0.846220 0.072875
4 seq5 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... 4 5049.638616 C226H365O68N61S1 C226H365O68N61S1 0.054173 0.148735 0.602333 0.195036

Changing output name

You can also change the name of the output file

[7]:
!seq-to-first-iso peptides.tsv pep_sequence pep_charge -o seq_stfi
Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output='seq_stfi', sequence_col_name='pep_sequence', unlabelled_aa=[])
[2019-12-05, 17:22:34] INFO    : Parsing file
[2019-12-05, 17:22:34] INFO    : Read peptides.tsv
[2019-12-05, 17:22:34] INFO    : Found 11 lines and 3 columns
[2019-12-05, 17:22:34] INFO    : Reading sequences.
[2019-12-05, 17:22:34] INFO    : Computing composition and formula.
[2019-12-05, 17:22:34] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:34] INFO    : Computing neutral mass
[2019-12-05, 17:22:34] INFO    : Computing M0 and M1
[8]:
# Read output file with different name.
df = pd.read_csv("seq_stfi.tsv", sep="\t")
df.head()
[8]:
pep_name pep_sequence pep_charge stfi_neutral_mass stfi_formula stfi_formula_X stfi_M0_NC stfi_M1_NC stfi_M0_12C stfi_M1_12C
0 seq1 YAQEISR 2 865.429381 C37H61O13N11 C37H61O13N11 0.620499 0.280949 0.920444 0.051819
1 seq2 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK 3 3838.102264 C172H288O49N48P1 C172H288O49N48P1 0.113085 0.236277 0.707156 0.174161
2 seq3 QRTTFFVLGINTVNYPDIYEHILER 2 3037.566156 C140H214O40N36 C140H214O40N36 0.171920 0.290033 0.764407 0.142807
3 seq4 AELFL(Glutathione)LNR 1 1279.623072 C55H90O18N15S1 C55H90O18N15S1 0.470882 0.318073 0.846220 0.072875
4 seq5 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... 4 5049.638616 C226H365O68N61S1 C226H365O68N61S1 0.054173 0.148735 0.602333 0.195036

Specifying unlabelled amino acids

[9]:
!seq-to-first-iso peptides.tsv pep_sequence pep_charge -u V,W
Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output=None, sequence_col_name='pep_sequence', unlabelled_aa=['V', 'W'])
[2019-12-05, 17:22:36] INFO    : Amino acid with default abundance: ['V', 'W']
[2019-12-05, 17:22:36] INFO    : Parsing file
[2019-12-05, 17:22:36] INFO    : Read peptides.tsv
[2019-12-05, 17:22:36] INFO    : Found 11 lines and 3 columns
[2019-12-05, 17:22:36] INFO    : Reading sequences.
[2019-12-05, 17:22:36] INFO    : Computing composition and formula.
[2019-12-05, 17:22:36] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:36] INFO    : Computing neutral mass
[2019-12-05, 17:22:36] INFO    : Computing M0 and M1
[10]:
# Read output file with different name and unlabelled amino acids.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()
[10]:
pep_name pep_sequence pep_charge stfi_neutral_mass stfi_formula stfi_formula_X stfi_M0_NC stfi_M1_NC stfi_M0_12C stfi_M1_12C
0 seq1 YAQEISR 2 865.429381 C37H61O13N11 C37H61O13N11 0.620499 0.280949 0.920444 0.051819
1 seq2 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK 3 3838.102264 C172H288O49N48P1 C141H288O49N48P1X31 0.113085 0.236277 0.508195 0.293976
2 seq3 QRTTFFVLGINTVNYPDIYEHILER 2 3037.566156 C140H214O40N36 C130H214O40N36X10 0.171920 0.290033 0.687130 0.202001
3 seq4 AELFL(Glutathione)LNR 1 1279.623072 C55H90O18N15S1 C55H90O18N15S1 0.470882 0.318073 0.846220 0.072875
4 seq5 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... 4 5049.638616 C226H365O68N61S1 C211H365O68N61S1X15 0.054173 0.148735 0.513344 0.248734

The carbon of unlabelled amino acids is shown as X in column stfi_formula_X.

For peptide YAQEISR, there is no unlabelled amino acids, stfi_formula and stfi_formula_X are identical. M0 and M1 intensities are not affected by the V and W auxotrophy.

[ ]: