Command line interface of seq-to-first-iso¶

seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99% 12C enriched carbon.

The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.

seq-to-first-iso is available as a Python module.

[1]:

import pandas as pd  # For output visualisation.

Note: the exclamation mark ``!`` is a magic command to run a Linux command within a Jupyter notebook. In a real Linux terminal, you don’t need it.

[2]:

!seq-to-first-iso -v

seq-to-first-iso 1.0.0

[3]:

!seq-to-first-iso -h

usage: seq-to-first-iso [-h] [-o OUTPUT] [-u amino_a] [-v]
                        input_file_name sequence_col_name charge_col_name

Read a tsv file with sequences and charges and compute intensity of first
isotopologues

positional arguments:
  input_file_name       file to parse in .tsv format
  sequence_col_name     column name with sequences
  charge_col_name       column name with charges

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        name of output file
  -u amino_a, --unlabelled-aa amino_a
                        amino acids with default abundance
  -v, --version         show program's version number and exit

[4]:

# File used.
!cat peptides.tsv

pep_name        pep_sequence    pep_charge
seq1    YAQEISR 2
seq2    VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK      3
seq3    QRTTFFVLGINTVNYPDIYEHILER       2
seq4    AELFL(Glutathione)LNR   1
seq5    .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)DTCLSIVRPNDSKPLDNR 4
seq6    YKTMNTFDPD(Heme)EKFEWFQVWQAVK   2
seq7    HKSASSPAV(Pro-&gt;Val)NADTDIQDSSTPSTSPSGRR  2
seq8    FHNK    1
seq9    .(Glutathione)MDLEIK    3
seq10   LANEKPEDVFER    2
seq11   .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)LDFWETLRSLATTNPNPPVEK     3

Minimal command¶

[5]:

!seq-to-first-iso peptides.tsv pep_sequence pep_charge

Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output=None, sequence_col_name='pep_sequence', unlabelled_aa=[])
[2019-12-05, 17:22:32] INFO    : Parsing file
[2019-12-05, 17:22:32] INFO    : Read peptides.tsv
[2019-12-05, 17:22:32] INFO    : Found 11 lines and 3 columns
[2019-12-05, 17:22:32] INFO    : Reading sequences.
[2019-12-05, 17:22:32] INFO    : Computing composition and formula.
[2019-12-05, 17:22:32] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:32] INFO    : Computing neutral mass
[2019-12-05, 17:22:32] INFO    : Computing M0 and M1

Running the command above will write a tab-separated-values file (peptides_stfi.tsv).

[6]:

# Read basic output file.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()

[6]:

	pep_name	pep_sequence	pep_charge	stfi_neutral_mass	stfi_formula	stfi_formula_X	stfi_M0_NC	stfi_M1_NC	stfi_M0_12C	stfi_M1_12C
0	seq1	YAQEISR	2	865.429381	C37H61O13N11	C37H61O13N11	0.620499	0.280949	0.920444	0.051819
1	seq2	VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK	3	3838.102264	C172H288O49N48P1	C172H288O49N48P1	0.113085	0.236277	0.707156	0.174161
2	seq3	QRTTFFVLGINTVNYPDIYEHILER	2	3037.566156	C140H214O40N36	C140H214O40N36	0.171920	0.290033	0.764407	0.142807
3	seq4	AELFL(Glutathione)LNR	1	1279.623072	C55H90O18N15S1	C55H90O18N15S1	0.470882	0.318073	0.846220	0.072875
4	seq5	.(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D...	4	5049.638616	C226H365O68N61S1	C226H365O68N61S1	0.054173	0.148735	0.602333	0.195036

Changing output name¶

You can also change the name of the output file

[7]:

!seq-to-first-iso peptides.tsv pep_sequence pep_charge -o seq_stfi

Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output='seq_stfi', sequence_col_name='pep_sequence', unlabelled_aa=[])
[2019-12-05, 17:22:34] INFO    : Parsing file
[2019-12-05, 17:22:34] INFO    : Read peptides.tsv
[2019-12-05, 17:22:34] INFO    : Found 11 lines and 3 columns
[2019-12-05, 17:22:34] INFO    : Reading sequences.
[2019-12-05, 17:22:34] INFO    : Computing composition and formula.
[2019-12-05, 17:22:34] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:34] INFO    : Computing neutral mass
[2019-12-05, 17:22:34] INFO    : Computing M0 and M1

[8]:

# Read output file with different name.
df = pd.read_csv("seq_stfi.tsv", sep="\t")
df.head()

[8]:

	pep_name	pep_sequence	pep_charge	stfi_neutral_mass	stfi_formula	stfi_formula_X	stfi_M0_NC	stfi_M1_NC	stfi_M0_12C	stfi_M1_12C
0	seq1	YAQEISR	2	865.429381	C37H61O13N11	C37H61O13N11	0.620499	0.280949	0.920444	0.051819
1	seq2	VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK	3	3838.102264	C172H288O49N48P1	C172H288O49N48P1	0.113085	0.236277	0.707156	0.174161
2	seq3	QRTTFFVLGINTVNYPDIYEHILER	2	3037.566156	C140H214O40N36	C140H214O40N36	0.171920	0.290033	0.764407	0.142807
3	seq4	AELFL(Glutathione)LNR	1	1279.623072	C55H90O18N15S1	C55H90O18N15S1	0.470882	0.318073	0.846220	0.072875
4	seq5	.(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D...	4	5049.638616	C226H365O68N61S1	C226H365O68N61S1	0.054173	0.148735	0.602333	0.195036

Specifying unlabelled amino acids¶

[9]:

!seq-to-first-iso peptides.tsv pep_sequence pep_charge -u V,W

Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output=None, sequence_col_name='pep_sequence', unlabelled_aa=['V', 'W'])
[2019-12-05, 17:22:36] INFO    : Amino acid with default abundance: ['V', 'W']
[2019-12-05, 17:22:36] INFO    : Parsing file
[2019-12-05, 17:22:36] INFO    : Read peptides.tsv
[2019-12-05, 17:22:36] INFO    : Found 11 lines and 3 columns
[2019-12-05, 17:22:36] INFO    : Reading sequences.
[2019-12-05, 17:22:36] INFO    : Computing composition and formula.
[2019-12-05, 17:22:36] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:36] INFO    : Computing neutral mass
[2019-12-05, 17:22:36] INFO    : Computing M0 and M1

[10]:

# Read output file with different name and unlabelled amino acids.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()

[10]:

	pep_name	pep_sequence	pep_charge	stfi_neutral_mass	stfi_formula	stfi_formula_X	stfi_M0_NC	stfi_M1_NC	stfi_M0_12C	stfi_M1_12C
0	seq1	YAQEISR	2	865.429381	C37H61O13N11	C37H61O13N11	0.620499	0.280949	0.920444	0.051819
1	seq2	VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK	3	3838.102264	C172H288O49N48P1	C141H288O49N48P1X31	0.113085	0.236277	0.508195	0.293976
2	seq3	QRTTFFVLGINTVNYPDIYEHILER	2	3037.566156	C140H214O40N36	C130H214O40N36X10	0.171920	0.290033	0.687130	0.202001
3	seq4	AELFL(Glutathione)LNR	1	1279.623072	C55H90O18N15S1	C55H90O18N15S1	0.470882	0.318073	0.846220	0.072875
4	seq5	.(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D...	4	5049.638616	C226H365O68N61S1	C211H365O68N61S1X15	0.054173	0.148735	0.513344	0.248734

The carbon of unlabelled amino acids is shown as X in column stfi_formula_X.

For peptide YAQEISR, there is no unlabelled amino acids, stfi_formula and stfi_formula_X are identical. M0 and M1 intensities are not affected by the V and W auxotrophy.

[ ]: