Command-line interface of seq-to-first-iso¶

seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99 % 12C enriched carbon.

The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.

seq-to-first-iso is available as a command line tool.

[1]:

import pandas as pd  # For output visualisation.

Note: the exclamation mark ``!`` is a magic command to run a Linux command within a Jupyter notebook. In a real Linux terminal, you don’t need it.

[2]:

!seq-to-first-iso -v

seq-to-first-iso 0.4.3

[3]:

!seq-to-first-iso -h

usage: seq-to-first-iso [-h] [-o OUTPUT] [-n amino_a] [-v] input

Read a file of sequences and creates a tsv file

positional arguments:
  input                 file to parse

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        name of output file
  -n amino_a, --non-labelled-aa amino_a
                        amino acids with default abundance
  -v, --version         show program's version number and exit

The output file will have columns

sequence	mass	formula	formula_X	M0_NC	M1_NC	M0_12C	M1_12C
original sequence	sequence mass	chemical formula	chemical formula with X	M0 in NC	M1 in NC	M0 in 12C	M1 in 12C

X: Virtual element created to represent unlabelled carbon
NC: Normal Condition (Natural Carbon)
12C: 12C condition (12C enriched carbon)

[4]:

# File used.
!cat peptides.txt

YAQEISR
VGFPVLSVKEHK
LAMVIIKEFVDDLK

Minimal command¶

[5]:

!seq-to-first-iso peptides.txt

[2019-06-28, 14:49:36] INFO    : Parsing file
[2019-06-28, 14:49:36] INFO    : Computing formula
[2019-06-28, 14:49:36] INFO    : Computing composition of modifications
[2019-06-28, 14:49:36] INFO    : Computing mass
[2019-06-28, 14:49:36] INFO    : Computing M0 and M1

Running the command above will create a file with tab-separated values : peptides_stfi.tsv

[6]:

# Read basic output file.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()

[6]:

	sequence	mass	formula	formula_X	M0_NC	M1_NC	M0_12C	M1_12C
0	YAQEISR	865.429381	C37H59O13N11	C37H59O13N11	0.620641	0.280871	0.920656	0.051619
1	VGFPVLSVKEHK	1338.765971	C63H102O16N16	C63H102O16N16	0.455036	0.345060	0.890522	0.074113
2	LAMVIIKEFVDDLK	1632.916066	C76H128O21N16S1	C76H128O21N16S1	0.369940	0.337319	0.831576	0.081017

Changing output name¶

You can also change the name of the output file

[7]:

!seq-to-first-iso peptides.txt -o sequence

[2019-06-28, 14:49:38] INFO    : Parsing file
[2019-06-28, 14:49:38] INFO    : Computing formula
[2019-06-28, 14:49:38] INFO    : Computing composition of modifications
[2019-06-28, 14:49:38] INFO    : Computing mass
[2019-06-28, 14:49:38] INFO    : Computing M0 and M1

[8]:

# Read output file with different name.
df = pd.read_csv("sequence.tsv", sep="\t")
df.head()

[8]:

	sequence	mass	formula	formula_X	M0_NC	M1_NC	M0_12C	M1_12C
0	YAQEISR	865.429381	C37H59O13N11	C37H59O13N11	0.620641	0.280871	0.920656	0.051619
1	VGFPVLSVKEHK	1338.765971	C63H102O16N16	C63H102O16N16	0.455036	0.345060	0.890522	0.074113
2	LAMVIIKEFVDDLK	1632.916066	C76H128O21N16S1	C76H128O21N16S1	0.369940	0.337319	0.831576	0.081017

Choosing unlabelled amino acids¶

[9]:

!seq-to-first-iso peptides.txt -n V,W -o sequence

[2019-06-28, 14:49:41] INFO    : Amino acid with default abundance: ['V', 'W']
[2019-06-28, 14:49:41] INFO    : Parsing file
[2019-06-28, 14:49:41] INFO    : Computing formula
[2019-06-28, 14:49:41] INFO    : Computing composition of modifications
[2019-06-28, 14:49:41] INFO    : Computing mass
[2019-06-28, 14:49:41] INFO    : Computing M0 and M1

[10]:

# Read output file with different name and unlabelled amino acids.
df = pd.read_csv("sequence.tsv", sep="\t")
df.head()

[10]:

	sequence	mass	formula	formula_X	M0_NC	M1_NC	M0_12C	M1_12C
0	YAQEISR	865.429381	C37H59O13N11	C37H59O13N11	0.620641	0.280871	0.920656	0.051619
1	VGFPVLSVKEHK	1338.765971	C63H102O16N16	C48H102O16N16X15	0.455036	0.345060	0.758956	0.185155
2	LAMVIIKEFVDDLK	1632.916066	C76H128O21N16S1	C66H128O21N16S1X10	0.369940	0.337319	0.747509	0.152927

The carbon of unlabelled amino acids is shown as X in column “formula_X”.
We can observe that for sequence “YAQEISR” that has no unlabelled amino acids, M0 and M1 are the same as the previous sequence.tsv, regardless of the condition.
In contrast sequence “VGFPVLSVKEHK”, in 12C condition, has M0 go down from 0.8905224988642593 to 0.7589558393662944 and M1 go up from 0.07411308335404865 to 0.18515489894512063.