Command-line interface of seq-to-first-iso

seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99 % 12C enriched carbon.

The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.

seq-to-first-iso is available as a command line tool.

[1]:
import pandas as pd  # For output visualisation.

Note: the exclamation mark ``!`` is a magic command to run a Linux command within a Jupyter notebook. In a real Linux terminal, you don’t need it.

[2]:
!seq-to-first-iso -v
seq-to-first-iso 0.4.3
[3]:
!seq-to-first-iso -h
usage: seq-to-first-iso [-h] [-o OUTPUT] [-n amino_a] [-v] input

Read a file of sequences and creates a tsv file

positional arguments:
  input                 file to parse

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        name of output file
  -n amino_a, --non-labelled-aa amino_a
                        amino acids with default abundance
  -v, --version         show program's version number and exit

The output file will have columns

sequence

mass

formula

formula_X

M0_NC

M1_NC

M0_12C

M1_12C

original sequence

sequence mass

chemical formula

chemical formula with X

M0 in NC

M1 in NC

M0 in 12C

M1 in 12C

X: Virtual element created to represent unlabelled carbon
NC: Normal Condition (Natural Carbon)
12C: 12C condition (12C enriched carbon)
[4]:
# File used.
!cat peptides.txt
YAQEISR
VGFPVLSVKEHK
LAMVIIKEFVDDLK

Minimal command

[5]:
!seq-to-first-iso peptides.txt
[2019-06-28, 14:49:36] INFO    : Parsing file
[2019-06-28, 14:49:36] INFO    : Computing formula
[2019-06-28, 14:49:36] INFO    : Computing composition of modifications
[2019-06-28, 14:49:36] INFO    : Computing mass
[2019-06-28, 14:49:36] INFO    : Computing M0 and M1

Running the command above will create a file with tab-separated values : peptides_stfi.tsv

[6]:
# Read basic output file.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()
[6]:
sequence mass formula formula_X M0_NC M1_NC M0_12C M1_12C
0 YAQEISR 865.429381 C37H59O13N11 C37H59O13N11 0.620641 0.280871 0.920656 0.051619
1 VGFPVLSVKEHK 1338.765971 C63H102O16N16 C63H102O16N16 0.455036 0.345060 0.890522 0.074113
2 LAMVIIKEFVDDLK 1632.916066 C76H128O21N16S1 C76H128O21N16S1 0.369940 0.337319 0.831576 0.081017

Changing output name

You can also change the name of the output file

[7]:
!seq-to-first-iso peptides.txt -o sequence
[2019-06-28, 14:49:38] INFO    : Parsing file
[2019-06-28, 14:49:38] INFO    : Computing formula
[2019-06-28, 14:49:38] INFO    : Computing composition of modifications
[2019-06-28, 14:49:38] INFO    : Computing mass
[2019-06-28, 14:49:38] INFO    : Computing M0 and M1
[8]:
# Read output file with different name.
df = pd.read_csv("sequence.tsv", sep="\t")
df.head()
[8]:
sequence mass formula formula_X M0_NC M1_NC M0_12C M1_12C
0 YAQEISR 865.429381 C37H59O13N11 C37H59O13N11 0.620641 0.280871 0.920656 0.051619
1 VGFPVLSVKEHK 1338.765971 C63H102O16N16 C63H102O16N16 0.455036 0.345060 0.890522 0.074113
2 LAMVIIKEFVDDLK 1632.916066 C76H128O21N16S1 C76H128O21N16S1 0.369940 0.337319 0.831576 0.081017

Choosing unlabelled amino acids

[9]:
!seq-to-first-iso peptides.txt -n V,W -o sequence
[2019-06-28, 14:49:41] INFO    : Amino acid with default abundance: ['V', 'W']
[2019-06-28, 14:49:41] INFO    : Parsing file
[2019-06-28, 14:49:41] INFO    : Computing formula
[2019-06-28, 14:49:41] INFO    : Computing composition of modifications
[2019-06-28, 14:49:41] INFO    : Computing mass
[2019-06-28, 14:49:41] INFO    : Computing M0 and M1
[10]:
# Read output file with different name and unlabelled amino acids.
df = pd.read_csv("sequence.tsv", sep="\t")
df.head()
[10]:
sequence mass formula formula_X M0_NC M1_NC M0_12C M1_12C
0 YAQEISR 865.429381 C37H59O13N11 C37H59O13N11 0.620641 0.280871 0.920656 0.051619
1 VGFPVLSVKEHK 1338.765971 C63H102O16N16 C48H102O16N16X15 0.455036 0.345060 0.758956 0.185155
2 LAMVIIKEFVDDLK 1632.916066 C76H128O21N16S1 C66H128O21N16S1X10 0.369940 0.337319 0.747509 0.152927
The carbon of unlabelled amino acids is shown as X in column “formula_X”.
We can observe that for sequence “YAQEISR” that has no unlabelled amino acids, M0 and M1 are the same as the previous sequence.tsv, regardless of the condition.
In contrast sequence “VGFPVLSVKEHK”, in 12C condition, has M0 go down from 0.8905224988642593 to 0.7589558393662944 and M1 go up from 0.07411308335404865 to 0.18515489894512063.