Command-line interface of seq-to-first-iso¶
seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99 % 12C enriched carbon.
The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.
seq-to-first-iso is available as a command line tool.
[1]:
import pandas as pd # For output visualisation.
Note: the exclamation mark ``!`` is a magic command to run a Linux command within a Jupyter notebook. In a real Linux terminal, you don’t need it.
[2]:
!seq-to-first-iso -v
seq-to-first-iso 0.4.3
[3]:
!seq-to-first-iso -h
usage: seq-to-first-iso [-h] [-o OUTPUT] [-n amino_a] [-v] input
Read a file of sequences and creates a tsv file
positional arguments:
input file to parse
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
name of output file
-n amino_a, --non-labelled-aa amino_a
amino acids with default abundance
-v, --version show program's version number and exit
The output file will have columns
sequence |
mass |
formula |
formula_X |
M0_NC |
M1_NC |
M0_12C |
M1_12C |
---|---|---|---|---|---|---|---|
original sequence |
sequence mass |
chemical formula |
chemical formula with X |
M0 in NC |
M1 in NC |
M0 in 12C |
M1 in 12C |
[4]:
# File used.
!cat peptides.txt
YAQEISR
VGFPVLSVKEHK
LAMVIIKEFVDDLK
Minimal command¶
[5]:
!seq-to-first-iso peptides.txt
[2019-06-28, 14:49:36] INFO : Parsing file
[2019-06-28, 14:49:36] INFO : Computing formula
[2019-06-28, 14:49:36] INFO : Computing composition of modifications
[2019-06-28, 14:49:36] INFO : Computing mass
[2019-06-28, 14:49:36] INFO : Computing M0 and M1
Running the command above will create a file with tab-separated values : peptides_stfi.tsv
[6]:
# Read basic output file.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()
[6]:
sequence | mass | formula | formula_X | M0_NC | M1_NC | M0_12C | M1_12C | |
---|---|---|---|---|---|---|---|---|
0 | YAQEISR | 865.429381 | C37H59O13N11 | C37H59O13N11 | 0.620641 | 0.280871 | 0.920656 | 0.051619 |
1 | VGFPVLSVKEHK | 1338.765971 | C63H102O16N16 | C63H102O16N16 | 0.455036 | 0.345060 | 0.890522 | 0.074113 |
2 | LAMVIIKEFVDDLK | 1632.916066 | C76H128O21N16S1 | C76H128O21N16S1 | 0.369940 | 0.337319 | 0.831576 | 0.081017 |
Changing output name¶
You can also change the name of the output file
[7]:
!seq-to-first-iso peptides.txt -o sequence
[2019-06-28, 14:49:38] INFO : Parsing file
[2019-06-28, 14:49:38] INFO : Computing formula
[2019-06-28, 14:49:38] INFO : Computing composition of modifications
[2019-06-28, 14:49:38] INFO : Computing mass
[2019-06-28, 14:49:38] INFO : Computing M0 and M1
[8]:
# Read output file with different name.
df = pd.read_csv("sequence.tsv", sep="\t")
df.head()
[8]:
sequence | mass | formula | formula_X | M0_NC | M1_NC | M0_12C | M1_12C | |
---|---|---|---|---|---|---|---|---|
0 | YAQEISR | 865.429381 | C37H59O13N11 | C37H59O13N11 | 0.620641 | 0.280871 | 0.920656 | 0.051619 |
1 | VGFPVLSVKEHK | 1338.765971 | C63H102O16N16 | C63H102O16N16 | 0.455036 | 0.345060 | 0.890522 | 0.074113 |
2 | LAMVIIKEFVDDLK | 1632.916066 | C76H128O21N16S1 | C76H128O21N16S1 | 0.369940 | 0.337319 | 0.831576 | 0.081017 |
Choosing unlabelled amino acids¶
[9]:
!seq-to-first-iso peptides.txt -n V,W -o sequence
[2019-06-28, 14:49:41] INFO : Amino acid with default abundance: ['V', 'W']
[2019-06-28, 14:49:41] INFO : Parsing file
[2019-06-28, 14:49:41] INFO : Computing formula
[2019-06-28, 14:49:41] INFO : Computing composition of modifications
[2019-06-28, 14:49:41] INFO : Computing mass
[2019-06-28, 14:49:41] INFO : Computing M0 and M1
[10]:
# Read output file with different name and unlabelled amino acids.
df = pd.read_csv("sequence.tsv", sep="\t")
df.head()
[10]:
sequence | mass | formula | formula_X | M0_NC | M1_NC | M0_12C | M1_12C | |
---|---|---|---|---|---|---|---|---|
0 | YAQEISR | 865.429381 | C37H59O13N11 | C37H59O13N11 | 0.620641 | 0.280871 | 0.920656 | 0.051619 |
1 | VGFPVLSVKEHK | 1338.765971 | C63H102O16N16 | C48H102O16N16X15 | 0.455036 | 0.345060 | 0.758956 | 0.185155 |
2 | LAMVIIKEFVDDLK | 1632.916066 | C76H128O21N16S1 | C66H128O21N16S1X10 | 0.369940 | 0.337319 | 0.747509 | 0.152927 |