Command line interface of seq-to-first-iso¶
seq-to-first-iso computes the first two isotopologue intentities (M0 and M1) from peptide sequences with natural carbon and with 99.99% 12C enriched carbon.
The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.
seq-to-first-iso is available as a Python module.
[1]:
import pandas as pd # For output visualisation.
Note: the exclamation mark ``!`` is a magic command to run a Linux command within a Jupyter notebook. In a real Linux terminal, you don’t need it.
[2]:
!seq-to-first-iso -v
seq-to-first-iso 1.0.0
[3]:
!seq-to-first-iso -h
usage: seq-to-first-iso [-h] [-o OUTPUT] [-u amino_a] [-v]
input_file_name sequence_col_name charge_col_name
Read a tsv file with sequences and charges and compute intensity of first
isotopologues
positional arguments:
input_file_name file to parse in .tsv format
sequence_col_name column name with sequences
charge_col_name column name with charges
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
name of output file
-u amino_a, --unlabelled-aa amino_a
amino acids with default abundance
-v, --version show program's version number and exit
[4]:
# File used.
!cat peptides.tsv
pep_name pep_sequence pep_charge
seq1 YAQEISR 2
seq2 VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK 3
seq3 QRTTFFVLGINTVNYPDIYEHILER 2
seq4 AELFL(Glutathione)LNR 1
seq5 .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)DTCLSIVRPNDSKPLDNR 4
seq6 YKTMNTFDPD(Heme)EKFEWFQVWQAVK 2
seq7 HKSASSPAV(Pro->Val)NADTDIQDSSTPSTSPSGRR 2
seq8 FHNK 1
seq9 .(Glutathione)MDLEIK 3
seq10 LANEKPEDVFER 2
seq11 .(Acetyl)SDTPLR(Oxidation)D(Acetyl)EDG(Acetyl)LDFWETLRSLATTNPNPPVEK 3
Minimal command¶
[5]:
!seq-to-first-iso peptides.tsv pep_sequence pep_charge
Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output=None, sequence_col_name='pep_sequence', unlabelled_aa=[])
[2019-12-05, 17:22:32] INFO : Parsing file
[2019-12-05, 17:22:32] INFO : Read peptides.tsv
[2019-12-05, 17:22:32] INFO : Found 11 lines and 3 columns
[2019-12-05, 17:22:32] INFO : Reading sequences.
[2019-12-05, 17:22:32] INFO : Computing composition and formula.
[2019-12-05, 17:22:32] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:32] INFO : Computing neutral mass
[2019-12-05, 17:22:32] INFO : Computing M0 and M1
Running the command above will write a tab-separated-values file (peptides_stfi.tsv
).
[6]:
# Read basic output file.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()
[6]:
pep_name | pep_sequence | pep_charge | stfi_neutral_mass | stfi_formula | stfi_formula_X | stfi_M0_NC | stfi_M1_NC | stfi_M0_12C | stfi_M1_12C | |
---|---|---|---|---|---|---|---|---|---|---|
0 | seq1 | YAQEISR | 2 | 865.429381 | C37H61O13N11 | C37H61O13N11 | 0.620499 | 0.280949 | 0.920444 | 0.051819 |
1 | seq2 | VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK | 3 | 3838.102264 | C172H288O49N48P1 | C172H288O49N48P1 | 0.113085 | 0.236277 | 0.707156 | 0.174161 |
2 | seq3 | QRTTFFVLGINTVNYPDIYEHILER | 2 | 3037.566156 | C140H214O40N36 | C140H214O40N36 | 0.171920 | 0.290033 | 0.764407 | 0.142807 |
3 | seq4 | AELFL(Glutathione)LNR | 1 | 1279.623072 | C55H90O18N15S1 | C55H90O18N15S1 | 0.470882 | 0.318073 | 0.846220 | 0.072875 |
4 | seq5 | .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... | 4 | 5049.638616 | C226H365O68N61S1 | C226H365O68N61S1 | 0.054173 | 0.148735 | 0.602333 | 0.195036 |
Changing output name¶
You can also change the name of the output file
[7]:
!seq-to-first-iso peptides.tsv pep_sequence pep_charge -o seq_stfi
Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output='seq_stfi', sequence_col_name='pep_sequence', unlabelled_aa=[])
[2019-12-05, 17:22:34] INFO : Parsing file
[2019-12-05, 17:22:34] INFO : Read peptides.tsv
[2019-12-05, 17:22:34] INFO : Found 11 lines and 3 columns
[2019-12-05, 17:22:34] INFO : Reading sequences.
[2019-12-05, 17:22:34] INFO : Computing composition and formula.
[2019-12-05, 17:22:34] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:34] INFO : Computing neutral mass
[2019-12-05, 17:22:34] INFO : Computing M0 and M1
[8]:
# Read output file with different name.
df = pd.read_csv("seq_stfi.tsv", sep="\t")
df.head()
[8]:
pep_name | pep_sequence | pep_charge | stfi_neutral_mass | stfi_formula | stfi_formula_X | stfi_M0_NC | stfi_M1_NC | stfi_M0_12C | stfi_M1_12C | |
---|---|---|---|---|---|---|---|---|---|---|
0 | seq1 | YAQEISR | 2 | 865.429381 | C37H61O13N11 | C37H61O13N11 | 0.620499 | 0.280949 | 0.920444 | 0.051819 |
1 | seq2 | VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK | 3 | 3838.102264 | C172H288O49N48P1 | C172H288O49N48P1 | 0.113085 | 0.236277 | 0.707156 | 0.174161 |
2 | seq3 | QRTTFFVLGINTVNYPDIYEHILER | 2 | 3037.566156 | C140H214O40N36 | C140H214O40N36 | 0.171920 | 0.290033 | 0.764407 | 0.142807 |
3 | seq4 | AELFL(Glutathione)LNR | 1 | 1279.623072 | C55H90O18N15S1 | C55H90O18N15S1 | 0.470882 | 0.318073 | 0.846220 | 0.072875 |
4 | seq5 | .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... | 4 | 5049.638616 | C226H365O68N61S1 | C226H365O68N61S1 | 0.054173 | 0.148735 | 0.602333 | 0.195036 |
Specifying unlabelled amino acids¶
[9]:
!seq-to-first-iso peptides.tsv pep_sequence pep_charge -u V,W
Namespace(charge_col_name='pep_charge', input_file_name=PosixPath('peptides.tsv'), output=None, sequence_col_name='pep_sequence', unlabelled_aa=['V', 'W'])
[2019-12-05, 17:22:36] INFO : Amino acid with default abundance: ['V', 'W']
[2019-12-05, 17:22:36] INFO : Parsing file
[2019-12-05, 17:22:36] INFO : Read peptides.tsv
[2019-12-05, 17:22:36] INFO : Found 11 lines and 3 columns
[2019-12-05, 17:22:36] INFO : Reading sequences.
[2019-12-05, 17:22:36] INFO : Computing composition and formula.
[2019-12-05, 17:22:36] WARNING : Fe in (Heme) is not supported in the computation of M0 and M1
[2019-12-05, 17:22:36] INFO : Computing neutral mass
[2019-12-05, 17:22:36] INFO : Computing M0 and M1
[10]:
# Read output file with different name and unlabelled amino acids.
df = pd.read_csv("peptides_stfi.tsv", sep="\t")
df.head()
[10]:
pep_name | pep_sequence | pep_charge | stfi_neutral_mass | stfi_formula | stfi_formula_X | stfi_M0_NC | stfi_M1_NC | stfi_M0_12C | stfi_M1_12C | |
---|---|---|---|---|---|---|---|---|---|---|
0 | seq1 | YAQEISR | 2 | 865.429381 | C37H61O13N11 | C37H61O13N11 | 0.620499 | 0.280949 | 0.920444 | 0.051819 |
1 | seq2 | VLLIDLRIPQR(Phospho)SAINHIVAPNLVNVDPNLLWDK | 3 | 3838.102264 | C172H288O49N48P1 | C141H288O49N48P1X31 | 0.113085 | 0.236277 | 0.508195 | 0.293976 |
2 | seq3 | QRTTFFVLGINTVNYPDIYEHILER | 2 | 3037.566156 | C140H214O40N36 | C130H214O40N36X10 | 0.171920 | 0.290033 | 0.687130 | 0.202001 |
3 | seq4 | AELFL(Glutathione)LNR | 1 | 1279.623072 | C55H90O18N15S1 | C55H90O18N15S1 | 0.470882 | 0.318073 | 0.846220 | 0.072875 |
4 | seq5 | .(Acetyl)VGEVFINYIQRQNELFQGKLAYLII(Oxidation)D... | 4 | 5049.638616 | C226H365O68N61S1 | C211H365O68N61S1X15 | 0.054173 | 0.148735 | 0.513344 | 0.248734 |
The carbon of unlabelled amino acids is shown as X
in column stfi_formula_X
.
For peptide YAQEISR
, there is no unlabelled amino acids, stfi_formula
and stfi_formula_X
are identical. M0 and M1 intensities are not affected by the V and W auxotrophy.
[ ]: