KNIME configuration

You can install and use seq-to-first-iso with the Analytics platform KNIME to process data from SLIM-labeling.

Requirements:

  • KNIME

  • conda

  • a little bit of Python knowledge

(Note: if you wish to use pandas > 0.23, you need to upgrade KNIME to 3.7.2 at least)

Set Python with KNIME

You need to install and configure a Python extension.
This guide is adapted from the 3.7 Python installation guide from KNIME.

Set up the conda environment

On Windows, if you want to use conda with the default command-line interface CMD, you need to do some configuration, else use Anaconda prompt (or any other interface that recognizes conda) bundled with conda.

If conda was not added to the PATH environment variable during the installation, you have to configure your shell to use the conda commands:

  • You can add the PATH to the folder containing conda, the command in CMD should look like:

 set PATH=%PATH%;C:<PATH_WHERE_YOU_INSTALLED_CONDA>\Scripts

By default <PATH_WHERE_YOU_INSTALLED_CONDA> is C:\Users\<Username>\<CONDA_INSTALLATION> where <Username> is the Windows username and <CONDA_INSTALLATION> the conda installer used (e.g: “Miniconda3”, “Anaconda4” …).
This command only works for the current command prompt and you will have to type it again if you want to use conda in a new command prompt.

  • If you want CMD to always recognize conda, you will have to add conda to the system environment. One way to do it is with the command prompt:

 setx PATH=%PATH%;C:<PATH_WHERE_YOU_INSTALLED_CONDA>\Scripts

Create a conda environment

In the GitHub repository, go to knime/environment-knime.yml and download the raw version of the file (Right click → Save as …).

_images/knime_conda_yaml.png

Then in the directory where environment-knime.yml was downloaded open a shell/Anaconda prompt and use:

conda env create -f environment-knime.yml

Note: to move within folders with a shell use the cd command.
Now if you do

conda env list

you should see seq-to-first-iso-knime in the environments.

Create a start script (KNIME <4.0)

For versions of KNIME lower than 4.0, a launch script is needed.
Create a small script to start the conda environment by using the templates defined by KNIME.
In our case <ENVIRONMENT_NAME> is seq-to-first-iso-knime while <PATH_WHERE_YOU_INSTALLED_ANACONDA> depend on the user’s conda configuration and operating sytem.

To test if your script works, try to execute it:

  • For Windows, double-click on it

  • For Linux/Mac, make your file executable

    chmod gou+x <script_name>
    

    then execute it with

    ./<script_name>
    

Configure the Python extension

In the KNIME interface, go to File → Install KNIME Extensions then search for Python Integration to find the KNIME Python Integration.

_images/knime_python_integration.png

KNIME Python extension in the Install window

Now you just need to configure the Python executable KNIME will use.
Go into File → Preferences → KNIME → Python then in the Python 3 subsection, paste the absolute path to your start script.
e.g: Windows path C:\Documents\<script_name>, Unix path home/<user>/<script_name>.

For KNIME 4.X, you only need to write the path to the conda installation directory (your <PATH_WHERE_YOU_INSTALLED_CONDA>) and select seq-to-first-iso-knime in the dropdown menu for Python 3.

Next use Python 3 as default, then Apply and close.

If everything went alright, you should now be able to use Python script nodes with KNIME.

Use seq-to-first-iso with KNIME

Warning: the tutorial assumes you use seq-to-first-iso 0.5.0, functions in other versions might differ. If you want to use other versions, you have to make sure the changes across versions will not break your workflow.

The following steps are mostly examples, you can adapt them for your needs.

If the file is not parsed

You can use the parsing provided by seq-to-first-iso if you want to read from a formatted file (one sequence per line).

Create a Python scripting node by going in the Node Repository then Scripting → Python → Python Source. Then configure the node.

import seq_to_first_iso as stfi

# Parse the file, change <PATH_TO_YOUR_FILE> to your file path.
parsed_output = stfi.sequence_parser("<PATH_TO_YOUR_FILE>")
# Get rid of ignored_lines for seq_to_df.
parsed_output.pop("ignored_lines")

# List of unlabelled amino acids, change if you need.
unlabelled_aa = ["A", "C"]

# The name output_table will inform KNIME that
# the variable is an output table.
output_table = stfi.seq_to_df(unlabelled_aa=unlabelled_aa,
                              **parsed_output)

_images/knime_minimal_python_source.png

Minimal implementation of a Python node without prior input in KNIME

If peptides are already in a table

If you already have a table with peptide sequences, you can feed it to function seq_to_df() to get the output dataframe

Create a Python scripting node by going in the Node Repository then Scripting → Python → Python Script (1⇒1). The node will receive a table as an input.

from seq_to_first_iso import seq_to_df

# Copy input table.
output_table = input_table.copy()

# Set the unlabelled amino acids.
unlabelled_aa = ["A", "C"]

# If the input table has more than 2 columns
# annotations will be the first column while
# sequences are the second one.
# Else, we consider the only column contains sequences.
try:
  sequences = output_table.iloc[:,1]
  # Cast pd.Series to list.
  annotations = list(output_table.iloc[:,0])
except:
  sequences = output_table.iloc[:,0]
  annotations = []

output_table = seq_to_df(sequences, unlabelled_aa, annotations=annotations)

In the example above, you may also chose which columns to take into account as arguments for seq_to_df().
e.g: The fourth column of your table has lists of post-translational modifications, you can capture the column in a variable with ptms = list(output_table.iloc[:,3]) then use the variable output_table = seq_to_df(sequences, unlabelled_aa, modifications=ptms).

_images/knime_minimal_table_input.png

Minimal implementation of a Python node with table input