## [1] "Metaboseek version: 0.9.9.0"

1 Metaboseek Documentation

Metaboseek offers a graphical user interface to set up data analysis with the xcms package to detect and align molecular features from LC/MS data across multiple samples. You can then load xcms results into the app as a “Feature Table” (using xcms and MSnbase packages, mzR-based) and run statistical analyses to identify molecular features of interest.

Filter the xcms results, view and export chromatograms and mass spectra for molecular features of interest.
Generate and view molecular networks based on tandem-MS spectrum similarity between molecular features (using MassTools and igraph packages).
Annotate fragments in tandem-MS spectra with SIRIUS.

This document describes all UI elements in Metaboseek and is meant to be a comprehensive user manual.

2 Install Metaboseek

2.1 System Requirements

Recommended minimal system requirements:

Quad-core processor (or dual core with 4 threads)
8 GB of RAM (16 GB or more are preferred)
Recent versions of Firefox or Chrome web browsers (Metaboseek should work on any browser, but testing is done for these two)

We recommend computers with a monitor with at least full HD (1920 x 1080 pixels) resolution. You can use the zoom function of your web browser to scale the interface to your liking.

All files are loaded into memory, so that browsing will be very quick: It is easy to look at extracted ion chromatograms (EICs) for many MS features of interest across dozens of files within a fraction of a second. However, the initial loading of the data will take some time, and you may experience issues if you load many files at a time. We strongly recommend using centroided data files, as they will have a smaller memory footprint. Loading 50 data files from 20-minute high resolution LC/MS data acquisition should not be a problem on a computer with 16 GB of RAM.

2.1.1 Java

If installed from an R session, Metaboseek will require Java to be installed for full functionality (in particular, molecular structure plotting in the SIRIUS module. Java is also a requirement for installing SIRIUS itself). Make sure to install 64-bit Java if you are running 64-bit R (which is most likely), or 32-bit Java if you are running 32-bit R. If you go to java.com and follow the download buttons there, it will send you to download the version that corresponds to your browser (32- or 64-bit) by default, which may or may not be the version you need. Get the appropriate Java version from this page: https://www.java.com/en/download/manual.jsp.

2.2 Install on Windows

2.2.1 Using the Installer

Download the installer of the most recent release version here
Follow the installation steps.
Metaboseek 0.9 should now be installed and can be launched like any other Windows program. When Metaboseek launches, a command line window will appear, and the user interface will open up in your default web browser. To close the program, close the Metaboseek command line window.

The installer version of Metaboseek has one limitation: it does not plot molecular structures for predicted structures in the SIRIUS module. This is a compromise made so that this installation of Metaboseek does not require Java to be installed on your system.

2.2.2 Using a .zip File

Download the .zip file of the most recent release version here
Unzip the file on your computer (this may take a while!)
Run Metaboseek by executing runMetaboseek.exe In the unzipped folder.

The .zip version of Metaboseek has one limitation: it does not plot molecular structures for predicted structures in the SIRIUS module. This is a compromise made so that this installation of Metaboseek does not require Java to be installed on your system.

2.3 Install on Mac / Linux

Consider getting the Metaboseek Docker image, or follow these steps to install Metaboseek:

Download.
Mac users: Get Xcode by entering this line into your Terminal window:

xcode-select --install

Follow the instructions. This will install parts of Xcode that are required to install Metaboseek.

Run R, enter this line:

source("http://metaboseek.com/files/install_Metaboseek.R")

If there is an error during installation, try to additionally install Xcode from the AppStore.

To run Metaboseek, enter this line into your R console:

Metaboseek::runMseek()

If Metaboseek starts up, but an unresponsive grey screen appears, there is most likely an issue related to the rcdk package and Java). Restart R and enter this line:

remove.packages('rcdk')

Then try to run Metaboseek again.

2.4 Get the Docker Image

As they put it on their website, “Docker provides a way to run applications securely isolated in a container, packaged with all its dependencies and libraries.”. This is also a convenient way to reproduce analysis results that were generated with a particular version of Metaboseek. Once you have set up Docker on your computer, this is the easiest and most reproducible way to get fully functional Metaboseek, including SIRIUS integration.

Install and set up Docker. Please note that there are limitations for Windows users (Windows 10 Pro is required and using Docker prevents running Virtual Machines with VM VirtualBox).
You can now get the Metaboseek Docker image using this terminal command:

docker pull mjhelf/metaboseek

The metaboseek Docker image is based on the bioconductor/release_metabolomics2 image. 3. Running this command will execute the latest version of the Metaboseek container (and download it if not already available on your computer):

docker run -d -v HOSTFOLDER:/home/shiny/data -p 3840:80 -e PASSWORD=YOURPASSWORD mjhelf/metaboseek

Lets take a look at some key settings here:

HOSTFOLDER should be the path of a folder on your computer that contains all data that you want to analyze with Metaboseek, for example if used like this:

docker run -d -v /home/user123:/home/shiny/data -p 3840:80 -e PASSWORD=YOURPASSWORD mjhelf/metaboseek

NOTE: The apps hosted inside the container will be accessible from the internet (for anyone connecting to your computer’s IP address and the correct port number). By default, they will be protected by HTTP basic authentication, but that is not 100% secure. Once authenticated, the apps allow seeing the data structure of the specified HOSTFOLDER, and it is possible to download arbitrary .csv files and MS data from that folder. We are not liable for any data exposure to unauthorized parties or other damages.

All contents of the /home/user123 folder will be acessible in Metaboseek. * -p 3838:3838 means that port 3838 from the container will be accessible as port 3840 on the host computer.

-e PASSWORD=YOURPASSWORD this password has to be set. It can be used if you want to access rstudio inside the container, and is necessary to access the Metaboseek apps.
You can disable authentication for the apps by adding -e PROTECTED=false to this command, for instance to provide convenient public access to your data. WARNING: This makes the apps accessible from the internet (see note above)

Check if the Metaboseek container is running:

docker ps

Go to your web browser and go to the website localhost:3840, where the port number after the colon may differ based on your -p setting (see above). By default, you will have to log in, with the username metaboseek and the password you specified (YOURPASSWORD in our example). This will open a website, hosted inside the metaboseek container. Select the app you want to run and analyze your data!

2.5 Experienced R users (Windows, Mac or Linux):

If you have installed R (and the devtools package) already, you can install Metaboseek like this:

devtools::install_github("mjhelf/MassTools")
devtools::install_github("mjhelf/Metaboseek")

If you want to make sure you get all the required packages, run the install script with this line:

source("http://metaboseek.com/files/install_Metaboseek.R")

2.6 Use the web version

If you have trouble installing Metaboseek and want to just try it out with an example dataset, use the web version.

3 Data Analysis with Metaboseek

With Metaboseek, you can quickly visualize data from batches of high-resolution LC/MS data files and find differences between groups of samples. It is not necessary to do any analysis before looking at your data, but a typical workflow starts with a data analysis step:

Detection of MS features (defined by m/z and retention time) with the xcms analysis module
- alternatively, you can use an MS feature list that you generated elsewhere.
Molecular networking of MS2 spectra
Structure prediction using SIRIUS

Then, you can use Metaboseek to browse the data, find molecular features of interest, predict the molecular formula and make structure predictions based on MS2 data.

3.1 Overview

Metaboseek is structured into two major sections, the Data Explorer section for visualization and statistical analysis tasks, and the XCMS analysis to identify LC/MS features in MS data files. You can switch between these sections with the navigation menu on the left of the screen.

3.2 Navigation Bar Items

The buttons in the navigation bar either help with the user interface, or allow quick access to important functionalities. The Navigation bar, always showing up at the top of the Metaboseek interface

3.2.1 Interface Buttons

The leftmost Menubutton can be used to hide the navigation bar on the left side, allowing you to maximize screensize. Likewise, you can use the Fullscreen button to maximize the size of the browser window.

3.2.2 Functional Buttons

3.2.2.1 Load MS Data, Feature Tables or Sessions

The Load button allows you to load MS data, feature Tables and entire Metaboseek projects, as detailed here..

3.2.2.2 Save Session

Use the Save button to save the current Metaboseek session. This will save all Feature Tables, Molecular Networks and MS data files that you have loaded into the current session. You can choose to include the MS data in the session file (e.g. for simple sharing of an analysis with colleagues). However, this will increase file size significantly and may slow down the saving process. If MS data is not included in the saved session, Metaboseek will expect the MS data files to be in the same location when you load the session.

3.2.2.3 Global Options

Settings available :

Enabled cores: Number of CPU threads to use for parallel processing jobs. Changes to this settings require a restart of Metaboseek (close and restart the process)
Per page: Set how many Features should be shown per page for the Main Feature Table. Very large numbers here will slow down browsing the table.
Database Folder: Currently not used
Sirius Folder: NOTE: This setting has been moved to the Sirius options Tab of the Options Box in Metaboseek 0.9.6.3.

3.3 Start Page / Loading Data

The Metaboseek start page with data loading options and update news

The Start page provides you with information about the newest version of Metaboseek, and also allows you to load data into Metaboseek. You can also click on the Load icon on the left side of the navigation bar at the top of the page to get the same set of options for loading data:

3.3.1 Load Feature Tables

You can load any .csv or .mskFT file into Metaboseek. You can then go to the “Regroup Table” tab to specify or change the columns that contain intensity values. Feature Tables contain the results from feature detection with xcms, along with results from statistical analysis. If you load an .mskFT file, important metadata, such as processing history and sample grouping are loaded along with the result table. If you have loaded a project folder into the current session, there is a convenient option to select all compatible table files from the project folder as well.

Loading MS Data Files Directly.

All files with supported file extensions in the selected folders and all its subfolders can be imported, either by selecting files individually (selecting multiple files at a time is possible), or by importing an entire folder that contanis MS data (will import all compatible files from all subfolders, too). To save time, it makes sense to pre-sort your files in a reasonable folder structure (e.g. separate positive mode data from negative mode so you don’t get both kinds when selecting a folder to load into Metaboseek). Loading MS data files after you have already loaded a project folder allows you to visually inspect files that you had excluded from the xcms analysis, such as blanks.

3.3.2 Load a Metaboseek Project Folder.

When you run xcms through Metaboseek, the program generates a project folder that contains the results from that xcms analysis run, and all settings that were used in it. In addition, all output feature tables you requested will be saved in the project folder during the xcms run. You can load this result folder into Metaboseek, making it easier to keep all analysis results related to this xcms run in one place.

You can either select a project folder anywhere on your computer, or select a project folder from the recent project selection window that lists the most recently used project folders (load the selected folder with the Load Recent button). If you chose to load a project folder, all MS data files from the xcms run will be loaded and sample grouping information from the xcms analysis will be applied. Metaboseek will ask you which feature table you want to load from the project folder. If you select an .mskFT file (recommended) instead of the corresponding .csv file, you will benefit from the additional information embedded in these files. .csv files are primarily there for export and viewing in other tools (and even Microsoft Excel), while .mskFT files are designed to be loaded back into Metaboseek. The advantage of .mskFT files is that they contain the complete processing history (including settings used for the xcms run, CAMERA analysis and post-processing). .mskFT files are technically .RDS files containing an MseekFT object and can be loaded into any R session with the readRDS() function.

3.3.2.1 Load Example Data

You can select “example_projectfolder” from the “Recent projects” selection box and click on “Load recent”. Metaboseek will ask you which table you would like to load into the session along with the MS data that is associated with the exammple project folder.

3.3.3 Load a Metaboseek Session

You can load a Metaboseek session that you saved previously in an .msks file. This will restore all feature tables and MS data files you had loaded into that session along with many of the layout settings. Note: This will currently only work if the MS data file locations have not changed from the paths used in the old session. Some aspects of the session will not be restored (notably, molecular networks are not saved in the session file).

3.3.4 Supported File Types

Metaboseek uses the MSnbase and xcms packages to load MS data files of the following formats.:

.cdf
.nc
.mzData
.mzML
.mzXML

Note: Data needs to be centroided.

Feature Tables can be loaded in these formats:

.csv : comma-separated values. Metaboseek will expect a column mz with m/z values and a column rt with retention time values in seconds.
.mskFT: Metaboseek’s format to keep feature tables with metadata, including processing history

3.4 Data Explorer

Sirius Options

At the heart of Metaboseek is the interaction between data visualization in the “Data viewer” box, and a table of LC/MS data features in the Feature table box

3.4.1 Options Box

This box provides a number of optional functionality, including setting up SIRIUS, calculating molecular formulas and controling the appearance of extracted ion chromatograms (EICs) in the Data Viewer box.

3.4.1.1 Sirius Options

Sirius Options

The settings here are passed on the the SIRIUS executable. Please have a look at the SIRIUS documentation to learn more about them.

Database: Restrict SIRIUS molecular structures and FingerID searches to a database. Search PubChem for most results (many of which may not be relevant in a biological context).
Ion: Select the ion type. Make sure to select the correct charge (positive or negative), and consider specifying the adduct type only if you are certain of it.
Get FingerID: If this box is checked, SIRIUS will also run a FingerID query in the selected database, returning a list of molecules matching your spectrum. If you unselect this setting, only the SIRIUS fragmentation tree is generated.
Use MS1 spectrum: Option to use MS1 level information for the search. Will use the the MS1 scan closest to the retention time of the current MS2 spectrum from the MS2 browser. If multiple MS2 spectra are selected, only one MS1 scan gets used (should be the one corresponding to the first scan in the MS2 scan list (when not using sorting)).
Instrument: Select the type of instrument that was used for data acquisition
Allow elements: Enter Element symbols you want to include in the search without spaces. To limit the maximum number of an element per molecule, add a number in brackets after the Element symbol (e.g. CHNOP[5]S[5])
Sirius Folder: Path to a SIRIUS executable

3.4.1.2 Molecular Formula Prediction

Molecular formula prediction

In this Tab, you can calculate molecular formulas that match the currently selected feature’s m/z value. All settings are passed to the calcMF function from the MassTools package. Molecular formulas are generated with the Rdisop package and can then be filtered using the rules proposed by the “Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry” ((???)), as well as some additional filters. For detailed documentation, click here.

Elements: Elements that are allowed to be in the proposed molecules
Charge: Charge of the ion, can be a positive or negative integer, or zero.
ppm: Maximum relative difference between input and theoretical m/z value
min./max. DBE: Minimum and Maximum number of double bond equivalents (DBE).
Minimum/Maximum Elements: a molecula formula defining the minimum or maximum number of atom counts for a set of elements.
Parity: Must be odd (o), even (e) or either. Correct selection depends on your ionization method.
Element filter: Check each predicted molecula formula for maximum number of elements expected in a natural product of its size (“Golden Rule #1”)
Valence filters: for each predicted molecula formula, calculate the sum of valences minus twice (the number of atoms minus 1) (to check for Senior’s third theorem, “Golden Rule #2”)
H/C ratio: molecular formulas must have a Hydrogen/ Carbon count ratio of >0.2 and <3.1 (“Golden Rule #4”). Will be ignored if no Hydrogen or Carbon present in a molecule.
NOPS/C ratio: molecular formulas must have these atom count ratios: Nitrogen/Carbon <1.3, Oxygen/Carbon <1.2, Phosphorus/Carbon <0.3 and Sulfur/Carbon <0.8 (“Golden Rule #5”). Will be ignored if no Carbon present in a molecule.
Element filter 2: additional element ratio heuristic is applied (various element ratios need to be in the “common range” for natural products, Golden Rule #6")
source: source of the m/z value that you want to analyze. By default (feature table), the m/z value is taken from the current selection in the Feature Table. If spectrum is selected, you can select a peak of interest from the MS1 Spectrum in the Data viewer -> Grouped EICs tab or the MS2 Spectrum from the Data viewer -> MS2 Browser -> Compare MS2 spectra. To feed peaks selected in the MS2 spectrum shown next to the scan table in Data viewer -> MS2 Browser. Select peaks from spectra by pressing Shift + click. Selected peaks will be marked in orange. selecting custom as source will use the m/z value that you specify in the custom m/z field.
calculate automatically: Generate molecular formulas automatically as soon as your source m/z value changes (e.g. every time you change the selection in the Feature Table). This may slow down Metaboseek when looking for high mass molecules.
calculate/calculate all: Generates molecular formulas with the current settings for all m/z values in the Feature Table and adds a column called predicted_MFs to the Feature Table.

3.4.1.3 RT Correction

RT correction

If you load a Project Folder from a finished xcms job that included retention time correction, you can review the effect of retention time correction across your files here. Retention time is plotted on the x-axis for each file, and deviation from the uncorrected retention time is shown on the y-axis. Very large RT deviations or very different behavior between groups of samples can point to problems with your chromatography setup or retention time correction settings.

3.4.1.4 Mass Shifts

Mass shifts

You can define mass shifts that will be shown in the Data viewer -> Grouped EICs window as additional EIC traces (in dashed lines). Mol_formula and charge columns are currently ignored. Click Update mass shifts to update the EIC view and to save your edits to the mass shift table (will be restored in your next session).

3.4.1.5 EIC Options

EIC options

This allows control of various formatting options for the EICs in Data viewer -> Grouped EICs as well as Data viewer -> MS2 Browser -> Feature Report, and in part also for Data viewer -> MS Browser

Mass tol (ppm): Relative mass tolerance in parts per milion (ppm) around the m/z value selected in the Feature Table for generation of extracted ion chromatograms (EICs).
Plots per row: Maximum number of EIC plots per row, will add additional rows of plots if necessary.
TIC: Plot Total Ion Chormatogram (TIC) instead of EIC. Will ignore the m/z value selected in the Feature Table but will use its retention time information.
Relative intensities: Change y-axis labels to percentages instead of raw intensity numbers.
RT window (sec): width of the retention time range in the EIC plots, +/- from the retention time defined in Feature Table rt column.
Full RT range: Show chromatogram for entire retention time range instead of using RT window.
Group by: Use this Group to define which files get plotted in the same EIC plot; Edit grouping information in Data viewer -> Regroup MS data.
Y-axis zoom: Zoom in on the Y-axis (intensity) by this factor. Use this to inspect small peaks that would otherwise not be visible because they are plotted in the same plot as large peaks.
Line width: Thickness of the EIC lines
Mark feature RT: Show a dashed line at the retention time as defined in the Feature Table rt column
Raise EICs: Raise up the y-axis view so you can see sections of the EIC that are close to zero and may be hidden behind the x-axis
Font size: Control the font size for axis labels in EIC plots.
Color palette: Change the color scheme for EIC traces.
Color by: Use this Group to define which files get plotted in the same color; Edit grouping information in Data viewer -> Regroup MS data. Coloring by Mass shift not implemented yet.

3.4.2 Data Viewer

This box provides plots of data that is selected in the Feature Table box. Different kinds of data plots and browsing options are available, mostly for extracted ion chromatograms (EICs) and spectrum plots, but also bar plots, venn diagrams and plots from principal component analysis.

3.4.2.1 MS2 Browser

The MS2 browser

If you have loaded MS data files which contain MS2 (tandem-MS) data, you can go to the MS2 browser for a variety of data analysis options specifically for MS2 data. The MS2 Browser box is the most complex tab in Metaboseek, and its user interface is divided into three parts: The two sub-tabs Feature report and Compare MS2, and a bottom part that is always visible, independent of which sub-tab is selected. This General MS2 Browser part includes the SIRIUS module and the list of MS2 scans associated with the selected row in the Feature Table.

3.4.2.2 Sub-Tabs

You can switch between two views in the MS2 browser:

Feature Report

The Feature Report sub-tab is designed to show all information about a molecular feature at one glance and make it exportable as a single page .pdf document. This includes Grouped EICs at the top (see “Grouped EICs” description for description of the controls for this). If MS2 data is available for a molecular feature selected in the Feature Table, MS1 (left) and MS2 (right) spectra are also shown below the EICs.

You can generate single page reports for a feature, including EICs, MS1 and MS2 spectra and SIRIUS results.

Compare MS2

In this sub-tab, you can compare MS2 spectra with each other. On the left side, you see space for the molecular network viewer. MS/MS spectra are shown on the right side.

Molecular Network Module

Make a new MS2 network: Here, you can analyze MS2 data and find molecular features with similar MS2 spectra, which can indicate structural similarity. Click on the “New MS2 Network” button to get started. First, you need to associate MS2 spectra with molecular features in the Feature Table. This step is identical to the “Find MS2 scans” dialog described in the Advanced Analysis section, and you can skip it if you already ran this analysis step on the current Feature Table and want to use the same MS2 association parameters (m/z and RT tolerance, etc.) as before.

MS2 scans are averaged for each molecular feature, and then the averaged spectra are compared with each other. In the next step, you can select the parameters used for spectrum sililarity calculations. Only peaks that match within the m/z and ppm tolerance between two spectra will be used to calculate the similarity score, and peaks at an intensity below a set percentage of the maximum peak intensity in a scan (Noise level in %) will be excluded. You can also ignore small fragments, an experimental feature that will exclude peaks with m/z < 100 m/z from the spectra, which can be used for instance to exclude phosphate peaks which can be very dominant in negative mode data. The similarity score will be considered 0 if less than min. peaks match between two spectra.

settings for Network generation

The intensities of the matching peaks for each spectrum are extracted, an the similarity score is calculated as the cosine between these intensity vectors, i.e. a simlar relative intensity distribution of intensities is expected for the matching peaks for similar compounds. If you select Use parent masses, neutral losses are also used for peak matching, which will increase the number of matching peaks for compounds that have different parent masses (e.g. because of a methyl group or adduct difference). This step can take minutes or even hours, depending on the number of molecular features with MS2 scans that are compared to each other. For more details, look at the documentation for the makeEdges() and network1() functions from the MassTools package.

Finishing the

After calculation of the similarity scores, you can give your new network a name and select a threshold for which comparisons to keep (above a given Cosine threshold). Stricter (higher) values generate less data and less complicated networks, generally with less netowork clusters. A less strict (lower) Cosine threshold will keep more of the comparison information which you can remove later with the Simplify network button:

You can remove edges (connections) from the network to only see the most significant connections

You can save networks in either the .graphML or .mskg format.

Use the molecular network viewer: If you have loaded or generated a molecular network, you can also display spectra for each network node, as long as the files used to generate the network are loaded into Metaboseek. Navigate from the network overview to a contingent network cluster (or “subgraph”) by clicking on it while holding the SHIFT key. The view will now zoom in on the subgraph. You can use the control menu above the network to show node and edge labels of your choice (e.g. Parent m/z and m/z difference between nodes (“deltaMZ”)), and apply a coloring scheme (e.g. color by default groups). To select a node and display all MS/MS spectra associated with it, click on a node while holding the SHIFT key.

Using the network viewer

You can move nodes by dragging them with your mouse while holding the CTRL key (this helps make all labels visible in a dense network). Return back to the network overview by double clicking on the graph. If double-clicking does not work, you can also zoom out by clicking while holding the Z key.

The processing history for the current network can be viewed with the History button.

Mapping to reference “Match Feature Table” is an experimental beta feature: You can map the current Feature Table on the currently active MS2 network, re-using the network layout. This is still in development and will change over time.

Compare Spectra

In the “MS2 spectra” box on the right, you can choose to keep a spectrum view - it will then not be refreshed when you select a new Feature table entry or network node. Instead, a new spectrum plot will show up below. You can show up to 5 spectrum views at the same time. By default, all peaks that occur in more than one of the shown spectra are highlighted in blue. You can disable this comparison with the Compare checkbox. You can also download the shown spectrum views in .pdf format by clicking Download spectra, or in .tsv format (Save as table).

3.4.2.3 General MS2 Browser

Below the sub-tab selection, you can see these elements:

SIRIUS Module

SIRIUS (Dührkop et al. (2019)) is a stand-alone software developed in the Boecker lab at the University of Jena that can use MS/MS data to predict the molecular formulas of fragment and parent ion peaks. It also offers an interface to CSI:FingerID to match fragmentation patterns with structure databases.

When you first go to the MS2 Browser, this is what it looks like

Information about completed SIRIUS analyses will show up here if available for the active molecular feature from the Feature Table.

3.4.2.3.1 Get Structure Predictions with SIRIUS

MS2 data can be analyzed with SIRIUS from inside the Metaboseek app. All settings for SIRIUS can be found in the Options box. In the Sirius options, you first need to tell Metaboseek where the SIRIUS executable is located (“SIRIUS folder”). Metaboseek will generate a new folder there to store results from Sirius runs. NOTE: Make sure you have write access to the SIRIUS location.

To run Sirius, use the “Run SIRIUS” Button above the MS2 scan table. Make sure to select appropriate options in the Sirius options section at the top of the app Options box. The results can be accessed through Metaboseek as soon as a Sirius analysis run finishes by clicking “Show SIRIUS” in the Spectra list. Select items in the tables that show up to view fragmentation trees and proposed structures. Two buttons for SIRIUS are in the Spectra list section below: The Run SIRIUS button will use the currently selected spectra with the current Sirius options to run a SIRIUS analysis. This will typically take a few seconds. The Show SIRIUS button will show SIRIUS results for the selected MS2 spectra when available. The color of the button indicates if SIRIUS results are available (green), not available (red), or available with settings that differ from the current settings in Sirius options (yellow).

You can select molecular formulas from the SIRIUS result table on the left to display the corresponding fragmentation tree. The annotated fragments will also be highlighted in the Feature report subtab MS2 spectrum view. If you selected Get FingerID in the Sirius options, a list of candidate molecules will show up on the right side. Select one to view the molecular structure. NOTE: Viewing molecular structures requires installation of the rcdk package, which is not included in the Metaboseek Windows installer, and not automatically installed when installing Metaboseek from R.

Click on the Browse SIRIUS searches section to show a list of SIRIUS jobs. Select a job here to look at SIRIUS results independent from the current Feature Table selection.

When you first go to the MS2 Browser, this is what it looks like

Spectra List

Spectra list

When you select one (or multiple) entries in the Feature Table, Metaboseek will find any MS/MS scans that have a parent mass matching the selected Feature Table entry (e.g. within 5 ppm and 200 seconds, customizable). All MS/MS scans matching a selection (from a network or from the Feature Table) are shown in a table in the MS2 browser tab.

You can define the parent ion m/z tolerance (in ppm) and retention time window at the top, allowing you to only show MS2 scans that are within these tolerances from your selection in the Feature Table. You can also sort this table with the controls at the bottom of the table. An average spectrum of all scans shown in this scan table is displayed on the left. You can select single or multiple scans in the scan table to show the spectrum of only the selected scan(s). The MS2 scans selected here are also used and displayed by both, the Feature Report and Compare MS2 sub-tabs.

3.4.2.4 PCA Viewer

Shows interactive plots with results from the principal component analysis (PCA) from the Feature Table Actions Analysis Options if available.

3.4.2.5 Venn Diagrams

This module allow you to filter the Feature Table in up to three different groups and show the number of overlaps between the groups. You can define the grouping by applying up to three different filters to the current Feature Table. The filters work like in the Filter Table Tab

3.4.2.6 Quickplots

In this tab, you can view the data in summary plots. The left side uses the intensity values from the feature table as input, while the right side allows you to plot arbitrary Feature Table columns against each other.

3.4.2.7 MS Browser

Here, you can select individual files to show their EICs for the selected feature or a custom m/z value. You can use SHIFT + click to select a data point to display the corresponding MS1 spectrum below. See the “Navigating plots” section for more information on how to interact with the spectrum and EIC plots.

Add EIC: Add another EIC view

You can display multiple independent EIC views at the same time. Each of them has these settings:

Remove: Remove this EIC view
Download EIC: Generate a .pdf file from this current EIC view
RT correction: Apply retention time (RT) correction to the EICs, if available. RT correction information automatically loaded when opening project folders, but can also be added to the session from the RT correction Tab in the Options box
TIC: Show Total Ion Current (TIC) instead of EIC.
Hotlink mz and rt ranges: If selected, uses the m/z and rt range for this EIC view from the current selection in the Feature Table.
m/z: Displays the m/z used for this EIC view, can be edited if the hotlink is not active
select files: select files for which to show EICs in this EIC view.

Other settings for the EIC plots, such as mass tolerance and color palette, can be changed in the EIC options in the Options box and will apply to all EIC plots in Metaboseek.

3.4.2.8 Grouped EICs

Similar to the MS Browser (see above), but enabling different layouts of grouped EICs. Some plotting parameters can be changed in the EIC options in the Options box, and some can be changed here directly:

RT correction: Apply retention time (RT) correction to the EICs, if available. RT correction information automatically loaded when opening project folders, but can also be added to the session from the RT correction Tab in the Options box
Show spectrum: Shows a spectrum plot underneath the EICs, for the scan with the highest intensity visible in any of the Grouped EICs plots. You can interact with the plot as described here.
Small EICs: Decreases the height of the EIC plots to save vertical space
Rescale: If selected, will rescale all plots so that they are relative to the highest intensity across all current plot groups, instead of relative to the highest intensity within each group.
Subtitle context: Here, you can select which columns from the Feature Table. Their content will be shown in the plots as subtitle.
Save Plot: This will generate a .pdf file with grouped EICs for all features in the Feature Table (after applying the active filters) with grouped EICs for one feature per page. IMPORTANT: The limit for number of features that can be exported to pdf this way is 1000. Make sure you filter your results accordingly before exporting, otherwise only the EIC plots for the first 1000 features get exported.
Active MS grouping: You can switch between different grouping schemes that
you defined in the Regroup MS data Tab.

3.4.2.9 Regroup MS Data

You can group the MS data independently from the grouping in the Feature Table. This grouping can be used to define color schemes or which files should be plotted together in Grouped EICs. It is possible to assign each file to two different groups to allow switching plot layouts using EIC options in the Options box. You can define multiple grouping schemes here with the ‘new Grouping’ and ‘Update Grouping’ buttons and switch between these schemes from the Grouped EICs Tab.

3.4.3 Feature Table

The Feature Table

This box contains the most important element in the app: the Feature Table. Most plots in the Data Viewer will use this table as input to show you information that is related to the molecular feature that is defined in the selected row.

sort: switch sorting the table on or off
decreasing: sort in decreasing or increasing order
Sort by column: which column to sort by
page: select which page to show. You can change the number of items per page in the Global Options (Navigation bar at the top of the app)
Save Table: You can save the Feature Table in multiple formats, and either download it through your browser, or save it in an automatically generated subfolder of your project folder (if you are working with xcms results from a Metaboseek project folder). The recommended format is .mskFT, because it retains the processing history information for this Feature Table when you load it back into Metaboseek. For export in other software, use the .csv format. You can also generate inclusion or exclusion lists for Thermo instruments.
History: Shows you the processing history of the currently active Feature Table, including the processing history from the xcms run (if all steps were done in Metaboseek, and the Feature Table was loaded in .mskFT format):
Active Table: Select which Feature Table to display in the Feature Table box. You can load multiple Feature tables and switch between them here (e.g. tables filtered for different criteria)
Rename: Rename the currently active Feature Table (names of Feature Tables in the current session are displayed in the ‘Active Table’ selection box)

3.4.4 Feature Table Actions

In this box you can run analyses on the currently selected Feature Table and filter it.

3.4.4.1 Special Columns in the Metaboseek Feature Table

Some column names and name schemes are generated by the actions you can take in the Analyze Table Tab. You can use these columns to filter your Feature Table in the Filter Table Tab.

Column	Description	calculated by
{Group}__foldOverCtrl	Fold change of the mean intensity of G1 over mean intensity of the control group	Basic analysis
{Group}__foldOverRest	Fold change of the mean intensity of G1 over mean intensities of all other samples outside of G1	Basic analysis
{Group}__meanInt	sample group mean intensity	Basic analysis
{Group}__minFold	Fold change of the lowest intensity sample in G1 over the highest intensity sample of all other samples outside of G1	Basic analysis
{Group}__minFoldMean	Fold change of the mean intensity of G1 over the highest intensity sample of all other samples outside of G1	Basic analysis
{Group}__pval	p value between this group and all samples in all other groups, as calculated by stats::t.test()	t-test
{Group}__pval_adj	p values, adjusted by the “bonferroni” method using stats::p.adjust()	t-test
{Group}__sdev	coefficient of variatioin (relative standard deviation = sd/mean) within the group	t-test
{IntensityColumn}__norm	Normalized intensity values for an intensity column, typically named {Sample}__norm or {Sample}__XIC__norm; by default : (1) replaces zeros by lowest value in entire intensity dataset (assuming it represents the detection limit), (2) intensity values of each column are adjusted so that each column has the mean intensity equal to the mean intensity of the entire dataset.	Normalize data
{Sample}	Columns with intensities as reported by xcms	xcms script
{Sample}__XIC	Columns with intensities as calculated by Metaboseek, can have a suffix other than "_XIC" if generated with the “Get intensities” button.	Get Intensities, xcms script,
ANOVA__pvalue	per-row one-way ANOVA between grouped columns of the feature table	anova
best_minFold	The highest minFold value reported across all groups	Basic analysis
best_minFoldCtrl	The highest minFoldCtrl value reported across all groups	Basic analysis
best_minFoldMean	The highest minFoldMean value reported across all groups	Basic analysis
cluster__clara	listing the cluster into which each feature falls after using cluster::clara() with the selected number of clusters	clara cluster
massdefppm	Mass defect in ppm, calculation: ((mz-floor(mz))/mz)*1e6	Basic analysis
maxfold	Highest fold change between the mean intensities of any 2 sample groups	Basic analysis
maxfoldover2	Fold change between the mean intensities of topgroup over group with second highest mean intensity	Basic analysis
maxint	Maximum intensity across all samples	Basic analysis
MS2scans	Lists the MS2 scans found for each feature in the MS2 data loaded when the “Find MS2 scans” analysis was run	Find MS2 scans
mzMatch_{variable}	All other columns from the compound list are added as well with the prefix mzMatch_.	mzMatch
mzMatchError	ppm error, based on difference between mz in the feature table and the mz in the compound list.	mzMatch
mzMatches	identity of the seach hits in the selected compound lists, taken from the “id” column of the compound lists. If there are multiple hits across the selected compound lists, they will be separated by “\|” within the mzMatches, mzMatchError and all mzMatch_{variable} columns.	mzMatch
PCA__#	Principal component coordinates for each feature (components are numbered)	PCA features
topgroup	Sample group with highest mean intensity	Basic analysis

3.4.4.2 Filter Table

Filter Table

You can filter the Feature Table here by specifying a column and filter criteria. Columns containing text can be filtered for text patterns, and numeric columns for values within a range. You can define an arbitrary number of filters and it is possible to activate or deactivate individual filter steps. IMPORTANT: when you save a Feature Table, the currently active filters will be applied before saving.

3.4.4.3 Analyze Table

Filter Table

The Analyze Table tab is the central hub for data analysis on your Feature Table. Most analysis steps will generate new columns in the Feature Table which you can then use to filter your table to get to your features of interest. See below for a guide to the columns generated by the analysis steps.

3.4.4.3.1 Analysis Options

For more in-depth information on the underlying functions in R, see the Metaboseek::analyzeFT documentation.

Normalize data: Select this option ONLY if the current table has not been filtered and is the result of an unbiased xcms analysis. “Normalization” will make a copy of the current intensity columns with the suffix "__norm". In these new columns, all zero intensity values will be set to the lowest non-zero value across all sample intensity columns (assuming it represents the detection limit). Then, a normalization factor is applied so that the average intensiy of each individual column is the same and equal to the average intensity across all columns prior to normalization. If you do this on a table that has been pre-filtered, for instance containing only features that are upregulated in one sample group, this will fatally distort the data!
Use normalized data: Use the normalized data for when running analyses that use intensity values: Basic Analysis, anova, t-test, PCA, clara cluster
Select control group: Select a sample group that is the control (in Basic Analysis, all sample groups are compared to this group).
Apply log10: When checked, will apply log10 to the "__norm columns" (see above) after normalization of intensity values.

3.4.4.3.2 Basic Analysis

Basic analysis: Selecting this option will calculate a set of fold changes between sample groups and some summary information columns such as maxint. See below for a description of all columns generated by this analysis step.
clara cluster: cluster the feature table with cluster::clara()
anova: Calculate per-row one-way ANOVA between grouped columns of the feature table. NOTE: Equal variance is not assumed (uses stats::oneway.test), returns NaN in cases where one group has all equal values (no variance, e.g. if all values are 0).
t-test: calculate t-test between samples. Works only if there are two groups in grouping with multiple members.
PCA features: Perform Principal Component Analysis (PCA) of features (does not require grouping information). will add columns to the feature table.
PCA samples: Calculate Principal Component Analysis to cluster samples based on the intensity columns (does not require grouping information). sample PCA information is not stored in the viewable feature table, but is saved as part of an .mskFT file.
mzMatch: Match the m/z values of your featureTable to a list of known compounds. Note that these matches based on MS1 data alone are ambiguous. Will generate multiple columns in the Feature Table, as described below.
Peak shapes: Tries to match the EIC for each feature in each sample to a curve and calculates a fit score between 0 (no fit) and 1 (best fit).
Fast peak shapes: Recommended way to score peak shapes. Tries to match the EIC for each feature in the sample with the highest intensity for each feature to a curve and calculates a fit score between 0 (no fit) and 1 (best fit). Much faster than “Peak shapes”, with equivalent or better results.

3.4.4.3.3 Advanced Analysis

Get intensities: For each molecular feature, an EIC is generated across all MS files currently loaded in the MS data layout. The retention time boundaries of the EIC can be chosen to be seconds around the features retention time (rt) or around its peak boundaries as reported in the rtmin and rtmax columns of the feature table. If retention time correction information is used, the EIC retention time window is moved accordingly for each file. Intensities within this EIC range are averaged and reported for each file. Alternatively, the peak areas can be calculated instead of the average intensities, leading to results that are more easily comparable to xcms-based intensities.
Find MS2 scans: Find MS2 scans corresponding to each feature in the Feature Table. Allows setting of m/z and RT tolerances, will add a column to the Feature Table with information about the scans. This column is used by the MS2 browser Module to identify feature-specific spectra for MS2 networking.
MS2 patterns: Allows to search for combinations of MS2 fragment peaks in all loaded MS data files.
Labelfinder: Find stable isotope labeled compounds in datasets containing labeled and unlabeled samples.

For the Labelfinder, follow these steps:

Run two xcms analyses independently for the labeled and the unlabeled samples.
Load the results from both analyses into the Metaboseek session (potentially use the renaming functionality in the Feature Table box to keep track of which results come from the labeled and unlabeled samples).
Make sure to also load all MS files into the session, for both labeled and unlabeled samples.
Select the unlabeled Feature Table as active table in the Feature Table box
Open the Labelfinder dialog and select the labeled sample feature table.
Read the tooltips on the settings for explanations on the individual settings. You can deselect samples from both the labeled and unlabeled feature tables if necessary
Press Go to start the Labelfinder analysis. This will generate a new Feature Table with likely labeled compounds using the the selected name (by default has “Labelfinder_” as a prefix). The unlabeled features will be reported in the resulting table.
To browse the results, you can add the label m/z of interest to the Options -> Mass shifts. This will allow you to see overlays of EICs for the labeled and unlabeled compounds. Note that you may have to manually load additional raw files (e.g. those for the labeled samples) to display all relevant information.

Click here for details on the Labelfinder algorithm

The findLabels() function compares two Feature Tables with each other, assuming that one of them contains an enrichment of labeled compounds.

In a first step, featlistCompare() is used to identify entries in the reference (unlabeled) Feature Table which have a corresponding, labeled feature in the comparison (labeled) Feature Table (m/z in comparison Feature Table should be within tolerance of reference m/z + expected label and also within retention time tolerance).

Each entry from the reference Feature Table (dubbed I1S1, for Isotopologue 1, Sample Group 1) can have multiple matches in each of these categories: 1. m/z + label match in reference table (I2S1) 2. m/z match in comparison table (I1S2) 3. m/z + label in comparison table (I2S2)

For each match, only the match closest in retention time to I1S1 is kept for further processing. Intensities are re-extracted for all matched peaks (I1S1, I2S1, I1S2, I2S2), using the m/z values identified for I1S1 (for I1S1 and I1S2) and I2S2 (for I1S2 and I2S2), and the rt values for I1S1 (for I1S1 and I2S1) and I2S2 (for I2S2 and I2S2). The extracted intensities are used to calculate mean intensity across the unlabeled (S1) and labeled (S2) samples for both isotopologs.

Key filter criteria that are user-controlled are the minimum ratio of I1S1/I2S1 (because a high ratio is expected in the unlabeled sample S1 where the unlabeled compound I1 is expected to be more abundant than the labeled compound) and the maximum ratio of I1S2/I2S2 (where a low value is indicative of the label being enriched). The Features from the reference Feature Table which meet the filter criteria are then exported to a new Feature Table that contains intensity information for I1S1, I2S1, I1S2 and I2S2. The reported m/z and rt values are directly carried over from the original reference Feature Table.

Find peaks: For each m/z value in the feature table, an EIC for the full retention time range is generated, and a simple peak detection algorithm is applied to identify maxima that stand out from background noise.

Click here for details on the peak detection algorithm

the peakDetect() function uses a modified version of an algorithm presented by Ma et al.24 as follows: For the global noise level, let $N $ be the number of EIC data points, and $S_{i}$ the intensity value of the $i^{th}$ data point. $K$ is a user definable variable. $GlobalNoiseThreshold = (GlobalMaximum + GlobalAverage)/100 + K * Deviation$ where : $GlobalAverage = \displaystyle \frac{\sum_{i=1}^N|S_{i}|}{N}$; $Deviation = \displaystyle \frac{\sum_{i=1}^N|S_{i} - GlobalAverage|}{N}$ In addition to the global noise threshold, a local noise threshold is calculated for each data point $S_{i}$ in the EIC, using a similar equation limited to a small retention time window around $S_{i}$. Let $n$ be the number of scans to consider for local noise level calculation in each direction, and $noise_{i}$ the local noise level for a data point in the EIC. $noise_{i} = (LocalMaximum + LocalAverage)/2 + K * Local Deviation$ $LocalAverage = \displaystyle \frac{\sum_{i-n}^{i+n}|S_{i}|}{2n + 1}$; $LocalDeviation = \displaystyle \frac{\sum_{i-n}^{i+n}|S_{i} - LocalAverage|}{2n + 1}$ In a first step, all local maxima and their adjacent minima in an EIC are detected, and peak boundaries are defined by the two minima surrounding a maximum. Peaks are selected if their maximum is above both the local noise level at its position in the EIC and above the global noise level. If two peaks are adjacent, and the local minimum that separates them is at least 1/3 the intensity of either peak maximum, these two peaks are merged. Additional filters include selection for peaks spanning at least a given number of scans, and a factor by which a peak maximum has to be above the average intensity inside the peak boundaries. Peaks are merged between files by first matching peaks with maxima within a specified retention time window. The peak boundary and maximum position are then calculated from the weighted average boundary and maximum positions of all peaks that are matched, weighted by the maximum intensity of each peak.

Calculate m/z: Allows to calculate m/z values from molecular formulas that are written out in a column of the Feature Table (without charge).

3.4.4.4 Regroup Table

Regroup Table

This tab allows you to redefine the columns containing intensity values and how they are grouped.

3.4.5 Navigating Plots

Many plots in Metaboseek are interactive and allow you to get more information by selecting the elements they display. Mass spectra, some EIC plots and the network module plot are interactive. To zoom in, drag your mouse while holding the left mouse button. A selection square will appear, and you can double click to zoom in. To zoom out, double click on the plot without selecting anything. NOTE: Double-clicks currently do not work on some computers, so you can alternatively click while holding the CTRL key to zoom in or out. In the Molecular Network view, hold the Z key while clicking to zoom out instead. To highlight a peak in a spectrum, select a time point in an EIC, a subnetwork or node in a network, hold the left SHIFT button and click on your datapoint of choice. Some plots allow export of the current view in .pdf or text format. In Spectra, the selected peak is highlighted, and when you mouse over other peaks, you can see the mass difference to the highlighted plots. You can also link the peak selection to the Molecular formula prediction Tab in the Options box to get a list of possible molecular formulas for it.

3.5 XCMS Analysis

This section will help you to set up an xcms analysis in Metaboseek in order to identify LC/MS features that are differential between sets of data files. This can, for instance, be useful to assess the impact of a mutation on the metabolome of an organism or to identify compounds associated with the activity of an enzyme.

Running an xcms analysis - a description of the highlighted steps is below.

Select a folder with MS data files. All files with supported file extensions in the selected folders and all its subfolders will be listed, so it makes sense to pre-sort your files in a reasonable folder structure:

All files should be acquired under comparable conditions, especially with the same polarity. Differences in LC gradient or general composition (e.g. through widely different extraction methods, or comparing samples and blanks) can also make it difficult to apply retention time correction and find differential features.

There are 7 tables with xcms settings you can change here. Navigate through them with the drop down menu highlighted as (2.). A short description for each parameter is given when you hover over the table entries. You can use the default settings and proceed to step 3 without changing any of them. The default is for highly similar LC/MS runs acquired at high resolution and high accuracy (< 5 ppm), and will find relatively small peaks (even if they only occur in a single replicate). While these settings allow for detection of small peaks, the processing time is relatively long and many false-positives (non-peaks) will also end up in the feature table.

Click here for details about the xcms settings

Peak Detection: set parameters for the xcms::findChromPeaks() function
- Peak Filling: These settings specify how to look for intensities for molecular features in all files, even in files where no peak was detected for that feature in the initial Peak Detection step. The xcms peak filling parameters will be used if you select the “Fill peaks with xcms…” output option below. Technically, you are setting parameters for the
  xcms::fillChromPeaks() function. You can also set parameters for the Metaboseek peak intensity functions here, which will extract intensities for all molecular features in all files.
- Feature Grouping: set the parameters for how xcms will group peaks from different files together (also known as correspondence analysis) so that intensities can be compared across files. These parameters are used for a call to xcms::groupChromPeaks with xcms::PeakDensityParam.
- Output files: select which output files you want to get. The values in this table can more conveniently be set in the user interface below the tables (“Output selection” section).
- CAMERA settings: Settings for isotope peak and adduct annotation with the CAMERA package. Metaboseek sequentially runs the CAMERA package functions xsAnnotate, groupFWHM, groupCorr, findIsotopes and findAdducts which are described in the CAMERA documentation.
- RT correction: Settings for retention time correction, using xcms::adjustRtime either using the Obiwarp or the peakGroups method. If Obiwarp is selected and fails, the xcms runner script will attempt to run peakGroups with the given paramters.

Start the analysis with a click on the “Start analysis!” button.

Once the analysis is running, Metaboseek will generate a Project Folder for you, containing settings and results from your xcms run. You can load the Project Folder back into Metaboseek to keep all your analysis results in one place. See Project Folders for more information.

You can save settings as a .zip file (on windows computers, 7-zip or other software allowing for the zip command line prompt must be installed), or load a .zip file with settings from a previous run.

Note that loading settings will override your selection of MS data files. If you want to apply the settings to a new set of data files, load the settings first and then select a folder (step 1).

References

Dührkop, Kai, Markus Fleischauer, Marcus Ludwig, Alexander A. Aksenov, Alexey V. Melnik, Marvin Meusel, Pieter C. Dorrestein, Juho Rousu, and Sebastian Böcker. 2019. “SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.” Nature Methods 16 (4): 299–302. https://doi.org/10.1038/s41592-019-0344-8.

Getting started with Metaboseek

Maximilian Helf

April 2020

1 Metaboseek Documentation

2 Install Metaboseek

2.1 System Requirements

2.1.1 Java

2.2 Install on Windows

2.2.1 Using the Installer

2.2.2 Using a .zip File

2.3 Install on Mac / Linux

2.4 Get the Docker Image

2.5 Experienced R users (Windows, Mac or Linux):

2.6 Use the web version

3 Data Analysis with Metaboseek

3.1 Overview

3.2 Navigation Bar Items

3.2.1 Interface Buttons

3.2.2 Functional Buttons

3.2.2.1 Load MS Data, Feature Tables or Sessions

3.2.2.2 Save Session

3.2.2.3 Global Options

3.3 Start Page / Loading Data

3.3.1 Load Feature Tables

Loading MS Data Files Directly.

3.3.2 Load a Metaboseek Project Folder.

3.3.2.1 Load Example Data

3.3.3 Load a Metaboseek Session

3.3.4 Supported File Types

3.4 Data Explorer

3.4.1 Options Box

3.4.1.1 Sirius Options

3.4.1.2 Molecular Formula Prediction

3.4.1.3 RT Correction

3.4.1.4 Mass Shifts

3.4.1.5 EIC Options

3.4.2 Data Viewer

3.4.2.1 MS2 Browser

3.4.2.2 Sub-Tabs

Feature Report

Compare MS2

Molecular Network Module

Compare Spectra

3.4.2.3 General MS2 Browser

SIRIUS Module

3.4.2.3.1 Get Structure Predictions with SIRIUS

Spectra List

3.4.2.4 PCA Viewer

3.4.2.5 Venn Diagrams

3.4.2.6 Quickplots

3.4.2.7 MS Browser

3.4.2.8 Grouped EICs

3.4.2.9 Regroup MS Data

3.4.3 Feature Table

3.4.4 Feature Table Actions

3.4.4.1 Special Columns in the Metaboseek Feature Table

3.4.4.2 Filter Table

3.4.4.3 Analyze Table

3.4.4.3.1 Analysis Options

3.4.4.3.2 Basic Analysis

3.4.4.3.3 Advanced Analysis

3.4.4.4 Regroup Table

3.4.5 Navigating Plots

3.5 XCMS Analysis

References