## [1] "Metaboseek version: 0.9.9.0"
Metaboseek offers a graphical user interface to set up data analysis with the xcms
package to detect and align molecular features from LC/MS data across multiple samples. You can then load xcms results into the app as a “Feature Table” (using xcms
and MSnbase
packages, mzR
-based) and run statistical analyses to identify molecular features of interest.
Filter the xcms results, view and export chromatograms and mass spectra for molecular features of interest.
Generate and view molecular networks based on tandem-MS spectrum similarity between molecular features (using MassTools
and igraph
packages).
Annotate fragments in tandem-MS spectra with SIRIUS.
This document describes all UI elements in Metaboseek and is meant to be a comprehensive user manual.
Recommended minimal system requirements:
We recommend computers with a monitor with at least full HD (1920 x 1080 pixels) resolution. You can use the zoom function of your web browser to scale the interface to your liking.
All files are loaded into memory, so that browsing will be very quick: It is easy to look at extracted ion chromatograms (EICs) for many MS features of interest across dozens of files within a fraction of a second. However, the initial loading of the data will take some time, and you may experience issues if you load many files at a time. We strongly recommend using centroided data files, as they will have a smaller memory footprint. Loading 50 data files from 20-minute high resolution LC/MS data acquisition should not be a problem on a computer with 16 GB of RAM.
If installed from an R session, Metaboseek will require Java to be installed for full functionality (in particular, molecular structure plotting in the SIRIUS module. Java is also a requirement for installing SIRIUS itself). Make sure to install 64-bit Java if you are running 64-bit R (which is most likely), or 32-bit Java if you are running 32-bit R. If you go to java.com and follow the download buttons there, it will send you to download the version that corresponds to your browser (32- or 64-bit) by default, which may or may not be the version you need. Get the appropriate Java version from this page: https://www.java.com/en/download/manual.jsp.
The installer version of Metaboseek has one limitation: it does not plot molecular structures for predicted structures in the SIRIUS module. This is a compromise made so that this installation of Metaboseek does not require Java to be installed on your system.
The .zip version of Metaboseek has one limitation: it does not plot molecular structures for predicted structures in the SIRIUS module. This is a compromise made so that this installation of Metaboseek does not require Java to be installed on your system.
Consider getting the Metaboseek Docker image, or follow these steps to install Metaboseek:
Terminal
window:xcode-select --install
source("http://metaboseek.com/files/install_Metaboseek.R")
Metaboseek::runMseek()
remove.packages('rcdk')
Then try to run Metaboseek again.
As they put it on their website, “Docker provides a way to run applications securely isolated in a container, packaged with all its dependencies and libraries.”. This is also a convenient way to reproduce analysis results that were generated with a particular version of Metaboseek. Once you have set up Docker on your computer, this is the easiest and most reproducible way to get fully functional Metaboseek, including SIRIUS integration.
docker pull mjhelf/metaboseek
The metaboseek Docker image is based on the bioconductor/release_metabolomics2
image. 3. Running this command will execute the latest version of the Metaboseek container (and download it if not already available on your computer):
docker run -d -v HOSTFOLDER:/home/shiny/data -p 3840:80 -e PASSWORD=YOURPASSWORD mjhelf/metaboseek
Lets take a look at some key settings here:
HOSTFOLDER
should be the path of a folder on your computer that contains all data that you want to analyze with Metaboseek, for example if used like this:docker run -d -v /home/user123:/home/shiny/data -p 3840:80 -e PASSWORD=YOURPASSWORD mjhelf/metaboseek
NOTE: The apps hosted inside the container will be accessible from the internet (for anyone connecting to your computer’s IP address and the correct port number). By default, they will be protected by HTTP basic authentication, but that is not 100% secure. Once authenticated, the apps allow seeing the data structure of the specified HOSTFOLDER, and it is possible to download arbitrary .csv files and MS data from that folder. We are not liable for any data exposure to unauthorized parties or other damages.
All contents of the /home/user123
folder will be acessible in Metaboseek. * -p 3838:3838
means that port 3838 from the container will be accessible as port 3840 on the host computer.
-e PASSWORD=YOURPASSWORD
this password has to be set. It can be used if you want to access rstudio inside the container, and is necessary to access the Metaboseek apps.
You can disable authentication for the apps by adding -e PROTECTED=false
to this command, for instance to provide convenient public access to your data. WARNING: This makes the apps accessible from the internet (see note above)
docker ps
localhost:3840
, where the port number after the colon may differ based on your -p
setting (see above). By default, you will have to log in, with the username metaboseek
and the password you specified (YOURPASSWORD
in our example). This will open a website, hosted inside the metaboseek container. Select the app you want to run and analyze your data!If you have installed R (and the devtools package) already, you can install Metaboseek like this:
devtools::install_github("mjhelf/MassTools")
devtools::install_github("mjhelf/Metaboseek")
If you want to make sure you get all the required packages, run the install script with this line:
source("http://metaboseek.com/files/install_Metaboseek.R")
If you have trouble installing Metaboseek and want to just try it out with an example dataset, use the web version.
With Metaboseek, you can quickly visualize data from batches of high-resolution LC/MS data files and find differences between groups of samples. It is not necessary to do any analysis before looking at your data, but a typical workflow starts with a data analysis step:
Then, you can use Metaboseek to browse the data, find molecular features of interest, predict the molecular formula and make structure predictions based on MS2 data.
Metaboseek is structured into two major sections, the Data Explorer
section for visualization and statistical analysis tasks, and the XCMS analysis
to identify LC/MS features in MS data files. You can switch between these sections with the navigation menu on the left of the screen.
The Start
page provides you with information about the newest version of Metaboseek, and also allows you to load data into Metaboseek. You can also click on the Load
icon on the left side of the navigation bar at the top of the page to get the same set of options for loading data:
You can load any .csv or .mskFT file into Metaboseek. You can then go to the “Regroup Table” tab to specify or change the columns that contain intensity values. Feature Tables contain the results from feature detection with xcms, along with results from statistical analysis. If you load an .mskFT
file, important metadata, such as processing history and sample grouping are loaded along with the result table. If you have loaded a project folder into the current session, there is a convenient option to select all compatible table files from the project folder as well.
All files with supported file extensions in the selected folders and all its subfolders can be imported, either by selecting files individually (selecting multiple files at a time is possible), or by importing an entire folder that contanis MS data (will import all compatible files from all subfolders, too). To save time, it makes sense to pre-sort your files in a reasonable folder structure (e.g. separate positive mode data from negative mode so you don’t get both kinds when selecting a folder to load into Metaboseek). Loading MS data files after you have already loaded a project folder allows you to visually inspect files that you had excluded from the xcms analysis, such as blanks.
When you run xcms through Metaboseek, the program generates a project folder that contains the results from that xcms analysis run, and all settings that were used in it. In addition, all output feature tables you requested will be saved in the project folder during the xcms run. You can load this result folder into Metaboseek, making it easier to keep all analysis results related to this xcms run in one place.
You can either select a project folder anywhere on your computer, or select a project folder from the recent project selection window that lists the most recently used project folders (load the selected folder with the Load Recent
button). If you chose to load a project folder, all MS data files from the xcms run will be loaded and sample grouping information from the xcms analysis will be applied. Metaboseek will ask you which feature table you want to load from the project folder. If you select an .mskFT
file (recommended) instead of the corresponding .csv
file, you will benefit from the additional information embedded in these files. .csv
files are primarily there for export and viewing in other tools (and even Microsoft Excel), while .mskFT
files are designed to be loaded back into Metaboseek. The advantage of .mskFT
files is that they contain the complete processing history (including settings used for the xcms run, CAMERA analysis and post-processing). .mskFT
files are technically .RDS
files containing an MseekFT
object and can be loaded into any R session with the readRDS()
function.
You can select “example_projectfolder” from the “Recent projects” selection box and click on “Load recent”. Metaboseek will ask you which table you would like to load into the session along with the MS data that is associated with the exammple project folder.
You can load a Metaboseek session that you saved previously in an .msks
file. This will restore all feature tables and MS data files you had loaded into that session along with many of the layout settings. Note: This will currently only work if the MS data file locations have not changed from the paths used in the old session. Some aspects of the session will not be restored (notably, molecular networks are not saved in the session file).
Metaboseek uses the MSnbase
and xcms
packages to load MS data files of the following formats.:
Note: Data needs to be centroided.
Feature Tables can be loaded in these formats:
mz
with m/z values and a column rt
with retention time values in seconds.At the heart of Metaboseek is the interaction between data visualization in the “Data viewer” box, and a table of LC/MS data features in the Feature table
box
This box provides a number of optional functionality, including setting up SIRIUS, calculating molecular formulas and controling the appearance of extracted ion chromatograms (EICs) in the Data Viewer
box.
The settings here are passed on the the SIRIUS executable. Please have a look at the SIRIUS documentation to learn more about them.
Database: Restrict SIRIUS molecular structures and FingerID searches to a database. Search PubChem for most results (many of which may not be relevant in a biological context).
Ion: Select the ion type. Make sure to select the correct charge (positive or negative), and consider specifying the adduct type only if you are certain of it.
Get FingerID: If this box is checked, SIRIUS will also run a FingerID query in the selected database, returning a list of molecules matching your spectrum. If you unselect this setting, only the SIRIUS fragmentation tree is generated.
Use MS1 spectrum: Option to use MS1 level information for the search. Will use the the MS1 scan closest to the retention time of the current MS2 spectrum from the MS2 browser. If multiple MS2 spectra are selected, only one MS1 scan gets used (should be the one corresponding to the first scan in the MS2 scan list (when not using sorting)).
Instrument: Select the type of instrument that was used for data acquisition
Allow elements: Enter Element symbols you want to include in the search without spaces. To limit the maximum number of an element per molecule, add a number in brackets after the Element symbol (e.g. CHNOP[5]S[5]
)
Sirius Folder: Path to a SIRIUS executable
In this Tab, you can calculate molecular formulas that match the currently selected feature’s m/z value. All settings are passed to the calcMF
function from the MassTools
package. Molecular formulas are generated with the Rdisop
package and can then be filtered using the rules proposed by the “Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry” ((???)), as well as some additional filters. For detailed documentation, click here.
feature table
), the m/z value is taken from the current selection in the Feature Table. If spectrum
is selected, you can select a peak of interest from the MS1 Spectrum in the Data viewer
-> Grouped EICs
tab or the MS2 Spectrum from the Data viewer
-> MS2 Browser
-> Compare MS2
spectra. To feed peaks selected in the MS2 spectrum shown next to the scan table in Data viewer
-> MS2 Browser
. Select peaks from spectra by pressing Shift + click. Selected peaks will be marked in orange. selecting custom
as source will use the m/z value that you specify in the custom m/z
field.Feature Table
). This may slow down Metaboseek when looking for high mass molecules.predicted_MFs
to the Feature Table
.If you load a Project Folder
from a finished xcms job that included retention time correction, you can review the effect of retention time correction across your files here. Retention time is plotted on the x-axis for each file, and deviation from the uncorrected retention time is shown on the y-axis. Very large RT deviations or very different behavior between groups of samples can point to problems with your chromatography setup or retention time correction settings.
You can define mass shifts that will be shown in the Data viewer
-> Grouped EICs
window as additional EIC traces (in dashed lines). Mol_formula and charge columns are currently ignored. Click Update mass shifts
to update the EIC view and to save your edits to the mass shift table (will be restored in your next session).
This allows control of various formatting options for the EICs in Data viewer
-> Grouped EICs
as well as Data viewer
-> MS2 Browser
-> Feature Report
, and in part also for Data viewer
-> MS Browser
Feature Table
for generation of extracted ion chromatograms (EICs).Feature Table
but will use its retention time information.Feature Table
rt
column.Data viewer
-> Regroup MS data
.Feature Table
rt
columnData viewer
-> Regroup MS data
. Coloring by Mass shift
not implemented yet.This box provides plots of data that is selected in the Feature Table
box. Different kinds of data plots and browsing options are available, mostly for extracted ion chromatograms (EICs) and spectrum plots, but also bar plots, venn diagrams and plots from principal component analysis.
If you have loaded MS data files which contain MS2 (tandem-MS) data, you can go to the MS2 browser
for a variety of data analysis options specifically for MS2 data. The MS2 Browser
box is the most complex tab in Metaboseek, and its user interface is divided into three parts: The two sub-tabs Feature report
and Compare MS2
, and a bottom part that is always visible, independent of which sub-tab is selected. This General MS2 Browser
part includes the SIRIUS module and the list of MS2 scans associated with the selected row in the Feature Table
.
You can switch between two views in the MS2 browser
:
The Feature Report
sub-tab is designed to show all information about a molecular feature at one glance and make it exportable as a single page .pdf document. This includes Grouped EICs at the top (see “Grouped EICs” description for description of the controls for this). If MS2 data is available for a molecular feature selected in the Feature Table
, MS1 (left) and MS2 (right) spectra are also shown below the EICs.
In this sub-tab, you can compare MS2 spectra with each other. On the left side, you see space for the molecular network viewer. MS/MS spectra are shown on the right side.
Feature Table
. This step is identical to the “Find MS2 scans” dialog described in the Advanced Analysis section, and you can skip it if you already ran this analysis step on the current Feature Table
and want to use the same MS2 association parameters (m/z and RT tolerance, etc.) as before.MS2 scans are averaged for each molecular feature, and then the averaged spectra are compared with each other. In the next step, you can select the parameters used for spectrum sililarity calculations. Only peaks that match within the m/z and ppm tolerance between two spectra will be used to calculate the similarity score, and peaks at an intensity below a set percentage of the maximum peak intensity in a scan (Noise level in %
) will be excluded. You can also ignore small fragments
, an experimental feature that will exclude peaks with m/z < 100 m/z from the spectra, which can be used for instance to exclude phosphate peaks which can be very dominant in negative mode data. The similarity score will be considered 0 if less than min. peaks
match between two spectra.
The intensities of the matching peaks for each spectrum are extracted, an the similarity score is calculated as the cosine between these intensity vectors, i.e. a simlar relative intensity distribution of intensities is expected for the matching peaks for similar compounds. If you select Use parent masses
, neutral losses are also used for peak matching, which will increase the number of matching peaks for compounds that have different parent masses (e.g. because of a methyl group or adduct difference). This step can take minutes or even hours, depending on the number of molecular features with MS2 scans that are compared to each other. For more details, look at the documentation for the makeEdges()
and network1()
functions from the MassTools
package.
After calculation of the similarity scores, you can give your new network a name and select a threshold for which comparisons to keep (above a given Cosine threshold
). Stricter (higher) values generate less data and less complicated networks, generally with less netowork clusters. A less strict (lower) Cosine threshold
will keep more of the comparison information which you can remove later with the Simplify network
button:
You can save networks in either the .graphML or .mskg format.
SHIFT
key. The view will now zoom in on the subgraph. You can use the control menu above the network to show node and edge labels of your choice (e.g. Parent m/z and m/z difference between nodes (“deltaMZ”)), and apply a coloring scheme (e.g. color by default groups). To select a node and display all MS/MS spectra associated with it, click on a node while holding the SHIFT
key.You can move nodes by dragging them with your mouse while holding the CTRL
key (this helps make all labels visible in a dense network). Return back to the network overview by double clicking on the graph. If double-clicking does not work, you can also zoom out by clicking while holding the Z
key.
The processing history for the current network can be viewed with the History
button.
“Match Feature Table” is an experimental beta feature: You can map the current Feature Table
on the currently active MS2 network, re-using the network layout. This is still in development and will change over time.
In the “MS2 spectra” box on the right, you can choose to keep
a spectrum view - it will then not be refreshed when you select a new Feature table
entry or network node. Instead, a new spectrum plot will show up below. You can show up to 5 spectrum views at the same time. By default, all peaks that occur in more than one of the shown spectra are highlighted in blue. You can disable this comparison with the Compare
checkbox. You can also download the shown spectrum views in .pdf format by clicking Download spectra
, or in .tsv format (Save as table
).
Below the sub-tab selection, you can see these elements:
SIRIUS (Dührkop et al. (2019)) is a stand-alone software developed in the Boecker lab at the University of Jena that can use MS/MS data to predict the molecular formulas of fragment and parent ion peaks. It also offers an interface to CSI:FingerID to match fragmentation patterns with structure databases.
Information about completed SIRIUS analyses will show up here if available for the active molecular feature from the Feature Table
.
MS2 data can be analyzed with SIRIUS from inside the Metaboseek app. All settings for SIRIUS can be found in the Options box
. In the Sirius options
, you first need to tell Metaboseek where the SIRIUS executable is located (“SIRIUS folder”). Metaboseek will generate a new folder there to store results from Sirius runs. NOTE: Make sure you have write access to the SIRIUS location.
To run Sirius, use the “Run SIRIUS” Button above the MS2 scan table. Make sure to select appropriate options in the Sirius options
section at the top of the app Options box
. The results can be accessed through Metaboseek as soon as a Sirius analysis run finishes by clicking “Show SIRIUS” in the Spectra list. Select items in the tables that show up to view fragmentation trees and proposed structures. Two buttons for SIRIUS are in the Spectra list
section below: The Run SIRIUS
button will use the currently selected spectra with the current Sirius options
to run a SIRIUS analysis. This will typically take a few seconds. The Show SIRIUS
button will show SIRIUS results for the selected MS2 spectra when available. The color of the button indicates if SIRIUS results are available (green), not available (red), or available with settings that differ from the current settings in Sirius options
(yellow).
You can select molecular formulas from the SIRIUS result table on the left to display the corresponding fragmentation tree. The annotated fragments will also be highlighted in the Feature report
subtab MS2 spectrum view. If you selected Get FingerID
in the Sirius options
, a list of candidate molecules will show up on the right side. Select one to view the molecular structure. NOTE: Viewing molecular structures requires installation of the rcdk
package, which is not included in the Metaboseek Windows installer, and not automatically installed when installing Metaboseek from R.
Click on the Browse SIRIUS searches
section to show a list of SIRIUS jobs. Select a job here to look at SIRIUS results independent from the current Feature Table
selection.
When you select one (or multiple) entries in the Feature Table
, Metaboseek will find any MS/MS scans that have a parent mass matching the selected Feature Table
entry (e.g. within 5 ppm and 200 seconds, customizable). All MS/MS scans matching a selection (from a network or from the Feature Table
) are shown in a table in the MS2 browser
tab.
You can define the parent ion m/z tolerance (in ppm) and retention time window at the top, allowing you to only show MS2 scans that are within these tolerances from your selection in the Feature Table
. You can also sort this table with the controls at the bottom of the table. An average spectrum of all scans shown in this scan table
is displayed on the left. You can select single or multiple scans in the scan table
to show the spectrum of only the selected scan(s). The MS2 scans selected here are also used and displayed by both, the Feature Report
and Compare MS2
sub-tabs.
Shows interactive plots with results from the principal component analysis (PCA) from the Feature Table Actions
Analysis Options if available.
This module allow you to filter the Feature Table
in up to three different groups and show the number of overlaps between the groups. You can define the grouping by applying up to three different filters to the current Feature Table
. The filters work like in the Filter Table Tab
In this tab, you can view the data in summary plots. The left side uses the intensity values from the feature table as input, while the right side allows you to plot arbitrary Feature Table
columns against each other.
Here, you can select individual files to show their EICs for the selected feature or a custom m/z value. You can use SHIFT + click to select a data point to display the corresponding MS1 spectrum below. See the “Navigating plots” section for more information on how to interact with the spectrum and EIC plots.
You can display multiple independent EIC views at the same time. Each of them has these settings:
RT correction
Tab in the Options
boxFeature Table
.Other settings for the EIC plots, such as mass tolerance and color palette, can be changed in the EIC options
in the Options
box and will apply to all EIC plots in Metaboseek.
Similar to the MS Browser (see above), but enabling different layouts of grouped EICs. Some plotting parameters can be changed in the EIC options
in the Options
box, and some can be changed here directly:
RT correction
Tab in the Options
boxGrouped EICs
plots. You can interact with the plot as described here.Feature Table
. Their content will be shown in the plots as subtitle.Feature Table
(after applying the active filters) with grouped EICs for one feature per page. IMPORTANT: The limit for number of features that can be exported to pdf this way is 1000. Make sure you filter your results accordingly before exporting, otherwise only the EIC plots for the first 1000 features get exported.Regroup MS data
Tab.You can group the MS data independently from the grouping in the Feature Table
. This grouping can be used to define color schemes or which files should be plotted together in Grouped EICs
. It is possible to assign each file to two different groups to allow switching plot layouts using EIC options
in the Options
box. You can define multiple grouping schemes here with the ‘new Grouping’ and ‘Update Grouping’ buttons and switch between these schemes from the Grouped EICs
Tab.
This box contains the most important element in the app: the Feature Table
. Most plots in the Data Viewer
will use this table as input to show you information that is related to the molecular feature that is defined in the selected row.
sort: switch sorting the table on or off
decreasing: sort in decreasing or increasing order
Sort by column: which column to sort by
page: select which page to show. You can change the number of items per page in the Global Options
(Navigation bar at the top of the app)
Save Table: You can save the Feature Table
in multiple formats, and either download it through your browser, or save it in an automatically generated subfolder of your project folder (if you are working with xcms results from a Metaboseek project folder). The recommended format is .mskFT
, because it retains the processing history information for this Feature Table
when you load it back into Metaboseek. For export in other software, use the .csv format. You can also generate inclusion or exclusion lists for Thermo instruments.
History: Shows you the processing history of the currently active Feature Table
, including the processing history from the xcms run (if all steps were done in Metaboseek, and the Feature Table
was loaded in .mskFT
format):
Active Table: Select which Feature Table to display in the Feature Table box. You can load multiple Feature tables and switch between them here (e.g. tables filtered for different criteria)
Rename: Rename the currently active Feature Table (names of Feature Tables in the current session are displayed in the ‘Active Table’ selection box)
In this box you can run analyses on the currently selected Feature Table
and filter it.
Some column names and name schemes are generated by the actions you can take in the Analyze Table
Tab. You can use these columns to filter your Feature Table
in the Filter Table
Tab.
Column | Description | calculated by |
---|---|---|
{Group}__foldOverCtrl | Fold change of the mean intensity of G1 over mean intensity of the control group | Basic analysis |
{Group}__foldOverRest | Fold change of the mean intensity of G1 over mean intensities of all other samples outside of G1 | Basic analysis |
{Group}__meanInt | sample group mean intensity | Basic analysis |
{Group}__minFold | Fold change of the lowest intensity sample in G1 over the highest intensity sample of all other samples outside of G1 | Basic analysis |
{Group}__minFoldMean | Fold change of the mean intensity of G1 over the highest intensity sample of all other samples outside of G1 | Basic analysis |
{Group}__pval | p value between this group and all samples in all other groups, as calculated by stats::t.test() | t-test |
{Group}__pval_adj | p values, adjusted by the “bonferroni” method using stats::p.adjust() | t-test |
{Group}__sdev | coefficient of variatioin (relative standard deviation = sd/mean) within the group | t-test |
{IntensityColumn}__norm | Normalized intensity values for an intensity column, typically named {Sample}__norm or {Sample}__XIC__norm; by default : (1) replaces zeros by lowest value in entire intensity dataset (assuming it represents the detection limit), (2) intensity values of each column are adjusted so that each column has the mean intensity equal to the mean intensity of the entire dataset. | Normalize data |
{Sample} | Columns with intensities as reported by xcms | xcms script |
{Sample}__XIC | Columns with intensities as calculated by Metaboseek, can have a suffix other than "_XIC" if generated with the “Get intensities” button. | Get Intensities, xcms script, |
ANOVA__pvalue | per-row one-way ANOVA between grouped columns of the feature table | anova |
best_minFold | The highest minFold value reported across all groups | Basic analysis |
best_minFoldCtrl | The highest minFoldCtrl value reported across all groups | Basic analysis |
best_minFoldMean | The highest minFoldMean value reported across all groups | Basic analysis |
cluster__clara | listing the cluster into which each feature falls after using cluster::clara() with the selected number of clusters | clara cluster |
massdefppm | Mass defect in ppm, calculation: ((mz-floor(mz))/mz)*1e6 | Basic analysis |
maxfold | Highest fold change between the mean intensities of any 2 sample groups | Basic analysis |
maxfoldover2 | Fold change between the mean intensities of topgroup over group with second highest mean intensity | Basic analysis |
maxint | Maximum intensity across all samples | Basic analysis |
MS2scans | Lists the MS2 scans found for each feature in the MS2 data loaded when the “Find MS2 scans” analysis was run | Find MS2 scans |
mzMatch_{variable} | All other columns from the compound list are added as well with the prefix mzMatch_. | mzMatch |
mzMatchError | ppm error, based on difference between mz in the feature table and the mz in the compound list. | mzMatch |
mzMatches | identity of the seach hits in the selected compound lists, taken from the “id” column of the compound lists. If there are multiple hits across the selected compound lists, they will be separated by “|” within the mzMatches, mzMatchError and all mzMatch_{variable} columns. | mzMatch |
PCA__# | Principal component coordinates for each feature (components are numbered) | PCA features |
topgroup | Sample group with highest mean intensity | Basic analysis |
You can filter the Feature Table
here by specifying a column and filter criteria. Columns containing text can be filtered for text patterns, and numeric columns for values within a range. You can define an arbitrary number of filters and it is possible to activate or deactivate individual filter steps. IMPORTANT: when you save a Feature Table
, the currently active filters will be applied before saving.
The Analyze Table
tab is the central hub for data analysis on your Feature Table
. Most analysis steps will generate new columns in the Feature Table
which you can then use to filter your table to get to your features of interest. See below for a guide to the columns generated by the analysis steps.
For more in-depth information on the underlying functions in R
, see the Metaboseek::analyzeFT documentation.
Normalize data: Select this option ONLY if the current table has not been filtered and is the result of an unbiased xcms analysis. “Normalization” will make a copy of the current intensity columns with the suffix "__norm". In these new columns, all zero intensity values will be set to the lowest non-zero value across all sample intensity columns (assuming it represents the detection limit). Then, a normalization factor is applied so that the average intensiy of each individual column is the same and equal to the average intensity across all columns prior to normalization. If you do this on a table that has been pre-filtered, for instance containing only features that are upregulated in one sample group, this will fatally distort the data!
Use normalized data: Use the normalized data for when running analyses that use intensity values: Basic Analysis, anova, t-test, PCA, clara cluster
Select control group: Select a sample group that is the control (in Basic Analysis, all sample groups are compared to this group).
Apply log10: When checked, will apply log10 to the "__norm columns" (see above) after normalization of intensity values.
Basic analysis: Selecting this option will calculate a set of fold changes between sample groups and some summary information columns such as maxint
. See below for a description of all columns generated by this analysis step.
clara cluster: cluster the feature table with cluster::clara()
anova: Calculate per-row one-way ANOVA between grouped columns of the feature table. NOTE: Equal variance is not assumed (uses stats::oneway.test), returns NaN in cases where one group has all equal values (no variance, e.g. if all values are 0).
t-test: calculate t-test between samples. Works only if there are two groups in grouping with multiple members.
PCA features: Perform Principal Component Analysis (PCA) of features (does not require grouping information). will add columns to the feature table.
PCA samples: Calculate Principal Component Analysis to cluster samples based on the intensity columns (does not require grouping information). sample PCA information is not stored in the viewable feature table, but is saved as part of an .mskFT file.
mzMatch: Match the m/z values of your featureTable to a list of known compounds. Note that these matches based on MS1 data alone are ambiguous. Will generate multiple columns in the Feature Table
, as described below.
Peak shapes: Tries to match the EIC for each feature in each sample to a curve and calculates a fit score between 0 (no fit) and 1 (best fit).
Fast peak shapes: Recommended way to score peak shapes. Tries to match the EIC for each feature in the sample with the highest intensity for each feature to a curve and calculates a fit score between 0 (no fit) and 1 (best fit). Much faster than “Peak shapes”, with equivalent or better results.
Get intensities: For each molecular feature, an EIC is generated across all MS files currently loaded in the MS data layout. The retention time boundaries of the EIC can be chosen to be seconds around the features retention time (rt) or around its peak boundaries as reported in the rtmin and rtmax columns of the feature table. If retention time correction information is used, the EIC retention time window is moved accordingly for each file. Intensities within this EIC range are averaged and reported for each file. Alternatively, the peak areas can be calculated instead of the average intensities, leading to results that are more easily comparable to xcms-based intensities.
Find MS2 scans: Find MS2 scans corresponding to each feature in the Feature Table. Allows setting of m/z and RT tolerances, will add a column to the Feature Table with information about the scans. This column is used by the MS2 browser Module to identify feature-specific spectra for MS2 networking.
MS2 patterns: Allows to search for combinations of MS2 fragment peaks in all loaded MS data files.
Labelfinder: Find stable isotope labeled compounds in datasets containing labeled and unlabeled samples.
For the Labelfinder, follow these steps:
Run two xcms analyses independently for the labeled and the unlabeled samples.
Load the results from both analyses into the Metaboseek session (potentially use the renaming functionality in the Feature Table
box to keep track of which results come from the labeled and unlabeled samples).
Make sure to also load all MS files into the session, for both labeled and unlabeled samples.
Select the unlabeled Feature Table
as active table in the Feature Table
box
Open the Labelfinder
dialog and select the labeled sample feature table.
Read the tooltips on the settings for explanations on the individual settings. You can deselect samples from both the labeled and unlabeled feature tables if necessary
Press Go
to start the Labelfinder
analysis. This will generate a new Feature Table
with likely labeled compounds using the the selected name (by default has “Labelfinder_” as a prefix). The unlabeled features will be reported in the resulting table.
To browse the results, you can add the label m/z of interest to the Options -> Mass shifts
. This will allow you to see overlays of EICs for the labeled and unlabeled compounds. Note that you may have to manually load additional raw files (e.g. those for the labeled samples) to display all relevant information.
Click here for details on the Labelfinder algorithm
The findLabels()
function compares two Feature Tables with each other, assuming that one of them contains an enrichment of labeled compounds.
In a first step, featlistCompare()
is used to identify entries in the reference (unlabeled) Feature Table which have a corresponding, labeled feature in the comparison (labeled) Feature Table (m/z in comparison Feature Table should be within tolerance of reference m/z + expected label and also within retention time tolerance).
Each entry from the reference Feature Table (dubbed I1S1
, for Isotopologue 1, Sample Group 1) can have multiple matches in each of these categories: 1. m/z + label match in reference table (I2S1
) 2. m/z match in comparison table (I1S2
) 3. m/z + label in comparison table (I2S2
)
For each match, only the match closest in retention time to I1S1
is kept for further processing. Intensities are re-extracted for all matched peaks (I1S1
, I2S1
, I1S2
, I2S2
), using the m/z values identified for I1S1
(for I1S1
and I1S2
) and I2S2
(for I1S2
and I2S2
), and the rt values for I1S1
(for I1S1
and I2S1
) and I2S2
(for I2S2
and I2S2
). The extracted intensities are used to calculate mean intensity across the unlabeled (S1) and labeled (S2) samples for both isotopologs.
Key filter criteria that are user-controlled are the minimum ratio of I1S1/I2S1
(because a high ratio is expected in the unlabeled sample S1
where the unlabeled compound I1
is expected to be more abundant than the labeled compound) and the maximum ratio of I1S2/I2S2
(where a low value is indicative of the label being enriched). The Features from the reference Feature Table which meet the filter criteria are then exported to a new Feature Table that contains intensity information for I1S1
, I2S1
, I1S2
and I2S2
. The reported m/z and rt values are directly carried over from the original reference Feature Table.
Click here for details on the peak detection algorithm
This tab allows you to redefine the columns containing intensity values and how they are grouped.
This section will help you to set up an xcms analysis in Metaboseek in order to identify LC/MS features that are differential between sets of data files. This can, for instance, be useful to assess the impact of a mutation on the metabolome of an organism or to identify compounds associated with the activity of an enzyme.
Click here for details about the xcms settings
xcms::findChromPeaks()
function
xcms::fillChromPeaks()
function. You can also set parameters for the Metaboseek peak intensity functions here, which will extract intensities for all molecular features in all files.xcms::groupChromPeaks
with xcms::PeakDensityParam
.CAMERA
package. Metaboseek
sequentially runs the CAMERA
package functions xsAnnotate
, groupFWHM
, groupCorr
, findIsotopes
and findAdducts
which are described in the CAMERA
documentation.xcms::adjustRtime
either using the Obiwarp
or the peakGroups
method. If Obiwarp
is selected and fails, the xcms runner script will attempt to run peakGroups
with the given paramters.Dührkop, Kai, Markus Fleischauer, Marcus Ludwig, Alexander A. Aksenov, Alexey V. Melnik, Marvin Meusel, Pieter C. Dorrestein, Juho Rousu, and Sebastian Böcker. 2019. “SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.” Nature Methods 16 (4): 299–302. https://doi.org/10.1038/s41592-019-0344-8.