LipidFinder is an open-source Python workflow designed to facilitate further targeted lipidomics analysis. LipidFinder categorises and removes noise and artifacts from liquid chromatography/mass spectrometry (LC/MS) datasets, searches the outcome in different databases to obtain putative identification of lipids, and assigns them to a class based on the LIPID MAPS classification system. The software quickly distinguishes and quantifies lipid-like features from contaminants, adducts and noise in LC/MS datasets that have been pre-processed using XCMS. Although we advise users to use XCMS, LipidFinder accepts input files from other pre-processing software tools (e.g. SIEVE™ from ThermoFisher). LipidFinder 1.0: O'Connor et al., 2017. LipidFinder on LIPIDMAPS: Fahy et al., 2018.
This Jupyter notebook will explain how to get your computer ready to use LipidFinder and will guide you through LipidFinder's workflow to process your data. Before continuing, note that your data needs to be high resolution MS (e.g. at least 60000ppm) and with long chromatography to separate isobaric lipids. The approach is not suitable for shotgun lipidomics, MS/MS or low resolution datasets. The demo data provided in the tests folder was generated with an Orbitrap Elite Mass Spectrometer, but the software is MS-platform independent. It is composed by 12 LC/MS runs of macrophage-like cells: first 6 samples from RAW cells (264.7) and last 6 samples from mouse wildtype (C57BL/6). Afterwards, the data was pre-processed with XCMS Online using the set of parameters Orbitrap II.
LipidFinder can be downloaded from GitHub or accessed online via LIPID MAPS: http://www.lipidmaps.org/resources/tools/lipidfinder. For questions or suggestions regarding LipidFinder please contact us at lipidfinder@cardiff.ac.uk.
LipidFinder is distributed under the MIT license (see LICENSE file for more information).
LipidFinder has been designed for extensive clean-up of LC/MS datasets where a high degree of artifact removal is desired (e.g. discovery lipidomics). ESI-high resolution MS experiments contain many spurious signals that can arise from diverse sources, including common contaminants, adducts, in-source fragments, etc. LipidFinder is devised to work primarily as an add-on to XCMS, focusing on the clean-up of MS data files which have already been pre-processed for peak alignment and integration. Removal of these artifacts results in significantly cleaner datasets that perform better in downstream statistical analysis pipelines.
LipidFinder has been tested for Python 2.7.9 and Python 3.6.3. This doesn't mean it won't work in earlier versions, but you might get errors or significant differences in the results. Some computer’s operating systems come bundled with Python, but it can also be downloaded and installed from the Python Software Foundation. The first step is to download LipidFinder's package file (Wheel format) for GitHub:
LipidFinder's package includes all the instructions to install all the dependencies required. The easiest way to install LipidFinder is to open a command prompt/terminal, go to the folder where the downloaded Wheel file is locates, and run one of the following commands:
pip install LipidFinder-2.0-py2-none-any.whl # for Python 2.7
pip install LipidFinder-2.0-py3-none-any.whl # for Python 3.6
Many users prefer to use Anaconda, an open-source distribution aimed to do Python data science and machine learning in Windows, Linux, and MacOS. To install LipidFinder, open an Anaconda prompt/terminal and run one of the following commands:
pip install LipidFinder-2.0-py2-none-any.whl # for Python 2.7
pip install LipidFinder-2.0-py3-none-any.whl # for Python 3.6
Note: We suggest to download and install Anaconda for Python 3.6. All the scripts include the .py
extension that needs to be removed in Windows systems.
In [ ]:
config_params.py -m peakfilter
Additionally, if you already have a parameters JSON file, you can load its values instead of LipidFinder's defaults (see example below). Once launched, the process will guide you through a question-answering system to configure each parameter. At the end, the program will ask for the path and file name in which you want to save the new set of parameters:
In [ ]:
config_params.py -m peakfilter -p my_parameters.json
The second option is through a Jupyter notebook (like this one). The Configuration module includes a graphical user interface (GUI) class to set up each parameter of the selected module interactively based on Jupyter's widgets. The following code shows an example of how to launch the GUI to set Amalgamator's parameters based on default values:
In [ ]:
from LipidFinder.Configuration.LFParametersGUI import LFParametersGUI
LFParametersGUI(module='amalgamator');
To use an existing parameters JSON file instead of the default values, you need to add the argument src=x
, where x
is the path to the JSON file, to the LFParametersGUI()
call.
Hint: once you have configured PeakFilter's parameters, you can use that JSON file as template for the other modules so you do not need to type in again the value of the parameters they all share (e.g. m/z column name). Warning: parameter firstSampleIndex
needs to be changed when using PeakFilter's summary output file as input.
We have included a help option to display the description, usage and other information of each Python script included in LipidFinder. For instance, for the previous script, the command to run would be the following:
config_params.py -h
A user that has used LipidFinder 1.0 might be interested in repeating their experiments with the new version or run new ones under a similar parameter configuration. Thus, we have developed a script to transform the old parameters CSV file for PeakFilter and Amalgamator to the new parameters JSON files for the same modules. To run it you will also need the old adducts CSV file to update the lists of adduct pairs. We have included an example of these two files in the tests
folder (available on GitHub) to illustrate how to use the script:
update_params.py -p tests/LipidFinder-1.0/old_parameters.csv -a test/LipidFinder-1.0/old_adducts.csv -o results
The script will generate two files: peakfilter.json
and amalgamator.json
. Be aware that these new parameters JSON files are incomplete (some new parameters have been introduced in LipidFinder 2.0) and will raise an error when used for their corresponding module. They should be handled first by config_params.py
(-p
argument) to fill in the missing parameters and generate a complete version.
LipidFinder's complete workflow is composed by three modules: PeakFilter, Amalgamator and MSSearch. We have developed one script for each one to ease their usage. Each module will create a log
file (named after the module) that will save any information that might be useful for the user, e.g. which frames that have been removed by which stages during PeakFilter. A new run will append the new information at the end of the log
file if it already exists, so no information is lost.
A standard LipidFinder workflow would first process the pre-aligned data with PeakFilter (once for negative and once for positive ion polarity), afterwards Amalgamator would merge both files' information based on matching m/z values, and finally, MSSearch would identify and classify lipid-like features with the selected LIPID MAPS database. Alternatively, LipidFinder can also process a single file with PeakFilter and run MSSearch afterwards.
The following examples are all based on the demo data pre-processed with XCMS, but we also provide an alternative to show LipidFinder's flexibility with SIEVE™ pre-processed files (just replace XCMS by SIEVE in each command).
PeakFilter is intended to clean-up the data from contaminants, adducts, stacks and other artifacts like in-source ion fragments and salt clusters. Among its parameters, PeakFilter has several "switches" for determined filtering functionalities that should be configured based on the experimental set-up that generated the input dataset.
In most scenarios, an experiment involving LC/MS will generate two sets of data with different ion polarity: one negative and one positive. After they have been pre-processed separately with XCMS, we need to process each file individually with PeakFilter. Using our demo data available on GitHub under the tests
folder, we show first how to process the negative polarity CSV file:
In [ ]:
run_peakfilter.py -i tests/XCMS/negative.csv -o results -p tests/XCMS/params_peakfilter_negative.json
And then the positive one:
In [ ]:
run_peakfilter.py -i tests/XCMS/positive.csv -o results -p tests/XCMS/params_peakfilter_positive.json
By default, PeakFilter will generate the complete filtered file and a summary output CSV file with the relevant information of each remaining frame.
The output file names will always contain ion polarity, so running PeakFilter once for each polarity will not be a problem when choosing the same output folder (e.g. results
in the previous examples). However, if we change the parameters and run PeakFilter again with the same output folder, we will overwrite any previous output file for the same polarity.
Amalgamator merges the output files for both negative and positive ion polarities generated with PeakFilter. By default, it will keep every frame that exists in only one of the input files, and for those with a match in both files, Amalgamator will retain the information of the one with the highest mean intensity for all samples tagging the selected source in the output file's Polarity column.
In [ ]:
run_amalgamator.py -neg results/peakfilter_negative_summary.csv \
-pos results/peakfilter_positive_summary.csv \
-p tests/XCMS/params_amalgamator.json -o results
Duplicates are identified by comparing the negative file with the positive file within a small retention time tolerance and a corrected m/z tolerance (negative m/z + 2H+, followed by negative m/z + H+ + CH3+ for phosphotidylcholine and sphingomyelins with phosphocholine head group). Any hits are classed as a match.
Alternatively, you can use the complete output files generated by PeakFilter as input files if you want to keep every column of your source data file.
MSSearch has been designed to identify and classify lipid-like features from either PeakFilter or Amalgamator output file, using the knowledge available in LIPID MAPS. The output file will include all the matches for each m/z value in the input file (within the indicated tolerance in the parameters JSON file). The output file will also include every frame not found in the selected database, and they will be classified as unknown. Finally, MSSearch will create a lipid-category scatter plot of the results by m/z and retention time in a PDF file (by default).
In [ ]:
run_mssearch.py -i results/amalgamated.csv -o results \
-p tests/XCMS/params_mssearch.json