SIMS2015 Session DP-ThP: Data Processing and Interpretation Poster Session

Thursday, September 17, 2015 5:20 PM in Grand Ballroom III

Thursday Afternoon

Time Period ThP Sessions | Topic DP Sessions | Time Periods | Topics | SIMS2015 Schedule

DP-ThP-2 Integration of SIMS Data Into Cross-Platform, Multimodality, Multivariate Analysis Software
Alan Race, Bonnie J. Tyler, Josephine Bunch (National Physical Laboratory, UK); Ian S. Gilmore (National Physical Laboratory, UK, United Kingdom of Great Britain and Northern Ireland)

The ability to combine SIMS data with additional imaging modalities is becoming increasingly desired for both increased specificity and confidence in localisation of analytes. Here we present software capable of handling SIMS data in a generic binary format, as well as MALDI, DESI, LESA (stored in the open mass spectrometry imaging format imzML) and Raman imaging datasets. This enables processing of multimodality data from the same sample within the same software, assisting in making the data directly comparable.

Rapid processing and investigation of datasets can be performed through automated peak detection and data reduction, followed by subsequent multivariate analysis such as principal component analysis (PCA), non-negative matrix factorisation (NMF) or probabilistic latent semantic analysis (PLSA). The effectiveness of these multivariate analysis techniques will be demonstrated on both SIMS and MALDI MSI datasets of biologically relevant samples.

The software is written in a combination of MATLAB (for ease of extensibility), C/C++ (enabling the use of CUDA for rapid processing on GPUs) and Java (for CPU multithreading of memory intensive routines). The open source nature provides an extensible platform for incorporating additional preprocessing and multivariate analysis routines with ease without requiring the redevelopment of GUIs, data visualisation and data format parsers.

DP-ThP-3 SurfaceSpectra Identity: A Free Web Service for Molecular Formula Identification
Alex Henderson (University of Manchester, UK)

In 2006, Kind and Fiehn released a database of all ‘correct’ molecular formulae of mass less than 500 u consisting of C, H, O, N, S and P. [1,2] The list contains 1.6 million molecular formulae. Since this collection was calculated in silico, the exact isotopic mass and natural isotopic abundance of the molecule (M) is known to many decimal places. Using this database, we determined the subset of molecular formulae that contained at least one hydrogen atom and, removing this atom, generated a list of M-H fragments; roughly doubling the number of entries. Here, it is interesting to note that in the removal of a hydrogen atom, we have in effect generated a list of possible fragments where a single bond has been broken.

Consider the case where a molecule is fragmented and a C–C bond is broken. Each fragment thus produced can be considered to be an M’-H species where the M’ corresponds to the fragment if it were to have had a hydrogen attached/removed rather than being part of a larger structure. Therefore, one can consider the list of M-H species to be all possible fragments of the same set of atoms (CHONSP). It is trivial to consider adducts of M; M+H, M+Na, M+K, and the masses of these are calculated on demand.

In 2013, a free desktop software package – SurfaceSpectra Identity – was released [3,4]. This allows the user to search the 3.2 million molecular formulae and adducts for a spectral feature in the range 1–500 u, in the domain of CHONSP, and view the isotope pattern of the possible species. In addition, this software will display the isotope pattern of any other molecular formula, although this aspect does not determine whether the molecule is fully valent.

Now that more mobile and web-oriented software solutions are available it seems right time to convert this database into a web service, allowing anyone to access the information without the need to use the SurfaceSpectra software.

In this contribution we will outline the design of the database, the mechanisms of accessing the web service and its functionality.

[1] T. Kind, O. Fiehn, BMC Bioinformatics 2006, 7, 234. DOI:10.1186/1471-2105-7-234

[2] http://fiehnlab.ucdavis.edu/projects/identification/

[3] SurfaceSpectra Identity. http://surfacespectra.com/identity/

[4] A. Henderson, J.D. Moore and J.C. Vickerman (2013), Surf. Interface Anal., 45: 471–474. DOI:10.1002/sia.5065

DP-ThP-4 Combination of Wavelet transforms and Principal Component Analysis for Treatment of High Mass Resolution Three-Dimensional SIMS Data
Nunzio Tuccitto, Gabriella Zappalà, Stefania Vitale, Alberto Torrisi, Antonino Licciardello (Università degli Studi di Catania, Italy)

Current use of cluster primary ion beams for secondary ion mass spectrometry allows the molecular depth profiling of organic and polymeric materials. By combining that to the lateral capability of ToF-SIMS imaging it provides the ability to perform 3D molecular imaging experiments. Since a typical 3D raw data may contain thousand of peaks, the amount of information to deal with grows rapidly and widely, so that data reduction techniques become indispensable for extracting the most significant information from the given data set. The very basic approach used to analyze giant mass spectrometer data is their reduction by performing mass binning compression, so losing the high mass resolution capability of ToF instrumentation. Otherwise, a precise peaks detection and integration is needed. Although the latter procedure is achievable, the challenge is how to extract easily reliable information from the data, without a massive involvement of a specialist.

Principal component analysis (PCA) is commonly used in the ToF-SIMS community for feature extraction, especially from complex biological materials. What PCA does is to project a set of data from a high dimensional space to a lower dimensional one, which has the set of data as its main components. However, a limitation of this method is that it cannot remove the noise efficiently enough, so making difficult the direct use of the giant 3D raw data. One of the potential solutions is the application of Wavelets transform for data compression and noise removal. When Wavelet transform decomposes a signal, most of the wavelet coefficients are small or zero. This is the property that enables wavelets to compress signals. In addition, since the noise is usually white noise, in the wavelet coefficients the noise is concentrated in the “detail” coefficients, whereas the most meaningful data is embossed in the “approximation” coefficients. Therefore, a possible solution is to combine wavelet analysis with PCA, called Wavelet-PCA, which can improve the result of feature extraction process. The approach we present in this contribution is based on the use of a ”smart” data compression by discrete wavelet transform followed by PCA of the approximation coefficients and the successive wavelet inversed transformation of data. Several applications of this methodology on three-dimensional ToF-SIMS data treatment will be illustrated.

DP-ThP-5 The Benefits of Using All of the Measured Mass Channels During MVSA of ToF-SIMS Data Sets
Vincent Smentkowski (General Electric Global Research Center); Mike Keenan (Independent Scientist); Henrik Arlinghaus (ION-TOF GmbH, Germany)

Time of flight secondary ion mass spectrometry (ToF-SIMS) data sets are very large and contain a wealth of information about the material being analyzed. A typical image data set can be comprised of 256 x 256 pixels with a 0 to 900 amu (or greater) mass spectrum collected at high mass resolution at every pixel. Data sets are often comprised of >1 x 1015 spectral channels. The challenge for a ToF-SIMS analyst is to scrutinize all of the measured information without bias in order to provide for the most robust understanding of the material being analyzed; this is especially important in an industrial setting where unknown samples are analyzed. Multivariate statistical analysis (MVSA) algorithms have assisted in ToF-SIMS data work up [1,2], however commercially available software is not able to handle data sets this large and the analysts often select mass intervals to utilize and/or degrades the mass resolution prior to MVSA analysis. In this poster, we will report first results obtained using MVSA software that is able to handle massive ToF-SIMS data sets. We demonstrate two important benefits of unbiased analysis of the massive ToF-SIMS data sets: (1) finding unexpected elements in real world samples (this is a reason why the authors never use peak lists for MVSA analysis) and (2) the ability to obtain high mass resolution results from data sets collected at nominal mass resolution (e.g., the beam alignment pulsing mode on ION-TOF instruments). The importance of these two benefits will be highlighted.

References:

[1] Surface and Interface Analysis, Special issue on Multivariate Analysis. Volume 41,

issue 2 Feb 2009

[2] Surface and Interface Analysis, Special issue on Multivariate Analysis II. Volume

41 issue 8, Aug 2009

Time Period ThP Sessions | Topic DP Sessions | Time Periods | Topics | SIMS2015 Schedule