Skip to content

Commit e0d5e6d

Browse files
committed
Small fixes and updates
1 parent 86c2b6e commit e0d5e6d

File tree

1 file changed

+50
-49
lines changed

1 file changed

+50
-49
lines changed

vignettes/xcms-preprocessing.Rmd

Lines changed: 50 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -893,7 +893,7 @@ EIC for serine and run a *centWave*-based peak detection on that data using
893893
```{r centWave-default}
894894
#' Get the EIC for serine in all files
895895
serine_chr <- chromatogram(mse, rt = c(164, 200),
896-
mz = serine_mz + c(-0.01, 0.01),
896+
mz = serine_mz + c(-0.005, 0.005),
897897
aggregationFun = "max")
898898
899899
#' Get default centWave parameters
@@ -906,7 +906,7 @@ chromPeaks(res)
906906

907907
The peak matrix returned by `chromPeaks` is empty, thus, with the default
908908
settings *centWave* failed to identify any chromatographic peak in the EIC for
909-
serine. These default values are shown below:
909+
serine. The default values for the parameters are shown below:
910910

911911
```{r centWave-default-parameters}
912912
#' Default centWave parameters
@@ -920,6 +920,7 @@ however see that these values are way too large for our UHPLC-based data set
920920
(see below).
921921

922922
```{r, fig.cap = "Extracted ion chromatogram for serine."}
923+
#' Plot the EIC
923924
plot(serine_chr)
924925
```
925926

@@ -1273,21 +1274,21 @@ repeatedly measured QC samples (e.g. sample pools) and adjust the full
12731274
experiment based on these. See the alignment section in the *xcms*
12741275
[vignette](https://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html)
12751276
for more information on this subset-based alignment. Note that such a
1276-
subset-based alignment requires the samples to be loaded in the order in which
1277-
they were measured. Also, recently, functionality was added to *xcms* to perform
1278-
the alignment on pre-selected signals (e.g. retention times of internal
1277+
subset-based alignment requires the samples to be organized in the order in
1278+
which they were measured. Also, recently, functionality was added to *xcms* to
1279+
perform the alignment on pre-selected signals (e.g. retention times of internal
12791280
standards) or to align a data set against an external reference.
12801281

12811282
For our example we use the *peakGroups* method that, as mentioned above, aligns
12821283
samples based on the retention times of *anchor peaks*. To define these, we need
1283-
to first run an initial correspondence analysis to group chromatographic peaks
1284+
to first run an initial correspondence analysis and group chromatographic peaks
12841285
across samples. Below we use the *peakDensity* method for correspondence
12851286
(details about this method and explanations on the choices of its parameters are
12861287
provided in the next section). In brief, parameter `sampleGroups` defines to
12871288
which sample group of the experiment individual samples belong to, and parameter
12881289
`minFraction` specifies the proportion of samples (of one of the sample groups
12891290
defined in `sampleGroups`) in which a chromatographic peak needs to be detected
1290-
to group them into an LC-MS feature. Chromatographic peaks will be grouped into
1291+
to group them into an LC-MS feature. Chromatographic peaks will be grouped to
12911292
features if their difference in *m/z* and retention times is below the defined
12921293
thresholds and if in at least `minFraction * 100` percent of samples of at least
12931294
one sample group a chromatographic peak was detected. For our example we use the
@@ -1303,7 +1304,7 @@ the samples, its settings does not need to be fully optimized.
13031304
#' Define the settings for the initial peak grouping - details for
13041305
#' choices in the next section.
13051306
pdp <- PeakDensityParam(sampleGroups = sampleData(mse)$group, bw = 1.8,
1306-
minFraction = 1, binSize = 0.02, ppm = 10)
1307+
minFraction = 1, binSize = 0.01, ppm = 10)
13071308
mse <- groupChromPeaks(mse, pdp)
13081309
```
13091310

@@ -1330,9 +1331,9 @@ pgm <- adjustRtimePeakGroups(mse, PeakGroupsParam(minFraction = 1))
13301331
head(pgm)
13311332
```
13321333

1333-
Ideally, if possible, the anchor peaks should span a large range of the
1334-
retention time range to allow alignment of the full LC runs. Below evaluate the
1335-
distribution of retention times of the anchor peaks in the first sample.
1334+
Ideally, if possible, the anchor peaks should span most of the retention time
1335+
range to allow alignment of the full LC runs. Below evaluate the distribution of
1336+
retention times of the anchor peaks in the first sample.
13361337

13371338
```{r}
13381339
#' Evaluate distribution of anchor peaks' rt in the first sample
@@ -1346,9 +1347,9 @@ on the `minFraction` parameter) the algorithm minimizes the observed
13461347
between-sample retention time differences for these. Parameter `span` defines
13471348
the degree of smoothing of the loess function that is used to allow different
13481349
regions along the retention time axis to be adjusted by a different factor. A
1349-
value of 0 will most likely cause overfitting, while 1 would cause all retention
1350-
times of a sample to be shifted by a constant value. Values between 0.4 and 0.6
1351-
seem to be reasonable for most experiments.
1350+
value close to 0 will most likely cause overfitting, while a value of 1 would
1351+
cause all retention times of a sample to be shifted by a constant value. Values
1352+
between 0.4 and 0.6 seem to be reasonable for most experiments.
13521353

13531354
```{r alignment-correspondence}
13541355
#' Define settings for the alignment
@@ -1474,10 +1475,10 @@ assignment defined in `sampleData`.
14741475

14751476
```{r}
14761477
#' Extract a chromatogram for a m/z range containing serine
1477-
chr_1 <- chromatogram(data, mz = serine_mz + c(-0.005, 0.005))
1478+
chr_1 <- chromatogram(mse, mz = serine_mz + c(-0.005, 0.005))
14781479
14791480
#' Default parameters for peak density; bw = 30
1480-
pdp <- PeakDensityParam(sampleGroups = sampleData(data)$group, bw = 30)
1481+
pdp <- PeakDensityParam(sampleGroups = sampleData(mse)$group, bw = 30)
14811482
14821483
#' Test these settings on the extracted slice
14831484
plotChromPeakDensity(chr_1, param = pdp)
@@ -1497,22 +1498,22 @@ of this curve (which is created with the base R `density` function) is
14971498
configured with the parameter `bw`. The *peakDensity* algorithm assigns all
14981499
chromatographic peaks within the same *peak* of this density estimation curve to
14991500
the same feature. Chromatographic peaks assigned to the same feature are
1500-
indicated with a grey rectangle in the plot. In the present example, because
1501-
retention times of the two chromatographic peaks are very similar, this
1502-
rectangle is very narrow and looks thus more like a vertical line. Based on this
1503-
result, the default settings (`bw = 30`) seemed to correctly define features. It
1504-
is however advisable to evaluate settings on multiple slices, ideally with
1505-
signal from more than one compound being present. Such slices could be
1506-
identified in e.g. a plot created with the `plotChromPeaks` function (see
1507-
example in the chromatographic peak detection section).
1501+
indicated with a grey rectangle in the lower panel of the plot. In the present
1502+
example, because retention times of the two chromatographic peaks are very
1503+
similar, this rectangle is very narrow and looks thus more like a vertical
1504+
line. Based on this result, the default settings (`bw = 30`) seemed to correctly
1505+
define features. It is however advisable to evaluate settings on multiple
1506+
slices, ideally with signal from more than one compound being present. Such
1507+
slices could be identified in e.g. a plot created with the `plotChromPeaks`
1508+
function (see example in the chromatographic peak detection section).
15081509

15091510
In our example we extract a chromatogram for an *m/z* slice containing signal
15101511
for known isomers betaine and valine ([M+H]+ *m/z* 118.08625).
15111512

15121513
```{r correspondence-bw, fig.cap = "Correspondence analysis with default settings on an *m/z* slice containing signal from multiple ions."}
15131514
#' Plot the chromatogram for an m/z slice containing betaine and valine
1514-
mzr <- 118.08625 + c(-0.01, 0.01)
1515-
chr_2 <- chromatogram(data, mz = mzr, aggregationFun = "max")
1515+
mzr <- 118.08625 + c(-0.005, 0.005)
1516+
chr_2 <- chromatogram(mse, mz = mzr, aggregationFun = "max")
15161517
15171518
#' Correspondence in that slice using default settings
15181519
plotChromPeakDensity(chr_2, param = pdp)
@@ -1527,14 +1528,14 @@ reduced value for parameter `bw`.
15271528

15281529
```{r correspondence-bw-fix, fig.cap = "Correspondence analysis with reduced bw setting on a *m/z* slice containing signal from multiple ions."}
15291530
#' Reducing the bandwidth
1530-
pdp <- PeakDensityParam(sampleGroups = sampleData(data)$group, bw = 1.8)
1531+
pdp <- PeakDensityParam(sampleGroups = sampleData(mse)$group, bw = 1.8)
15311532
plotChromPeakDensity(chr_2, param = pdp)
15321533
```
15331534

15341535
Setting `bw = 1.8` strongly reduced the smoothness of the density curve
15351536
resulting in a higher number of density *peaks* and hence a nice grouping of
15361537
(aligned) chromatographic peaks into separate features. Note that the height of
1537-
the peaks of the density curve are not considered for the grouping.
1538+
the peaks of the density curve are not relevant for the grouping.
15381539

15391540
By having defined a `bw` appropriate for our data set, we proceed and perform
15401541
the correspondence analysis on the full data set. Other parameters of
@@ -1557,17 +1558,17 @@ allows to generate *m/z*-dependent bin sizes: the width of the *m/z* slices
15571558
increases by `ppm` of the bin's *m/z* along the *m/z* axis.
15581559

15591560
For our correspondence analysis we set the maximal acceptable difference of
1560-
chrom peaks' *m/z* values with `binSize = 0.02` and `ppm = 10`, hence grouping
1561+
chrom peaks' *m/z* values with `binSize = 0.01` and `ppm = 10`, hence grouping
15611562
chromatographic peaks with similar retention time and with a difference of their
1562-
*m/z* values that is smaller than 0.02 + 10 ppm of their *m/z* values. By
1563+
*m/z* values that is smaller than 0.01 + 10 ppm of their *m/z* values. By
15631564
setting `minFraction = 0.4` we in addition require for a feature that a
15641565
chromatographic peak was detected in `>=` 40% of samples of at least one sample
15651566
group.
15661567

15671568
```{r correspondence-analysis}
15681569
#' Set in addition parameter ppm to a value of 10
15691570
pdp <- PeakDensityParam(sampleGroups = sampleData(mse)$group, bw = 1.8,
1570-
minFraction = 0.4, binSize = 0.02, ppm = 10)
1571+
minFraction = 0.4, binSize = 0.01, ppm = 10)
15711572
15721573
#' Perform the correspondence analysis on the full data
15731574
mse <- groupChromPeaks(mse, param = pdp)
@@ -1821,9 +1822,9 @@ l <- lm(log2(avg_filled) ~ log2(avg_detect))
18211822
summary(l)
18221823
```
18231824

1824-
With a value of 0.994, the slope of the line is thus very close to the slope of
1825+
With a value of 1.007, the slope of the line is thus very close to the slope of
18251826
the identity line and the two sets of values are also highly correlated (R
1826-
squared of 0.79).
1827+
squared of 0.81).
18271828

18281829

18291830

@@ -1977,8 +1978,8 @@ available in the infrastructure provided through the *xcms*, *Spectra*,
19771978
*MsCoreUtils*, *MetaboCoreUtils* and other related Bioconductor packages. It
19781979
would for example be easily possible to extract specific information for
19791980
selected chromatographic peaks or LC-MS features from an *xcms* result object
1980-
and perform some additional visualizations or analyses on them. Below we first
1981-
identify chromatographic peaks that would match the *m/z* of serine.
1981+
and perform some additional visualizations or analyses on them. AS an example we
1982+
below first identify chromatographic peaks that would match the *m/z* of serine.
19821983

19831984
```{r}
19841985
#' Extract chromatographic peaks matching the m/z of the [M+H]+ of serine
@@ -2004,10 +2005,10 @@ serine_ms1_2 <- chromPeakSpectra(mse, msLevel = 1, method = "closest_rt",
20042005
peaks = rownames(serine_pks)[2])
20052006
```
20062007

2007-
For LC-MS/MS data, this function would allow to select all MS2 spectra from the
2008-
data set with their precursor m/z (and retention time) within the
2009-
chromatographic peak's *m/z* and retention time width using parameters `msLevel
2010-
= 2` and `method = "all"`.
2008+
For LC-MS/MS data, this function would also allow to extract all MS2 spectra
2009+
from the data set with their precursor m/z (and retention time) within the
2010+
chromatographic peak's *m/z* and retention time width by using parameters
2011+
`msLevel = 2` and `method = "all"`.
20112012

20122013
Below we plot the EIC and the MS1 scan for the selected chromatographic peak.
20132014

@@ -2033,9 +2034,9 @@ and retention time ranges of the chromatographic peak in that sample,
20332034
`featureChrommatograms` will instead integrate the signal from the *m/z* and
20342035
retention time area of the **feature**, i.e. will use a single area and
20352036
integrate the signal from that same area in each sample. This *m/z* - retention
2036-
time area might eventually be larger than the respective ranges for a single
2037+
time area might however be larger than the respective ranges for a single
20372038
chromatographic peak in one sample. This *m/z* - retention time area for
2038-
features can be extracted using the `featureArea` function:
2039+
features can also be extracted (and evaluated) using the `featureArea` function:
20392040

20402041
```{r}
20412042
#' Extract the m/z - retention time area for features
@@ -2075,10 +2076,10 @@ cols[iso_idx[[1]]] <- "#ff0000ff"
20752076
plotSpectra(serine_ms1_2, col = cols, lwd = 2)
20762077
```
20772078

2078-
While in the example above were specifically looking for potential isotopes of a
2079-
single, selected, mass peak (by setting `seedMz` to the *m/z* value of that
2080-
peak), we could also use `isotopologues` to identify all potential isotope
2081-
groups in a spectrum.
2079+
While in the example above we were specifically looking for potential isotopes
2080+
of a single, selected, mass peak (by setting `seedMz` to the *m/z* value of that
2081+
peak), we could also use `isotopologues` without specifying `seedMz` to identify
2082+
all potential isotope groups in a spectrum.
20822083

20832084
```{r}
20842085
#' Identify all potential isotope peaks in the MS1 spectrum
@@ -2151,7 +2152,7 @@ space from an LC-MS experiment.
21512152

21522153
We below subset the data to the first sample and visualize the identified
21532154
chromatographic peaks in the *m/z* - retention time plane using the
2154-
`plotChromPeaks` function already used before.
2155+
`plotChromPeaks` function that we used already before.
21552156

21562157
```{r, fig.cap = "Position of identified chromatographic peaks in the first sample."}
21572158
#' Plot identified chromatographic peaks in the first sample
@@ -2267,18 +2268,18 @@ particular how to adapt peak detection setting on a rather noisy
22672268
*chromatographic* data. Below we load the example data from a text file.
22682269

22692270
```{r peaks-load}
2270-
data <- read.table(
2271+
cdata <- read.table(
22712272
system.file("txt", "chromatogram.txt", package = "xcmsTutorials"),
22722273
sep = "\t", header = TRUE)
2273-
head(data)
2274+
head(cdata)
22742275
```
22752276

22762277
Our data has two columns, one with *retention times* and one with
22772278
*intensities*. We can now create a `Chromatogram` object from that and plot the
22782279
data.
22792280

22802281
```{r peaks-plot, fig.width = 12, fig.height = 2.15}
2281-
chr <- Chromatogram(rtime = data$rt, intensity = data$intensity)
2282+
chr <- Chromatogram(rtime = cdata$rt, intensity = cdata$intensity)
22822283
par(mar = c(2, 2, 0, 0))
22832284
plot(chr)
22842285
```

0 commit comments

Comments
 (0)