I tried modifying the code to adapt the convertArray method in combineArrays.R for EPICv2.
During this process, I discovered 2 bugs and attempted to fix them.
I have already submitted a pull request.
Would you mind reviewing it?
- Code Modification for EPICv2 Compatibility
The convertArray method has been modified to support the EPICv2 chip.
The annotation settings for rgSet are now consistent with the current devel version of minfi in utils.R and read.meth.R. The idat files can be read in the same way as previously done with read.metharray.exp, without any special treatment needed. Specifically, the annotations are set as follows:
annotation(rgSet)["array"] = "IlluminaHumanMethylationEPICv2"
annotation(rgSet)["annotation"] = "20a1.hg38"
The conversion process remains the same and does not require special treatment:
rgSet_EPICv1 <- convertArray(rgSet, outType = "IlluminaHumanMethylationEPIC")
Note: Only the manifest file from mwsill/IlluminaHumanMethylationEPICv2manifest (https://github.com/mwsill/IlluminaHumanMethylationEPICv2manifest) has been tested. This file has already de-duplicated the EPICv2 probes, as detailed in the repository's README. Therefore, my modifications do not include code for handling duplicated probes.
- Bug : Logical Issue in Code
Problem:
While i was testing, a logical issue in the code was discovered, causing some probe signal values to be incorrect. The original code replaced EPICv2 addresses in rownames(rgSet) with corresponding EPICv1 addresses based on probe types (I, II, SnpI, SnpII, Control) sequentially. This led to issues such as duplicate addresses and incorrect replacements, causing errors in Beta value calculations for some probes.
For example:
(1)
First, the replacement for Type I probes is executed. Among the Type I probes, there is an EPICv2 address of 13719409, which corresponds to an EPICv1 address of 13773183. After the replacement, there will be two entries of 13773183 in rownames(rgSet): one is the converted result from the EPICv2 address 13719409, and the other is the original EPICv2 address 13773183 for a Type II probe.
(2)
Next, the replacement for Type II probes is executed. At this point, the EPICv2 address 13773183 in rownames(rgSet) needs to be replaced with its corresponding EPICv1 address 2608371. However, as shown in step (1), there are two entries of 13773183, and both of them will be replaced with the address 2608371.
(3)
As a result, the EPICv2 Type I probe address 13719409 has been incorrectly replaced. The corresponding EPICv1 address should be 13773183, but it has been changed to 2608371. Additionally, the EPICv2 Type II probe address 13773183 has also been correctly replaced with the EPICv1 address 2608371.
(4)
There are duplicate values in rownames(rgSet). After the "# Update rgSet" step, only unique rownames(rgSet) will remain. I'm not uncertain about what rule is followed to determine which entries to keep, but in this example, the Green and Red signal values associated with the EPICv2 Type II probe address 13773183 are actually the values for the EPICv2 Type I probe address 13719409.
Ultimately, this leads to errors in the Beta values calculated by methods such as getBeta, preprocessRaw, and other preprocess methods that call preprocessRaw for some probes.
Fix:
Instead of replacing rownames(rgSet) sequentially for each probe type, all replacements should be done in one go after confirming the conversion relationships for all probes.
I tried modifying the code to adapt the convertArray method in combineArrays.R for EPICv2.
During this process, I discovered 2 bugs and attempted to fix them.
I have already submitted a pull request.
Would you mind reviewing it?
The convertArray method has been modified to support the EPICv2 chip.
The annotation settings for rgSet are now consistent with the current devel version of minfi in utils.R and read.meth.R. The idat files can be read in the same way as previously done with read.metharray.exp, without any special treatment needed. Specifically, the annotations are set as follows:
The conversion process remains the same and does not require special treatment:
Note: Only the manifest file from mwsill/IlluminaHumanMethylationEPICv2manifest (https://github.com/mwsill/IlluminaHumanMethylationEPICv2manifest) has been tested. This file has already de-duplicated the EPICv2 probes, as detailed in the repository's README. Therefore, my modifications do not include code for handling duplicated probes.
Problem:
While i was testing, a logical issue in the code was discovered, causing some probe signal values to be incorrect. The original code replaced EPICv2 addresses in rownames(rgSet) with corresponding EPICv1 addresses based on probe types (I, II, SnpI, SnpII, Control) sequentially. This led to issues such as duplicate addresses and incorrect replacements, causing errors in Beta value calculations for some probes.
For example:
(1)
First, the replacement for Type I probes is executed. Among the Type I probes, there is an EPICv2 address of 13719409, which corresponds to an EPICv1 address of 13773183. After the replacement, there will be two entries of 13773183 in rownames(rgSet): one is the converted result from the EPICv2 address 13719409, and the other is the original EPICv2 address 13773183 for a Type II probe.
(2)
Next, the replacement for Type II probes is executed. At this point, the EPICv2 address 13773183 in rownames(rgSet) needs to be replaced with its corresponding EPICv1 address 2608371. However, as shown in step (1), there are two entries of 13773183, and both of them will be replaced with the address 2608371.
(3)
As a result, the EPICv2 Type I probe address 13719409 has been incorrectly replaced. The corresponding EPICv1 address should be 13773183, but it has been changed to 2608371. Additionally, the EPICv2 Type II probe address 13773183 has also been correctly replaced with the EPICv1 address 2608371.
(4)
There are duplicate values in rownames(rgSet). After the "# Update rgSet" step, only unique rownames(rgSet) will remain. I'm not uncertain about what rule is followed to determine which entries to keep, but in this example, the Green and Red signal values associated with the EPICv2 Type II probe address 13773183 are actually the values for the EPICv2 Type I probe address 13719409.
Ultimately, this leads to errors in the Beta values calculated by methods such as getBeta, preprocessRaw, and other preprocess methods that call preprocessRaw for some probes.
Fix:
Instead of replacing rownames(rgSet) sequentially for each probe type, all replacements should be done in one go after confirming the conversion relationships for all probes.