The R Validation Hub was formed in 2018 by the PSI AIMS Special Interest Group and is supported by the R Consortium. We work closely with many neighboring efforts in the pharmaceutical space and broader R ecosystem.
-
-
-
Mission
-
Our mission is to leverage the open-source and collaborative nature of R while supporting its adoption within the biopharmaceutical setting.
-
Since our inception, we have worked on developing several resources in hopes of their utilization industry-wide. Our solutions for the validation of R packages assist your work by quantifying the “risk” of R packages with several meaningful metrics (see {riskmetric}) and providing a user-friendly, full-fledged R Shiny app as a central hub to gauge the “risk” of packages for your organization (see {riskassessment}). These solutions are useful for a variety of roles, like individual contributors who are curious about a package’s risk or an organization’s administrator(s) for Posit Package Manager, for example.
-
While the R Validation Hub continues to evolve, we are enthusiastic about fostering enriching discussions through our community meetings and extending our presence at conferences or internally at organizations to progress the use and acceptance of R in our industry.
Our community meetings are an initiative we began in 2023 to help foster stronger cross-industry connections for those working in biopharma and champion the use of R in the industry. Each meeting centers around a particular discussion topic, such as approaches for assessing R packages’ risk, updates about our R Validation Hub tools, or implementation of GxP R environments within companies.
-
-
-
When are Community Meetings?
-
Traditionally, these meetings take place every three months. For updates on when the next community meeting is taking place, join our mailing list or watch out for announcements of the meetings on the R Consortium LinkedIn page. As this initiative gains greater momentum, the goal is to make these meetings more frequent.
-
-
-
Past Meetings
-
If you missed a past meeting or would like to engage with a past meeting’s content, you can find a list of them below:
November 28, 2023 - Wrapping Up 2023 and Welcoming 2024 (slides/recording)
-
August 09, 2023 - Risk Metric Application and Risk Score – Mini Series Part 1 (GitHub folder)
-
June 27, 2023 - Learnings & Reflections from Case Studies (GitHub folder/slides)
-
-
diff --git a/content/community-meetings.md b/content/community-meetings.md
new file mode 100644
index 0000000..19c7fb8
--- /dev/null
+++ b/content/community-meetings.md
@@ -0,0 +1,46 @@
+### About Community Meetings
+
+Our community meetings are an initiative we began in 2023 to help foster
+stronger cross-industry connections for those working in biopharma and
+champion the use of R in the industry. Each meeting centers around a
+particular discussion topic, such as approaches for assessing R
+packages’ risk, updates about our R Validation Hub tools, or
+implementation of GxP R environments within companies.
+
+### When are Community Meetings?
+
+Traditionally, these meetings take place every three months. For updates
+on when the next community meeting is taking place, join our [mailing
+list](https://lists.r-consortium.org/g/RConsortium-Validation-Hub) or
+watch out for announcements of the meetings on the R Consortium LinkedIn
+page. As this initiative gains greater momentum, the goal is to make
+these meetings more frequent.
+
+### Past Meetings
+
+If you missed a past meeting or would like to engage with a past
+meeting’s content, you can find a list of them below:
+
+- May 27, 2025 - *Validating In-House R Packages* ([GitHub
+ folder](https://github.com/pharmaR/events/tree/main/community_meetings/2025-05-27)/[slides](https://github.com/pharmaR/events/blob/main/community_meetings/2025-05-27/RvalHub_2025May_Community_meeting.pdf))
+- February 25, 2025 - *Shiny App Validation in Regulatory Submissions*
+ ([GitHub
+ folder](https://github.com/pharmaR/events/tree/main/community_meetings/2025-02-25))
+- November 26, 2024 - *Navigating Programming Language Transitions in
+ Pharma* ([GitHub
+ folder](https://github.com/pharmaR/events/tree/main/community_meetings/2024-11-26))
+- August 20, 2024 - *Analyzing Change in Assessed Risk Across Package
+ Releases* ([GitHub
+ folder](https://github.com/pharmaR/events/tree/main/community_meetings/2024-08-20))
+- May 21, 2024 - *Tackling Hurdles: Embracing Open-Source Packages in
+ Projects* ([GitHub
+ folder](https://github.com/pharmaR/events/tree/main/community_meetings/2024-05-21)/[slides](https://github.com/pharmaR/events/blob/main/community_meetings/2024-05-21/RvalHub_2024May_Community_meeting.pdf)/[notes](https://github.com/pharmaR/events/blob/main/community_meetings/2024-05-21/21MAY2024_community_meeting_notes.md))
+- February 03, 2024 - *Unraveling the Term “Validation”* ([GitHub
+ folder](https://github.com/pharmaR/events/tree/main/community_meetings/2024-02-20)/[slides](https://github.com/pharmaR/events/blob/main/community_meetings/2024-02-20/RvalHub_2024Feb_Community_meeting.pdf)/[notes](https://github.com/pharmaR/events/blob/main/community_meetings/2024-02-20/RvalHub_2024Feb_Community_meeting_notes.txt))
+- November 28, 2023 - *Wrapping Up 2023 and Welcoming 2024*
+ ([slides](https://docs.google.com/presentation/d/1iq2HzcjVYGnR5sf-rgLZzxnGwMnrDlJJNJmziwfX3j0/edit#slide=id.g1e185ace86d_0_153)/[recording](https://www.youtube.com/watch?v=_r5x-baN76E&t=3s))
+- August 09, 2023 - *Risk Metric Application and Risk Score – Mini
+ Series Part 1* ([GitHub
+ folder](https://github.com/pharmaR/events/tree/main/community_meetings/2023-08-09_riskmetric_riskassessment_series))
+- June 27, 2023 - *Learnings & Reflections from Case Studies* ([GitHub
+ folder](https://github.com/pharmaR/events/tree/main/community_meetings/2023-06-27)/[slides](https://github.com/pharmaR/events/blob/main/community_meetings/2023-06-27/community_mtg_6_27_2023.pdf))
diff --git a/content/contact.html b/content/contact.html
deleted file mode 100644
index 656b469..0000000
--- a/content/contact.html
+++ /dev/null
@@ -1,19 +0,0 @@
----
-title: "Contact Us"
----
-
-
-
-
-
Stay up-to-date and join our mailing list
-
For occasional news updates, you can join our mailing
-list. This will
-enable you and your colleagues to receive notifications of key updates/software
-releases, blog posts and invites to all-hands Hub meetings.
For occasional news updates, you can join our mailing
-list.
-This will enable you and your colleagues to receive notifications of key
-updates/software releases, our latest news and blog posts, and invites
-to all-hands Hub meetings.
-
Once you’ve joined, you can subscribe to our
-calendar to keep tabs on
-all of the R Validation Hub’s all-hands meetings.
-
-
-
-
Contribute to a Working Group
-
If you are willing and able to commit more time to the R Validation
-Hub’s mission, then there are several options depending on your skills
-and interests. If you do not feel you have the necessary technical
-skills to contribute then please don’t let that stop you from getting in
-touch and sharing ideas to support the implementation of a risk-based R
-package assessment.
-
In each case you can get involved either by contacting the stream leads
-via the relevant GitHub page, or by sending a message to us via the
-contact form on our Contact Us page.
-
-
The Communications Workstream
-
Check back later!
-
The communications workstream is our newest endeavor. We’ve been
-fortunate to have a lot of interest in these roles. Please check back
-later to see if the team is looking to grow.
The {riskmetric} package collects metrics that support the idea of a
-risk-based evaluation of R packages. The package collects and summarizes
-various metrics and provides the means to quantify the risk of a package
-via a weighted risk score. It also serves as the backbone of the Risk
-Assessment Shiny application. For more information see the package
-site or GitHub
-page.
-
Current Focus
-
The {riskmetric} team is currently looking to build features to support
-the idea of package cohorts. We are also looking to refine some of the
-meta-information collected by riskmetric to better serve the risk
-assessment application. The team also has a backlog of other open
-issues, including
-additional metrics that they would like to explore.
-
Key Skills Required
-
A fundamental understanding of R packages is essential for anyone
-looking to support the {riskmetric} team. Package development experience
-is also highly beneficial.
The risk assessment Shiny app is an extension of the {riskmetric} R
-package and provides a graphical interface to the {riskmetric}
-functionality. It provides further exploratory capabilities in addition
-to the numeric metrics and improves ease-of-use for organizations to
-adopt a risk-based approach to R package validation. For more
-information see the package GitHub
-page.
-
Current Focus
-
In addition to working with the {riskmetric} team to better support
-cohorts of packages and multiple package versions, the team is building
-up metric and package exploration features. This includes links to
-source information such as vignettes, and the ability to explore package
-documentation and tests.
-
Key Skills Required
-
Shiny app development experience is highly desirable; specifically,
-experience with Shiny modules and the {golem} package would be
-beneficial but is not required.
-
-
-
The Regulatory R Package Repository
-
The regulatory R package repository is currently working on
-minimal-viable products for a number of key features. Progress is
-tracked on our GitHub
-page, where you can
-find information for getting
-involved.
-
-
-
-
-
How else can I help?
-
If you have tried to implement any of the R Validation Hub’s ideas, or
-used any of our tools, or even if you’ve decided to go own way, then we
-would love you to submit a case study describing what you have done to
-implement a GxP-compliant R environment to our Case
-Study catalogue on GitHub. Use
-our mailing list or GitHub issues to let us know of your intentions and
-we’ll work with you to upload your written case study. We’re also
-pleased to share case studies verbally with the wider community. You can
-find videos of previous presentations on our case
-studies page.
-
If you still don’t feel that you can contribute then please consider
-one of our partner initiatives.
-
diff --git a/content/contribute.Rmd b/content/contribute.md
similarity index 100%
rename from content/contribute.Rmd
rename to content/contribute.md
diff --git a/content/general-guidances.html b/content/general-guidances.html
deleted file mode 100644
index 1ac5594..0000000
--- a/content/general-guidances.html
+++ /dev/null
@@ -1,7 +0,0 @@
----
-title: "General Guidances"
----
-
-
-
-
diff --git a/content/general-guidances.Rmd b/content/general-guidances.md
similarity index 100%
rename from content/general-guidances.Rmd
rename to content/general-guidances.md
diff --git a/content/glossary.Rmd b/content/glossary.Rmd
deleted file mode 100644
index ff078a4..0000000
--- a/content/glossary.Rmd
+++ /dev/null
@@ -1,9 +0,0 @@
----
-title: "Glossary"
----
-
-| Term | Definition |
-|--------------------|----------------------------------------------------|
-| **Open Source Software** | Software or language designed for collaboration that has source code available for community review, usage, and modification |
-| **System Qualification** | The process including the execution of Installation, Operational and Performance tests on a system that ensures the reproducibility of expected results. |
-| **Validation** | *Per the FDA's Glossary of Computer System Software Development Terminology* -"Establishing documented evidence which provides a high degree of assurance (accuracy) that a specific process consistently (reproducibility) produces a product meeting its predetermined specifications (traceability) and quality attributes." |
diff --git a/content/glossary.html b/content/glossary.html
deleted file mode 100644
index 2e2736d..0000000
--- a/content/glossary.html
+++ /dev/null
@@ -1,32 +0,0 @@
----
-title: "Glossary"
----
-
-
-
-
-
-
-
-
-
-
-
Term
-
Definition
-
-
-
-
-
Open Source Software
-
Software or language designed for collaboration that has source code available for community review, usage, and modification
-
-
-
System Qualification
-
The process including the execution of Installation, Operational and Performance tests on a system that ensures the reproducibility of expected results.
-
-
-
Validation
-
Per the FDA’s Glossary of Computer System Software Development Terminology -“Establishing documented evidence which provides a high degree of assurance (accuracy) that a specific process consistently (reproducibility) produces a product meeting its predetermined specifications (traceability) and quality attributes.”
-
-
-
diff --git a/content/glossary.md b/content/glossary.md
new file mode 100644
index 0000000..7ecc3f9
--- /dev/null
+++ b/content/glossary.md
@@ -0,0 +1,9 @@
+---
+title: "Glossary"
+---
+
+| Term | Definition |
+|--------------------------|----------------------------------------------|
+| **Open Source Software** | Software or language designed for collaboration that has source code available for community review, usage, and modification |
+| **System Qualification** | The process including the execution of Installation, Operational and Performance tests on a system that ensures the reproducibility of expected results. |
+| **Validation** | *Per the FDA's Glossary of Computer System Software Development Terminology* -"Establishing documented evidence which provides a high degree of assurance (accuracy) that a specific process consistently (reproducibility) produces a product meeting its predetermined specifications (traceability) and quality attributes." |
diff --git a/content/looking-forward.Rmd b/content/looking-forward.Rmd
deleted file mode 100644
index bafe04c..0000000
--- a/content/looking-forward.Rmd
+++ /dev/null
@@ -1,5 +0,0 @@
----
-title: "Looking Forward"
----
-
-{width="432"}
diff --git a/content/looking-forward.html b/content/looking-forward.html
deleted file mode 100644
index 958e265..0000000
--- a/content/looking-forward.html
+++ /dev/null
@@ -1,7 +0,0 @@
----
-title: "Looking Forward"
----
-
-
-
-
The R Validation Hub champions a level structure, comprised of many workstreams that contribute toward their focus areas and activities. Find each workstream’s duties below as well as the members that support them.
-
If you wish to contribute to any of our efforts, please visit our Contribute page.
-
-
-
Strategic Workstream
-
The Strategic Workstream has the responsibility of aligning on longer-term strategic goals. The intention is that this workstream is to become a floating set of representatives from each of our workstreams.
-
-
-
-
Communications Workstream
-
The Communications Workstream focuses on how we build connections across the R world, specifically with our neighboring initiatives: the R Consortium, PhUSE, PSI AIMS, the R Submissions Working Group, and ROpenSci. This working group also supports the effort to make the R Validation Hub more intentional with how we organize ourselves and look into best means to disseminate information.
-
-
-
-
Repositories Workstream
-
The Repositories Workstream supports a transparent, cross-industry approach of establishing and maintaining a repository of validated R packages.
-
-
-
-
{riskassessment} Application Workstream
-
The {riskassessment} Workstream leads the development of the risk assessment Shiny app (an extension of the {riskmetric} package) that provides a graphical interface to the package’s functionality.
-
-
-
-
{riskmetric} Workstream
-
The {riskmetric} Workstream leads the development of the {riskmetric} package that supports the idea of risk-based evaluation of R packages.
-
-
-
-
Previous Members
-
Andy Nicholls (Chair 2018-2022)
-
Lyn Taylor, Paulo Bargo, Marly Gotti, Yilong Zhang, Keaven Anderson, Min Lee, Parker Sims
-
diff --git a/content/members-and-workstreams.md b/content/members-and-workstreams.md
new file mode 100644
index 0000000..99aeb0a
--- /dev/null
+++ b/content/members-and-workstreams.md
@@ -0,0 +1,59 @@
+### Our Structure
+
+The R Validation Hub champions a level structure, comprised of many
+workstreams that contribute toward their focus areas and activities.
+Find each workstream’s duties below as well as the members that support
+them.
+
+*If you wish to contribute to any of our efforts, please visit our
+[Contribute](https://www.pharmar.org/contribute/) page.*
+
+### Strategic Workstream
+
+The **Strategic Workstream** has the responsibility of aligning on
+longer-term strategic goals. The intention is that this workstream is to
+become a floating set of representatives from each of our workstreams.
+
+
+
+### Communications Workstream
+
+The **Communications Workstream** focuses on how we build connections
+across the R world, specifically with our neighboring initiatives: the R
+Consortium, PhUSE, PSI AIMS, the R Submissions Working Group, and
+ROpenSci. This working group also supports the effort to make the R
+Validation Hub more intentional with how we organize ourselves and look
+into best means to disseminate information.
+
+
+
+### Repositories Workstream
+
+The **Repositories Workstream** supports a transparent, cross-industry
+approach of establishing and maintaining a repository of validated R
+packages.
+
+
+
+### {riskassessment} Application Workstream
+
+The **{riskassessment} Workstream** leads the development of the risk
+assessment Shiny app (an extension of the {riskmetric} package) that
+provides a graphical interface to the package’s functionality.
+
+
+
+### {riskmetric} Workstream
+
+The **{riskmetric} Workstream** leads the development of the
+{riskmetric} package that supports the idea of risk-based evaluation of
+R packages.
+
+
+
+### Previous Members
+
+*Andy Nicholls* (Chair 2018-2022)
+
+*Lyn Taylor*, *Paulo Bargo*, *Marly Gotti*, *Yilong Zhang*, *Keaven
+Anderson*, *Min Lee*, *Parker Sims*
diff --git a/content/members-and-workstreams_files/figure-markdown_strict/chair-rconsortium-1.png b/content/members-and-workstreams_files/figure-markdown_strict/chair-rconsortium-1.png
new file mode 100644
index 0000000..cd9139e
Binary files /dev/null and b/content/members-and-workstreams_files/figure-markdown_strict/chair-rconsortium-1.png differ
diff --git a/content/members-and-workstreams_files/figure-markdown_strict/comm-ws-1.png b/content/members-and-workstreams_files/figure-markdown_strict/comm-ws-1.png
new file mode 100644
index 0000000..11b0167
Binary files /dev/null and b/content/members-and-workstreams_files/figure-markdown_strict/comm-ws-1.png differ
diff --git a/content/members-and-workstreams_files/figure-markdown_strict/repo-ws-1.png b/content/members-and-workstreams_files/figure-markdown_strict/repo-ws-1.png
new file mode 100644
index 0000000..c02facf
Binary files /dev/null and b/content/members-and-workstreams_files/figure-markdown_strict/repo-ws-1.png differ
diff --git a/content/members-and-workstreams_files/figure-markdown_strict/riskassessment-ws-1.png b/content/members-and-workstreams_files/figure-markdown_strict/riskassessment-ws-1.png
new file mode 100644
index 0000000..4f61bd6
Binary files /dev/null and b/content/members-and-workstreams_files/figure-markdown_strict/riskassessment-ws-1.png differ
diff --git a/content/members-and-workstreams_files/figure-markdown_strict/riskmetric-ws-1.png b/content/members-and-workstreams_files/figure-markdown_strict/riskmetric-ws-1.png
new file mode 100644
index 0000000..404604c
Binary files /dev/null and b/content/members-and-workstreams_files/figure-markdown_strict/riskmetric-ws-1.png differ
diff --git a/content/minutes.Rmarkdown b/content/minutes.Rmd
similarity index 99%
rename from content/minutes.Rmarkdown
rename to content/minutes.Rmd
index 3a7b087..6b4dbdd 100644
--- a/content/minutes.Rmarkdown
+++ b/content/minutes.Rmd
@@ -17,7 +17,7 @@ The R Validation Hub operates in the public domain and all meeting minutes are m
[GitHub Link](https://github.com/pharmaR/communications/tree/main/minutes)
```{r minutes, echo = FALSE, results = "asis", message = FALSE, warning = FALSE}
-
+
# library(dplyr)
# library(stringr)
# library(lubridate)
diff --git a/content/minutes.html b/content/minutes.html
deleted file mode 100644
index f31bbc5..0000000
--- a/content/minutes.html
+++ /dev/null
@@ -1,421 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-Meeting Minutes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Meeting Minutes
-
-
-
-
-
The R Validation Hub operates in the public domain and all meeting
-minutes are made available here. Workstreams meet routinely and compile
-their minutes at their respective links below.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/content/minutes.md b/content/minutes.md
new file mode 100644
index 0000000..4891957
--- /dev/null
+++ b/content/minutes.md
@@ -0,0 +1,18 @@
+The R Validation Hub operates in the public domain and all meeting
+minutes are made available here. Workstreams meet routinely and compile
+their minutes at their respective links below.
+
+### Executive Meeting Minutes
+
+[GitHub
+Link](https://github.com/pharmaR/pharmaR/issues?q=is%3Aissue+is%3Aopen+label%3Aminutes)
+
+### `{riskassessment}` Application Workstream Minutes
+
+[GitHub
+Link](https://github.com/pharmaR/riskassessment/labels/Meeting%20Minutes)
+
+### Communications Workstream Minutes
+
+[GitHub
+Link](https://github.com/pharmaR/communications/tree/main/minutes)
diff --git a/content/old-pages/casestudies.markdown b/content/old-pages/casestudies.markdown
index b4f0e37..f1be372 100644
--- a/content/old-pages/casestudies.markdown
+++ b/content/old-pages/casestudies.markdown
@@ -15,12 +15,10 @@ The following videos are taken from our three-part meeting series in 2022. The
-
*Part 2*
-
*Part 1*
diff --git a/content/origins-and-mission.html b/content/origins-and-mission.html
deleted file mode 100644
index 46ceed5..0000000
--- a/content/origins-and-mission.html
+++ /dev/null
@@ -1,16 +0,0 @@
----
-title: "Our Origins & Mission"
----
-
-
-
-
-
Origins
-
The R Validation Hub was formed in 2018 by the PSI AIMS Special Interest Group and is supported by the R Consortium. We work closely with many neighboring efforts in the pharmaceutical space and broader R ecosystem.
-
-
-
Mission
-
Our mission is to leverage the open-source and collaborative nature of R while supporting its adoption within the biopharmaceutical setting.
-
Since our inception, we have worked on developing several resources in hopes of their utilization industry-wide. Our solutions for the validation of R packages assist your work by quantifying the “risk” of R packages with several meaningful metrics (see {riskmetric}) and providing a user-friendly, full-fledged R Shiny app as a central hub to gauge the “risk” of packages for your organization (see {riskassessment}). These solutions are useful for a variety of roles, like individual contributors who are curious about a package’s risk or an organization’s administrator(s) for Posit Package Manager, for example.
-
While the R Validation Hub continues to evolve, we are enthusiastic about fostering enriching discussions through our community meetings and extending our presence at conferences or internally at organizations to progress the use and acceptance of R in our industry.
-
diff --git a/content/origins-and-mission.Rmd b/content/origins-and-mission.md
similarity index 100%
rename from content/origins-and-mission.Rmd
rename to content/origins-and-mission.md
diff --git a/content/participating-orgs.Rmd b/content/participating-orgs.Rmd
index 6e3f5f8..b39925d 100644
--- a/content/participating-orgs.Rmd
+++ b/content/participating-orgs.Rmd
@@ -16,12 +16,11 @@ library(dplyr)
```
```{r get_photos, echo=FALSE}
-list_of_logos <- sort(list.files("~/pharmaR.github.io/content/participating_orgs_photos/"))
-logo_path_list <- c()
-for (logo in 1:length(list_of_logos)) {
- logo_path <- paste0("~/pharmaR.github.io/content/participating_orgs_photos/", list_of_logos[logo])
- logo_path_list <- c(logo_path_list, logo_path)
-}
+logo_path_list <- list.files(
+ "content/participating_orgs_photos",
+ recursive = TRUE,
+ full.names = TRUE
+)
```
The R Validation Hub is comprised of participants from across the pharmaceutical industry. Participants contribute to the effort through our regular group meetings, as well as supporting the various workstreams that make up the project. Feel free to explore [contribution opportunities](https://www.pharmar.org/contribute) and subscribe to [our mailing list](https://lists.r-consortium.org/g/RConsortium-Validation-Hub/) to stay up-to-date on our progress.
@@ -32,17 +31,16 @@ The R Validation Hub is comprised of participants from across the pharmaceutical
```{r photo-row-func, echo=FALSE, fig.width = 11}
photo_row_function_logos <- function(path_list) {
- list <- list()
- for (org in 1:length(logo_path_list)) {
- logo_path <- logo_path_list[org]
- p <-
- ggdraw() +
- draw_image(logo_path, scale = .95)
-
- list[[org]] <- p
+ grobs <- list()
+
+ for (org in seq_along(logo_path_list)) {
+ logo_path <- logo_path_list[org]
+ p <- ggdraw() + draw_image(logo_path, scale = .95)
+ grobs[[org]] <- p
}
-grid.arrange(grobs=c(list), nrow = 6, ncol = 9)
+ grobs <- Filter(Negate(is.null), grobs)
+ grid.arrange(grobs = grobs, nrow = 6, ncol = 9)
}
photo_row_function_logos(logo_path_list)
diff --git a/content/participating-orgs.html b/content/participating-orgs.html
deleted file mode 100644
index 33e33c2..0000000
--- a/content/participating-orgs.html
+++ /dev/null
@@ -1,10 +0,0 @@
----
-title: "Participating Organizations"
----
-
-
-
-
The R Validation Hub is comprised of participants from across the pharmaceutical industry. Participants contribute to the effort through our regular group meetings, as well as supporting the various workstreams that make up the project. Feel free to explore contribution opportunities and subscribe to our mailing list to stay up-to-date on our progress.
-
-
If you are a member and your organization does not appear on this list, please let us know!
-
diff --git a/content/participating-orgs.md b/content/participating-orgs.md
new file mode 100644
index 0000000..663ebb9
--- /dev/null
+++ b/content/participating-orgs.md
@@ -0,0 +1,15 @@
+The R Validation Hub is comprised of participants from across the
+pharmaceutical industry. Participants contribute to the effort through
+our regular group meetings, as well as supporting the various
+workstreams that make up the project. Feel free to explore [contribution
+opportunities](https://www.pharmar.org/contribute) and subscribe to [our
+mailing
+list](https://lists.r-consortium.org/g/RConsortium-Validation-Hub/) to
+stay up-to-date on our progress.
+
+------------------------------------------------------------------------
+
+*If you are a member and your organization does not appear on this list,
+please let us know!*
+
+
diff --git a/content/participating-orgs_files/figure-markdown_strict/photo-row-func-1.png b/content/participating-orgs_files/figure-markdown_strict/photo-row-func-1.png
new file mode 100644
index 0000000..76e8943
Binary files /dev/null and b/content/participating-orgs_files/figure-markdown_strict/photo-row-func-1.png differ
diff --git a/content/partner-initiatives.html b/content/partner-initiatives.html
deleted file mode 100644
index 4455645..0000000
--- a/content/partner-initiatives.html
+++ /dev/null
@@ -1,48 +0,0 @@
----
-title: "Partner Initiatives"
----
-
-
-
-
We have partner initiatives with three main entities: the R Consortium, PHUSE, and Statisticians in the Pharmaceutical Industry (PSI).
-
-
-
The R Validation Hub is an R Consortium Working Group. The R Consortium supports several related working groups. These working groups have a similar overall objective (to support the use of R within the biopharmaceutical industry) but the working groups are complementary: each has its own distinct deliverables; and many of the R Validation Hub’s members are also members of the other working groups.
-
-
-
Submission-Focused Working Groups
-
-
R Tables for Regulatory Submission (RTRS) Working Group
-
Once results have been generated, they are typically formatted and sent to medical writing teams to include in a Clinical Study Report. The goal of the R Tables for Regulatory Submission (RTRS) working group is to create standards for creating tables that meet the requirements of FDA submission documents, and hence enhance the suitability of R for FDA submissions.
-
-
-
Submissions Working Group
-
The final step in the data journey is to package up the results in order to share them with health authorities. The pharmaceutical submission process includes various IT and platform challenges which are the focus of the Submissions Working Group.
-
-
-
-
-
R Conferences
-
The R Validation Hub was started by the PSI AIMS Special Interest Group. However, support for the initiative grew rapidly following the inaugural R/Pharma Conference in Cambridge, Boston area. The R/Pharma Conference continues to grow in popularity and serves as a breeding ground for new ideas an initiatives relating to R.
-
The R/Medicine conference is also growing in popularity with a strong participation from R Validation Hub participants.
-
-
-
-
-
Clinical Statistical Reporting in a Multilingual World
-
A Phuse initiative called, ‘Clinical Statistical Reporting in a Multilingual World’ started in 2020. As with the R Consortium initiatives, this Phuse initiative maintains a close connection with the R Validation Hub. As they put it on their GitHub page, “Subtle differences exist between the fundamental approaches implemented by each language, yielding differences in results which are each correct in their own right.” One of the specific aims of this initiative is to “provide a framework for assessing the fundamental differences for a particular statistical analysis across languages”.
-
-
-
R Package Validation Framework
-
The R Validation Hub is focused on assessing and managing risk for public R packages. The Phuse R Package Validation Framework is targeted at those developing packages. The initiative aims to deliver a white paper and R package. As described in the project scope:
-
-
“The White Paper will serve as a reference for industry on how to perform validation for user-contributed extensions of programming software. It will detail the elements that need to be met for the extension to be validated, and ways to document the process in a reusable, efficient, and shareable fashion.”
-
“The R package will be developed to provide the tools and guidance for validating an R package. It will show how to take advantage of the tools that exist in the R language, and it will be based on the White Paper to ensure the baseline requirements are achieved”
diff --git a/content/partner-initiatives.Rmd b/content/partner-initiatives.md
similarity index 95%
rename from content/partner-initiatives.Rmd
rename to content/partner-initiatives.md
index 5054753..81fea4d 100644
--- a/content/partner-initiatives.Rmd
+++ b/content/partner-initiatives.md
@@ -6,7 +6,7 @@ title: "Partner Initiatives"
------------------------------------------------------------------------
-{width="180" height="41"}
+
The R Validation Hub is an [R Consortium Working Group](https://www.r-consortium.org/projects/isc-working-groups). The R Consortium supports several related working groups. These working groups have a similar overall objective (to support the use of R within the biopharmaceutical industry) but the working groups are complementary: each has its own distinct deliverables; and many of the R Validation Hub's members are also members of the other working groups.
@@ -30,7 +30,7 @@ The R Validation Hub was started by the [PSI AIMS Special Interest Group](https:
The [R/Medicine conference](https://events.linuxfoundation.org/r-medicine/) is also growing in popularity with a strong participation from R Validation Hub participants.
-# {width="78"}
+#
#### Clinical Statistical Reporting in a Multilingual World
@@ -43,7 +43,7 @@ The R Validation Hub is focused on assessing and managing risk for public R pack
- *"The White Paper will serve as a reference for industry on how to perform validation for user-contributed extensions of programming software. It will detail the elements that need to be met for the extension to be validated, and ways to document the process in a reusable, efficient, and shareable fashion."*
- *"The R package will be developed to provide the tools and guidance for validating an R package. It will show how to take advantage of the tools that exist in the R language, and it will be based on the White Paper to ensure the baseline requirements are achieved"*
-{width="70"}
+
#### AIMS SIG
diff --git a/content/posts/case-studies/Merck-External-R-Package-Qual.html b/content/posts/case-studies/Merck-External-R-Package-Qual.html
deleted file mode 100644
index f9a3171..0000000
--- a/content/posts/case-studies/Merck-External-R-Package-Qual.html
+++ /dev/null
@@ -1,199 +0,0 @@
----
-title: External R Package Qualification Implementation at Merck
-author: Uday Preetham Palukuru, Pawel Bernecki, Jane Liao, Yilong Zhang, Merck & Co., Inc., Kenilworth, NJ, USA
-date: '2022-09-21'
-slug: merck-case-study
-categories: [case studies]
-banner: 'img/banners/merck-ext-pkg-qual.png'
----
-
-
-
-
Introduction
-
There has been a growing interest in pharmaceutical industry to use R for clinical trial data analysis and reporting (A&R). Using R for regulatory submission purposes requires careful qualification of R packages given that the open-source packages differ in their quality of development. Many cross-industry initiatives including R Validation Hub and TransCelerate have published framework for qualifying R packages to be used in a regulatory setting (Nicholls, Bargo, & Sims, 2020) (Amoruccio, Lee, & Woodie, 2021). Our organization has been exploring the use of R in a regulatory setting for the past few years. A framework has been developed internally for qualifying external R packages that incorporates elements from both R Validation Hub and TransCelerate framework. This framework is currently being used to qualify both internally developed and externally sourced R packages for use in clinical trial A&R. In this document, we demonstrate this risk-based package qualification framework using the GGally R package. We provide the workflow as well as relevant details regarding the package qualification process used to qualify GGally as a moderate risk R package. We hope this inspires other organizations to use R in a regulatory setting as well as generate discussion to improve our existing framework.
-
-
Risk-Based Package Qualification Framework
-
The R package qualification framework deployed at Merck is based on validation as defined by FDA (The R Foundation for Statistical Computing c/o Institute for Statistics and Mathematics, 2021). The goal of the framework is to create documentation that contains qualification details of R package based on pre-specified criteria. The framework employs a risk-based strategy to qualify R packages based on the type of A&R deliverable being generated. The types of deliverables and their associated R package risk levels are shown in Table 1.
-
-
-
-
-
-
-
-
-
Type of Deliverables
-
Example
-
R Package Risk
-
-
-
-
-
External (electronic Common Technical Document (eCTD))
-
Clinical Study Report (CSR) and submission package;Drug labeling;Agency request
Data monitoring committee; Manuscript & publication (using clinical data); Internal committee review or presentation
-
Moderate or Low
-
-
-
Exploration/Within Department
-
Data Exploration; Data Quality Checks; Exploratory Analysis
-
Open, Moderate, or Low
-
-
-
-
Table 1. Examples of deliverables and respective risk categories of the R packages
-
-
The pre-specified criteria used to qualify a R package into the desired risk category are defined as:
-
-
c1: Package is developed and maintained by a trusted vendor.
-
c2: Package is user-facing with sufficient software development lifecycle (SDLC) evidence equivalent to internal SDLC requirement.
-
c3: Package is not user-facing and all packages dependent on this R package are qualified.
-
c4: Package is user-facing with additional internal work to complete necessary steps following internal SDLC requirement.
-
c5: Package maintained by a trusted person or organization.
-
-
An R package can be qualified under low risk category if it meets any of the first four criteria i.e., c1-c4. For moderate risk category the R package needs to meet any of the five criteria i.e., c1- c5. Any R package used for exploratory purposes and not qualified under either low-risk or moderate-risk category is categorized as open-risk. Any other R package deployed by users from external sources such as Comprehensive R Archive Network (CRAN) or other repositories is automatically categorized as open risk.
-
-
R Package Qualification Workflow
-
Within our organization a shared baseline strategy recommended by RStudio is followed to manage a reproducible R environment (RStudio, 2020). The defining characteristic of shared baseline strategy is that R package availability is tied to R installations using site-wide libraries. The use of scheduled updates to the site-wide libraries allows all users to use the same installed packages, thereby creating a baseline environment to share and re-run work. An R package within our organization is available through a regularly updated site-wide library installation called Global R library. The global R library is a set of directories containing installed R packages and their dependencies. There are 3 risk levels within the global R library corresponding to the risk category used in package qualification. The global R library is nested and independent, with all low-risk packages included in the moderate-risk library, and all moderate-risk packages included in the open-risk library. Our organization employs RStudio Package Manager (RSPM) as the R package repository server to host source code to install the packages in global R library. A high-level global R library update and package qualification workflow is summarized below (Figure 1).
-
-
-
Figure 1. Global R library update and package qualification workflow diagram
-
-
-
GGally Package Qualification
-
GGally is an R package that extends ggplot2 R package functionality by adding several functions to reduce the complexity of combining geometric objects (geoms) with transformed data (Schloerke, 2020). Based on a request to use the GGally package in a publication, the package was qualified under moderate risk category. The steps in the qualification process followed were:
-
-
Review package documentation to determine qualifying criteria. It was determined that this package can be qualified using the c2 and c5 criteria.
-
Perform a dry-run installation for global R library update, with GGally set as moderate risk.
-
Check installation log for errors / warning messages (Figure 2):
-
-
-
-
Figure 2. GGally dry run installation log snippet.
Figure 3. Code coverage statistics of GGally obtained from internal RSPM server.
-
-
-
-
Figure 4. GGally SDLC documentation from CRAN.
-
-
-
Cross-check against internal database (White List) containing trusted package authors / organizations (criterion c5). R Package Author White List is a list of trusted R package authors (person or organization) identified by our organization’s Subject Matter Experts (SME).
-
Run program using internally developed R package to generate the qualification document. The details included in the qualification document are as shown below:
-
-
-
Package Qualification – “GGally”
-
Qualification Overview
-
The purpose of this document is to demonstrate that GGally, when used in a qualified fashion, can support the appropriate regulatory requirements for validated systems, thus ensuring that resulting electronic records are “trustworthy, reliable and generally equivalent to paper records.”
-
-
-
-
Package
-
GGally
-
-
-
Risk level
-
moderate
-
-
-
Qualification date
-
2022-03-14
-
-
-
Qualification criteria
-
c2, c5
-
-
-
-
Package Information
-
-
-
-
-
-
-
-
package
-
GGally
-
-
-
version
-
2.1.2
-
-
-
author
-
Barret Schloerke [aut, cre], Di Cook [aut, ths], Joseph Larmarange [aut], Francois Briatte [aut], Moritz Marbach [aut], Edwin Thoen [aut], Amos Elberg [aut], Ott Toomet [ctb], Jason Crowley [aut], Heike Hofmann [ths], Hadley Wickham [ths]
-
-
-
maintainer
-
Barret Schloerke <schloerke@gmail.com>
-
-
-
license
-
GPL (>=2.0)
-
-
-
description
-
The R package ‘ggplot2’ is a plotting system based on the grammar of graphics. ‘GGally’ extends ‘ggplot2’ by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.
Criteria c2: There is sufficient evidence of publicly available software development lifecycle information, including authors, source code, test cases, release notes, and user guides. To qualify GGally, we reviewed and confirmed the R package GGally follows a proper software development lifecycle.
-
-
Each exported (user facing) function contains documentation.
-
The released version has a unique version number on CRAN.
After the qualification document is generated, panel compromising of a qualified statistician and a statistical programmer reviews the document for accuracy and validity. After the qualification was completed, GGally was included in the moderate risk category update for global R library formal installation.
-
-
-
Conclusion
-
A risk-based R package qualification process has been deployed at Merck to classify R packages based on generated A&R deliverables. This process has been automated using internally developed R package to both streamline the process as well as reduce any human errors. The qualification of GGally R package under moderate risk category using the qualification process, demonstrates the useability of the qualification framework for qualifying R packages in a regulatory setting. There is ongoing work to enhance the internally developed R package used in package qualification framework to further automate the process. We are also working on defining the pre-specified criteria used to qualify an organization or group of vendors as trusted source, thereby reducing the burden of qualifying individual packages.
-
-
References
-
Amoruccio, V. J., Lee, M., & Woodie, D. (2021). A TransCelerate Initiative – How Can You Modernize Your Statistical Environment. PharmaSUG 2021. SI-028. PharmaSUG. Retrieved April 11, 2022, from https://www.pharmasug.org/proceedings/2021/SI/PharmaSUG-2021-SI028.pdf Nicholls, A., Bargo, P. R., & Sims, J. (2020, January 23). A risk-based approach for assessing R package accuracy within a validated infrastructure. Retrieved April 11, 2022, from https://www.pharmar.org:https://www.pharmar.org/white-paper/ RStudio. (2020). Shared Baselines. (RStudio) Retrieved April 11, 2022, from Reproducible Environments: https://environments.rstudio.com/shared Schloerke, B. (2020, March 25). GGally: Extension to ggplot2. Retrieved April 11, 2022, from https://www.rdocumentation.org/packages/GGally/versions/1.5.0 The R Foundation for Statistical Computing c/o Institute for Statistics and Mathematics. (2021, October 18). R: Regulatory Compliance and Validation Issues, A Guidance Document for the Use of R in Regulated Clinical Trial Environments. Vienna, Austria. Retrieved April 11, 2022, from https://www.r-project.org/doc/R-FDA.pdf
-
-
Corresponding Author Contact
-
We encourage feedback to improve our framework and processes. For any questions or feedback please reach out to preetham.palukuru@merck.com
diff --git a/content/posts/case-studies/Merck-External-R-Package-Qual.Rmd b/content/posts/case-studies/Merck-External-R-Package-Qual.md
similarity index 100%
rename from content/posts/case-studies/Merck-External-R-Package-Qual.Rmd
rename to content/posts/case-studies/Merck-External-R-Package-Qual.md
diff --git a/content/posts/case-studies/merck-kgaa-case-study.html b/content/posts/case-studies/merck-kgaa-case-study.html
deleted file mode 100644
index 7ad0ee1..0000000
--- a/content/posts/case-studies/merck-kgaa-case-study.html
+++ /dev/null
@@ -1,54 +0,0 @@
----
-title: Risk Assessment of R Packages at Merck KGaA/EMD Serono
-author: Juliane Manitz, Stefan Pinkert, Martin Gregory and Francois Beckers
-date: '2022-02-14'
-slug: merck-kgaa-case-study
-categories: [case studies]
-banner: 'img/banners/merck-kgaa.png'
----
-
-
-
-
Introduction
-
Like many other companies, Merck KGaA/EMD Serono has embarked on their journey to enable the use R for regulatory submissions. Following the framework introduced by the R validation hub (Nicholls et al., 2020), we started to develop an algorithm to qualify a CRAN package as a Merck standard package in our GxP environment. In a nutshell: Given the R Foundation’s effort to ensure the validity of base and recommended R packages, these packages are classified as level 1. If an additional R package passes the installation qualification and successfully executes available tests, the package will be made available to the user and (temporarily) classified as level 3 package. Then, an automated risk assessment of R packages is performed based on the test coverage score (more is better) and the riskmetric score generated from the meta-information (smaller is better). If pre-defined thresholds are fulfilled, the package is qualified as Merck standard package (i.e., promoted to level 2), otherwise an explicit (manual) risk assessment is needed. This 3-tier model provides a useful framework for the users to define a risk-based quality control of outputs when using R. In this document, we introduce our pathway to a risk-based assessment of R packages at Merck. We provide relevant details on the statistical analysis which led to the definition of thresholds supporting a robust classification of CRAN packages as Merck standard packages. We want to inspire other companies and seek feedback from the community.
-
-
Merck Validation Framework
-
The assessment of R package accuracy is part of the process of validation to ensure quality output of statistical analyses. Validation is “establishing documented evidence which provides a high degree of assurance [accuracy] that a specific process consistently [reproducibility] produces a product meeting its predetermined specifications [traceability] and quality attributes” (see FDA’s Glossary of Computer System Software Development Terminology). While focused here on R, the proposed framework can be generalized to other programming languages (e.g. Python, SAS, . . . ).
-
The Merck Validation Framework classifies external CRAN packages into three levels of confidence in the accuracy, reliability, and trustworthiness of their functionalities:
-
-
Core CRAN Packages which are generally accepted to be accurate based on published documentation by the R Foundation
-
Merck add-on standard packages which have sufficient documented evidence establishing trustworthiness.
-
Other R packages for which the user is expected to ensure proper quality control and respective documentation that the specific package functionality results in the accurate outcome. Respective requirements vary depending on the purpose and complexity of the application.
-
-
-
Risk Assessment Algorithm of R Packages from CRAN
-
The proposed automated risk assessment of R packages is based on a combination of the test coverage and riskmetric score. A process overview is provided in the Figure 1. If an R package passes the installation qualification and successfully executes available tests (internal and add-on, if applicable), the package will be made available to the user at level 3. Then, an automated risk assessment of R packages is performed based on the test coverage score (more is better) and the riskmetric score generated from the meta-information (smaller is better). If pre-defined thresholds are fulfilled, the package is qualified as level 2, otherwise an explicit (manual) risk assessment is needed.
-
-
-
Figure 1: Outline of the process for installing CRAN packages in the computing environment.
-
-
-
Riskmetric Score
-
The riskmetric score has the following components and weights:
-
-
50% code coverage: unit testing, examples, vignette
-
15% good software development practices: maintainer, public code base, news file
Although unit test coverage and the riskmetric score are not independent, the overall score has been found to be robust.
-
-
Empirical Evaluation
-
We established a robust threshold for the riskmetric score based on a ROC analysis, which determined optimal classification given the continuous riskmetric score (see Figure 2). As training data, we used a selection of n = 61 packages (38 packages were classified as level 2, and 23 packages were classified as level 3).
-
We find an appropriate threshold for the riskmetrics score at y = 50, which is results in a good classification performance (Accuracy = 77% [64; 87]). In order to increase specificity, we added test coverage as second dimension with pre-defined threshold of x = 50. This results in an improved classication specificity of 88.5%.
-
Note that as a first version of this automated risk-assessment of R packages for level 2 qualification, we chose a quite conservative approach which deemed acceptable in the general process surrounding the analysis of clinical data.
-
-
-
Figure 2: Derivation of Classification Threshold using ROC analysis
-
-
-
Summary and Outlook
-
We introduced a first version of a risk-based assessment of R packages at Merck KGaA/EMD Serono. The automated risk assessment of CRAN packages classifies R packages based on a two-dimensional risk score, which is composed of test coverage and riskmetric score. The approach results in a final classication specificity of 88.5%, however the accuracy estimates are empirical and associated with some level of uncertainty. Evaluation of a test set of packages and their analysis for potential improvement of the threshold are underway.
-
We are actively seeking feedback. Please do not hesitate to reach out to juliane.manitz@emdserono.com, and lets discuss during the next meeting of the R validation hub (TBA).
Whereas data validation is already a standard precursor to any form of scientific analysis in drug development and the validation of in-house built source code used to generate quantitative deliverables follows standard practices as well, the increasing popularity of open source programming languages like R in this context have created a new type of challenge: the validation of the R packages which are imported and used in the drug submission/ approval projects. Such packages are distributed freely, almost always without any warranties, and may be of varying quality. Therefore, Novartis has been working on defining a package risk-based validation approach qualifying R packages. Its risk assessment was designed based on the two business use cases, which reflect current business activities.
-
Use case 1: “standard” packages that are routinely used for drug project submissions and are pre-installed on the platform after being explicitly requested by an associate.
-
Use case 2: new and ad hoc installed packages specific for a given project or user specific that are still to be used for a drug project submission.
-
-
2 R PACKAGE RISK BASED VALIDATION
-
A group of Novartis experts have defined ten risk assessment criteria to validate open-source R packages. The steps outlined in this document are aligned with the considerations published by industry consortia such as R Validation Hub [1].
-
Each criterion was formulated based on the subject matter experts’ opinions supported by the publicly available data and was framed as question with two possible answers: “Yes” and “No”, where the former must be always supported by the evidence (e.g., link to the source or, less preferably, picture including a date). The questions were categorized into three groups: “Low”, “Medium”, and “High” risk.
-
When a package meets criteria from multiple risk groups, it inherits thelowest risk rating.For example, if “Yes” answer is marked in the “Low”and “High” risk group, the final package risk corresponds to “Low”. If a package does not meet any criteria it is considered extremely risky and may not be installed at all.
-
The package risk assessment and testing processes, mentioned in this paper, are aligned with FDA’s validation principles [2] including:establishing documented evidence and providing a high degree ofassurance (accuracy) that a specific process is consistently (reproducibility) meeting its predetermined specifications (traceability) and quality attributes.
-
A schematic of the process and criteria is provided in Figure 1. As should be evident, several of the criteria include either references to curated lists of trusted sources or “arbitrary” thresholds. Novartis aims to have a systematic and science-based approach to defining these items, e.g., by using classic metrics such as impact factors for journals to determine their credibility, or by comparing download rates to overall download rates of popular R packages. However, we are deeply aware of the futility of attempting to tune these criteria to perfection, as we believe such a focus on “facts” provides a false sense of security that is easily exploited: bad science finds its way into reputable journals all the time and reviewers generally do a bad job of vetting source code, download rates can be inflated through bots, coverage ratings can be rendered meaningless by adding pointless unit tests, etc. Instead, we rely on critical thinking both on the part of the team in charge of installing, validating, and managing these packages as well as on the part of the associates that use them. In this context, the risk categorization serves primarily as directing our attention towards those packages that are likeliest to be problematic but does not absolve us from remaining vigilant at all times.
-
-
-
Figure 1: Overview of Novartis risk criteria. Green/yellow/red boxes represent low/medium/high-risk category criteria, respectively. Orange parallelograms represent curated lists managed by the R package validation team.
-
-
-
3MITIGATION OF THE RISK - TESTING REQUIREMENTS
-
All open-source packages which are available as pre-installed packages on Novartis platform (use case 1), are installed by IT. Low risk packages do not require business testing (performance qualification), meaning the package functions are not further tested by the users. Medium risk packages, while not meeting the same, strict low risk requirements, are still considered to have adequate evidence to be rated as sufficiently trustworthy for use without dedicated testing.
-
All the packages which are high risk require business testing (performance qualification) that verifies the main functions that are to be used. Those tests should be written and executed by the end-user (package requestor) or/and by an appointed R Governance Team. The focus of the PQ testing is to ensure that the package functionsthat are likeliest to be used provide scientifically correct results. That is, the main interest lies on ensuring scientific validity, which may or may not be purpose of unit tests written by scientific software developers. The tests themselves and the output of the test executions are documented in a Novartis-internal life cycle management tool and send for QA approval. In the case where a high-risk package appears as a dependency of the requested package, it is treated as if it was explicitly requested by the user.
-
For use case 2, i.e., packages installed in an ad hoc manner for a specific project, the business process, which is governed by an internal work instruction, is similar. The key difference is the missing tracking through the standard life cycle management tool and therefore the explicit QA approval. Instead, accountability of ensuring sufficient evidence for proving the credibility of the package in question has been gathered lies with the project team itself.
-
A table below summarizes the mandatory testing steps depending on the package risk and the use case.
-
-
-
-
-
-
-
-
-
-
Use Case 1 (Production Global Use)
-
Use Case 2 (Ad-Hoc installed package for the drug project submission/approval)
-
-
-
-
-
Low Risk
-
Installation/Operational Qualification done by IT
-
Installation Testing (CMD check) done by the user, e.g., using the Novartis tool (see section 4 for more details)
-
-
-
Medium Risk
-
Installation/Operational Qualification done by IT
-
Installation Testing (CMD check) done by the user, e.g., using the Novartis tool (see section 4 for more details)
-
-
-
High Risk
-
Installation/Operational Qualification done by IT; Performance Qualification Testing is required and must be done in QA environment by Package Requestor; Performance Qualification should test all exported functions intended to be used in the project; The package testing output is stored in the Novartis life cycle management tool and approved by QA.
-
Installation Testing (CMD check) done by the user, e.g., using the Novartis tool (see section 4 for more details);Ad-hoc installed packages, which are classified high risk should be tested by the end-user;Performance Qualification should test all exported functions intended to be used in the project; The package testing output should be stored in tool folder in the project home directory
-
-
-
-
-
4 AUTOMATION
-
An internal tool has been developed to facilitate and expedite the above process at Novartis and to ensure proper package installation and testing record.
-
This tool, planned to be made open source in the future, includes – but is not limited to – the following functionalities
-
-
Installation of the package and its dependencies (including suggests) and tests
-
Running of the examples, tests and vignettes, and saving of the test directory to a zip file
-
Ensuring of reproducible MRAN, GitHub, GitLab and BitBucket sourcesby saving their snapshot date or commit sha hash and restoring them on reload
-
Conducting the package risk assessment for a given list of packages and their dependencies and filling out the form
-
-
As stated previously, the R governance team acknowledges the risk of attempting to “automate away” the critical thinking. Therefore, the package in question is under continued development, taking in lessons learned with each new release of R packages to the computing platform. Furthermore, the R governance team is developing a shiny application which will further simplify the process of requesting the package, assessing the risk, and testing from the end user’s perspective. This tool will allow to have a full overview over the package release history, tested functions, projects the package was used and more.
-
-
5 CONCLUSION
-
The validation of R packages is a new and exciting endeavor that entails making judgement calls on various aspects for which little reliable data exists. As such, it is and should be subject to change, evolving over time with the increasing amount of experience gathered. During this forming period (and maybe even beyond), some of the risk assessment criteria will inevitably be based on hard-to-quantify subject matter expert opinion and domain knowledge. Because these can be seen arbitrary from the validation perspective, Novartis is working on systematizing to the extent possible and performs sensitivity analyses to assess the impact of our choices. The addition of new languages to this framework, such as Python or Julia, will bring additional experience that can be leveraged for this purpose.Still, regardless of the outcome of these “best efforts”, we consider the universal application of critical thinking to be the single most important ingredient for successful usage of open source packages. It is our deeply held belief that scientific results must be significant and robust enough such that a faulty line of code does not invalidate an entire conclusion, regardless of whether the code in question was written by a Novartis associate or an external author.
-
-
6 REFERENCES
-
[1] Andy Nicholls, Paulo R. Bargo, and John Sims. A risk-based approach for assessing r package accuracy within a validated infrastructure. URL: https://www.pharmar.org/white-paper/.
Click here to access the full paper including the appendix
diff --git a/content/posts/case-studies/novartis-case-study.Rmd b/content/posts/case-studies/novartis-case-study.md
similarity index 100%
rename from content/posts/case-studies/novartis-case-study.Rmd
rename to content/posts/case-studies/novartis-case-study.md
diff --git a/content/posts/case-studies/roche-case-study.html b/content/posts/case-studies/roche-case-study.html
deleted file mode 100644
index 730ef68..0000000
--- a/content/posts/case-studies/roche-case-study.html
+++ /dev/null
@@ -1,22 +0,0 @@
----
-title: "Automated R Package Validation at Roche"
-author: Coline Zeballos, Doug Kelkhoff, Szymon Maksymiuk, Lorenzo Braschi
-date: "2022-04-22"
-output:
- html_document:
- df_print: paged
-categories: [case studies]
-banner: "img/banners/roche-case-study.png"
-slug: "roche-case-study"
----
-
-
-
-
This case study walks through the automated R package validation process at Roche that utilizes a human-in-the-middle component to reconcile any gaps that arise in the automated metadata checks. The approach balances automation with risk mitigation and encourages in-house package development and iteration by introducing transparency to the validation process. The result reinforces best practices in R programming and package development while ensuring high package quality for use within a regulatory environment.
diff --git a/content/posts/case-studies/roche-case-study.Rmd b/content/posts/case-studies/roche-case-study.md
similarity index 100%
rename from content/posts/case-studies/roche-case-study.Rmd
rename to content/posts/case-studies/roche-case-study.md
diff --git a/content/posts/news/2020-05-07-a-risk-based-approach-for-assessing-r-package-accuracy-within-a-validated-infrastructure-white-paper-summary.html b/content/posts/news/2020-05-07-a-risk-based-approach-for-assessing-r-package-accuracy-within-a-validated-infrastructure-white-paper-summary.html
deleted file mode 100644
index ac19d8b..0000000
--- a/content/posts/news/2020-05-07-a-risk-based-approach-for-assessing-r-package-accuracy-within-a-validated-infrastructure-white-paper-summary.html
+++ /dev/null
@@ -1,60 +0,0 @@
----
-title: 'A Risk-based Approach for Assessing R package Accuracy within a Validated
- Infrastructure: White Paper Summary'
-author: Andy Nicholls
-date: '2020-01-30'
-slug: a-risk-based-approach-for-assessing-r-package-accuracy-within-a-validated-infrastructure-white-paper-summary
-categories: [news]
-tags:
- - white paper
-banner: 'img/banners/news.png'
----
-
-
-
-
The R Validation Hub is a cross-industry initiative whose mission is to enable the use of R by the Bio-Pharmaceutical Industry in a regulatory setting, where the output may be used in submissions to regulatory agencies. The group was initially formed in 2018 by members of PSI’s AIMS SIG but has now expanded to include just under 100 members from multiple organisations across the pharmaceutical sector.
-
-
-
Why write a white paper?
-
The R Validation Hub is following a three-phase roadmap. Phase 1 of the roadmap has been focussed on consolidating relevant information for those wishing to use R for regulatory work. The white paper reflects the current thinking of the R Validation Hub working group and may evolve over time. It consolidates the collective high-level thinking that has been established through various communication channels over the past 12-18 months. Additional detail will be provided via the website and future papers.
-
-
-
Key Themes from the White Paper
-
-
Validation, R and R Packages
-
The FDA provide a clear definition of validation in the ‘Glossary of Computer System Software Development Terminology’. This can be broken down into three core components:
-
-
Accuracy
-
Reproducibility
-
Traceability
-
-
As a constantly evolving language, R presents challenges in each of these areas but the greatest challenges typically concern accuracy.
-
When assessing accuracy it is important to make the distinction between core R (base and recommended packages) R and the many contributed R packages. The R Foundation have produced R: Regulatory Compliance and Validation Issues. A Guidance Document for the Use of R in Regulated Clinical Trial Environments which is a very useful reference for anyone wishing to use core R for regulatory work. Based on an assessment of the available information we conclude that there “is minimal risk in using base and recommended (core) packages as a component in a validated system for regulatory analysis and reporting with R”.
-
For the contributed R packages we propose a risk-based approach to establish package accuracy/validity.
-
-
-
An R Package Risk Assessment Framework
-
We propose to assess the risk of contributed R packages based on four criteria:
-
-
Purpose
-
Maintenance Good Practice (Software Development Life Cycle)
-
Community Usage
-
Testing
-
-
The criteria form part of a proposed workflow summarised in the figure below:
-
-
The purpose (eg statistical modelling, data manipulation), maintenance practices, community usage and testing each helps establish confidence (or otherwise) in the accuracy of an R package. We propose to gather information in each of these areas in order to determine an overall risk score for a package.
-For packages that present a medium/high degree of risk we may suggest to mitigate the risk by clearly defining usage requirements and developing tests against them through a unit-testing framework such as testthat.
-For low-risk packages we feel that “additional remediation for such packages is unlikely to yield any significant reduction in risk.” Based on the earlier conclusion this would include base and recommended R packages.
-
-
-
Trust
-
A vendor audit can be used establish trust in a company that develops proprietary software. A successful audit may result in the vendor being allocated a ‘trusted resource’ status and thus any software produced by that vendor could be deemed to be low risk. We may also begin to develop a similar level of trust in certain package authors or collections of packages (eg ‘tidyverse’) that continue to demonstrate good practice over a sustained period of time.
-# Next Steps
-
The white paper represents the R Validation Hub’s current thinking and will likely evolve over time as we explore key themes, such as testing, in more depth. As we do so, further shorter papers will follow.
-
In addition, R Validation Hub has now commenced phase 2 of its roadmap. The aim of this phase is to supplement the white paper with tools. This will begin with the release of an R package, riskmetric, for the collection of metrics that can be used to evaluate package risk and a corresponding Shiny app that can be used to generate package reports. See www.pharmar.org for further information and updates.
-
-
diff --git a/content/posts/news/2020-05-07-a-risk-based-approach-for-assessing-r-package-accuracy-within-a-validated-infrastructure-white-paper-summary.Rmd b/content/posts/news/2020-05-07-a-risk-based-approach-for-assessing-r-package-accuracy-within-a-validated-infrastructure-white-paper-summary.md
similarity index 100%
rename from content/posts/news/2020-05-07-a-risk-based-approach-for-assessing-r-package-accuracy-within-a-validated-infrastructure-white-paper-summary.Rmd
rename to content/posts/news/2020-05-07-a-risk-based-approach-for-assessing-r-package-accuracy-within-a-validated-infrastructure-white-paper-summary.md
diff --git a/content/posts/news/2020-05-07-status-update-may-2020.html b/content/posts/news/2020-05-07-status-update-may-2020.html
deleted file mode 100644
index fdb36ec..0000000
--- a/content/posts/news/2020-05-07-status-update-may-2020.html
+++ /dev/null
@@ -1,20 +0,0 @@
----
-title: Status Update May 2020
-author: Juliane Manitz
-date: '2020-05-07'
-slug: status-update-may-2020
-categories: [news]
-tags:
- - riskmetric
-banner: 'img/banners/news.png'
----
-
-
-
-
In these uncertain times, we would like to provide you with some good news and update you on the progress from the R validation hub. It has been a little while since you heard from us, but that doesn’t mean we were less active.
-
Communication: You may have realized that there are no further re-occurring meetings scheduled for the R validation hub since February. Following previous discussions, we are planning regular update releases via the website and newsletters instead. Meetings for the validation hub will be scheduled less frequently as needed. However, we are dependent on your input and continue to encourage volunteers to collaborate on all the different parts of the project.
-
The riskmetric R package is steadily growing. Shout-out to Mark Pagham from ROpenSci, who is helping with non-industry contributions. However, your input and support is more than welcome! Whether you want to contribute your own implementation of a riskmetric into the existing package framework, or have ideas how to quantify certain metrics, don’t hesitate to visit the riskmetric github page or directly contact Doug or Yilong.
-
Moreover, development has started on an interactive interface for riskmetric. The risk assessment app is developed by a vendor FISSION using the funding by the R-consortium. The initial build will include the following metrics: Number of downloads in the past 12 months, months since first release, number of vignettes, availability of a news feed and website, and test coverage. Anyone available to do some testing, please let Andy know you would like to be involved during the testing phase expected in July 2020. After the end of the contract, maintaining the app becomes a “community effort”.
-
Some good and bad news regarding useR! 2020: Our abstract was accepted as oral presentation, which was fantastic news for a couple of days. Unfortunately, shortly after that the conference was cancelled. We are waiting patiently for status updates from the conference organizers on possible virtual replacements.
-
Stay tuned for more updates and stay healthy,
-
The R validation hub executive committee
diff --git a/content/posts/news/2020-05-07-status-update-may-2020.Rmd b/content/posts/news/2020-05-07-status-update-may-2020.md
similarity index 100%
rename from content/posts/news/2020-05-07-status-update-may-2020.Rmd
rename to content/posts/news/2020-05-07-status-update-may-2020.md
diff --git a/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.Rmd b/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.Rmd
index b5efdc8..560ccc5 100644
--- a/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.Rmd
+++ b/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.Rmd
@@ -36,6 +36,8 @@ Then, the package can be loaded:
``` {r, echo=FALSE, message=FALSE, eval=TRUE}
library(riskmetric)
+library(tibble)
+library(dplyr)
```
To illustrate how `riskmetric` works, a few packages with a wide range of popularity have been selected.
@@ -56,8 +58,17 @@ To illustrate how `riskmetric` works, a few packages with a wide range of popula
When referencing a package, riskmetric first looks for installed packages but can also assess packages that have not been installed:
``` {r, message=FALSE}
-package_tbl <- pkg_ref(c("riskmetric", "utils", "ggplot2", "Hmisc", "survminer", "coxrobust"))
-package_tbl$survminer
+pkg_names <- c(
+ "riskmetric",
+ "utils",
+ "ggplot2",
+ "Hmisc",
+ "survminer",
+ "coxrobust"
+)
+
+pkgs <- pkg_ref(pkg_names)
+pkgs[[5]]
```
Note that many fields have a trailing `...`; riskmetric will evaluate and cache the results of the queries later on. When we call the `pkg_assess()` function on each reference, the metrics will be stored and become available. In other words, the necessary package metadata is assessed and an atomic value is added for each assessment and package.
@@ -67,7 +78,7 @@ Then, the information is scored in order to estimate associated risk. This final
For more information, check out the [`riskmetric` vignette](https://pharmar.github.io/riskmetric/articles/riskmetric.html).
``` {r, message=FALSE, warning=FALSE}
-res <- package_tbl %>%
+res <- pkgs %>%
pkg_assess() %>%
pkg_score() %>%
mutate(risk = summarize_scores(.))
@@ -76,7 +87,7 @@ res <- package_tbl %>%
The function `summarize_scores()` serves as an example for how a risk score might be derived. Each organization should decide independently how to weight different assessments.
``` {r, output="asis", echo=FALSE}
-pander(res[,c(1:2, 9,10,6, 7, 8,14 )], split.table=Inf)
+pander(res[, c(1:2, 9, 10, 6, 7, 8, 14)], split.table = Inf)
```
@@ -106,12 +117,12 @@ In addition to assessing the set of packages used to develop a project, `riskmet
``` {r}
pkg_ref("survminer") %>%
- assess_downloads_1yr() %>%
+ assess_downloads_1yr() %>%
metric_score()
```
```{r, echo = FALSE, warning = FALSE, message = FALSE}
-pkg_example <- pkg_ref("survminer")
+pkg_example <- pkg_ref("survminer")
pkg_data <- pkg_example %>% pkg_assess()
pkg_score <- pkg_data %>% pkg_score()
```
diff --git a/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.html b/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.html
deleted file mode 100644
index 312fce7..0000000
--- a/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.html
+++ /dev/null
@@ -1,193 +0,0 @@
----
-title: Introduction to the R Package `riskmetric`
-author: Juliane Manitz, Douglas Kelkhoff, Eli Miller, and Yilong Zhang
-date: '2020-06-09'
-slug: riskmetric-intro-jun-2020
-categories: [news]
-tags:
- - riskmetric
-banner: 'img/banners/news.png'
----
-
-
-
-
-
Many contributed R packages lack documentation expected in software qualification, which is required within pharma and other regulated industries. For pharma, there are various regulations, which require documentation that demonstrates software is used appropriately and works as expected. Thus, industry needs to establish appropriate requirements for R packages using selected metadata and useful risk metrics.
-
In context of the R Validation Hub, the R package riskmetric has been developed, which seeks to take the first steps in identifying metrics and best practices to quantify the quality of R packages. It provides a framework for retrieving package metadata, assessing package metrics, and summarizing the risk that the package might not provide accurate results. A corresponding Shiny app, that can be used to generate package reports using riskmetric, is under development.
-
In this blog post, we want to illustrate the capabilities and usage of riskmetric and demonstrate how it could fit into an organizations validation process or its qualified environments.
-
-
The riskmetric package is not yet on CRAN. Until it is, it can be installed using devtools directly from GitHub:
-
devtools::install_github("pharmaR/riskmetric", force = TRUE)
-
Then, the package can be loaded:
-
library(riskmetric)
-
To illustrate how riskmetric works, a few packages with a wide range of popularity have been selected.
-
-
riskmetric (Metrics to evaluate the risk of R packages): Not on CRAN yet
-
utils (R utility functions): R core package
-
ggplot2 (Create Elegant Data Visualisations Using the Grammar of Graphics): very popular package
-
Hmisc (Harrell Miscellaneous functions): something more old school
-
survminer (Drawing Survival Curves using ggplot2): less popular, but established package
-
coxrobust (Robust Estimation in Cox Model): oldest R package on CRAN
-
-
-
When referencing a package, riskmetric first looks for installed packages but can also assess packages that have not been installed:
Note that many fields have a trailing ...; riskmetric will evaluate and cache the results of the queries later on. When we call the pkg_assess() function on each reference, the metrics will be stored and become available. In other words, the necessary package metadata is assessed and an atomic value is added for each assessment and package.
-
Then, the information is scored in order to estimate associated risk. This final score converts the assessment value into a single numeric score between 0 (poor) and 1 (great). Finally each package’s risk is summarized as a weigthed sum of assessment scores.
The function summarize_scores() serves as an example for how a risk score might be derived. Each organization should decide independently how to weight different assessments.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
package
-
version
-
license
-
export_help
-
has_vignettes
-
has_bug_reports_url
-
bugs_status
-
has_news
-
-
-
-
-
riskmetric
-
0.1.0.9001
-
NA
-
1
-
0
-
1
-
0.5667
-
0
-
-
-
utils
-
3.6.2
-
NA
-
0.996
-
1
-
0
-
0
-
0
-
-
-
ggplot2
-
3.2.1
-
NA
-
1
-
1
-
1
-
0.6333
-
1
-
-
-
Hmisc
-
4.3.1
-
NA
-
1
-
0
-
0
-
0
-
0
-
-
-
survminer
-
0.4.6
-
NA
-
1
-
1
-
1
-
0.2333
-
1
-
-
-
coxrobust
-
1.0
-
NA
-
1
-
0
-
0
-
0
-
0
-
0
-
-
-
-
-
There are many good programming and package development practices that establish a package is well made and maintained:
-
-
has_vignettes - Number of published vignettes
-
has_news - Number of releases with a NEWS update
-
has_bug_reports_url - Presence of a URL for users to report issues and bugs found in the package.
-
-
Community usage is determined based on the number of downloads. This is a useful proxy for community support and adhoc testing done by other developers.
-
-
downloads_1yr – Number of downloads from CRAN, Bioconductor, and GitHub in the past year.
-
-
Furthermore, the test coverage of a package can provide well established insights on the package accuracy
-
-
covr_coverage – Package unit test coverage percentage
-
-
Several other metrics are under active development that can interrogate package stability and complexity, e.g.
-
-
Maturity – Package version and overall maturity
-
Cyclomatic Complexity – Complexity of the code base itself
-
-
-
In addition to assessing the set of packages used to develop a project, riskmetric can also be used to assess a package before you introduce it into your development environment. Here is an example reviewing the number of downloads for the survminer package:
Finally, this information can be used by a system administrator when evaluating the suitability of a package, or when writing a validation report:
-
-
survminer (v0.4.7)
-
Package survminer (v0.4.7) has 227894 downloads in the past year, which converts to a riskmetric score of
-60.31%.
-
-
If you are interested in helping with development or the direction of the package, we are active on GitHub and welcome any contributions. More details can be found in the “Get Involved” section of the readme file for riskmetric GitHub page.
diff --git a/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.md b/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.md
new file mode 100644
index 0000000..493469e
--- /dev/null
+++ b/content/posts/news/2020-06-02-riskmetric-intro-jun-2020.md
@@ -0,0 +1,258 @@
+
+
+Many contributed R packages lack documentation expected in software
+qualification, which is required within pharma and other regulated
+industries. For pharma, there are various regulations, which require
+documentation that demonstrates software is used appropriately and works
+as expected. Thus, industry needs to establish appropriate requirements
+for R packages using selected metadata and useful risk metrics.
+
+In context of the R Validation Hub, the R package
+[`riskmetric`](https://github.com/pharmaR/riskmetric) has been
+developed, which seeks to take the first steps in identifying metrics
+and best practices to quantify the quality of R packages. It provides a
+framework for retrieving package metadata, assessing package metrics,
+and summarizing the risk that the package might not provide accurate
+results. A corresponding Shiny app, that can be used to generate package
+reports using riskmetric, is under development.
+
+In this blog post, we want to illustrate the capabilities and usage of
+`riskmetric` and demonstrate how it could fit into an organizations
+validation process or its qualified environments.
+
+
+
+The `riskmetric` package is not yet on CRAN. Until it is, it can be
+installed using `devtools` directly from GitHub:
+
+ devtools::install_github("pharmaR/riskmetric", force = TRUE)
+
+Then, the package can be loaded:
+
+To illustrate how `riskmetric` works, a few packages with a wide range
+of popularity have been selected.
+
+- `riskmetric` (Metrics to evaluate the risk of R packages): Not on
+ CRAN yet
+- `utils` (R utility functions): R core package
+- `ggplot2` (Create Elegant Data Visualisations Using the Grammar of
+ Graphics): very popular package
+- `Hmisc` (Harrell Miscellaneous functions): something more old school
+- `survminer` (Drawing Survival Curves using `ggplot2`): less popular,
+ but established package
+- `coxrobust` (Robust Estimation in Cox Model): oldest R package on
+ CRAN
+
+
+
+When referencing a package, riskmetric first looks for installed
+packages but can also assess packages that have not been installed:
+
+ pkg_names <- c(
+ "riskmetric",
+ "utils",
+ "ggplot2",
+ "Hmisc",
+ "survminer",
+ "coxrobust"
+ )
+
+ pkgs <- pkg_ref(pkg_names)
+ pkgs[[5]]
+
+ ## survminer v0.5.2
+ ## $repo
+ ## [1] "https://cloud.r-project.org/src/contrib"
+ ## $source
+ ## [1] "pkg_cran_remote"
+ ## $version
+ ## [1] "0.5.2"
+ ## $name
+ ## [1] "survminer"
+ ## $archive_release_dates...
+ ## $bug_reports...
+ ## $bug_reports_host...
+ ## $bug_reports_url...
+ ## $downloads...
+ ## $license...
+ ## $maintainer...
+ ## $news...
+ ## $news_urls...
+ ## $r_cmd_check...
+ ## $release_date...
+ ## $remote_checks...
+ ## $repo_base_url...
+ ## $source_control_url...
+ ## $tarball_url...
+ ## $vignettes...
+ ## $web_html...
+ ## $web_url...
+ ## $website_urls...
+
+Note that many fields have a trailing `...`; riskmetric will evaluate
+and cache the results of the queries later on. When we call the
+`pkg_assess()` function on each reference, the metrics will be stored
+and become available. In other words, the necessary package metadata is
+assessed and an atomic value is added for each assessment and package.
+
+Then, the information is scored in order to estimate associated risk.
+This final score converts the assessment value into a single numeric
+score between 0 (poor) and 1 (great). Finally each package’s risk is
+summarized as a weigthed sum of assessment scores.
+
+For more information, check out the [`riskmetric`
+vignette](https://pharmar.github.io/riskmetric/articles/riskmetric.html).
+
+ res <- pkgs %>%
+ pkg_assess() %>%
+ pkg_score() %>%
+ mutate(risk = summarize_scores(.))
+
+The function `summarize_scores()` serves as an example for how a risk
+score might be derived. Each organization should decide independently
+how to weight different assessments.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
package
+
version
+
r_cmd_check
+
exported_namespace
+
has_news
+
remote_checks
+
news_current
+
has_maintainer
+
+
+
+
+
riskmetric
+
0.2.7
+
NA
+
0.443
+
1
+
NA
+
0
+
1
+
+
+
utils
+
4.5.0
+
NA
+
0.07081
+
0
+
NA
+
0
+
1
+
+
+
ggplot2
+
3.5.2.9001
+
NA
+
0.006497
+
1
+
NA
+
0
+
1
+
+
+
Hmisc
+
5.2.3
+
NA
+
0.01639
+
1
+
NA
+
1
+
1
+
+
+
survminer
+
0.5.2
+
NA
+
NA
+
1
+
0.9464
+
1
+
1
+
+
+
coxrobust
+
1.0.2
+
NA
+
NA
+
1
+
1
+
1
+
1
+
+
+
+
+
+
+There are many good programming and package development practices that
+establish a package is well made and maintained:
+
+- `has_vignettes` - Number of published vignettes
+- `has_news` - Number of releases with a NEWS update
+- `has_bug_reports_url` - Presence of a URL for users to report issues
+ and bugs found in the package.
+
+Community usage is determined based on the number of downloads. This is
+a useful proxy for community support and adhoc testing done by other
+developers.
+
+- `downloads_1yr` – Number of downloads from CRAN, Bioconductor, and
+ GitHub in the past year.
+
+Furthermore, the test coverage of a package can provide well established
+insights on the package accuracy
+
+- `covr_coverage` – Package unit test coverage percentage
+
+Several other metrics are under active development that can interrogate
+package stability and complexity, e.g.
+
+- Maturity – Package version and overall maturity
+- Cyclomatic Complexity – Complexity of the code base itself
+
+
+
+In addition to assessing the set of packages used to develop a project,
+`riskmetric` can also be used to assess a package before you introduce
+it into your development environment. Here is an example reviewing the
+number of downloads for the `survminer` package:
+
+ pkg_ref("survminer") %>%
+ assess_downloads_1yr() %>%
+ metric_score()
+
+ ## [1] 0.7992852
+
+Finally, this information can be used by a system administrator when
+evaluating the suitability of a package, or when writing a validation
+report:
+
+> **survminer (v0.5.2)**
+>
+> Package survminer (v0.5.2) has 597329 downloads in the past year,
+> which converts to a `riskmetric` score of 79.93%.
+
+If you are interested in helping with development or the direction of
+the package, we are active on GitHub and welcome any contributions. More
+details can be found in the “Get Involved” section of the readme file
+for [`riskmetric` GitHub page](https://github.com/pharmaR/riskmetric).
diff --git a/content/posts/news/2020-07-20-status-update-july-2020.html b/content/posts/news/2020-07-20-status-update-july-2020.html
deleted file mode 100644
index 2e8562e..0000000
--- a/content/posts/news/2020-07-20-status-update-july-2020.html
+++ /dev/null
@@ -1,21 +0,0 @@
----
-title: Status Update July 2020
-author: Juliane Manitz
-date: '2020-07-20'
-slug: status-update-july-2020
-categories: [news]
-tags:
- - riskmetric
-banner: 'img/banners/news.png'
----
-
-
-
-
It’s time to bring some updates to you on the current status of the R validation hub and we are have plenty of great developments.
-
Fission has finished their work on the R package risk assessment app and made the source code is available on github. It is an interactive web application providing a front end for the collection of metrics for R packages via riskmetric package including visualizations and comparison metrics.
-
This is a major milestone for us as the app becomes now a community effort. Fortunately, we are happy to announce that Marly Cormar who has agreed to take on the lead of the app development. With her on board, we are confident to have a next major version ready by R/Pharma 2020.
-
In parallel, the ‘riskmetric’ R package made good advances and the respective support team is growing. In fact, they are putting a timeline together for a stable release soon. Stay tuned!
-
In other words, the R validation hub project is growing up and becomes a community effort that thrives and fails with your support – no pressure, but this is the perfect time to join the effort ;)
-
Furthermore, Andy presented at the EU Programming Heads meeting in June and earlier this month, Andy and Juliane recorded a presentation for the virtual useR! 2020, which is published on the R Consortium youtube channel. We can tell you that recording talks is so much more exhausting than giving them in person. Both presentations can be found on our website.
-
Finally, you might have recognized that we initiated quarterly meeting, which start on July 21, 2020. We are looking forward to see you there! For invites to these meetings and occasional news updates, please join our mailing list.
-
The R Validation Hub Executive Committee
diff --git a/content/posts/news/2020-07-20-status-update-july-2020.Rmd b/content/posts/news/2020-07-20-status-update-july-2020.md
similarity index 100%
rename from content/posts/news/2020-07-20-status-update-july-2020.Rmd
rename to content/posts/news/2020-07-20-status-update-july-2020.md
diff --git a/content/posts/news/2020-08-05-risk-assessment-application.html b/content/posts/news/2020-08-05-risk-assessment-application.html
deleted file mode 100644
index f64d33e..0000000
--- a/content/posts/news/2020-08-05-risk-assessment-application.html
+++ /dev/null
@@ -1,88 +0,0 @@
----
-title: Risk Assessment Application
-author: Andy Nicholls
-date: '2020-08-05'
-slug: risk-metric-application
-categories:
- - news
-tags:
- - riskmetric
- - white paper
-banner: 'img/banners/fission_collab.png'
----
-
-
-
-
-
Background
-
Towards the end of 2019, the R Validation Hub received an additional grant from the R Consortium to progress the next phase of our road map and produce a risk assessment app to complement the riskmetric package. In early 2020, Fission Labs were selected as our partner to build the first iteration of the application.
-
-
-
Fission Labs is a software product development services company delivering product life-cycle management and high-end scalable technology solutions. Our mission is to deliver complex products through simple and efficient engineering.
-
-
-
-
So what does the app do?
-
The aim of the app is to provide an interactive user interface to the riskmetric package, with metrics categorised into ‘maintenance’, ‘community usage’ and ‘testing’. The review team can then use the app to record their comments on the metrics. Both the metrics and the user comments are stored in an underlying database. This makes it possible to stop and start reviews at the reviewer’s convenience.
-
Once a review is complete, a final summary comment on the package can be provided before a final decision is made on the package. In line with our white paper, the decision is an overall risk score: low; medium; high. At this point (or indeed at any point) any of the users could then generate either an HTML or a DOCX package report containing the metrics, comments and some additional high level information about the package.
-
For those looking for more information on the app, I’ve provided further details in the following sections…
-
-
-
A detailed walk-through
-
-
Getting started
-
The app can be cloned/downloaded from the R Validation Hub’s GitHub repository. The app’s readme file contains details on how to get started with the app. Essentially it works like any other shiny app and can therefore be launched interactively within RStudio. But in order to take advantage of the collaborative elements of the app, it would need to be deployed to a web server. For example, it could be deployed using Shiny Server or RStudio Connect.
-
-
-
Roles
-
This first version of the app implements a very simple workflow that can be tailored to an organisation’s needs. There is no log-in, the app detects the user’s ID and asks them for their role.
-
-
For a simple two-stage review process the roles might be, ‘reviewer’, ‘approver’. Or you might prefer job-based roles such as ‘Statistician’. For now, the only impact is to record the users role on the assessment reports.
-
-
-
Loading packages
-
Currently, packages are added to the database by uploading a simple CSV of package names and versions. An example CSV is available for download within the app. Upon uploading the CSV, the app will begin collecting metrics by running riskmetric on each of the packages identified within the CSV. For several packages this may take some time! We are currently exploring alternative upload options as part of our continuous app improvement. But for now this is a good time for a cup of coffee! Otherwise, new packages can be added at any time.
-
CAUTION: The package version functionality has yet to be implemented. At the time of writing, the app displays information relating to the latest version of the package, regardless of the version specified in the CSV.
-
The mechanism for loading packages (with specific versions) is currently our top priority for the next major release.
-
-
-
A Package assessment
-
Once packages have been uploaded to the DB, the main panel on the left can be used to select the package (and version) in order to review its metrics. The metrics are split over three tabs:
-
-
Maintenance Metrics
-
Community Usage Metrics
-
Testing Metrics
-
-
Each individual tab provides a mechanism for commenting on the metrics. Comments within the sections are designed to be conversational, in other words a reviewer can make multiple comments alongside other reviewers. The reviewer’s ID, role and a date-time stamp are recorded along with the comment.
-
-
It is down to the individual organisations to determine their own review process. But the app supports any number of reviewers commenting on the metrics.
-
-
-
Completing the assessment
-
Once the reviewers have reach a consensus, overall package comments can be added on the left-hand panel. Once again, it is up to individual organisations to determine their own process with respect to the overall comment.
-
Unlike the comments on the metrics tabs, the overall comments are editable. Currently, each reviewer can add their own overall comment but this may change in the future depending on user feedback.
-
To finalise the review, a final decision must be made on the overall package risk. This decision has been categorised into ‘High’, ‘Medium’ and ‘Low’ in line with our white paper. Once the final decision has been submitted the package is locked. This prevents anyone from adding / editing comments.
-
-
-
Reports
-
A package report can be generated at any time. The reports are generated from the Report Preview tab by clicking the ‘Download Report’ button in the top right-hand corner. We currently support HTML and DOCX reports. The reports are generated using rmarkdown. The HTML and DOCX styling is controlled by CSS and a DOCX template respectively. They can therefore be easily customised to meet an organisation’s styling expectations.
-
-
If a report is generated prior to the final decision then the decision is shown as pending. The final report includes the metrics, all comments (both the overall comments and the conversational flow relating to the metric groupings), the metrics and some additional high level information about the package.
-
-
-
-
A word on the collaboration…
-
I would like to take this opportunity to extend my thanks to the Fission team for their support and patience over the past few months. Engaging with Fission has been a really positive experience. They understood what we were trying to achieve with this app from the off and have been extremely accommodating to changes throughout the project. Thank you.
-
-
-
What next?
-
The application is freely available to clone or download from our GitHub repository.
-
Marly Cormar has kindly agreed to take ownership of the app for it’s next phase, during which we aim to enhance the app based on the user feedback that we receive. Some of the current priorities include:
-
-
An enhanced package upload mechanism that enables specific versions to be uploaded
-
User-based roles and permissions
-
Closer integration with the riskmetric R package
-
Additional modularisation to facilitate future metrics
-
-
We are always looking for more volunteers to help support the development moving forward please contact us at psi.aims.r.validation@gmail.com. Else you can sign up to our mailing list here.
-
diff --git a/content/posts/news/2020-08-05-risk-assessment-application.Rmd b/content/posts/news/2020-08-05-risk-assessment-application.md
similarity index 100%
rename from content/posts/news/2020-08-05-risk-assessment-application.Rmd
rename to content/posts/news/2020-08-05-risk-assessment-application.md
diff --git a/content/posts/news/2020-09-21-status-update-sept-2020.Rmd b/content/posts/news/2020-09-21-status-update-sept-2020.Rmd
index 120ea0c..5571a1d 100644
--- a/content/posts/news/2020-09-21-status-update-sept-2020.Rmd
+++ b/content/posts/news/2020-09-21-status-update-sept-2020.Rmd
@@ -11,18 +11,43 @@ banner: 'img/banners/ropenscilabs.png'
```{r include=FALSE}
knitr::opts_chunk$set(comment = NA, collapse = TRUE)
-library(packgraph)
-library(autotest)
```
```{r pkg-install, echo = FALSE}
-ip <- installed.packages ()
-if (!"pkgapi" %in% rownames (ip))
- remotes::install_github ("r-lib/pkgapi")
-if (!"packgraph" %in% rownames (ip))
- remotes::install_github ("ropenscilabs/packgraph")
-```
+if (!requireNamespace("pak", quietly = TRUE)) {
+ install.packages("pak")
+}
+
+if (!requireNamespace("packgraph", quietly = TRUE)) {
+ pak::pak("ropenscilabs/packgraph")
+}
+
+if (!requireNamespace("autotest", quietly = TRUE)) {
+ pak::pak("ropensci-review-tools/autotest")
+}
+
+if (!requireNamespace("pkgapi", quietly = TRUE)) {
+ pak::pak("r-lib/pkgapi")
+}
+
+if (!requireNamespace("pkgapi", quietly = TRUE)) {
+ pak::pak("r-lib/pkgapi")
+}
+
+if (!requireNamespace("gert", quietly = TRUE)) {
+ install.packages("gert")
+}
+
+riskmetric_path <- file.path(tempdir(), "riskmetric")
+gert::git_clone(
+ "https://github.com/pharmaR/riskmetric.git",
+ path = riskmetric_path
+)
+library(gert)
+library(packgraph)
+library(autotest)
+```
## Background
@@ -101,17 +126,16 @@ functions of a package. Here is the summary of exported functions of the
[`riskmetric`](https://github.com/pharmar/riskmetric) package.
```{r packgraph_show, eval=FALSE}
-library (packgraph)
+library(packgraph)
pkg_source <- "////riskmetric"
-g <- pg_graph (pkg_source, plot = FALSE)
-pg_report (g)
+g <- pg_graph(pkg_source, plot = FALSE)
+pg_report(g)
```
```{r packgraph_run, echo = FALSE, cache=TRUE, collapse = TRUE}
-library (packgraph)
-pkg_source <- tools::file_path_as_absolute("../../static/blog/riskmetric")
-g <- pg_graph (pkg_source, plot = FALSE)
-pg_report (g)
+library(packgraph)
+g <- pg_graph(riskmetric_path, plot = FALSE)
+pg_report(g)
```
The primary cluster shown in purple in the preceding image has only two
@@ -197,17 +221,13 @@ The main function that does the work is
as demonstrated with the following code:
```{r autotest_show, eval = FALSE}
-library (autotest)
-system.time (
- x <- autotest_package ("////riskmetric")
- )
+library(autotest)
+system.time(x <- autotest_package("////riskmetric"))
```
```{r autotest_run, echo = FALSE, cache=TRUE, collapse=TRUE}
-library (autotest)
-system.time (
- x <- autotest_package (tools::file_path_as_absolute("../../static/blog/riskmetric"))
- )
+library(autotest)
+system.time(x <- autotest_package(riskmetric_path))
```
And you can see that the function takes a few seconds to run. The function
@@ -217,7 +237,7 @@ implements a `summary` method for these objects an edited part of which looks
like this:
```{r autotest-summary}
-summary (x)
+summary(x)
```
The result contained no errors or diagnostic messages, and 13 warnings for
@@ -277,5 +297,3 @@ our system, while our assessments and reviews would help improve the quality of
your software. We look forward to any contributions to help improve our system
for peer review of statistical software, and ultimately for helping to improve
the quality of statistical software in R.
-
-
diff --git a/content/posts/news/2020-09-21-status-update-sept-2020.html b/content/posts/news/2020-09-21-status-update-sept-2020.html
deleted file mode 100644
index c298a01..0000000
--- a/content/posts/news/2020-09-21-status-update-sept-2020.html
+++ /dev/null
@@ -1,333 +0,0 @@
----
-title: rOpenSci, Statistical Software, and the R Validation Hub
-author: Mark Padgham
-date: '2020-09-21'
-slug: status-update-sept-2020
-categories: [news]
-tags:
- - riskmetric
-banner: 'img/banners/ropenscilabs.png'
----
-
-
-
-
-
Background
-
rOpenSci is an organization devoted to “transforming
-science through data, software and reproducibility.” One of rOpenSci’s focal
-activities is peer review of R packages, historically focusing on packages that
-cover the data management lifecycle.
-This has historically excluded software implementing statistical methods, for which
-standards and review require addressing a different set of challenges. This year,
-we have begun tackling these so as to expand our peer review system to explicitly
-encompass statistical software, under project
-funded by the Alfred P. Sloan Foundation.
-
Two goals for the project are to develop sets of standards for statistical R packages
-against which they can be reviewed, and to develop a suite of tools to support
-for this assessment. Many of these tools are
-intended to function automatically, and to provide overviews of software
-structure and function, as well as to automatically diagnose and provide
-information on errors, warnings, and other diagnostic messages issued during
-execution of statistical software functions.
-
These tools relate closely R Validation Hub projects, including the riskmetric
-package
-and the Risk Assessment
-Application.
-Both R Validation Hub and rOpenSci aim to automate, as
-much as possible, the production of a reports that can be used to evaluate software.
-We have distinct aims and scope, however, resulting in a complementary
-set of tools, which this blog post aims to highlight.
-
-
-
Package Reporting
-
Our automated tools aim to provide peer-reviewers with information
-that helps them understand the structure and functionality of R packages they
-are evaluating, so they can better undertake parts of reviews which can not be
-automatically evaluated. The first of these tools is packgraph,
-which provides a templated report on function call graphs in an R package.
-
packgraph provides an overview of
-package structure and inter-relationships between package functions, along with
-an optional interactive visualization of the network of function calls within
-a package. Function call networks are commonly
-divided among distinct clusters of locally inter-connected functions, with the
-resultant visualization using a different colour to visually distinguish each
-cluster. Applying the primary function pg_graph() function to the riskmetric
-package graphical representation:
-
-
-
-
-
Each node of the network is a function, with sizes scaled by how many times
-that function is called. Each line reflects a call from one function to
-another, with a thickness scaled by numbers of calls between those two
-functions. The function at the centre of the purple star shape is the core
-pkg_metric function, with the long tail representing functions for processing
-errors and warnings. That graph provides an immediate visual representation of
-overall package structure, revealing in the case of the
-riskmetric package a large number of
-effectively independent functions which are not directly called by other
-functions. Most of these isolated functions represent the various assessment
-metrics and associated caching procedures, which in turn reflect the modular
-design of the package, in which assessments, and the connections between these
-peripheral isolated functions, are controlled by the user rather than being
-hard-coded within the package.
-
Most packages have more defined clusters of interconnections which this
-interactive graphical output can help to explore and understand. The
-pg_report() function also generates a tabular summary of this function call
-network. By default, the pg_report() function only summarizes
-inter-relationships between exported functions of package, although setting
-exported_only = FALSE will yield a summary of inter-relationships between all
-functions of a package. Here is the summary of exported functions of the
-riskmetric package.
The primary cluster shown in purple in the preceding image has only two
-exported functions, yet is still identified as the primary cluster in this
-output because it connects the largest number of internal and exported
-functions within the package.
-
Even when called in default mode to report only
-on exported functions, the pg_report() function concludes with a statistical
-summary of documentation of non-exported functions. All functions should of
-course be documented, and these final numbers reveal that every non-exported
-function of the riskmetric package
-has a median of 2 lines of documentation, with an equivalent median value of no
-comment lines, which also reflects good and clean coding practice. The output
-of the packgraph package is intended
-to be provided at the outset of our review process as an aid to reviewers.
-
packgraph and its main dependency, pkgapi
-package, can be installed form GitHub with
Package reporting is primarily intended as an aid to reviewers of packages to
-be submitted to our peer review system. We are also developing tools to aid
-package developers, foremost among which is a package for automatic testing of
-statistical software called autotest.
-The package implements a form of “mutation testing” (sometimes called “mutation
-fuzzing”). This mutates the
-objects which are passed to the functions of the package, automatically testing
-their response to a variety of potential inputs. This frees authors from needing
-to develop tests for myriad possible edge cases.
-
autotest extracts all example
-code for a package, parses those examples to examine all objects being thrown
-at the package’s functions, and then mutates those objects to assess what
-happens. The package will ultimately have a workflow entirely compatible with
-riskmetric, and so will act as
-a plug-in extension to that package, with automatic tests themselves being
-user-controlled and modular.
-
Current tests include mutations of value, size, class, and other structural
-properties of inputs. Mutations may be expected to be acceptable – such as
-a documented example which includes some function myfn (x = TRUE), which
-would be expected to also work with x = FALSE – or they may be expected to
-generate warnings or errors, such as in response to passing a value of x = "a" to that example. Robust software should accept all appropriate mutations
-of inputs, while rejecting all inappropriate mutations.
-autotest only produces output
-where expectations are not met.
-
The package is intended as developer tool, because all
-packages to be submitted to our peer review system will be expected to yield
-clean results when submitted to
-autotest. The package will be
-able to be applied by anyone developing packages from the moment they implement
-their first exported function. The hope is then that ongoing usage of the
-package throughout the development of any statistical (or other) software will
-enhance its robustness, and reduce any chance of unexpected behaviour in
-response to inputs which developers may not otherwise have anticipated.
-
Finally, the autotest package
-will also form part of our reporting system, with its output forming part of
-reports provided to reviewers. Most importantly, we intend to
-implement mechanisms to enable users to control which tests are run on any
-particular package, and to oblige those intending to submit to our system to
-provide descriptive justifications of why particular tests may have been
-switched off. These textual explanations will then also form part of our
-reviewer reports, enabling reviewers to understand not only which kinds of tests
-package developers deem inappropriate for their software, but more importantly
-why.
-
-
Autotesting the riskmetric package
-
What happens when autotest is
-applied to the riskmetric package?
-The main function that does the work is
-autotest_package(),
-as demonstrated with the following code:
- parsing all package examples
-v parsed all package examples
- user system elapsed
- 12.41 2.31 20.83
-
And you can see that the function takes a few seconds to run. The function
-returns a tibble object, each row of which
-represents a test expectation which was not fulfilled. The package also
-implements a summary method for these objects an edited part of which looks
-like this:
-
summary (x)
-autotesting package [riskmetric, v0.1.0.9001] generated 13 rows of output of the following types:
- 0 errors
- 13 warnings
- 0 messages
- 0 other diagnosticss
-That corresponds to NaN messages per documented function (which has examples)
-
- fn_name num_errors num_warnings num_messages
-1 all_assessments NA 1 NA
-2 as_pkg_ref NA 1 NA
-3 assessment_error_as_warning NA 1 NA
-4 assessment_error_empty NA 1 NA
-5 assessment_error_throw NA 1 NA
-6 coverage NA 1 NA
-7 metric_score NA 1 NA
-8 pkg_assess NA 1 NA
-9 pkg_metric NA 1 NA
-10 pkg_ref NA 1 NA
-11 score_error_default NA 1 NA
-12 score_error_NA NA 1 NA
-13 score_error_zero NA 1 NA
- num_diagnostics
-1 NA
-2 NA
-3 NA
-4 NA
-5 NA
-6 NA
-7 NA
-8 NA
-9 NA
-10 NA
-11 NA
-12 NA
-13 NA
-
-In addition to the values in that table, the output includes 13 functions which have no documented examples:
- 1. all_assessments
- 2. as_pkg_ref
- 3. assessment_error_as_warning
- 4. assessment_error_empty
- 5. assessment_error_throw
- 6. coverage
- 7. metric_score
- 8. pkg_assess
- 9. pkg_metric
- 10. pkg_ref
- 11. score_error_default
- 12. score_error_NA
- 13. score_error_zero
-
- git hash for package as analysed here:
- [164a2e89acfce535d29d8e8ee95f8e19c85314e3]
-
The result contained no errors or diagnostic messages, and 13 warnings for
-functions which have no documented examples. These are considered as warnings,
-because the autotest package
-primarily works by scraping example code for each function, so functions with
-no examples can not be tested. A clean
-autotest result could thus be
-achieved for the riskmetric package
-by providing example code for each of those listed functions (and ensuring that
-the resultant autotest-ing of
-those examples generated no additional output).
-
-
-
-
Package Standards and Peer Review
-
In addition to the automated tools described in the preceding two sections,
-a large part of the project is devoted to devising standards for statistical
-software. One challenge we have found in developing standards is how varied
-and method-specific best practices for statistical software can be. As such,
-we are using a two-tiered approach: a “general” set of standards applicable to
-all packages, and specific standards for sub-categories of statistical software.
-A package may fall within multiple sub-categories and more than one set of these
-specific standards can apply to them.
-
We are beginning with 11 statistical sub-categories, based a practical taxonomy
-of R packages submitted to statistical journals and conferences. Full details of the categories and standards can be seen on the
-primary “living
-book”
-of the project, which describes the current categories of:
-
-
Bayesian and Monte Carlo Routines
-
Dimensionality Reduction, Clustering, and Unsupervised Learning
-
Machine Learning
-
Regression and Supervised Learning
-
Probability Distributions
-
Wrapper Packages
-
Networks
-
Exploratory Data Analysis (EDA) and Summary Statistics
-
Workflow Support
-
Spatial Analyses
-
Time Series Analyses
-
-
The tools described above aim to make the task of reviewing packages as easy as
-possible. The category-specific standards aim to ensure that software accepted
-as part of our system is of the highest possible quality. One of the primary
-tasks of reviewers will be to assess software against these standards.
-
Currently, we have initial standrads for five of these categories,
-and have released an initial call for “pilot submissions” within those categories
-to to help us test and improve the standards and the process of peer
-review. We invite any developers reading this blog who might be interested in
-submitting a statistical software package for peer review to contact us (Mark
-Padgham mark@ropensci.org and/or Noam Ross ross@ecohealthalliance.org)
-about a “pilot submission”. Your contribution would help improve the quality of
-our system, while our assessments and reviews would help improve the quality of
-your software. We look forward to any contributions to help improve our system
-for peer review of statistical software, and ultimately for helping to improve
-the quality of statistical software in R.
-
diff --git a/content/posts/news/2020-09-21-status-update-sept-2020.md b/content/posts/news/2020-09-21-status-update-sept-2020.md
new file mode 100644
index 0000000..00036fe
--- /dev/null
+++ b/content/posts/news/2020-09-21-status-update-sept-2020.md
@@ -0,0 +1,597 @@
+## Background
+
+[rOpenSci](https://ropensci.org) is an organization devoted to
+“transforming science through data, software and reproducibility.” One
+of rOpenSci’s focal activities is peer review of R packages,
+historically focusing on packages that cover the [data management
+lifecycle](https://devguide.ropensci.org/policies.html#aims-and-scope).
+This has historically excluded software implementing statistical
+methods, for which standards and review require addressing a different
+set of challenges. This year, we have begun tackling these so as to
+expand our peer review system to explicitly encompass statistical
+software, under
+[project](https://ropensci.org/blog/2019/07/15/expanding-software-review/)
+funded by the Alfred P. Sloan Foundation.
+
+Two goals for the project are to develop sets of standards for
+statistical R packages against which they can be reviewed, and to
+develop a suite of tools to support for this assessment. Many of these
+tools are intended to function automatically, and to provide overviews
+of software structure and function, as well as to automatically diagnose
+and provide information on errors, warnings, and other diagnostic
+messages issued during execution of statistical software functions.
+
+These tools relate closely R Validation Hub projects, including the
+[`riskmetric`
+package](https://www.pharmar.org/blog/2020/06/09/2020-06-02-riskmetric-intro-jun-2020/)
+and the [Risk Assessment
+Application](https://www.pharmar.org/blog/2020/08/05/2020-08-05-risk-assessment-application/).
+Both R Validation Hub and rOpenSci aim to automate, as much as possible,
+the production of a reports that can be used to evaluate software. We
+have distinct aims and scope, however, resulting in a complementary set
+of tools, which this blog post aims to highlight.
+
+## Package Reporting
+
+Our automated tools aim to provide peer-reviewers with information that
+helps them understand the structure and functionality of R packages they
+are evaluating, so they can better undertake parts of reviews which can
+not be automatically evaluated. The first of these tools is
+[`packgraph`](https://github.com/ropenscilabs/packgraph), which provides
+a templated report on function call graphs in an R package.
+
+[`packgraph`](https://github.com/ropenscilabs/packgraph) provides an
+overview of package structure and inter-relationships between package
+functions, along with an optional interactive visualization of the
+network of function calls within a package. Function call networks are
+commonly divided among distinct clusters of locally inter-connected
+functions, with the resultant visualization using a different colour to
+visually distinguish each cluster. Applying the primary function
+`pg_graph()` function to the [`riskmetric`
+package](https://github.com/pharmar/riskmetric) graphical
+representation:
+
+
+
+
+
+
+
+Each node of the network is a function, with sizes scaled by how many
+times that function is called. Each line reflects a call from one
+function to another, with a thickness scaled by numbers of calls between
+those two functions. The function at the centre of the purple star shape
+is the core `pkg_metric` function, with the long tail representing
+functions for processing errors and warnings. That graph provides an
+immediate visual representation of overall package structure, revealing
+in the case of the [`riskmetric`](https://github.com/pharmar/riskmetric)
+package a large number of effectively independent functions which are
+not directly called by other functions. Most of these isolated functions
+represent the various assessment metrics and associated caching
+procedures, which in turn reflect the modular design of the package, in
+which assessments, and the connections between these peripheral isolated
+functions, are controlled by the user rather than being hard-coded
+within the package.
+
+Most packages have more defined clusters of interconnections which this
+interactive graphical output can help to explore and understand. The
+`pg_report()` function also generates a tabular summary of this function
+call network. By default, the `pg_report()` function only summarizes
+inter-relationships between exported functions of package, although
+setting `exported_only = FALSE` will yield a summary of
+inter-relationships between all functions of a package. Here is the
+summary of exported functions of the
+[`riskmetric`](https://github.com/pharmar/riskmetric) package.
+
+ library(packgraph)
+ pkg_source <- "////riskmetric"
+ g <- pg_graph(pkg_source, plot = FALSE)
+ pg_report(g)
+
+ ══ riskmetric ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════
+
+ The package has 32 exported functions, and 215 non-exported funtions. The exported functions are structured into the following
+ 15 primary clusters containing 11, 57, 2, 15, 5, 2, 7, 7, 2, 3, 3, 3, 2, 3 and 5 functions
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 1| 1|allow_mutation |FALSE | 3| 140| 13| 0| 2|
+ | 1| 2|[[.pkg_ref |FALSE | 3| NA| 0| NA| NA|
+ | 1| 3|[[<-.pkg_ref |FALSE | 3| NA| 10| NA| NA|
+ | 1| 4|available_pkg_ref_fields |FALSE | 1| 140| 1| 0| NA|
+ | 1| 5|bare_env |FALSE | 3| 132| 9| 0| NA|
+ | 1| 6|dec_mutations_count |FALSE | 1| 41| 0| 0| NA|
+ | 1| 7|inc_mutations_count |FALSE | 1| 41| 0| 0| NA|
+ | 1| 8|names.pkg_ref |FALSE | 2| NA| 0| NA| NA|
+ | 1| 9|pkg_ref_cache |FALSE | 4| NA| 14| NA| NA|
+ | 1| 10|pkg_ref_mutability_error |FALSE | 1| 118| 7| 0| NA|
+ | 1| 11|print.pkg_ref |FALSE | 2| NA| 5| NA| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:--------------------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 2| 1|pkg_metric_eval |FALSE | 4| 139| 3| 0| 94|
+ | 2| 2|as_pkg_metric_error |FALSE | 1| 60| 8| 0| 28|
+ | 2| 3|as_pkg_metric_na |FALSE | 2| 68| 7| 0| 7|
+ | 2| 4|assessment_error_empty |TRUE | 2| 48| 1| 0| 3|
+ | 2| 5|as_pkg_metric_todo |FALSE | 2| 72| 7| 0| 2|
+ | 2| 6|get_package_dependencies |FALSE | 2| 85| 1| 0| 2|
+ | 2| 7|parse_dcf_dependencies |FALSE | 1| 29| 1| 0| 2|
+ | 2| 8|as_pkg_metric |TRUE | 2| 40| 7| 0| NA|
+ | 2| 9|as_pkg_metric.default |FALSE | 2| NA| 8| NA| NA|
+ | 2| 10|as_pkg_metric.expr_output |FALSE | 2| NA| 2| NA| NA|
+ | 2| 11|as_pkg_metric_condition |FALSE | 3| 62| 0| 0| NA|
+ | 2| 12|assess_covr_coverage.default |FALSE | 2| NA| 7| NA| NA|
+ | 2| 13|assess_covr_coverage.pkg_source |FALSE | 2| NA| 0| NA| NA|
+ | 2| 14|assess_dependencies.default |FALSE | 2| NA| 5| NA| NA|
+ | 2| 15|assess_dependencies.pkg_bioc_remote |FALSE | 2| NA| 5| NA| NA|
+ | 2| 16|assess_dependencies.pkg_cran_remote |FALSE | 2| NA| 1| NA| NA|
+ | 2| 17|assess_dependencies.pkg_install |FALSE | 2| NA| 7| NA| NA|
+ | 2| 18|assess_dependencies.pkg_source |FALSE | 2| NA| 0| NA| NA|
+ | 2| 19|assess_downloads_1yr.pkg_ref |FALSE | 2| NA| 18| NA| NA|
+ | 2| 20|assess_export_help.pkg_install |FALSE | 2| NA| 0| NA| NA|
+ | 2| 21|assess_export_help.pkg_remote |FALSE | 2| NA| 8| NA| NA|
+ | 2| 22|assess_export_help.pkg_source |FALSE | 2| NA| 1| NA| NA|
+ | 2| 23|assess_exported_namespace.default |FALSE | 2| NA| 1| NA| NA|
+ | 2| 24|assess_exported_namespace.pkg_install |FALSE | 2| NA| 0| NA| NA|
+ | 2| 25|assess_exported_namespace.pkg_source |FALSE | 2| NA| 6| NA| NA|
+ | 2| 26|assess_has_bug_reports_url.default |FALSE | 2| NA| 10| NA| NA|
+ | 2| 27|assess_has_examples.pkg_ref |FALSE | 2| NA| 8| NA| NA|
+ | 2| 28|assess_has_maintainer |TRUE | 2| 45| 1| 3| NA|
+ | 2| 29|assess_has_news.pkg_ref |FALSE | 2| NA| 0| NA| NA|
+ | 2| 30|assess_has_source_control |TRUE | 2| 53| 2| 3| NA|
+ | 2| 31|assess_has_vignettes.pkg_ref |FALSE | 2| NA| 0| NA| NA|
+ | 2| 32|assess_has_website |TRUE | 2| 46| 1| 3| NA|
+ | 2| 33|assess_last_30_bugs_status |TRUE | 2| 50| 9| 3| NA|
+ | 2| 34|assess_license |TRUE | 2| 42| 17| 3| NA|
+ | 2| 35|assess_news_current.pkg_ref |FALSE | 2| NA| 79| NA| NA|
+ | 2| 36|assess_news_current.pkg_remote |FALSE | 2| NA| 7| NA| NA|
+ | 2| 37|assess_r_cmd_check.default |FALSE | 2| NA| 0| NA| NA|
+ | 2| 38|assess_r_cmd_check.pkg_bioc_remote |FALSE | 2| NA| 2| NA| NA|
+ | 2| 39|assess_r_cmd_check.pkg_cran_remote |FALSE | 2| NA| 2| NA| NA|
+ | 2| 40|assess_r_cmd_check.pkg_source |FALSE | 2| NA| 7| NA| NA|
+ | 2| 41|assess_remote_checks.default |FALSE | 2| NA| 7| NA| NA|
+ | 2| 42|assess_remote_checks.pkg_bioc_remote |FALSE | 2| NA| 11| NA| NA|
+ | 2| 43|assess_remote_checks.pkg_cran_remote |FALSE | 2| NA| 0| NA| NA|
+ | 2| 44|assess_reverse_dependencies.default |FALSE | 2| NA| 11| NA| NA|
+ | 2| 45|assess_size_codebase.default |FALSE | 2| NA| 1| NA| NA|
+ | 2| 46|assess_size_codebase.pkg_install |FALSE | 2| NA| 2| NA| NA|
+ | 2| 47|assess_size_codebase.pkg_source |FALSE | 2| NA| 2| NA| NA|
+ | 2| 48|assessment_error_as_warning |TRUE | 3| 62| 26| 0| NA|
+ | 2| 49|assessment_error_throw |TRUE | 3| 56| 1| 0| NA|
+ | 2| 50|bug_reports_status |FALSE | 2| NA| 1| NA| NA|
+ | 2| 51|capture_expr_output |FALSE | 4| 84| 7| 0| NA|
+ | 2| 52|format_assessment_message |FALSE | 3| 198| 10| 0| NA|
+ | 2| 53|is_error |FALSE | 1| NA| 7| NA| NA|
+ | 2| 54|pkg_metric |TRUE | 3| 87| 3| 0| NA|
+ | 2| 55|pkg_ref_cache.covr_coverage.pkg_source |FALSE | 2| 41| 1| 0| NA|
+ | 2| 56|remove_base_packages |FALSE | 1| 55| 0| 0| NA|
+ | 2| 57|search_version_string |FALSE | 1| NA| 0| NA| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:----------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 3| 1|as_pkg_ref |TRUE | 2| NA| 5| NA| NA|
+ | 3| 2|pkg_ref |TRUE | 2| 501| 0| 12| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:--------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 4| 1|new_pkg_ref |FALSE | 4| NA| 3| NA| 9.0000000|
+ | 4| 2|verify_pkg_source |FALSE | 3| 62| 0| 0| 7.5000000|
+ | 4| 3|pkg_install |FALSE | 2| NA| 7| NA| 7.3333333|
+ | 4| 4|is_available_bioc |FALSE | 2| NA| 12| NA| 2.5000000|
+ | 4| 5|is_available_cran |FALSE | 3| NA| 3| NA| 2.5000000|
+ | 4| 6|determine_pkg_source |FALSE | 3| 46| 6| 0| 1.8333333|
+ | 4| 7|pkg_bioc |FALSE | 1| NA| 7| NA| 0.3333333|
+ | 4| 8|pkg_cran |FALSE | 2| NA| 7| NA| 0.3333333|
+ | 4| 9|pkg_missing |FALSE | 1| NA| 12| NA| 0.3333333|
+ | 4| 10|pkg_source |FALSE | 1| NA| 4| NA| 0.3333333|
+ | 4| 11|as_pkg_ref.character |FALSE | 5| NA| 1| NA| NA|
+ | 4| 12|get_pkg_ref_classes |FALSE | 2| 75| 1| 0| NA|
+ | 4| 13|is_url_subpath_of |FALSE | 2| 50| 1| 0| NA|
+ | 4| 14|pkg_cohort |FALSE | 0| NA| 1| NA| NA|
+ | 4| 15|pkg_library |FALSE | 1| NA| 3| NA| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:----------------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 5| 1|examples_from_dir |FALSE | 2| 96| 7| 0| 1|
+ | 5| 2|examples_from_pkg |FALSE | 1| 94| 6| 0| 1|
+ | 5| 3|filter_rd_db |FALSE | 1| 81| 31| 0| NA|
+ | 5| 4|pkg_ref_cache.examples.pkg_install |FALSE | 3| NA| 7| NA| NA|
+ | 5| 5|pkg_ref_cache.examples.pkg_source |FALSE | 3| NA| 7| NA| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:-----------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 6| 1|format.pkg_metric |FALSE | 2| NA| 7| NA| NA|
+ | 6| 2|with_unclassed_to |FALSE | 4| 65| 7| 0| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:----------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 7| 1|metric_score |TRUE | 2| 60| 1| 0| 1|
+ | 7| 2|firstS3method |FALSE | 3| 107| 1| 0| NA|
+ | 7| 3|get_assessment_columns |FALSE | 1| 72| 5| 0| NA|
+ | 7| 4|metric_score_condition |FALSE | 2| NA| 8| NA| NA|
+ | 7| 5|pkg_score.list_of_pkg_metric |FALSE | 3| NA| 5| NA| NA|
+ | 7| 6|pkg_score.tbl_df |FALSE | 3| NA| 7| NA| NA|
+ | 7| 7|summarize_scores |TRUE | 2| 214| 1| 10| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:-----------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 8| 1|all_assessments |TRUE | 0| 57| 5| 0| NA|
+ | 8| 2|pkg_assess |TRUE | 4| 162| 1| 0| NA|
+ | 8| 3|pkg_assess.list_of_pkg_ref |FALSE | 4| NA| 1| NA| NA|
+ | 8| 4|pkg_assess.pkg_ref |FALSE | 4| NA| 1| NA| NA|
+ | 8| 5|pkg_assess.tbl_df |FALSE | 4| NA| 18| NA| NA|
+ | 8| 6|roxygen_assess_family_catalog |FALSE | 0| 66| 0| 3| NA|
+ | 8| 7|use_assessments_column_names |FALSE | 1| 70| 7| 0| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:---------------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 9| 1|pkg_ref_cache.bug_reports.default |FALSE | 2| NA| 0| NA| NA|
+ | 9| 2|scrape_bug_reports |FALSE | 2| NA| 2| NA| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:------------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 10| 1|news_from_dir |FALSE | 1| 62| 3| 0| NA|
+ | 10| 2|pkg_ref_cache.news.pkg_install |FALSE | 3| NA| 3| NA| NA|
+ | 10| 3|pkg_ref_cache.news.pkg_source |FALSE | 3| NA| 0| NA| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:---------------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 11| 1|pkg_ref_cache.news.pkg_remote |FALSE | 3| 51| 6| 0| NA|
+ | 11| 2|pkg_ref_cache.web_html.pkg_remote |FALSE | 3| NA| 2| NA| NA|
+ | 11| 3|suppressMatchingConditions |FALSE | 4| 65| 0| 0| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:-----------------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 12| 1|pkg_ref_cache.vignettes.pkg_install |FALSE | 3| NA| 3| NA| NA|
+ | 12| 2|pkg_ref_cache.vignettes.pkg_source |FALSE | 3| NA| 7| NA| NA|
+ | 12| 3|vignettes_from_dir |FALSE | 1| 67| 7| 0| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:----------------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 13| 1|pkg_ref_cache.vignettes.pkg_remote |FALSE | 3| NA| 7| NA| NA|
+ | 13| 2|vignettes_from_html |FALSE | 1| 67| 0| 0| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:-------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 14| 1|bug_report_metadata |FALSE | 2| 34| 20| 0| NA|
+ | 14| 2|scrape_bug_reports.github |FALSE | 2| NA| 1| NA| NA|
+ | 14| 3|scrape_bug_reports.gitlab |FALSE | 2| NA| 15| NA| NA|
+
+
+ | cluster| n|name |exported | num_params| num_doc_words| num_doc_lines| num_example_lines| centrality|
+ |-------:|--:|:---------------------------|:--------|----------:|-------------:|-------------:|-----------------:|----------:|
+ | 15| 1|standardize_weights |FALSE | 2| NA| 7| NA| 2|
+ | 15| 2|add_default_weights |FALSE | 1| NA| 8| NA| NA|
+ | 15| 3|check_weights |FALSE | 1| NA| 1| NA| NA|
+ | 15| 4|summarize_scores.data.frame |FALSE | 2| NA| 7| NA| NA|
+ | 15| 5|summarize_scores.list |FALSE | 2| NA| 1| NA| NA|
+
+ There are also 120 isolated functions:
+
+
+ | n|name | loc|
+ |---:|:---------------------------------------------------|---:|
+ | 1|$.pkg_ref | 3|
+ | 2|$<-.pkg_ref | 3|
+ | 3|%||% | 1|
+ | 4|.DollarNames.pkg_ref | 3|
+ | 5|.onLoad | 20|
+ | 6|[.pkg_ref | 3|
+ | 7|[<-.pkg_ref | 5|
+ | 8|as_pkg_ref.default | 6|
+ | 9|as_tibble.list_of_pkg_ref | 9|
+ | 10|as_tibble.pkg_ref | 5|
+ | 11|assess_covr_coverage | 3|
+ | 12|assess_dependencies | 3|
+ | 13|assess_downloads_1yr | 3|
+ | 14|assess_export_help | 3|
+ | 15|assess_exported_namespace | 3|
+ | 16|assess_has_bug_reports_url | 3|
+ | 17|assess_has_examples | 3|
+ | 18|assess_has_news | 3|
+ | 19|assess_has_vignettes | 3|
+ | 20|assess_news_current | 3|
+ | 21|assess_r_cmd_check | 3|
+ | 22|assess_remote_checks | 3|
+ | 23|assess_reverse_dependencies | 3|
+ | 24|assess_size_codebase | 3|
+ | 25|bug_reports_status.github_bug_report | 3|
+ | 26|bug_reports_status.gitlab_bug_report | 3|
+ | 27|format.pkg_metric_error | 4|
+ | 28|format.pkg_missing | 4|
+ | 29|format.pkg_ref | 4|
+ | 30|get_assessments | 5|
+ | 31|metric_score.default | 16|
+ | 32|metric_score.pkg_metric_dependencies | 3|
+ | 33|metric_score.pkg_metric_export_help | 3|
+ | 34|metric_score.pkg_metric_exported_namespace | 3|
+ | 35|metric_score.pkg_metric_has_bug_reports_url | 3|
+ | 36|metric_score.pkg_metric_has_examples | 7|
+ | 37|metric_score.pkg_metric_has_maintainer | 3|
+ | 38|metric_score.pkg_metric_has_news | 3|
+ | 39|metric_score.pkg_metric_has_source_control | 3|
+ | 40|metric_score.pkg_metric_has_vignettes | 3|
+ | 41|metric_score.pkg_metric_has_website | 3|
+ | 42|metric_score.pkg_metric_last_30_bugs_status | 3|
+ | 43|metric_score.pkg_metric_news_current | 3|
+ | 44|metric_score.pkg_metric_r_cmd_check | 3|
+ | 45|metric_score.pkg_metric_remote_checks | 3|
+ | 46|metric_score.pkg_metric_reverse_dependencies | 3|
+ | 47|metric_score_condition.pkg_metric_error | 3|
+ | 48|metric_score_condition.pkg_metric_na | 3|
+ | 49|metric_score_condition.pkg_metric_todo | 3|
+ | 50|pillar_shaft.list_of_pkg_metric | 15|
+ | 51|pillar_shaft.list_of_pkg_ref | 4|
+ | 52|pillar_shaft.pkg_metric_error | 6|
+ | 53|pkg_ref_cache.archive_release_dates | 3|
+ | 54|pkg_ref_cache.archive_release_dates.pkg_cran_remote | 14|
+ | 55|pkg_ref_cache.bug_reports | 3|
+ | 56|pkg_ref_cache.bug_reports_host | 3|
+ | 57|pkg_ref_cache.bug_reports_host.default | 4|
+ | 58|pkg_ref_cache.bug_reports_url | 3|
+ | 59|pkg_ref_cache.bug_reports_url.pkg_bioc_remote | 6|
+ | 60|pkg_ref_cache.bug_reports_url.pkg_cran_remote | 6|
+ | 61|pkg_ref_cache.bug_reports_url.pkg_install | 3|
+ | 62|pkg_ref_cache.bug_reports_url.pkg_source | 6|
+ | 63|pkg_ref_cache.covr_coverage | 3|
+ | 64|pkg_ref_cache.description | 3|
+ | 65|pkg_ref_cache.description.pkg_install | 3|
+ | 66|pkg_ref_cache.description.pkg_source | 3|
+ | 67|pkg_ref_cache.downloads | 3|
+ | 68|pkg_ref_cache.examples | 3|
+ | 69|pkg_ref_cache.expression_coverage | 3|
+ | 70|pkg_ref_cache.expression_coverage.pkg_source | 3|
+ | 71|pkg_ref_cache.help | 3|
+ | 72|pkg_ref_cache.help.pkg_install | 3|
+ | 73|pkg_ref_cache.help.pkg_source | 3|
+ | 74|pkg_ref_cache.help_aliases | 3|
+ | 75|pkg_ref_cache.help_aliases.pkg_install | 3|
+ | 76|pkg_ref_cache.help_aliases.pkg_source | 15|
+ | 77|pkg_ref_cache.license | 3|
+ | 78|pkg_ref_cache.license.default | 4|
+ | 79|pkg_ref_cache.license.pkg_bioc_remote | 5|
+ | 80|pkg_ref_cache.license.pkg_cran_remote | 5|
+ | 81|pkg_ref_cache.maintainer | 3|
+ | 82|pkg_ref_cache.maintainer.pkg_install | 18|
+ | 83|pkg_ref_cache.maintainer.pkg_remote | 5|
+ | 84|pkg_ref_cache.news | 3|
+ | 85|pkg_ref_cache.news_urls | 3|
+ | 86|pkg_ref_cache.news_urls.pkg_bioc_remote | 11|
+ | 87|pkg_ref_cache.news_urls.pkg_cran_remote | 9|
+ | 88|pkg_ref_cache.r_cmd_check | 3|
+ | 89|pkg_ref_cache.r_cmd_check.default | 3|
+ | 90|pkg_ref_cache.r_cmd_check.pkg_source | 4|
+ | 91|pkg_ref_cache.release_date | 3|
+ | 92|pkg_ref_cache.release_date.pkg_install | 5|
+ | 93|pkg_ref_cache.release_date.pkg_remote | 5|
+ | 94|pkg_ref_cache.remote_checks | 3|
+ | 95|pkg_ref_cache.remote_checks.default | 3|
+ | 96|pkg_ref_cache.remote_checks.pkg_bioc_remote | 19|
+ | 97|pkg_ref_cache.remote_checks.pkg_cran_remote | 11|
+ | 98|pkg_ref_cache.repo_base_url | 3|
+ | 99|pkg_ref_cache.repo_base_url.pkg_remote | 3|
+ | 100|pkg_ref_cache.source_control_url | 6|
+ | 101|pkg_ref_cache.tarball_url | 3|
+ | 102|pkg_ref_cache.tarball_url.pkg_remote | 3|
+ | 103|pkg_ref_cache.vignettes | 3|
+ | 104|pkg_ref_cache.web_html | 3|
+ | 105|pkg_ref_cache.web_url | 3|
+ | 106|pkg_ref_cache.web_url.pkg_bioc_remote | 3|
+ | 107|pkg_ref_cache.web_url.pkg_cran_remote | 3|
+ | 108|pkg_ref_cache.website_urls | 3|
+ | 109|pkg_ref_cache.website_urls.default | 4|
+ | 110|pkg_ref_cache.website_urls.pkg_remote | 6|
+ | 111|pkg_score | 3|
+ | 112|print.with_eval_recording | 55|
+ | 113|require_cache_behaviors | 15|
+ | 114|roxygen_assess_family | 18|
+ | 115|roxygen_cache_behaviors | 11|
+ | 116|roxygen_score_family | 26|
+ | 117|scrape_bug_reports.default | 8|
+ | 118|vec_cast.character.list_of_pkg_ref | 3|
+ | 119|vec_cast.double.list_of_pkg_metric | 7|
+ | 120|with.pkg_ref | 5|
+
+ ── Summary of 32 exported functions ─────────────────────────────────────────────────────────────────────────────────────────────
+
+
+ |value | num_params| num_lines| doclines| cmtlines|
+ |:------|----------:|---------:|--------:|--------:|
+ |mean | 2.1| 4.4| 5| 1.5|
+ |median | 2.0| 3.0| 1| 1.0|
+
+ ── Summary of 215 non-exported functions ────────────────────────────────────────────────────────────────────────────────────────
+
+
+ |value | num_params| num_lines| doclines| cmtlines|
+ |:------|----------:|---------:|--------:|--------:|
+ |mean | 2.3| 8.3| 5.2| 0.7|
+ |median | 2.0| 5.0| 3.0| 0.0|
+
+The primary cluster shown in purple in the preceding image has only two
+exported functions, yet is still identified as the primary cluster in
+this output because it connects the largest number of internal and
+exported functions within the package.
+
+Even when called in default mode to report only on exported functions,
+the `pg_report()` function concludes with a statistical summary of
+documentation of non-exported functions. All functions should of course
+be documented, and these final numbers reveal that every non-exported
+function of the [`riskmetric`](https://github.com/pharmar/riskmetric)
+package has a median of 2 lines of documentation, with an equivalent
+median value of no comment lines, which also reflects good and clean
+coding practice. The output of the [`packgraph`
+package](https://github.com/ropenscilabs/packgraph) is intended to be
+provided at the outset of our review process as an aid to reviewers.
+
+`packgraph` and its main dependency, [`pkgapi`
+package](https://github.com/r-lib/pkgapi), can be installed form GitHub
+with
+
+ remotes::intall_github("r-lib/pkgapi")`
+ remotes::install_github("ropenscilabs/packgraph")
+
+## Package Testing
+
+Package reporting is primarily intended as an aid to reviewers of
+packages to be submitted to our peer review system. We are also
+developing tools to aid package developers, foremost among which is a
+package for automatic testing of statistical software called
+[`autotest`](https://github.com/ropenscilabs/autotest). The package
+implements a form of “mutation testing” (sometimes called [“mutation
+fuzzing”](https://www.fuzzingbook.org/html/MutationFuzzer.html)). This
+mutates the objects which are passed to the functions of the package,
+automatically testing their response to a variety of potential inputs.
+This frees authors from needing to develop tests for myriad possible
+edge cases.
+
+[`autotest`](https://github.com/ropenscilabs/autotest) extracts all
+example code for a package, parses those examples to examine all objects
+being thrown at the package’s functions, and then mutates those objects
+to assess what happens. The package will ultimately have a workflow
+entirely compatible with
+[`riskmetric`](https://github.com/pharmar/riskmetric), and so will act
+as a plug-in extension to that package, with automatic tests themselves
+being user-controlled and modular.
+
+Current tests include mutations of value, size, class, and other
+structural properties of inputs. Mutations may be expected to be
+acceptable – such as a documented example which includes some function
+`myfn (x = TRUE)`, which would be expected to also work with `x = FALSE`
+– or they may be expected to generate warnings or errors, such as in
+response to passing a value of `x = "a"` to that example. Robust
+software should accept all appropriate mutations of inputs, while
+rejecting all inappropriate mutations.
+[`autotest`](https://github.com/ropenscilabs/autotest) only produces
+output where expectations are not met.
+
+The package is intended as developer tool, because all packages to be
+submitted to our peer review system will be expected to yield clean
+results when submitted to
+[`autotest`](https://github.com/ropenscilabs/autotest). The package will
+be able to be applied by anyone developing packages from the moment they
+implement their first exported function. The hope is then that ongoing
+usage of the package throughout the development of any statistical (or
+other) software will enhance its robustness, and reduce any chance of
+unexpected behaviour in response to inputs which developers may not
+otherwise have anticipated.
+
+Finally, the [`autotest`](https://github.com/ropenscilabs/autotest)
+package will also form part of our reporting system, with its output
+forming part of reports provided to reviewers. Most importantly, we
+intend to implement mechanisms to enable users to control which tests
+are run on any particular package, and to oblige those intending to
+submit to our system to provide descriptive justifications of why
+particular tests may have been switched off. These textual explanations
+will then also form part of our reviewer reports, enabling reviewers to
+understand not only which kinds of tests package developers deem
+inappropriate for their software, but more importantly why.
+
+### Autotesting the riskmetric package
+
+What happens when [`autotest`](https://github.com/ropenscilabs/autotest)
+is applied to the [`riskmetric`](https://github.com/pharmar/riskmetric)
+package? The main function that does the work is
+[`autotest_package()`](https://ropenscilabs.github.io/autotest/reference/autotest_package.html),
+as demonstrated with the following code:
+
+ library(autotest)
+ system.time(x <- autotest_package("////riskmetric"))
+
+ ★ Extracting example code from 102 .Rd files
+ | | | 0% | |= | 1% | |= | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 6% | |===== | 7% | |===== | 8% | |====== | 9% | |======= | 10% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |=========== | 16% | |============ | 17% | |============ | 18% | |============= | 19% | |============== | 20% | |============== | 21% | |=============== | 22% | |================ | 23% | |================ | 24% | |================= | 25% | |================== | 25% | |=================== | 26% | |=================== | 27% | |==================== | 28% | |===================== | 29% | |===================== | 30% | |====================== | 31% | |======================= | 32% | |======================= | 33% | |======================== | 34% | |========================= | 35% | |========================= | 36% | |========================== | 37% | |=========================== | 38% | |=========================== | 39% | |============================ | 40% | |============================= | 41% | |============================== | 42% | |============================== | 43% | |=============================== | 44% | |================================ | 45% | |================================ | 46% | |================================= | 47% | |================================== | 48% | |================================== | 49% | |=================================== | 50% | |==================================== | 51% | |==================================== | 52% | |===================================== | 53% | |====================================== | 54% | |====================================== | 55% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 59% | |========================================== | 60% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 63% | |============================================= | 64% | |============================================= | 65% | |============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |====================================================== | 76% | |====================================================== | 77% | |======================================================= | 78% | |======================================================== | 79% | |======================================================== | 80% | |========================================================= | 81% | |========================================================== | 82% | |========================================================== | 83% | |=========================================================== | 84% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 90% | |================================================================ | 91% | |================================================================= | 92% | |================================================================= | 93% | |================================================================== | 94% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
+ ✔ Extracted example code
+
+ ── autotesting riskmetric ──
+
+ user system elapsed
+ 0.299 0.051 0.464
+
+And you can see that the function takes a few seconds to run. The
+function returns a [`tibble`](https://tibble.tidyverse.org) object, each
+row of which represents a test expectation which was not fulfilled. The
+package also implements a `summary` method for these objects an edited
+part of which looks like this:
+
+ summary(x)
+ Length Class Mode
+ 0 NULL NULL
+
+The result contained no errors or diagnostic messages, and 13 warnings
+for functions which have no documented examples. These are considered as
+warnings, because the
+[`autotest`](https://github.com/ropenscilabs/autotest) package primarily
+works by scraping example code for each function, so functions with no
+examples can not be tested. A clean
+[`autotest`](https://github.com/ropenscilabs/autotest) result could thus
+be achieved for the
+[`riskmetric`](https://github.com/pharmar/riskmetric) package by
+providing example code for each of those listed functions (and ensuring
+that the resultant
+[`autotest`](https://github.com/ropenscilabs/autotest)-ing of those
+examples generated no additional output).
+
+## Package Standards and Peer Review
+
+In addition to the automated tools described in the preceding two
+sections, a large part of the project is devoted to devising standards
+for statistical software. One challenge we have found in developing
+standards is how varied and method-specific best practices for
+statistical software can be. As such, we are using a two-tiered
+approach: a “general” set of standards applicable to all packages, and
+specific standards for sub-categories of statistical software. A package
+may fall within multiple sub-categories and more than one set of these
+specific standards can apply to them.
+
+We are beginning with 11 statistical sub-categories, based a practical
+taxonomy of R packages submitted to statistical journals and
+conferences. Full details of the categories and standards can be seen on
+the primary [“living
+book”](https://ropenscilabs.github.io/statistical-software-review-book/index.html)
+of the project, which describes the current categories of:
+
+1. Bayesian and Monte Carlo Routines
+2. Dimensionality Reduction, Clustering, and Unsupervised Learning
+3. Machine Learning
+4. Regression and Supervised Learning
+5. Probability Distributions
+6. Wrapper Packages
+7. Networks
+8. Exploratory Data Analysis (EDA) and Summary Statistics
+9. Workflow Support
+10. Spatial Analyses
+11. Time Series Analyses
+
+The tools described above aim to make the task of reviewing packages as
+easy as possible. The category-specific standards aim to ensure that
+software accepted as part of our system is of the highest possible
+quality. One of the primary tasks of reviewers will be to assess
+software against these standards.
+
+Currently, we have initial standrads for [five of these
+categories](https://ropenscilabs.github.io/statistical-software-review-book/standards.html),
+and have released an initial call for “pilot submissions” within those
+categories to to help us test and improve the standards and the process
+of peer review. We invite any developers reading this blog who might be
+interested in submitting a statistical software package for peer review
+to contact us (Mark Padgham and/or Noam Ross
+) about a “pilot submission”. Your
+contribution would help improve the quality of our system, while our
+assessments and reviews would help improve the quality of your software.
+We look forward to any contributions to help improve our system for peer
+review of statistical software, and ultimately for helping to improve
+the quality of statistical software in R.
diff --git a/content/posts/news/2021-01-13-status-update-a-summary-of-2020.html b/content/posts/news/2021-01-13-status-update-a-summary-of-2020.html
deleted file mode 100644
index fc7fdea..0000000
--- a/content/posts/news/2021-01-13-status-update-a-summary-of-2020.html
+++ /dev/null
@@ -1,56 +0,0 @@
----
-title: 'Status Update: A summary of 2020'
-author: Executive Committee
-date: '2021-01-13'
-slug: status-update-a-summary-of-2020
-categories:
- - news
-tags:
- - riskmetric
- - risk assessment
- - white paper
-banner: 'img/risk_assessment_app/db_dashboard_downloads.png'
----
-
-
-
-
2020 was a busy year for the R Validation Hub. We released our white paper describing our current thinking on a risk based approach to using R for regulatory work. We started to support the implementation of the white paper with tools such as riskmetric and our risk assessment application. And we started a new sub-team with the aim of producing a follow-up white paper on testing. Throughout, we have continued to share and gain feedback on our proposed approach, presenting at User!; running a workshop at R/Pharma; and speaking at an EU Programming Heads meeting in June.
-
Following the release of the white paper, where did we get to by the end of 2020 and what are our plans for 2021?
-
-
The riskmetric Package
-
In 2020, we continued to develop the riskmetric package and began to align development with the Risk Assessment Shiny app (see below). At the time of writing the package includes 12 metric assessments.
-
We provided a workshop to guide users on how to implement a risk-based approach to qualify R packages at the R/Pharma conference. This provided the team with additional ideas for metrics. We are aiming to release the first version of riskmetric package to CRAN in early 2021.
-
-
-
Risk Assessment Application
-
At the end summer of 2020, Fission Labs finished building the first iteration of the application. Since then, the application has gone through a series of updates and upgrades:
-
-
Restructured the backend to address issues like code duplication and inefficient queries to the database
-
Added a database dashboard: it displays each package on the database including version, risk score, decision, and last comment. In addition, it allows the download of multiple report at once;
-
Enhanced the plot depicting the number of downloads per package: the plot is now a timeline-like plot that shows the number of downloads for different periods of time, e.g., last year and since last version.
-
-
-
-
-
-
-
The next major milestone for the application is adding the ability to change the weights for each metric. This will enable the user to disregard/emphasize metrics of interest and so portray a risk score akin to the user’s validation needs.
-
-
-
Testing
-
In addition to the plans for riskmetric and the Risk Assessment App, we recently launched a new subteam with the focus of producing a white paper on testing. And we will release an accompanying suite of tests, embededded within a re-usable qualification framework built upon testthat. And if you read the recent blog post from Mark Padgham you’ll note that we are maintaining close links with the ROpenSci testing initiative.
-
-
-
R Consortium Developments
-
The R Validation Hub is now an R Consortium working group. Since the start of 2020, two further working groups have been initiatied with the help of the R Consortium:
Submissions: Focus on IT and platform challenges that must be addressed in order to make “all R” regulatory submissions.
-
-
There is already a huge overlap in membership between the initiatives and in 2021 we will be looking to ensure that we continue to work closely together as we head towards the common goal of enabling the adoption of R within a biopharmaceutical regulatory setting.
-
-
-
Looking Ahead
-
By the end of 2021 we expect to have delivered a complete set of practical tools that can be used to assess R package risk and accuracy. From there we’ll be looking to refine and build on these foundations - more metrics, more tests. Away from these core contributions the group continues to work with partner efforts to faciliate R adoption within our industry.
-
If you are interested in getting involved in contributing to our technical efforts then please send an email to psi.aims.r.validation@gmail.com. Or join our mailing list to receive an invite to our quarterly meeting and receive notifications of new blog posts.
Additional submissions are ongoing, so stay tuned for updates. All public presentations will be made available on the presentations page.
-
-
Make sure to recommend our presentations and sessions to your colleagues and peers.
-It is an opportunity to learn more about the work by the R validation hub supporting the adoption of R within a biopharmaceutical regulatory setting. We are looking forward to the interaction and feedback from the community.
-
If you are interested in getting involved in contributing to our technical efforts then please send an email to psi.aims.r.validation@gmail.com. Or join our mailing list to receive an invite to our quarterly meeting and receive notifications of new blog posts.
diff --git a/content/posts/news/2021-03-08-status-update-conferences-2021.Rmd b/content/posts/news/2021-03-08-status-update-conferences-2021.md
similarity index 100%
rename from content/posts/news/2021-03-08-status-update-conferences-2021.Rmd
rename to content/posts/news/2021-03-08-status-update-conferences-2021.md
diff --git a/content/posts/news/2021-04-01-status-update-riskmetric-cran.html b/content/posts/news/2021-04-01-status-update-riskmetric-cran.html
deleted file mode 100644
index 7c7bd0f..0000000
--- a/content/posts/news/2021-04-01-status-update-riskmetric-cran.html
+++ /dev/null
@@ -1,47 +0,0 @@
----
-title: 'Status Update: CRAN Release of `riskmetric`'
-author: Juliane Manitz, Yilong Zhang and Doug Kelkhoff
-date: '2021-04-02'
-slug: status-update-riskmetric-cran
-categories:
- - news
-tags:
- - riskmetric
- - risk assessment
- - white paper
-banner: 'img/banners/news.png'
----
-
-
-
-
-
We have reached a major milestone. The R package riskmetric has been released and is now available on CRAN.
-
-
What is riskmetric?
-
riskmetric is a collection of risk metrics to evaluate the quality of R packages following the framework suggested by the R validation hub (see our white paper for details).
-Various quality metrics are provided which evaluate best practices of software development, code documentation, community engagement and development sustainability. This package serves as a starting point for exploring the heterogeneity of code quality, and begin a broader conversation about the validation of R packages.
-
-
-
How to use it?
-
We separate three steps in the workflow to assess the risk of an R package using riskmetric:
-
-
Finding a source for package information (installed package or CRAN/git source) pkg_ref()
-
Assessing the package under validation criteria pkg_assess()
-
Scoring assessment criteria pkg_score()
-
-
The results will be assembled in a dataset of validation criteria containing an overall risk score for each package
-
A detailed demo can be found in this blogpost from last year.
-
-
-
What comes next?
-
The development of riskmetric continues and it is a community project. Comfort with a quantification of risk comes via consensus, and for that this project is dependent on close community engagement. There are plenty of ways to help:
-
-
Share the package
-
File issues when you encounter bugs
-
Weigh in on proposed metrics, or suggest a new one
-
Help us devise the best way to summarize risk into a single score
-
Help us keep documentation up to date
-
Contribute code to tackle the metric backlog
-
-
If you are interested in getting involved in contributing to our technical efforts then please send an email to psi.aims.r.validation@gmail.com. Or join our mailing list to receive an invite to our quarterly meeting and receive notifications of new blog posts.
-
diff --git a/content/posts/news/2021-04-01-status-update-riskmetric-cran.Rmd b/content/posts/news/2021-04-01-status-update-riskmetric-cran.md
similarity index 100%
rename from content/posts/news/2021-04-01-status-update-riskmetric-cran.Rmd
rename to content/posts/news/2021-04-01-status-update-riskmetric-cran.md
diff --git a/content/posts/news/2021-05-20-status-update-transcelerate-msa-framework.html b/content/posts/news/2021-05-20-status-update-transcelerate-msa-framework.html
deleted file mode 100644
index 96d546a..0000000
--- a/content/posts/news/2021-05-20-status-update-transcelerate-msa-framework.html
+++ /dev/null
@@ -1,25 +0,0 @@
----
-title: 'Status Update: MSA Framework by TransCelerate '
-author: Juliane Manitz and Joe Rickert
-date: '2021-05-20'
-slug: status-update-transcelerate-msa-framework
-categories:
- - news
-tags:
- - risk assessment
- - white paper
-banner: 'img/banners/news.png'
----
-
-
-
-
-
TransCelerate has published “Modernization of statistical analytics (MSA) Framework”.
-With goals similar to the R Validation Hub, the TransCelerate MSA framework seeks to demonstrate software reliability by establishing principles of accuracy, traceability, and reproducibility for a modern analytical software environment.
-
The MSA framework is centered around risk-assessment and mitigation practices to demonstrate reliability of software.
-This framework suggests assessing the accuracy of a software library via a confidence measure built on risk metrics such as published source code, issue management, usage, maturity, etc. If confidence does not meet the highest standards, additional testing is recommended. In addition, the intended use of a particular software library and the impact to the broader business outcome determines the requirement for additional testing. Altogether, the MSA framework is in line with the suggestions published in the R Validation Hub 2020 white paper for R.
-While the TransCelerate authors suggest their principles apply to a broad range of software, e.g. SAS, R, Python, Julia, etc, they do not provide specific suggestions for the implementation of their framework. The R Validation Hub can support the implementation of MSA inspired features with the R package riskmetric and the respective shiny app.
-
Beyond accuracy assessment, the MSA framework emphasizes reproducibility and traceability requirements and offers a variety of arguments for implementing MSA including the the emergence of electronic and digital data sources, the quantity of data being collected, the desire to automate analyses, and the accelerated pace of innovation in analysis techniques.
-
The R Validation Hub project team welcomes the publication of the TransCelerate MSA framework, and due to the close alignment between the two frameworks, we consider our early vision “validated” ;)
-
-
If you are interested in getting involved in contributing to our technical efforts then please send an email to psi.aims.r.validation@gmail.com. Or join our mailing list to receive an invite to our quarterly meeting and receive notifications of new blog posts.
diff --git a/content/posts/news/2021-05-20-status-update-transcelerate-msa-framework.Rmd b/content/posts/news/2021-05-20-status-update-transcelerate-msa-framework.md
similarity index 100%
rename from content/posts/news/2021-05-20-status-update-transcelerate-msa-framework.Rmd
rename to content/posts/news/2021-05-20-status-update-transcelerate-msa-framework.md
diff --git a/content/posts/news/2021-10-19-status-update-R-pharma_2021.html b/content/posts/news/2021-10-19-status-update-R-pharma_2021.html
deleted file mode 100644
index 5077cee..0000000
--- a/content/posts/news/2021-10-19-status-update-R-pharma_2021.html
+++ /dev/null
@@ -1,36 +0,0 @@
----
-title: 'Status Update: R Validation Hub at R/Pharma 2021'
-author: Juliane Manitz
-date: '2021-10-19'
-slug: status-update-R-pharma-2021
-categories:
- - news
-banner: 'img/banners/news.png'
----
-
-
-
-
Our favorite meeting of the year is approaching: R/Pharma 2021 will be held virtually November 2-4th, 2021. Also watch out for workshops the week before.
-Since the R validation Hub is closely associated with R/Pharma, we would like to highlight some of the presentations inspired by the R validation Hub, partner initiatives, or generally related to the topic of validation.
-
Tuesday: November 2, 2021
-
-
11:00am “R Package Validation at Roche” by Coline Zeballos, Roche
-
11:20am “Statistical Analysis and Pathway to a Risk-based Assessment of R packages at Merck KGaA/EMD Serono” by Juliane Manitz, EMD Serono
-
1:50 PM “Panel Discussion – Validation” with representatives of the R validation Hub”
-
-
Wednesday: November 3, 2021
-
-
11.40am “R Consortium Pharma Working Groups Overview and Updates” by Ning Leng, Roche
-
10:10am “Submitting Data to CDER: What Comes Next?” Heather Crandall and Paul Schuette, FDA
-
11:00am “Using {valtools} for software validation separate from development” Marie Vendettouli, Fred Hutch
-
1:00pm “Reimagine the R package distribution system for reproducible research and submissions” by Nan Xiao, Merck
-
-
Thursday: November 4, 2021
-
-
12:40pm “Performing a risk assessment of R packages using the Risk Assessment Shiny Application” by Marly Gotti, Biogen
-
-
Friday: November 5, 2021
-
-
10:00-13:00am Workshop “R Package Validation Framework” hosted by Ellis Hughes (gsk) & Marie Vendettuoli (Fred Hutch)
-
-
We are looking forward to see you at R/Pharma, hear your feedback and have engaging discussions. If you are not registered yet, don’t wait to do so!
diff --git a/content/posts/news/2021-10-19-status-update-R-pharma_2021.Rmd b/content/posts/news/2021-10-19-status-update-R-pharma_2021.md
similarity index 100%
rename from content/posts/news/2021-10-19-status-update-R-pharma_2021.Rmd
rename to content/posts/news/2021-10-19-status-update-R-pharma_2021.md
diff --git a/content/posts/news/2021-12-10-trusted-resources.html b/content/posts/news/2021-12-10-trusted-resources.html
deleted file mode 100644
index 49794e6..0000000
--- a/content/posts/news/2021-12-10-trusted-resources.html
+++ /dev/null
@@ -1,23 +0,0 @@
----
-title: 'Some Considerations on Trusted Resources'
-author: Juliane Manitz, Yilong Zhang, and Andy Nicholls
-date: '2022-01-07'
-slug: trusted-resources
-categories:
- - news
-banner: 'img/banners/news.png'
----
-
-
-
-
-
There is a large variety of contributed R packages, which can be overwhelming when performing their accuracy assessment. These packages can be developed by anyone and may differ in accuracy. The white paper mentions the possibility to define “trusted resources” to simplify the assessment for some of the R packages.
-
The idea follows vendor assessments / audits to explore the internal validation practices of the vendor for proprietary software. For open-source software such audits are not logistically feasible. However, based on information available in the open-source domain, it may still be possible to perform a virtual audit of a vendor and their practices. In this context, we encourage the publication of software development life cycles (SDLC) documents, which support the process of risk assessment and provide evidence of software trustworthiness.
Some encouraging examples of contributed packages are:
-
-
The tidyverse, which is a commercially supported collection of contributed R packages. Rstudio has provided validation guidance for various packages including tidyverse, tidymodels, r-lib, and gt as well as shiny and rmarkdown. If an organization feels comfortable to list the Rstudio development team as a trusted resource, these SDLC documents could be used to qualify respective packages.
-
Another early lighthouse project example is stan, which is broadly used in the pharmaceutical industry. The development team has published their Software Development Lifecycle practices. The layout and content of this document very closely follows the original publication by the R consortium.
-
-
The members of the R validation executive committee like this trend and would be happy to see more of this. Package developers could consider providing information on SDLC or validation reports as vignettes. Recent package developments like valtools can support such efforts.
-
Note that the presence of an SDLC document alone is insufficient to provide documented evidence as to why a sponsor trusts the software vendor / creator. As with a physical audit, it remains the responsibility of the sponsor to establish whether such SDLC are actually being followed. Typically, the sponsor would create their own summary document highlighting why they consider the organisation a ‘trusted resource’. Such a document would likely need to explain why they consider the SDLC documentation sufficient, and what evidence has been collected to demonstrate the adherence to the SDLC.
It’s time to bring another update to you on the current status of the R validation hub:
-
-
The riskmetric R package has been stable on CRAN. Recent work has focused on structuring “cohort metrics” – metrics that are conditioned on the package library or execution environment available to R. In R, package behaviors are often dependent on the rest of the R installation, and this new feature will help to make metrics more inspectible and reproducible, as well as allowing us to ask new questions like, “What would be the effect of installing a new package into an R environment?”. Within riskmetric, cohorts will represent our way of capturing this information to help make the context of each metric more explicit. Eric Milliman has been leading this effort, representing the next major feature enhancement for the riskmetric package.
-
The Riskmetric App team have been working very hard to resolve all the major issues and add new features. We are happy to announce the news that they just published a beta version of the app and ask for feedback from the community. To access the app, you can go to https://rinpharma.shinyapps.io/risk_assessment/ and follow the instructions for login. Please provide your feedback in github. Next development goal is turning the app into an R package.
-
Regarding communications, we have had a busy year 2021 with various conference contributions that are available on the website presentation section. This year we will participate at useR! 2022, where our abstract was accepted. In the presentation, we will reflect on various examples of implementation of risk-based approaches to assess R package accuracy within a validated infrastructure.
-
In addition, we have initiated a presentation series on case studies where various companies share their experiences building a GxP framework with R highlighting aspects that were easy to implement which those which were more challenging. The first session was on April 26. You can find the recording on youtube (or see below). Feel free to actively pitch into the discussion on github. Additional contributions are welcome! Follow-up sessions are scheduled for May 24th and June 14th. You can sign up by joining our mailing list.
-
-
People & Future Directions: Keaven and Yilong will be stepping back from the R Validation Hub executive committee. We are so grateful for the support that they have provided in helping steer the group and get us to where we are today (where we’re seemingly inundated with companies wanting to share what they’re doing to adopt R). For the good news, we are happy to welcome on board Preetham who is the current lead for package qualification at Merck & Co. If you are interested in taking an active role in helping to determine the direction of the R Validation Hub, please feel free to reach out to us. Future work may include building a repository with R packages relevant for submissions to regulatory agencies, consolidate current state of the art on testing qualification and other topics. However, we do need willing volunteers to help drive these initiatives forward.
-
diff --git a/content/posts/news/2022-05-20-status-update-may-2022.Rmd b/content/posts/news/2022-05-20-status-update-may-2022.md
similarity index 100%
rename from content/posts/news/2022-05-20-status-update-may-2022.Rmd
rename to content/posts/news/2022-05-20-status-update-may-2022.md
diff --git a/content/posts/news/2023-03-30-case-studies-summary-march-2023.html b/content/posts/news/2023-03-30-case-studies-summary-march-2023.html
deleted file mode 100644
index 9fda118..0000000
--- a/content/posts/news/2023-03-30-case-studies-summary-march-2023.html
+++ /dev/null
@@ -1,19 +0,0 @@
----
-title: 'Summary of 2022 Case Studies'
-author: Juliane Manitz
-date: '2023-03-15'
-slug: case-studies-summary-march-2023
-categories:
- - news
-banner: 'img/banners/case_studies.png'
----
-
-
-
-
Last year, the R validation hub recently initiated a three-part presentation series on case studies in which eight pharmaceutical companies shared their experiences on building a GxP framework with R. These case studies highlighted both easy and challenging aspects of implementing risk assessment for R packages in a GxP environment. In this blog post we attempt to summarize common themes, difference in approaches, and challenges.
-
All implementations followed the risk validation process for R packages outlined in the white paper. There was a common theme of categorizing package quality into two or three risk categories. Test coverage was identified as a high-importance assessment metric, and the R Foundation was determined to be a trusted resource. Core R and recommended packages were treated as a collective of “low-risk” packages, with some organizations extending this to the tidyverse.
-
Some companies classified packages automatically as low risk, with little or no human intervention. For higher risk packages, there were typically pathways for additional human assessment. Different weights were assigned to testing coverage and various suggested maintenance metrics, with an acceptable threshold for test coverage ranging between 50-80% for low-risk packages. Different risk remediation strategies were applied, with some organizations introducing their own unit tests while others restricted package use to only the tested subset of package functionality.
-
However, implementing R package assessment is a resource-intense activity, and time has proven to be a considerable challenge. Ensuring R package reviewers have the right technical expertise and aligning different contributors across the organization (IT, Quality Assurance, Statistics, Data Science, or Programming) can also be difficult. Finding appropriate test datasets, test cases, and expected model output and managing the long-term maintenance and oversight of the risk-based package assessment process can also be challenging.
-
The recordings of these sessions are available on the R Validation minutes page, and we encourage to continue the exchange and discussion on GitHub, where everyone is welcome to contribute and learn from others. Additional contributions of case studies are welcome! We would like to send out a big thank all to all contributors. The case studies presented valuable insights into building a GxP framework with R, and the R validation hub aims to continue supporting the implementation of risk assessment for R packages in a GxP environment.
-
The learnings and reflections have been published in the ASA Biopharmaceutical report, Fall 2022. Furthermore, we are honored to present at the R Adoption series on March 30, 2023 at 8 AM PST/ 11 AM EST. Please join for the presentation. We are planning breakout rooms to engage in a conversation with the community on the challenges that have been identified.
diff --git a/content/posts/news/2023-03-30-case-studies-summary-march-2023.Rmd b/content/posts/news/2023-03-30-case-studies-summary-march-2023.md
similarity index 100%
rename from content/posts/news/2023-03-30-case-studies-summary-march-2023.Rmd
rename to content/posts/news/2023-03-30-case-studies-summary-march-2023.md
diff --git a/content/posts/news/2023-06-02-shiny_conf_award.Rmd b/content/posts/news/2023-06-02-shiny-conf-award.md
similarity index 100%
rename from content/posts/news/2023-06-02-shiny_conf_award.Rmd
rename to content/posts/news/2023-06-02-shiny-conf-award.md
diff --git a/content/posts/news/2023-06-02-shiny_conf_award.html b/content/posts/news/2023-06-02-shiny_conf_award.html
deleted file mode 100644
index d661505..0000000
--- a/content/posts/news/2023-06-02-shiny_conf_award.html
+++ /dev/null
@@ -1,30 +0,0 @@
----
-title: '{riskassessment} App voted best Shiny app at shinyConf 2023! 🎉'
-author: Juliane Manitz
-date: '2023-06-02'
-slug: shiny_conf_award
-categories:
- - news
-tags:
- - risk assessment
-banner: 'img/banners/shinyconf_win.png'
----
-
-
-
-
The {riskassessment} app, presented by Aaron Clark from the R Validation Hub Executive Committee, was voted best Shiny app at shinyConf 2023. The 2nd Annual Shiny Conference was held in March 2023. It was all virtual with over 4k global registrants. Congratulations!!
-
The app provides a shiny front-end to augment the utility of the {riskmetric} package, thus user-friendly and interactive access to risk assessment of R packages. The apps functionalities include:
-
-
Analyze risk metrics for each R package (and version) without the need to write code in R
-
Encourage open-source mentality by contribution of user/reviewer comments
-
Save overall assessment decision for each package (categories: low, medium, or high risk)
-
Download reports with risk metrics outputs, and reviewer comments, and more
-
Administration: All assessments are stored in a database and admins can manage users roles and metric weighting
-
-
Do you want to know more? Here some additional resources:
If you are interested in getting involved in contributing to our technical efforts then please send an email to psi.aims.r.validation@gmail.com. Or join our mailing list to receive an invite to our community meetings and receive notifications of new blog posts.
Welcome! It’s with great excitement and long-awaited anticipation that I get to share some recent updates that have hit the {riskassessment} app’s GitHub repository earlier this month. If this is the first time you’ve heard or seen the application, I’d recommend starting with our README to gain some familiarity with the project, especially with installation instructions. However, (in a nut-shell) the app is a full-fledged R package that seeks augment the utility of the {riskmetric} package within an organizational context.
-
-
-
-
-
-
Latest features Recap
-
Most notably, the application has sustained the following improvements, in order from least exciting to most exciting:
-
-
Face lift to functionality & aesthetics of the 'Report Builder' & 'Database View'
-
Enhanced support to analyze dependencies
-
More org-level customization, including the use of a configuration file
-
admins can now to edit roles and privileges
-
All users can explore source contents of package
-
-
The feedback loop is crucial! All of these improvements started off as community-driven suggestions on our GitHub repo. If you have an idea that doesn’t already exist on the existing list of issues, submit a new issue today and it may become a reality tomorrow.
-
-
-
-
'Report Builder' Face Lift
-
This new release introducted a more holistic Report Builder! Allowing users to define what content shows up in the report. In addition, users can now compose a long form “Package Summary” to keep track of more pertinent items (perhaps non-{riskmetric} items) for a more rounded package review.
-
In the example below, see how users can quickly include/exclude specific elements in the summary report, plus edit the package summary in a real-time report preview. When satisfied, the user can download the report to HTML, DOCX, or PDF:
-
-
-
-
-
-
'Database View' Face Lift
-
More useful content to ingest!
-
-
A summary of uploaded packages
-
The package uploaded date
-
Decision-related columns including a time stamp & decision source
-
Quick & easy download options to Excel or CSV
-
-
-
-
-
-
Package Dependencies
-
Though {riskmetric} has supported the dependencies assessment for some time, the application didn’t do a great job of displaying this data until this release. Now, the user can see each package dependency which may or may not have a lower bound on it, plus the dependency type (Imports, Depends, LinkingTo, or Suggests). If the package dependency already exists in the {riskassessment} database, then it’s risk score is displayed. When the package hasn’t been uploaded yet, there are convenient buttons to help the user evaluate those packages.
-
-
-
-
-
-
Enhanced organization-level settings
-
The following features allow admins critical control over how prospective users within an organization perform their risk assessments. Most, if not all, can be edited in-app or via the config file! So, what’s new? You can now…
-
-
Add/edit user roles/privileges
-
Customize decision categories & colors
-
Toggle decision automation rules
-
Initialize metric weights
-
-
Below is an example configuration file that demonstrates our ability to togggle these settings. First, config athors can add as many roles as they want under credentials. From there, they can use the privileges tag to populate that role with unique privileges in the app. For a details on each privilege, please see our documentation site. These roles can then be assigned to specific users to manage who is involved with different parts of the review process.
-
Next, decisions is where users can define custom package categories in the application that correspond to their organization’s validation process. From there, you can automate decisions based on risk scores or even define a custom color to each decision category.
-
By default, all metric weights (that ultimately determine package risk scores) are set to 1, unless you override them here or in the app’s UI. This is a convenient way to incorporate your organizations priorities into the validation process for all users of the application. Use a zero (0) to remove a metric entirely.
Building on the configuration file, admin users can also manage who’s involved in the review process on the fly in the app. This is helpful so that the application doesn’t have to be re-deployed every time a new role needs creation. User authentication & role management is a major cornerstone of this application, which helps organizations adhere to their unique validation strategies.
-
-
-
-
-
-
-
Explore Package Source Code
-
Finally, if there were one major change to announce at this release that creates unique value for app users, it would be this: the app now offers the ability to browse package source code! This is huge news for those orgs with a more manual package review process. {riskmetric} metrics are great because they can serve up isolated assessments from the source code (and beyond), but sometimes you just need to dig deeper into a package’s contents, and that’s what his new module does.
-
Below, you can see the ‘FILE BROWSER’ which gives us a directory tree on the left hand side and a file preview on the right. Currently, it’s displaying a description file for the {tidyCDISC} package. There you can explore all the things you may care about, such as the license, maintainer, dependencies, etc. In addition, you can navigate to a specific function’s script in the R/ folder to review methodology. Even more importantly, you can browse through the author’s tests to evaluate testing coverage and robustness.
-
-
-
-
-
Thanks and call to action
-
That’s all for now. Thanks for reviewing the latest release details and we hope you find them useful.
-
Interested in supporting package development? We could always use extra help / feedback! Please consider one of the following options:
Fill out our survey so we can learn how you use {riskmetric} and {riskassessment}
-
-
-
-
diff --git a/content/posts/news/2023-08-16-riskassessment-new-release-updates.Rmd b/content/posts/news/2023-08-16-riskassessment-new-release-updates.md
similarity index 100%
rename from content/posts/news/2023-08-16-riskassessment-new-release-updates.Rmd
rename to content/posts/news/2023-08-16-riskassessment-new-release-updates.md
diff --git a/content/posts/news/2023-09-05-status-update.html b/content/posts/news/2023-09-05-status-update.html
deleted file mode 100644
index f329415..0000000
--- a/content/posts/news/2023-09-05-status-update.html
+++ /dev/null
@@ -1,86 +0,0 @@
----
-title: Updates to the R Validation Hub Executive Team
-author: Doug Kelkhoff
-date: '2023-09-05'
-slug: updates-sept-2023
-categories:
- - news
-banner: 'img/banners/executive_team_updates.png'
----
-
-
-
-
As the R Validation Hub closes in on its 5th year of activity I want to take
-a quick trip down memory lane and reflect on how we got here. Perhaps I should
-start by reminding everyone how it all started. Back in 2018, as the prospect
-of using R for any regulated analysis was still a hotly contested question,
-industry participants brought our donated space at Harvard to capacity as
-attendees gathered for the first ever R/Pharma conference. Amidst a slew of
-impressive shiny applications (I vividly remember shiny being the decided
-tone-setting theme that year), one Andy Nicholls offered up a deceptively
-simple question: what should validated use of R look like?
-
The answer certainly didn’t get a firm answer at the conference, but it
-ignited a level of enthusiasm that spurred a Slack channel and a series of post-
-conference meetings to start hammering away at this problem. A year later, and
-the team had produced the first version of our white paper outlining a viable
-path forward. Within two years we had the basics of
-riskmetric – a tool for evaluating
-the various criteria outlined in the white paper. Participation grew, and an
-enthusastic community built a web-app around
-riskmetric to make it more
-accessible, giving way to the
-riskassessment app. Today,
-these tools are in use or inspiring the process across the industry, as has
-been evident in our case studies over the past year.
-
-
A Changing of the Guard
-
After an impressive track record of delivering and improving the adoption of
-open-source tools across the industry, Andy Nicholls has decided to step down
-from his position as Lead. We want to send a heartfelt “Thank You” to Andy for
-his leadership and enthusiasm that brought the R Validation Hub to this point.
-Now we get to embark on our next challenge, sustaining through personell turn-
-over which is an inevitable and critical aspect of sustaining any organization.
-We welcome Doug Kelkhoff, who will be taking over to lead the team.
-
-
-
New Faces at the R Validation Hub
-
While undergoing a change in leads, we wanted to take this transition period to
-reflect on where we’re at and where we want to improve. We held an internal
-survey to try to collectively decide on our next steps. We felt that a
-consistent strength has been our tools and technical support, but felt that
-there was still room for improvement in communications and long-term planning.
-It’s with those changes in mind that I want to announce two new initiatives:
-
-
Communications Workstream
-
We’re thrilled to announce that we’ve welcomed three new contributors to
-launch our new communications workstream. This team will focus on how we build
-connections across the R world, specifically our neighboring initiatives: the R
-Consortium, PhUSE, PSI AIMS, the R Submissions Working Group and ROpenSci.
-We’re taking a new, critical look at how we share information and how we can
-be more intentional with how we organize ourselves. You can look forward to
-a new level of polish and consistency to the news you receive.
-
-
-
Strategic Workstream
-
To level our structure a bit, we’re also hoping to rebrand the “Executive
-Committee” as a strategic workstream, whose responsibiilities will be to align
-on longer-term strategic goals. Where the executive team has historicall
-been a mostly static set of contributors, we hope for it to become a floating
-set of representatives from each of our workstreams. With this in mind,
-we hope that the core roles become more fluid, allowing any contributors
-to rotate in as their work requires broader discussion.
-
As well, we plan to leverage the R Consortium’s
-Pharma Oversight Committee
-to ensure we’re aligned with the broader industry goals while also giving
-early visibility to what we’re planning. If you would like to see your company
-weighing in on the strategic directin of the R Validation Hub, the best way to
-do so would be to sponsor the R Consortium and in doing so join the oversight
-committe. In addition to sponsoring enterprises, we also plan to extend
-invitations to less represented groups within the pharmaceutical space such as
-small-scale research organizations (CRO’s), academia and research hospitals as
-well as organizations outside of the pharamceutical space entirely as the needs
-arise. If you would like to see such a group represented, please reach out so
-we can make sure those voices are heard.
-
For all open opportunities, please see our contributing page.
Progress is continuing to be made in {riskassessment}. We wanted to share some of the enhancements and updates included in the most recent release of the application. There have been multiple releases, both minor and major, since our last post, so we have a lot of new content to cover!
-
But before we get into too much detail, if you are new to what we are doing in {riskassessment}, we would like to encourage you to check out our README. There you can find information regarding what {riskassessment} is seeking to accomplish and how you can install and deploy an instance of it for your personal or organizational use.
-
We are also excited to announce a collaborative deployment of the application! Our friends at ProCogia are hosting an instance of the app with persistent storage. We encourage you you to test it out.
-
-
-
-
-
-
Latest Features Recap
-
Lots of enhancements have been integrated since our last post on v2.0.0, so we will take a bird’s eye view of the following major changes:
-
-
Expanded decision automation to include individual {riskmetric} assessment values
-
New 'Function Explorer' module and faster exploration of source code
-
An expanded view of a package’s dependencies
-
Miscellaneous items
-
-
'About' tab
-
non-shinymanager deployment
-
-
-
-
As always, we invite all who have suggestions or contributions to visit our Github repository. If you come across a bug or have a feature request, please visit our open list of issues and add a comment or open a new issue if it is a new bug/feature.
-
-
Decision Automation Rules
-
Previously, the application would allow rules to be created based upon the overall risk score of the package. This feature is now expanded to include the {riskmetric} assessments. The process follows these basic steps:
-
-
Choose a {riskmetric} assessment for which you would like to create a rule.
-
Construct a function taking the assessment value as an argument and return a logical (TRUE/FALSE).
-
Choose the decision category to assign if the test returns TRUE.
-
-
When constructing rules, it is important to know the structure of the assessment. All {riskmetric} assessments can be found in the assessment function catalog on their website. Additionally, all helper functions from {riskmetric} are available for use when constructing the rules allowing a lot of flexibility in how the assessment can be parsed.
-
The rules can easily be re-ordered and will be read in a linear fashion. If no rules result in assigning a decision to the package, the ELSE option can be used to assign a decision but defaults to “No Decision”.
-
-
-
-
New 'Function Explorer' Module
-
Our friends at GSK have contributed a major addition that warranted a brand new page under the 'Source Explorer' tab. Previously, a user could browse all the source code using the 'File Explorer'. Now users can investigate the details of individual functions exported from the package. More specifically, users have curated access to the source code, help documentation, and test files that reference the function of interest. This page extends the value of the application for organizations with a hands-on review process, giving users the ability to explore the package contents at a more granular level.
-
Below, you will see a selection inputs panel on the left and a viewer pane on the right. To get started, simply choose a function name and file type you want to further explore. As previously mentioned, the type of files include the following:
-
-
Testing Files: All discovered test files in the tests/ folder containing the function will be displayed.
-
R Source Code: All R files in the R/ folder containing the function will be displayed. This includes all usages of the function, not just where it was defined.
-
Help Documentation: The .Rd file containing the function will be rendered in HTML.
-
-
Some additional features:
-
-
If multiple lines containing the function call are found, navigation arrows will appear below the viewer pane, enabling the user to quickly jump to each instance.
-
User comment sections are available for both the Function Explorer and the Package Browser.
-
The {archive} package is being used to read these files into memory which greatly increases the speed for the 'Source Explorer' tab overall.
-
-
-
-
-
Expanded Dependency Exploration
-
Summary cards are now displayed to provide a macro view of the package’s dependencies!
-
-
Dependencies Uploaded: the number of package dependencies that have been uploaded to the app
-
Type Summary: a breakdown of the dependency types (Depends, Imports, and LinkingTo)
-
Decision summary: a breakdown of the decisions applied to the dependency packages
-
Base-R packages: a summary of which dependencies are considered part of base R (i.e. {utils}, {stats}, {methods}, etc.)
-
-
Optionally the non-required packages from the ‘Suggests’ fields can also be included to give a more holistic view of a package’s dependency structure. This can be helpful for organizations that require suggested packages to also be qualified in their GxP environment.
-
-
-
-
A new table has been added, which is populated with the reverse dependency packages that were previously uploaded to the application. Now, in the event that a package is being re-assessed/re-evaluated, a user can quickly look at the downstream packages effected by a change in classification.
-
-
-
-
Miscellaneous Items
-
Expanded 'About' Tab
-
-
Contact Us: a page that directs users to report issues, ask questions, reach out to contribute, learn more about the R Validation Hub
-
Contributors and Companies: get to know contributors, past and present
-
-
Non-shinymanager Deployment
-
If deploying the application to an environment like POSIT Connect, the usage of shinymanager can be turned off. The user will then be assigned by session$user and the role by session$groups. This will allow organizations to manage application privileges using the roles assigned to the user(s) on Connect.
-
In Closing
-
Thank you for checking out our newest release! We hope you are as excited about the development as we are.
-
Interested in supporting package development? We could always use extra help / feedback! Please consider one of the following options:
At a recent ShinyGatherings, some {riskassessment} contributors presented a workshop that focused on tailoring the application to org-specific risk requirements to simplify R package validation. Though we discussed the end-user experience, this workshop was really geared towards equipping “app deployers” to get the highest and best use out of the application for their organization’s specific needs. We appreciate Appsilon for the opportunity to speak about the application!
-
-
-
-
-
About the Workshop
-
The workshop’s main topics:
-
-
A brief recap on what the {riskassessment} app does. Spoiler alert: it’s goal is to simplify the risk assessment process with a structured and user-friendly approach that avoids the need to author custom R code.
-
Why (and when) organizations should use the app with an emphasis on how it can alleviate the burden of making informed decisions about packages
-
How the app guides end-users to follow org-specific workflows, whether that be logging, reviewing package metrics, verifying the risk of dependencies, and justifying GxP package inclusion requests using comprehensive downloadable reports.
-
How “app deployers” can set custom org-specific rules to automate package risk assessment using the desired risk metric criteria.
-
A thorough look at the app’s role-based access control and how it plays well with Posit Connect authenticated users.
-
Common challenges & caveats we recognize related to the app
Joint Statistical Meetings: Topic-contributed session “Tools to enable the use of R by the bio-pharmaceutical industry in a regulatory setting”, organizer/chair: Juliane Manitz
-
-
Paulo Bargo (Abstract #317506): A Risk-based approach for assessing R package accuracy within a validated infrastructure
-
Doug Kelkhoff (Abstract #317498): A workflow to evaluate the quality of R packages using the R package riskmetric
-
Marly Gotti (Abstract #317411): A case study: performing a risk assessment on the tidyverse package using the Risk Assessment Shiny Application
diff --git a/content/present.Rmd b/content/present.md
similarity index 100%
rename from content/present.Rmd
rename to content/present.md
diff --git a/content/regulations.html b/content/regulations.html
deleted file mode 100644
index 0d793c8..0000000
--- a/content/regulations.html
+++ /dev/null
@@ -1,52 +0,0 @@
----
-title: "Regulations"
----
-
-
-
-
-
Regulations Overview
-
Key guidance documents are provided by different regulatory entities, which oversee activities within their respective domains:
-
-
International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals in Human Use (ICH)
-
The United States Food and Drug Administration (FDA)
-
European Medicines Agency (EMA)
-
Japanese Pharmaceuticals and Medical Devices Agency (PMDA)
-
-
-
-
-
ICH E9 Statistical Principles for Clinical Trials
-
At the international level, ICH provides regulatory guidance for the pharmaceutical and medical devices industry. In the ICH E9 guidance on Statistical Principles for Clinical Trials. Regarding “Integrity of Data and Computer Software Validity”, it states:
-
-
The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available.
“[Electronic records and signatures] be trustworthy, reliable, and generally equivalent to paper records and handwritten signatures executed on paper.”
In 2015, the FDA released a Statistical Software Clarifying Statement. This document states that they do not require the use of any specific software for statistical analyses. However, the FDA requests that software package(s) be documented upon submission. This documentation must include version and build identification.
“Sponsors should provide the software programs used to create all ADaM datasets and generate tables and figures associated with primary and secondary efficacy analyses. Furthermore, sponsors should submit software programs used to generate additional information included in Section 14 CLINICAL STUDIES of the Prescribing Information (PI)26 if applicable. The specific software utilized should be specified in the ADRG. The main purpose of requesting the submission of these programs is to understand the process by which the variables for the respective analyses were created and to confirm the analysis algorithms. Sponsors should submit software programs in ASCII text format; however, executable file extensions should not be used.
-
-
In Conclusion, 21 CFR Part 11 is not relevant or mandatory in the context of statistical analysis software itself. However, when using R as part of a validated system, elements of 21 CFR Part 11 do apply.
The main notice addresses the subject of qualification documentation. A definition of “adequate” is provided within the Q&A.
-
-
“The sponsor may rely on qualification documentation provided by the vendor, if the qualification activities performed by the vendor have been assessed as adequate. However, the sponsor may also have to perform additional qualification (and validation) activities based on a documented risk assessment.”
Every regulatory submission requires evidence that the software used meets
-quality standards appropriate for the type of work. The R Validation Hub
-proposed standards for this process in our 2020 White Paper.
-
Now we’re translating those standards into an open-source repository of
-risk assessments, available to all. We hope to empower validation that is
-consistent, reproducible and transparent, lowering the barrier
-to entry for start-ups, empowering consistency among contract research
-organizations (CROs), and improving the speed of regulatory review for
-established enterprises.
-
-
-
-
Intimidated by the climb to validation?
-Don’t forge your own path. Let us give
-you a lift!
-
-
-
-
-
-
Learn More
-
-
Want to get involved, or know someone who can help? Contact us.
-
-
-
-
-
At a Glance
-
Read more about our ambitions, expected timelines and how we plan to get
-there. Learn why businesses large and small are excited to see us moving
-forward.
Five years ago we drafted the guidelines. Today they’re a de facto industry
-standard. Now we’re building a repository of pre-calculated metrics to make
-the guidelines easier to apply.
The project at a glance, designed to get you up to speed fast. Start here
-for interested parties looking for ways to contribute as volunteers or
-advocates.
The R Validation Hub has develops & maintains useful tools that
-make the risk assessment approach proposed in our published white paper much easier
-to adopt for R packages.
-
-
Open-source tools
-
Though these tools likely won’t encapsulate every aspect of your
-organization’s end-to-end validation pipeline, we are constantly seeking
-to fill known gaps in the process. Thanks to partnerships with a sleuth
-of pharma organizations, these tools were designed to leverage industry consensus and
-provide flexibility for customization when needed. We’re proud that both
-{riskmetric} and {riskassessment} have claimed
-membership in the {pharmaverse}
-suite of packages.
{riskmetric} is a framework to quantify an R
-package’s “risk” by assessing several meaningful metrics designed to
-evaluate package development best practices, code documentation,
-community engagement, and development sustainability. Users embrace an
-overall assessment of the package or rely solely on hand-picked
-metrics.
-
{riskassessment} is a full-fledged R package
-delivering a shiny front-end to augment the utility & adoption of
-{riskmetric}. The application’s goal is to provide a
-central hub for an organization to review and assess the risk of R
-packages, providing handy tools and guide rails along the way.
-Specifically useful features include the ability to manage reviewer
-privileges, explore package source files hands-on, automate decisions
-based on pre-set rules, and generate a handy summary report to
-share.
-
{riskscore} (hex logo not yet designed) is an
-experimental Github-only “data package” containing risk assessments
-& scores for every R package on CRAN or Bioconductor. This package
-exists to establish an easily retrievable trend of risk over time,
-useful for both riskmetric and riskassessment
-development workflows.
-
-
-
-
{riskmetric}
-
Contributed R packages are developed by anyone & everyone, and
-may differ in popularity and accuracy. As such, the R Validation Hub
-developed an R package titled riskmetric
-whose goal is to assess the risk of contributed R packages.
-
{riskmetric} has four groups of metric criteria:
-
-
Unit testing metrics - includes unit test coverage and composite
-coverage of dependencies
-
Documentation metrics - availability of vignettes, news tracking,
-example(s) and return object description for exported functions
-
Community engagement - number of downloads, availability of the code
-in a public repository, formal bug tracking and user interaction
-
Maintainability and reuse - number of active contributors, author /
-maintainer contacts, and type of license
-
-
Note: Even though the quality of software is sometimes measurable,
-sometimes it is not. For example, assessing the
-accuracy of a contributed open-source R package should
-be done outside of {riskmetric}. The term accuracy
-refers to the risk of an error in the code that, when used, could lead
-to an incorrect calculation. This incorrect calculation may lead to an
-incorrect decision during data analysis. The relative impact of an error
-should be determined by the individual organisation. Thus, impact is not
-a part of the risk assessment performed by
-{riskmetric}.
-
For a comprehensive list of metrics assessed via
-{riskmetric}, see the current state of our package reference
-guide or browse the Metric
-Development Progress GitHub project.
-
-
Interested in supporting package development?
-
-
Contribute your implementation of a new or previously posed metric.
-For information about extending the functionality of
-riskmetric with your own metrics, see Extending
-riskmetric.
Fill out our survey so we
-can learn how you use {riskmetric} and
-{riskassessment}
-
-
-
-
-
-
{riskassessment}
-
The app’s main goal is to help those making “package
-inclusion” requests for validated GxP environments. So, the highest and
-best of {riskassessment} revolves
-around two things:
-
-
Empower members of your organization to embrace their
-responsibility to assess package risk themselves, prior to making
-uninformed IT requests like: “please add package xyz to our validated
-environment”.
-
Establish guide rails that adopt to your organizations validation
-strategy and use of {riskmetric} which culminates in a
-report for IT that summarizes each package’s adherence to those
-inclusion requirements.
-
-
The {riskassessment} app achieves that main goal with
-the following handy offerings:
-
-
Provides a platform for package exploration without the need to
-write any custom {riskmetric}
-
Runs {riskmetric} on the same machine with the same
-environment – creating a central hub for reproducibility
-
Maintains consistent, org-specific settings/options when producing
-risk outputs
-
Automates a risk-based “decision triage” based on an org-defined set
-of rules, saving time & effort
-
Manages who’s involved in the review process via user authentication
-& role management
-
Facilitates and stores user written summaries & communication,
-on certain packages and/or certain metrics
-
Generates risk summary reports, for sharing with the decision making
-parties
-
-
Below is a screenshot from the application’s current demo app, hosted on shinyapps.io.
-Feel free to give it a test ride and
Fill out our survey so we
-can learn how you use {riskmetric} and
-{riskassessment}
-
-
-
-
-
-
{riskscore}
-
A data package for cataloging riskmetric results across public
-repositories. WARNING: Right now, the {riskscore}
-is in a PoC stage that is not fully operational. With that said, there
-are several use cases that make the concept of {riskscore} valuable,
-including (but not limited to) the following: it …
-
-
Guides more effective discussion around how to summarize risk
-
Helps communicate changes to {riskmetric}’s summarizing
-algorithm or interpretations of assessment data
-
Aids the {riskmetric} dev team in identifying “edge
-cases” for analysis and code refinement.
-
Provides a channel to distribute handy tools for building
-{riskmetric} result data (ie, mimicking how our process for
-external packages could serve as a useful template for when comparing to
-internal or private repos).
-
Allows everyone to report risk scores in terms of a “CRAN
-percentile” instead of just some arbitrary numeric value.
-
Establishes a central repository for package scores, which can be
-used for many applications, like generating badge scores or trending in
-a package’s score over time to measure performance.
-
-
With this type of data at your finger tips, you can analyze package
-risk statistics with plots like the following (below), which allocates
-packages into different subgroups based on developers membership in the
-tidyverse / pharmaverse and groups defined by “most downloads”.
The app’s main goal is to help those making “package
-inclusion” requests for validated GxP environments. So, the highest and
-best of {riskassessment} revolves
-around two things:
-
-
Empower members of your organization to embrace their
-responsibility to assess package risk themselves, prior to making
-uninformed IT requests like: “please add package xyz to our validated
-environment”.
-
Establish guide rails that adopt to your organizations validation
-strategy and use of {riskmetric} which culminates in a
-report for IT that summarizes each package’s adherence to those
-inclusion requirements.
-
-
The {riskassessment} app achieves that main goal with
-the following handy offerings:
-
-
Provides a platform for package exploration without the need to
-write any custom {riskmetric}
-
Runs {riskmetric} on the same machine with the same
-environment – creating a central hub for reproducibility
-
Maintains consistent, org-specific settings/options when producing
-risk outputs
-
Automates a risk-based “decision triage” based on an org-defined set
-of rules, saving time & effort
-
Manages who’s involved in the review process via user authentication
-& role management
-
Facilitates and stores user written summaries & communication,
-on certain packages and/or certain metrics
-
Generates risk summary reports, for sharing with the decision making
-parties
-
-
Below is a screenshot from the application’s current demo app, hosted on shinyapps.io.
-Feel free to give it a test ride!
-
-
-
-
Are you interested in supporting package development?
-
We could always use extra help / feedback! Please consider one of the
-following options:
Fill out our survey so we
-can learn how you use {riskmetric} and
-{riskassessment}
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/content/riskassessment.Rmd b/content/riskassessment.md
similarity index 97%
rename from content/riskassessment.Rmd
rename to content/riskassessment.md
index ce4395b..4db5958 100644
--- a/content/riskassessment.Rmd
+++ b/content/riskassessment.md
@@ -4,7 +4,7 @@ title: "The {riskassessment} App"
## About the App
-{width="130"}
+
The app's **main goal** is to help those making "package inclusion" requests for validated GxP environments. So, the highest and best of [`{riskassessment}`](https://bit.ly/raa_gh) revolves around two things:
diff --git a/content/riskmetric.html b/content/riskmetric.html
deleted file mode 100644
index 546a507..0000000
--- a/content/riskmetric.html
+++ /dev/null
@@ -1,32 +0,0 @@
----
-title: "The {riskmetric} Package"
----
-
-
-
-
-
About the Package
-
-
Contributed R packages are developed by anyone & everyone, and may differ in popularity and accuracy. As such, the R Validation Hub developed an R package titled riskmetric whose goal is to assess the risk of contributed R packages.
-
{riskmetric} has four groups of metric criteria:
-
-
Unit testing metrics - includes unit test coverage and composite coverage of dependencies
-
Documentation metrics - availability of vignettes, news tracking, example(s) and return object description for exported functions
-
Community engagement - number of downloads, availability of the code in a public repository, formal bug tracking and user interaction
-
Maintainability and reuse - number of active contributors, author / maintainer contacts, and type of license
-
-
Note: Even though the quality of software is sometimes measurable, sometimes it is not. For example, assessing the accuracy of a contributed open-source R package should be done outside of {riskmetric}. The term accuracy refers to the risk of an error in the code that, when used, could lead to an incorrect calculation. This incorrect calculation may lead to an incorrect decision during data analysis. The relative impact of an error should be determined by the individual organisation. Thus, impact is not a part of the risk assessment performed by {riskmetric}.
-
With this type of data at your finger tips, you can analyze package risk statistics with plots like the following (below), which allocates packages into different subgroups based on developers’ membership in the tidyverse / pharmaverse and groups defined by “most downloads”.
-
-
For a comprehensive list of metrics assessed via {riskmetric}, see the current state of our package reference guide or browse the Metric Development Progress GitHub project.
-
-
-
Are you interested in supporting package development?
-
-
Contribute your implementation of a new or previously posed metric. For information about extending the functionality of riskmetric with your own metrics, see Extending riskmetric.
Fill out our survey so we can learn how you use {riskmetric} and {riskassessment}
-
-
-
diff --git a/content/riskmetric.Rmd b/content/riskmetric.md
similarity index 97%
rename from content/riskmetric.Rmd
rename to content/riskmetric.md
index 2fe41ad..3f17891 100644
--- a/content/riskmetric.Rmd
+++ b/content/riskmetric.md
@@ -4,7 +4,7 @@ title: "The {riskmetric} Package"
## About the Package
-{width="130"}
+
Contributed R packages are developed by anyone & everyone, and may differ in popularity and accuracy. As such, the R Validation Hub developed an R package titled [`riskmetric`](https://pharmar.github.io/riskmetric/articles/riskmetric.html) whose goal is to assess the risk of contributed R packages.
diff --git a/content/riskscore.html b/content/riskscore.html
deleted file mode 100644
index d33299d..0000000
--- a/content/riskscore.html
+++ /dev/null
@@ -1,25 +0,0 @@
----
-title: "The {riskscore} Package"
----
-
-
-
-
A data package for cataloging riskmetric results across public repositories.
-
WARNING: Right now, the {riskscore} is in a PoC stage that is not fully operational. With that said, there are several use cases that make the concept of {riskscore} valuable, including (but not limited to) the following: it …
-
-
Guides more effective discussion around how to summarize risk
-
Helps communicate changes to {riskmetric}’s summarizing algorithm or interpretations of assessment data
-
Aids the {riskmetric} dev team in identifying “edge cases” for analysis and code refinement.
-
Provides a channel to distribute handy tools for building {riskmetric} result data (ie, mimicking how our process for external packages could serve as a useful template for when comparing to internal or private repos).
-
Allows everyone to report risk scores in terms of a “CRAN percentile” instead of just some arbitrary numeric value.
-
Establishes a central repository for package scores, which can be used for many applications, like generating badge scores or trending in a package’s score over time to measure performance.
-
-
-
-
Are you interested in supporting package development?
-
We could always use extra help/feedback! Please consider one of the following options:
diff --git a/content/riskscore.Rmd b/content/riskscore.md
similarity index 100%
rename from content/riskscore.Rmd
rename to content/riskscore.md
diff --git a/content/white-paper.html b/content/white-paper.html
deleted file mode 100644
index 50ea200..0000000
--- a/content/white-paper.html
+++ /dev/null
@@ -1,253 +0,0 @@
----
-title: "A Risk-based Approach for Assessing R package Accuracy within a Validated Infrastructure"
----
-
-
-
-
-
Andy Nicholls, Statistics Director, Head of Statistical Data Sciences, GSK
-
-
-
Paulo R. Bargo, Director Scientific Computing, Statistics & Decision Sciences, Janssen R&D
-
-
-
John Sims, Director, Analytical Systems Architect & Data Science - Pfizer Vaccine Research
-
-
-
On behalf of the R Validation Hub, an R Consortium-funded ISC Working Group
-
-
-
January 23, 2020
-
-
View and/or download the PDF version of this white paper here.
-
-
1. Scope and Background
-
This white paper addresses concerns raised by statisticians, statistical programmers, informatics teams, executive leadership, quality assurance teams and others within the pharmaceutical industry about the use of R and selected R packages as a primary tool for statistical analysis for regulatory submission work. When discussing validation of software systems two areas should be considered:
-
-
Infrastructure validation
-
Software validation
-
-
Infrastructure includes the server, OS, necessary infrastructure software, etc… For example, a system may use a server running Redhat Enterprise Linux (RHEL) version 6 and several other infrastructure software pieces including proxy servers like Apache httpd. Documenting infrastructure (or the environment) is an essential part of the validation process and this validation could follow standard practices such as those proposed in GAMP5, particularly change control management. Discussions regarding infrastructure validation are not in scope of this paper.
-
The aim of this paper is to propose a possible risk-based approach for assessing R package accuracy within a validated infrastructure.
-
Many of the thoughts and ideas addressed in this paper are extracted, verbatim, from the R Validation Hub, a cross-industry initiative whose mission is to enable the use of R by the bio-pharmaceutical industry in a regulatory setting.
-
The paper reflects the current thinking of the R Validation Hub working group and may evolve over time. Additional detail will be provided via the website and future papers.
-
-
-
2. What is R?
-
As stated on the R-Project website and the guidance for the use of R in regulated clinical trial environments, “R is a language and environment for statistical computing and graphics… It is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form.” The official R distribution consists of ‘Base R’ and ‘Recommended Packages’. Each of these is a collection of R packages (code libraries) that can be defined as follows:
-
-
Base R - base, compiler, datasets, graphics, grDevices, grid, methods, parallel, splines, stats, stats4, tcltk, tools, utils
Beyond official R distribution, users in the open source community can submit their own R packages to an open source package repository known as the Central R Archive Network (CRAN). There are currently over 15,000 packages available on CRAN, with further packages available via another popular life sciences repository (Bioconductor), GitHub and a scattering of other websites. In addition, R packages are sometimes developed for internal use.
-
-
-
3. Regulations Governing the use of Statistical Software
-
In 1997, the United States Food and Drug Administration (FDA) issued 21 CFR Part 11 to provide regulations for electronic records and signatures. This final ruling states that:
-
-
“\[Electronic records and signatures\] be trustworthy, reliable, and generally equivalent to paper records and handwritten signatures executed on paper.”
Transactional – applications that collect data and often require electronic signature
-
Non-transactional - used for decision support and/or reporting
-
-
R tends to fall more in the ‘non-transactional’ space and as such, 21 CFR Part 11 typically does not apply. In 2015, the FDA released a Statistical Software Clarifying Statement. This document states that they do not require the use of any specific software for statistical analyses. But, the FDA requests that software package(s) be documented upon submission. This documentation must include version and build identification. In conclusion, 21 CFR Part 11 is not relevant or mandatory in the context of statistical analysis software itself. However, when using R as part of a validated system, elements of 21 CFR Part 11 do apply. The guidance for the use of R in regulated clinical trial environments provided more details of this topic. The rest of this paper focuses on R package validation and its dependency.
-
-
-
4. R Packages and Validation
-
According to the FDA’s Glossary of Computer System Software Development Terminology:
-
-
“Validation: Establishing documented evidence which provides a high degree of assurance (accuracy) that a specific process consistently (reproducibility) produces a product meeting its predetermined specifications (traceability) and quality attributes.”
-
-
In pharmaceutical development, validation typically refers to systems validation. The system validation should incorporate all of the following elements:
-
-
Accuracy
-
Reproducibility
-
Traceability
-
-
This paper outlines how to ensure the accuracy of R packages when used as part of a validated environment with R.
-
Note: Since R is open source, ensuring that the environment is reproducible and traceable presents several challenges. These challenges should be addressed by end user’s organisation.
-
-
4.1. Accuracy
-
When assessing the accuracy of R packages, the R Validation Hub differentiates R packages by the following types (see German et al, 2013i):
-
-
base and recommended (core) packages - developed by the R Foundation and shipped with the basic installation
-
contributed (open source) packages - developed by anyone, and may differ in accuracy
-
-
-
-
source: German et al (2013): The Evolution of the R Software Ecosystem
-
-
Figure 1 - Source: German et al (2013): The Evolution of the R Software Ecosystem (German, 2013)1
-
Core packages and contributed packages are managed by different processes. Therefore, different requirements are needed to ensure that both types of packages reliably produce accurate results.
-
-
4.1.1. Base R and Recommended Packages
-
The R Foundation develops both the base and recommended packages and follows a software development life-cycle that ensures the accuracy of each. These practices include:
-
-
Proper maintenance of the R source code, and control of releases
-
Testing the software and identifying issues for the Core Team to address
-
The R Core Team hiring highly qualified individuals
-
Validation testing each R release against known data and known results, and resolving all errors prior to release
In addition to the steps taken by the R Foundation and R Core Team to ensure the validity of Base R and Recommended packages, the R user community plays an important role in ensuring that the code is accurate. The R Foundation monitors feedback from users by the r-devel e-mail list and the R bug tracking system. This process allows for more extensive testing, and increases the likelihood that bugs are fixed before releases.
-
In conclusion, there is minimal risk in using base and recommended (core) packages as a component in a validated system for regulatory analysis and reporting with R.
-
-
-
4.1.2. Contributed Packages
-
Since R is Open Source, contributed packages can be developed by anyone and may depends on other R package and softwares. Therefore, ensuring the accuracy of each contributed package and its dependency is necessary.
-
R Validation Hub focuses on contributed packages on The Comprehensive R Archive Network (CRAN). All packages available on CRAN have passed a series of technical checks, as outlined in the checklist for CRAN submission. The technical checks for an R package ensure that:
-
-
all the code examples run successfully
-
all the package tests pass
-
the package is compatible with other packages on CRAN
-
-
However, these checks do not necessarily guarantee the accuracy of an R package. It is therefore suggested that a risk assessment exercise be conducted to evaluate the likely accuracy/validity of an R package with respect to its intended use.
-
-
-
-
4.2. Reproducibility
-
It is important to acknowledge that R (like other open source languages) presents additional challenges with respect to the reproducibility of an environment. The evolution of R packages is effectively continuous and thus maintaining a stable R installation that allows for the addition of new and/or updated packages can be a challenge.
-
Package dependency trees can be very large and complex with potential system dependency. A single package may ultimately require the installation of over 100 additional packages and other software. If just one package version number turns out to be incompatible the Intended for Use package of interest cannot be installed. It is therefore vital to ensure compatibility of R packages within an installation. It should be noted that on any given day all such dependencies on CRAN have been resolved. However, there is no standard way of ensuring compatibility between repositories such as CRAN and Bioconductor or GitHub. Some R packages also have system dependencies which need to be managed.
-
There are several commercial and open source offerings that attempt to deal with the management of R packages in different ways. Some of these focus on R, some on the environment itself. These are not discussed here.
-
-
-
4.3. Traceability
-
One of the core concepts presented in this paper is that Imports are not typically loaded by users and need not therefore be directly risk-assessed. If adopting this risk-based approach then measures need to be taken to ensure that users do not directly load the Package Imports. It is suggested that this is handled mainly through process, although tools could be developed to check using sessionInfo or devtools::session_info that check the loaded packages against packages lists of Intended for Use and Imports. In any case the use of these tools within a standard, logged, workflow is highly recommended to ensure traceability of the work.
-
-
-
-
5. System Qualification
-
A system qualification typically includes the execution of Installation, Operational and Performance tests. To qualify a system based around on R, tests need to be written to ensure that the R installation has been installed correctly and that it performs as expected. This activity is required for the overall installation, regardless of the risk assessment score for individual packages.
-
The purpose of the system qualification is to ensure the reproducibility of expected results. It is not to reassess the risk of individual packages.
-
Some packages may already contain tests. All such tests are suitable for a system qualification. However, it should be noted that the expected results, as determined by package authors / maintainers, may occasionally differ depending upon the Operating System configuration. It may not therefore be prudent to simply re-run all available tests. Instead, a subset of tests could be chosen that reflect typical usage.
-
-
-
6. A Proposed R Package Risk Assessment Framework
-
-
6.1. Classifying R Packages within an Installation
-
Users typically ‘load’ a package to access the functions and other objects stored within it. Some of the functions within the loaded package may, in turn, call functions within other packages. But a user would not load, nor call the functions within these packages directly2.
-
Within an installation, two classifications of R packages are proposed:
-
-
Intended for Use. These will be loaded directly by a user during an R session.
-
Imports. These packages are required to be installed in order to use the Intended for Use packages. They are comparable to a system dependency, or the ‘back-end’ code supporting a user interface.
-
-
An Intended for Use package may also be an Import for another package. In such cases a conservative approach should be adopted and the package should be classified as Intended for Use. Imports are identified within a package’s ‘DESCRIPTION’ file using the ‘Imports’ field. This field differs from the ‘Depends’ field for which dependencies are imported into the name space. This allows the end user to use the functions contained within the package. Packages specified in the depends field should therefore be classified as Intended for Use packages.
-
A risk-based approach should focus on the way that components of the system will be used. From a reproducibility perspective, it is important that the Imports are managed appropriately. But, the accuracy of these packages only needs to be verified by assessing the Intended for Use packages. This approach can be extended to system qualification. During which, the operational focus should be on the Intended for Use packages and not the Imports. It is also important to make this distinction when considering the maintenance of a system over time
-
-
-
6.2. Components of a Risk Assessment Framework
-
A risk assessment framework should evaluate R packages based on four criteria:
-
-
Purpose
-
Maintenance Good Practice (Software Development Life Cycle)
-
Community Usage
-
Testing
-
-
The criteria are described in more detail in the following subsections and additional information on the proposed metrics, the rationale for their use and how these are being implemented can be found in the R Validation Hub website. An overview of the proposed process is provided in Figure 2.
-
-
-
source: Assessing Package Accuracy
-
-
Figure 2 - Proposed Validation Workflow
-
It is recommended that the risk assessment be collated via individual package reports. These reports should detail the level of risk that each package presents. Justification for these scores should be based on the metrics obtained for the package. These reports may serve as documented evidence of the accuracy assessment for R.
-
-
6.2.1. Purpose
-
The purpose of an R package and its intended use greatly impacts the level of risk presented by the package.
-
For simplicity, two classifications are proposed:
-
-
Statistical
-
Non-statistical
-
-
The ‘Statistical’ classification refers to any package that implements statistical/machine learning algorithms, even if that is not the primary purpose of the package.
-
Statistical packages present a greater degree of risk than non-statistical packages. This is because the primary or secondary statistical analysis for a study might be based upon statistical models. Also, the complexities of the algorithms underpinning the main routines can make bugs difficult to detect. A package aimed at, say, data manipulation may have just as big an impact but errors would be much easier to identify and therefore more likely to be detected when the package has been exposed to extensive community testing.
-
Non-statistical packages could be divided into further categories, for example:
-
-
Data wrangling / Transformation / Manipulation
-
Data Input / Output (file systems, external documents (e.g. excel), databases (e.g. RDBMS’ such as Oracle, PostgreSQL, and MySQL
-
Communication (knitr, rmarkdown)
-
Modelling (non-statistical)
-
Application Interface (e.g. Shiny Server Pro)
-
-
Organisations may choose to further adapt their approach to risk assessment and testing based on these, or other, finer level classifications.
-
-
-
6.2.2. Maintenance Good Practice
-
Adopting best practices to manage the software development life-cycle can significantly reduce the potential for bugs / errors. Package maintainers are not obliged to share their practices (and rarely do). However, the open source community provides several ways of measuring best practice. The R Validation Hub website proposes some metrics to consider along with a rationale for why the metric is important in assessing package quality and risk.
-
Metrics may include whether the package has a vignette, website, news feed or formal mechanism for bug tracking, whether the source code is publicly maintained, the release rate for new versions; the size of the code base; author reputation and type of license.
-
-
-
6.2.3. Community Usage
-
The user community plays an important role in open source software development. The more exposure a package has had to the user community, the more ad-hoc testing it has been exposed to. Over time the better packages tend to rise to the top of the pack, leading to more downloads and increased exposure.
-
The aim of the community usage metrics is to assess the level of exposure to the wider community and thus the level of risk that a package presents. Community usage could be assessed through metrics such as the package (and version) maturity; whether the package is available through a standard repository; reverse package dependencies; number of downloads.
-
-
-
6.2.4. Testing
-
Testing is a vital component in a well-established Software Development Life Cycle (SDLC). As a genertal rule, the more tests a package has, the more confident one can be in the stability of the package over time. Packages should include unit tests, embedded within a standard unit-testing framework. For such packages a code coverage metric can be calculated and compared with package norms.
-
-
-
-
6.3. Determining Risk
-
Although each package metric is a numeric measure, the scales are very different. It would be possible to turn these into an overall risk score, but this score might be somewhat opaque to anyone reviewing the assessment for the first time. Instead it is recommended that the metrics form part of a subjective assessment made by a qualified individual and reviewed by a similarly qualified colleague.
-
Each of the four criteria should be assessed separately before combining for an overall assessment of risk for the package. It is not within the scope of this paper to fully identify appropriate qualifications but clearly the assessor and reviewer should be of a suitably high grade and have experience of statistical programming with R. When assessing a Statistical package, it is important to understand the nuances of the model(s) being implemented. The risk assessment of a statistical package should therefore be performed by a Statistician with an appropriate level of experience in their field.
-
-
-
6.4. Responding to Risk
-
The risk classification can be used to determine:
-
-
Whether the package is included as part of the R installation within the validated system
-
The extent of additional remediation / testing required to mitigate any risk
-
-
If a package is not included as part of a validated system, then by implication it is not approved for use within the controlled environment. If the environment permits a user to install their own packages the onus would be on the user to take extra precautions to ensure that it behaves as expected for their specific use case. In all cases, it is expected that users would follow their internal Quality Assurance Standard Operating Procedures.
-
-
6.4.1. Package Remediation / Testing
-
One way of mitigating risk for higher risk packages is to generate tests for important package functionality. The process for this should be no different than the standard process for developing functions/macros internally. Typically, requirements would be gathered, and tests written against each requirement such that any test could be traced back to a requirement. It is recommended using a unit test framework such as testthat to implement this in R.
-
Tests should be written for all statistical modelling functions within an Intended for Use Statistical Package, regardless of the risk assessment. Any tests generated can be used to assist in the qualification of a system.
-
Care should be taken when testing statistical packages to ensure that a pass/fail result can be achieved. Known results from literature are typically the best reference for testing complex statistical procedures. It may be sufficient to use existing tests within the package if these are deemed appropriate by the package assessor. It may sometimes be appropriate to test against known results from other software, provided both R and the comparison software claim to implement exactly the same method.
-
In a risk-based framework, the lower risk packages will typically have been developed according to best practices and/or been subjected to a high degree of community testing. Additional remediation for such packages is unlikely to yield any significant reduction in risk.
-
-
-
-
6.5. Vendor Assessments and Trusted Resources
-
For proprietary software it is common to perform vendor assessments / audits to explore the internal validation practices of the vendor. If a company is satisfied with the internal validation practices of the vendor, they may designate the vendor a ‘trusted’ status. The impact of this might be that any software produced by that vendor is deemed to be low risk.
-
For open source software such audits are not logistically feasible. However, based on information available in the open source domain, it may still be possible to perform a virtual audit of a vendor and their practices. For example, R Foundation and R Core team, have published information about their practices in R: Regulatory Compliance and Validation Issues. A Guidance Document for the Use of R in Regulated Clinical Trial Environments. It is the opinion of the R Validation Hub that this is sufficient to allocate the R Foundation a trusted status. This would render the collection of Base R and Recommended packages low risk as a result.
-
-
6.5.1. Becoming a Trusted Resource
-
Over time, some patterns will likely emerge from continued risk assessments. In particular, certain package authors and maintainers may become associated with low risk packages. It would be reasonable to allocate a trusted status to any package developer that attains a ‘low risk’ evaluation for multiple packages over a sustained period of time. Precisely how many packages, and how many iterations of risk assessment should be completed before allocating the Trusted Resource status, is down to the individual organisation to determine.
-
There are several popular packages / collections of R packages that could be considered for a ‘trusted resource’ status. One of the most popular examples is the tidyverse, as described in the following section.
-
-
-
6.5.2. An Example of a Possible Trusted Resource: The tidyverse
-
The tidyverse is a commercially supported collection of contributed R packages. According to https://www.tidyverse.org/:
-
-
“The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.”
-
-
The tidyverse is primarily developed by a team at RStudio who have publicly shared the set of Design Principles via an online book. These are used by the tidyverse team to “promote consistency across packages in the core tidyverse”. The same team have also published a tidyverse style guide.
-
Given the available documentation, the consistency and popularity of the tidyverse, and the commercial backing of the project by RStudio, there is a reasonable case to consider the tidyverse team a trusted resource.
-
-
-
-
-
7. Summary
-
When using R as part of a validated system, elements of 21 CFR Part 11 do apply, although the regulation is not directly applicable to programming languages. Systems validation focuses on accuracy, reproducibility and traceability of the system. For R the primary challenge is in ensuring the accuracy of results.
-
A risk-based approach to the adoption of R packages is highly recommended. This should focus on Intended for Use packages and not Imports. It is important to assess individual packages based on:
-
-
Purpose
-
Maintenance Good Practice
-
Community Usage
-
Testing
-
-
Using the metrics suggested in this paper it is possible to assign levels of risk to each of these elements to develop an overall risk classification for a package. The risk assessment should be documented by individual package reports. Additional remediation for low risk packages is unlikely to yield any significant reduction in risk. However, remediation in the form of tests, linked to user requirements, may be deemed necessary for higher risk packages.
-
Over time, collections of packages may obtain a ‘trusted’ status such that future offerings default to a low risk score without the need for a full risk assessment.
-
Regardless of the relative risk posed by a package, it is important to develop a set of qualification tests for Intended for Use R packages. These should be developed using a known framework such as ‘testthat’ and need not include all available tests within the packages themselves. Having established the accuracy and integrity of R packages within an installation, attention should be paid to the long-term maintenance of the system with respect to the additional challenges that R presents regarding reproducibility and traceability requirements.
-
<br>
-
-
-
-
-
German, D.M. & Adams, Bram & Hassan, Ahmed E.. (2013). The Evolution of the R Software Ecosystem. Proceedings of the Euromicro Conference on Software Maintenance and Reengineering, CSMR. 243-252. 10.1109/CSMR.2013.33.↩︎
-
It is worth noting that it is technically possible to call any function from any installed package but this is extremely rare behaviour within an analysis workflow.↩︎
-
-
diff --git a/content/white-paper.Rmd b/content/white-paper.md
similarity index 100%
rename from content/white-paper.Rmd
rename to content/white-paper.md
diff --git a/data/carousel/repository.yaml b/data/carousel/repository.yaml
index 60bad92..8c48773 100644
--- a/data/carousel/repository.yaml
+++ b/data/carousel/repository.yaml
@@ -1,6 +1,6 @@
weight: 0
title: "Toward a Regulatory Repository"
-image: "img/carousel/logo-repository.svg"
+# image: "img/carousel/logo-repository.svg"
description: >
',c.insertBefore(e,d),b=42===f.offsetWidth,c.removeChild(e),{matches:b,media:a}}}(a.document)}(this),function(a){"use strict";function b(){v(!0)}var c={};a.respond=c,c.update=function(){};var d=[],e=function(){var b=!1;try{b=new a.XMLHttpRequest}catch(c){b=new a.ActiveXObject("Microsoft.XMLHTTP")}return function(){return b}}(),f=function(a,b){var c=e();c&&(c.open("GET",a,!0),c.onreadystatechange=function(){4!==c.readyState||200!==c.status&&304!==c.status||b(c.responseText)},4!==c.readyState&&c.send(null))},g=function(a){return a.replace(c.regex.minmaxwh,"").match(c.regex.other)};if(c.ajax=f,c.queue=d,c.unsupportedmq=g,c.regex={media:/@media[^\{]+\{([^\{\}]*\{[^\}\{]*\})+/gi,keyframes:/@(?:\-(?:o|moz|webkit)\-)?keyframes[^\{]+\{(?:[^\{\}]*\{[^\}\{]*\})+[^\}]*\}/gi,comments:/\/\*[^*]*\*+([^/][^*]*\*+)*\//gi,urls:/(url\()['"]?([^\/\)'"][^:\)'"]+)['"]?(\))/g,findStyles:/@media *([^\{]+)\{([\S\s]+?)$/,only:/(only\s+)?([a-zA-Z]+)\s?/,minw:/\(\s*min\-width\s*:\s*(\s*[0-9\.]+)(px|em)\s*\)/,maxw:/\(\s*max\-width\s*:\s*(\s*[0-9\.]+)(px|em)\s*\)/,minmaxwh:/\(\s*m(in|ax)\-(height|width)\s*:\s*(\s*[0-9\.]+)(px|em)\s*\)/gi,other:/\([^\)]*\)/g},c.mediaQueriesSupported=a.matchMedia&&null!==a.matchMedia("only all")&&a.matchMedia("only all").matches,!c.mediaQueriesSupported){var h,i,j,k=a.document,l=k.documentElement,m=[],n=[],o=[],p={},q=30,r=k.getElementsByTagName("head")[0]||l,s=k.getElementsByTagName("base")[0],t=r.getElementsByTagName("link"),u=function(){var a,b=k.createElement("div"),c=k.body,d=l.style.fontSize,e=c&&c.style.fontSize,f=!1;return b.style.cssText="position:absolute;font-size:1em;width:1em",c||(c=f=k.createElement("body"),c.style.background="none"),l.style.fontSize="100%",c.style.fontSize="100%",c.appendChild(b),f&&l.insertBefore(c,l.firstChild),a=b.offsetWidth,f?l.removeChild(c):c.removeChild(b),l.style.fontSize=d,e&&(c.style.fontSize=e),a=j=parseFloat(a)},v=function(b){var c="clientWidth",d=l[c],e="CSS1Compat"===k.compatMode&&d||k.body[c]||d,f={},g=t[t.length-1],p=(new Date).getTime();if(b&&h&&q>p-h)return a.clearTimeout(i),i=a.setTimeout(v,q),void 0;h=p;for(var s in m)if(m.hasOwnProperty(s)){var w=m[s],x=w.minw,y=w.maxw,z=null===x,A=null===y,B="em";x&&(x=parseFloat(x)*(x.indexOf(B)>-1?j||u():1)),y&&(y=parseFloat(y)*(y.indexOf(B)>-1?j||u():1)),w.hasquery&&(z&&A||!(z||e>=x)||!(A||y>=e))||(f[w.media]||(f[w.media]=[]),f[w.media].push(n[w.rules]))}for(var C in o)o.hasOwnProperty(C)&&o[C]&&o[C].parentNode===r&&r.removeChild(o[C]);o.length=0;for(var D in f)if(f.hasOwnProperty(D)){var E=k.createElement("style"),F=f[D].join("\n");E.type="text/css",E.media=D,r.insertBefore(E,g.nextSibling),E.styleSheet?E.styleSheet.cssText=F:E.appendChild(k.createTextNode(F)),o.push(E)}},w=function(a,b,d){var e=a.replace(c.regex.comments,"").replace(c.regex.keyframes,"").match(c.regex.media),f=e&&e.length||0;b=b.substring(0,b.lastIndexOf("/"));var h=function(a){return a.replace(c.regex.urls,"$1"+b+"$2$3")},i=!f&&d;b.length&&(b+="/"),i&&(f=1);for(var j=0;f>j;j++){var k,l,o,p;i?(k=d,n.push(h(a))):(k=e[j].match(c.regex.findStyles)&&RegExp.$1,n.push(RegExp.$2&&h(RegExp.$2))),o=k.split(","),p=o.length;for(var q=0;p>q;q++)l=o[q],g(l)||m.push({media:l.split("(")[0].match(c.regex.only)&&RegExp.$2||"all",rules:n.length-1,hasquery:l.indexOf("(")>-1,minw:l.match(c.regex.minw)&&parseFloat(RegExp.$1)+(RegExp.$2||""),maxw:l.match(c.regex.maxw)&&parseFloat(RegExp.$1)+(RegExp.$2||"")})}v()},x=function(){if(d.length){var b=d.shift();f(b.href,function(c){w(c,b.href,b.media),p[b.href]=!0,a.setTimeout(function(){x()},0)})}},y=function(){for(var b=0;b