Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions app/components/Layout/components/Content/content.styles.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import { css } from "@emotion/react";
import styled from "@emotion/styled";
import { FONT } from "@databiosphere/findable-ui/lib/styles/common/constants/font";
import { ThemeProps } from "@databiosphere/findable-ui/lib/theme/types";
import { typographyToCSS } from "@databiosphere/findable-ui/lib/styles/common/mixins/typography";
import { ThemeProps } from "@databiosphere/findable-ui/lib/theme/types";
import { css } from "@emotion/react";
import styled from "@emotion/styled";

const muiAlert = ({ theme }: ThemeProps) => css`
.MuiAlert-root {
Expand Down Expand Up @@ -89,6 +89,10 @@ export const Content = styled.div`
}
}

hr {
margin: 20px 0;
}

p {
font: ${FONT.BODY_LARGE_400_2_LINES};
margin: 0 0 16px;
Expand All @@ -98,6 +102,10 @@ export const Content = styled.div`
}
}

pre + p {
margin-top: 16px;
}

ul {
font: ${FONT.BODY_LARGE_400_2_LINES};
margin: 16px 0;
Expand Down
4 changes: 2 additions & 2 deletions app/components/common/Figure/figure.styles.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import styled from "@emotion/styled";
import { PALETTE } from "@databiosphere/findable-ui/lib/styles/common/constants/palette";
import { FONT } from "@databiosphere/findable-ui/lib/styles/common/constants/font";
import { PALETTE } from "@databiosphere/findable-ui/lib/styles/common/constants/palette";
import styled from "@emotion/styled";

export const Figure = styled.figure`
margin: 32px 0;
Expand Down
28 changes: 28 additions & 0 deletions app/content/anvil-cmg/guides/data-download-options.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<Breadcrumbs
breadcrumbs={[
{ path: "/datasets", text: "AnVIL Data Explorer" },
{ path: "/guides", text: "Guides" },
{ path: "", text: "Data Download Options" },
]}
/>

# Data Download Options

There are several ways to download files for use on local, institutional, or other computational services.

With support from Amazon's Open Data Sponsorship Program, the AnVIL open-access datasets are available with no-cost egress and for use within the AWS environment.

Managed-access datasets can be downloaded from Google Cloud Platform on a requester-pays basis. For more information, refer to the ["Requesting Data Access"]({portalURL}/learn/find-data/requesting-data-access#requester-pays) document.

The following options are available through the AnVIL Data Explorer:

- **TSV File Manifest Downloads**
- Available for all datasets, including open and managed access datasets.
- Manifest can include one or more datasets based on the search criteria used in the AnVIL Data Explorer.

- **Data Download via curl**
- Available only for open-access datasets.
- `curl` Command downloads can be full datasets or include select file types from one or more open-access datasets.

- **Individual File Download**
- Available only for files in open-access datasets.
103 changes: 103 additions & 0 deletions app/content/anvil-cmg/guides/data-download-via-curl.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
<Breadcrumbs
breadcrumbs={[
{ path: "/datasets", text: "AnVIL Data Explorer" },
{ path: "/guides", text: "Guides" },
{ path: "", text: "Data Download via curl" },
]}
/>

# Data Download via curl

The **Download Open-Access Data (curl Command)** enables the user to select the organism type and file formats they wish to transfer to a local or institutional system. Complete datasets can be downloaded by selecting all available file types.

**NOTE:** At this time, this option is available only for open-access datasets.

## Prerequisites

curl must be installed on the destination system where the command will be run. Most Mac, Linux, Windows 10 & 11 systems include curl by default. Older Windows users can download it from the curl website or use Windows Subsystem for Linux (WSL).

## Example

### Downloading The Full Dataset

1. Visit the dataset of interest by clicking on the dataset name in the Data Explorer.

<Figure
alt="Visit the dataset of interest"
src="/guides/curl-command-download/single-dataset-download-01.webp"
width="100%"
/>

2. On the dataset description page, click on the "Export" button in the upper right-hand corner of that page.

<Figure
alt="Click the Export button"
src="/guides/curl-command-download/single-dataset-download-02.webp"
width="100%"
/>

3. Then click on "Download Open-Access Data Files (No Data Transfer Fees)" in the "Download" section near the bottom of the page.

<Figure
alt="Click Download Open-Access Data Files"
src="/guides/curl-command-download/single-dataset-download-03.webp"
width="100%"
/>

4. This will display a screen that allows some refinement of the data to download.

<Figure
alt="Refine the data to download"
src="/guides/curl-command-download/single-dataset-download-04.webp"
width="100%"
/>

5. Select all of the organism type(s) at the top of the page.

6. Check the box next to the Name heading. This will select all of the file types.
- If the user wants to download only specific file types, select only those file types and leave the others unchecked.

7. Select Bash<sup>[1](#footnote-1)</sup> if you are on Mac, Linux, or Windows Subsystem for Linux; select cmd.exe if you are on Windows Command Prompt.

8. Click on the Request curl Command button.

<Figure
alt="Click the Request curl Command button"
src="/guides/curl-command-download/single-dataset-download-08.webp"
width="100%"
/>

This will generate a curl manifest and the command needed to transfer the files. The resulting command will be similar to this:

```
curl --location --fail https://service.explore.anvilproject.org/manifest/files/ksQylKdhbnZpbDEzpGN1cmzEEKxolyZNG12_p9nHuKrRpbDEEH2f6ZDL2lSzofvXZ80pfgXEIJHlLajfJ07ut9ZEMwSwDDAdmSZQam5pZbCxG3WZeFBl | curl --retry 15 --retry-delay 10 --config -
```

On the destination system, issue the specified curl command. Clicking the text box containing the curl command copies it to your clipboard so you can paste it into a terminal window.

<Figure
alt="Copy the curl command to clipboard"
src="/guides/curl-command-download/single-dataset-download-08b.webp"
width="100%"
/>

For single-dataset downloads, a series of subdirectories will be created containing the selected files from that dataset.

### Downloading Files From Multiple Datasets

Downloading files from multiple datasets works the same way as downloading from a single dataset, except for how you select the datasets.

In this case, on the Data Explorer's main page, use the faceted search feature in the right-hand column to select the datasets of interest and then click on the "Export" button on the top right of the page.

<Figure
alt="Select datasets and click Export"
src="/guides/curl-command-download/multiple-datasets-download-01.webp"
width="100%"
/>

From this point on, the interface is the same as the single dataset download above. Continue with Step 3 above.

---

<sup id="footnote-1">1</sup> The Bash shell will work for most of the common
Unix/Linux command-line shells.
47 changes: 47 additions & 0 deletions app/content/anvil-cmg/guides/individual-file-download.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<Breadcrumbs
breadcrumbs={[
{ path: "/datasets", text: "AnVIL Data Explorer" },
{ path: "/guides", text: "Guides" },
{ path: "", text: "Individual File Download" },
]}
/>

# Individual File Download

Individual file downloads from the AnVIL Data Explorer are available for open-access files. Files can also be downloaded with information in the dataset manifests.

## Example

Downloading individual files.

1. Use the faceted search in the left-hand column to limit the scope of the files listed.

<Figure
alt="Use the faceted search to limit the scope of files"
src="/guides/individual-download/individual-files-01.webp"
width="100%"
/>

2. Select the "Files" tab at the top of the list of datasets. This will change the display to list the available files.

<Figure
alt="Select the Files tab"
src="/guides/individual-download/individual-files-02.webp"
width="100%"
/>

3. To download the file, click the download button next to the file of interest. This will start the Download folder as specified in the browser configuration.

<Figure
alt="Click the download button next to the file"
src="/guides/individual-download/individual-files-03.webp"
width="100%"
/>

Note that if there are files in the list that are not available for download (e.g., files from a managed access dataset), the download icon will be grayed out.

<Figure
alt="Disabled download icon for files that are not available for download"
src="/guides/individual-download/individual-files-04.webp"
width="100%"
/>
86 changes: 86 additions & 0 deletions app/content/anvil-cmg/guides/tsv-file-manifest-download.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
<Breadcrumbs
breadcrumbs={[
{ path: "/datasets", text: "AnVIL Data Explorer" },
{ path: "/guides", text: "Guides" },
{ path: "", text: "TSV File Manifest Download" },
]}
/>

# TSV File Manifest Download

Manifest downloads are available for all of the datasets listed in the AnVIL Data Explorer, including both open-access and managed-access datasets. A tab-separated-value file (.tsv) is generated based on the data selected.

The downloaded manifest contains a number of columns. Depending on how the data will be accessed and used, some of the key columns are:

- **dataset.title**, which contains the name of the dataset that the file belongs to.
- A manifest can contain files from multiple datasets, depending on how the file is generated.
- **datasets.consent_group** and **datasets.data_use_permission**, which contain the dataset's consent and use codes.
- **files.file_size**, which contains the file size in bytes.
- **files.name**, which contains the file name.
- **files.drs_url**, which contains the DRS URL for use within the Terra environment.
- **files.azul_url**, which is a URL that allows HTTP access to the individual file.
- Files in open-access datasets are available via this link.
- At this time, AnVIL requires requester-pays for managed-access datasets, so the files are not accessible through this URL.
- **files.azul_mirror_url**, which contains the URI to the Amazon Web Services S3 bucket for that file.
- Please note that the file name in the bucket is a hash to reduce storage requirements in case there is file duplication.
- This field will be blank if the file is not present through the AWS Open Data Sponsorship Program.

## Example

### Downloading The Manifest For A Single Dataset

1. Visit the dataset of interest by clicking on the dataset name in the Data Explorer.

<Figure
alt="Visit the dataset of interest"
src="/guides/dataset-manifest-download/single-dataset-download-01.webp"
width="100%"
/>

2. On the dataset description page, click on the "Export" button in the upper right-hand corner of that page.

<Figure
alt="Click the Export button"
src="/guides/dataset-manifest-download/single-dataset-download-02.webp"
width="100%"
/>

3. Then click on "Download TSV Manifest" in the "Download" section near the bottom of the page.

<Figure
alt="Click Download TSV Manifest"
src="/guides/dataset-manifest-download/single-dataset-download-03.webp"
width="100%"
/>

4. This will display a screen to request the generation of the manifest. Click on the "Request Link" button.

<Figure
alt="Click the Request Link button"
src="/guides/dataset-manifest-download/single-dataset-download-04.webp"
width="100%"
/>

5. Once the manifest is generated, you can either download it directly by clicking the download icon or copy its URL by clicking the copy icon.

<Figure
alt="Download or copy the manifest link"
src="/guides/dataset-manifest-download/single-dataset-download-05.webp"
width="100%"
/>

The manifest can be viewed with any utilities that can import tab-separated-value files. It can additionally be processed with scripts depending on the need.

### Downloading A Manifest For Multiple Datasets

Downloading files from multiple datasets works the same way as downloading from a single dataset, except for how you select the datasets.

In this case, on the Data Explorer's main page, use the faceted search feature in the right-hand column to select the datasets of interest and then click on the "Export" button on the top right of the page.

<Figure
alt="Select datasets and click Export"
src="/guides/dataset-manifest-download/multiple-datasets-download-01.webp"
width="100%"
/>

From this point on, the interface is the same as the single dataset download above. Continue with Step 3 above.
2 changes: 2 additions & 0 deletions app/content/common/constants.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { ANCHOR_TARGET } from "@databiosphere/findable-ui/lib/components/Links/common/entities";
import { Divider } from "@mui/material";
import * as C from "../../components";
import { Figure } from "../../components/common/Figure/figure";
import { Link } from "../../components/Layout/components/Content/components/Link/link";
Expand All @@ -11,6 +12,7 @@ export const MDX_COMPONENTS = {
Breadcrumbs: C.Breadcrumbs,
Figure,
a: Link,
hr: Divider,
};

export const MDX_SCOPE = { ANCHOR_TARGET };
8 changes: 1 addition & 7 deletions app/content/common/contentPages.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,7 @@ export async function getContentStaticProps(
!contentPathname ||
!isContentPathnameExists(contentPathname, slug)
) {
return {
props: {
mdxSource: null,
pageTitle,
slug: null,
},
};
return { notFound: true };
}
const markdownPathname = getMarkdownPathname(contentPathname, slug);
const markdownWithMeta = fs.readFileSync(markdownPathname, "utf-8");
Expand Down
52 changes: 52 additions & 0 deletions pages/guides/data-download-options.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import { Main } from "@databiosphere/findable-ui/lib/components/Layout/components/ContentLayout/components/Main/main";
import { Nav } from "@databiosphere/findable-ui/lib/components/Layout/components/Nav/nav";
import { ContentView } from "@databiosphere/findable-ui/lib/views/ContentView/contentView";
import { GetStaticProps, InferGetStaticPropsType } from "next";
import { MDXRemote } from "next-mdx-remote";
import { JSX } from "react";
import { Content } from "../../app/components/Layout/components/Content/content";
import { MDX_COMPONENTS } from "../../app/content/common/constants";
import { getContentStaticProps } from "../../app/content/common/contentPages";
import {
ABOUT_ANVIL_EXPLORER,
DATA_DOWNLOAD_OPTIONS,
DATA_DOWNLOAD_VIA_CURL,
INDIVIDUAL_FILE_DOWNLOAD,
TSV_FILE_MANIFEST_DOWNLOAD,
} from "../../site-config/anvil-cmg/dev/layout/navigationItem";
const slug = ["guides", "data-download-options"];

export const getStaticProps: GetStaticProps = async () => {
return getContentStaticProps({ params: { slug } }, "Data Download Options");
Comment thread
frano-m marked this conversation as resolved.
};

const Page = ({
layoutStyle,
mdxSource,
}: InferGetStaticPropsType<typeof getStaticProps>): JSX.Element => {
return (
<ContentView
content={
<Content>
<MDXRemote {...mdxSource} components={MDX_COMPONENTS} />
</Content>
}
navigation={
<Nav
navigation={[
ABOUT_ANVIL_EXPLORER,
{ active: true, ...DATA_DOWNLOAD_OPTIONS },
TSV_FILE_MANIFEST_DOWNLOAD,
DATA_DOWNLOAD_VIA_CURL,
INDIVIDUAL_FILE_DOWNLOAD,
]}
/>
Comment thread
frano-m marked this conversation as resolved.
}
layoutStyle={layoutStyle ?? undefined}
/>
);
};

Page.Main = Main;

export default Page;
Loading
Loading