diff --git a/cli.md b/cli.md index 54e0026..6046ce4 100644 --- a/cli.md +++ b/cli.md @@ -8,23 +8,23 @@ nav_order: 5 ## QuickStart -Before using Gleaner, not that Gleaner does have one prerequisit in the form of an accessible S3 compliant +Before using Gleaner, note that Gleaner does have one prerequisite in the form of an accessible S3-compliant object store. This can be AWS S3, Google Cloud Storage or others. Also, there is the open source and free Minio object store which is used in many of the examples in GleanerIO. -Once you have the object store running and ready you are ready to run Gleaner. -Pull down the release that matches your ssysem from [version 3.0.4](https://github.com/gleanerio/gleaner/releases/tag/v3.0.4-dev). -Below is an example of pulling this down for a Linux system on ADM64 architecture. +Once you have the object store running and ready, you are ready to run Gleaner. +Pull down the release that matches your system from [version 3.0.4](https://github.com/gleanerio/gleaner/releases/tag/v3.0.4-dev). +Below is an example of pulling this down for a Linux system on AMD64 architecture. ``` wget https://github.com/gleanerio/gleaner/releases/download/v3.0.4-dev/gleaner-v3.0.4-dev-linux-amd64.tar.gz ``` -You will need a configuration file and an example such file can be found in the resources directory. See also +You will need a configuration file and an example of one can be found in the resources directory. See also the config file in the Gleaner Config page. You can set the values in this configuration file. However, you can leave the Minio value empty and pass -then via environment variables. This sort of approach can work better in some orchestration environments or just +them via environment variables. This sort of approach can work better in some orchestration environments or just be a safer approach to managing these keys. ``` @@ -36,7 +36,7 @@ export MINIO_SECRET_KEY=SECRETVALUE export MINIO_BUCKET=mybucket ``` -With those set and your configuration file in palce you can run Gleaner with +With those set and your configuration file in place you can run Gleaner with ``` diff --git a/config.md b/config.md index 592ae5e..5520567 100644 --- a/config.md +++ b/config.md @@ -9,8 +9,8 @@ nav_order: 3 ## Source -Sources can be defined as two type. A sitemap, which is a traditional sitemap that -points to resources or a sitemap index that points to a set of sitemaps. +Sources can be defined as two types. A sitemap, which is a traditional sitemap that +points to resources, or a sitemap index that points to a set of sitemaps. The other is a sitegraph, which is a pre-computed graph for a site. diff --git a/dockercli.md b/dockercli.md index cba5f58..bdc7aa3 100644 --- a/dockercli.md +++ b/dockercli.md @@ -50,13 +50,12 @@ total 1356 -rwxr-xr-x 1 fils fils 1852 Aug 15 14:06 gleanerDocker.sh ``` -Let's see if we can setup our support infrastructure for Gleaner. The -file gleaner-IS.yml is a docker compose file that will set up the object store, +Let's see if we can set up our support infrastructure for Gleaner. The +file gleaner-IS.yml is a docker compose file that will set up the object store and a triplestore. -To do this we need to set up a few environment variables. To do this we can -leverage the setenvIS.sh script. This script will set up the environment we need. -Note you can also use a .env file or other approaches. You can references +To do this, we need to set up a few environment variables by leveraging the setenvIS.sh script. This script will set up the environment we need. +Note you can also use a .env file or other approaches. You can reference the [Environment variables in Compose](https://docs.docker.com/compose/environment-variables/) documentation. ```bash @@ -86,7 +85,7 @@ working config file was downloaded. > Note: This config file will change... it's pointing to an OIH partner > and I will not do that for the release. I have a demo site I will use. -Next we need to setup our object for Gleaner. Gleaner itself can do this +Next we need to set up our object for Gleaner. Gleaner itself can do this task so we will use ```bash @@ -140,7 +139,7 @@ millers.go:81: Miller run time: 0.024649 ## Working with results If all has gone well, at this point you have downloaded the JSON-LD documents into Minio or -some other object store.Next we will install a client that we can use to work with these objects. +some other object store. Next we will install a client that we can use to work with these objects. Note, there is a web interface exposed on the port mapped in the Docker compose file. In the case of these demo that is 9000. You can access it at @@ -164,7 +163,7 @@ There is also a [Minio Client Docker image](https://hub.docker.com/r/minio/minio that you can use as well but it will be more difficult to use with the following scripts due to container isolation. -To man an entry in the mc config use: +To make an entry in the mc config, use: ``` mc alias set oih http://localhost:9000 worldsbestaccesskey worldsbestsecretkey @@ -190,7 +189,7 @@ You can explore mc and see how to copy and work with the object store. As part of our Docker compose file we also spun up a triplestore. Let's use that now. -Now Download the minio2blaze.sh script. +Now download the minio2blaze.sh script. ```bash curl -O https://raw.githubusercontent.com/earthcubearchitecture-project418/gleaner/master/scripts/minio2blaze.sh @@ -239,7 +238,7 @@ where LIMIT 10 ``` -A very simple SPARQL to give us the first 10 results from the triplestore. If all has gone well, +A very simple SPARQL query to give us the first 10 results from the triplestore. If all has gone well, we should see something like: ![Blazegrah](./assets/images/simplequery.png) diff --git a/faircontext.md b/faircontext.md index 3273f85..2446fea 100644 --- a/faircontext.md +++ b/faircontext.md @@ -22,7 +22,7 @@ To provide better context we can define three personas to better express the rol ### Persona: Publisher -The Publisher is engaged authoring the JSON-LD documents and publishing them +The Publisher is engaged with authoring the JSON-LD documents and publishing them to the web. This persona is focused on describing and presenting structured data on the web to aid in the discovery and use the resources they manage. Details on this persona can be found in the [Publisher](../publishing/publishing.md) section. @@ -52,8 +52,8 @@ user experiences are described in the [User](../users/referenceclient.md) sectio ## FAIR Implementation Network -We can think of the above personnas and how they might be represented in a FAIR -implementation network. The diagram that follow represents some of these relations. +We can think of the above personas and how they might be represented in a FAIR +implementation network. The diagram that follows represents some of these relations. ![relations](assets/images/relations.png) @@ -128,7 +128,7 @@ the Go-FAIR [FAIR Principles](https://www.go-fair.org/fair-principles/) page. | Principles | Project | | ------------------- | ------------------------------------------------------------------------ | | License | schema:license or related (again, here we can leverage SHACL validation) | -| Community standards | Ocean InfoHub, POLDER, CCADI, GeoCODEs, Internet of Water | +| Community standards | Ocean InfoHub, POLDER, CCADI, GeoCODES, Internet of Water | ## Users @@ -149,7 +149,7 @@ GeoCODES is an NSF Earthcube program effort to better enable cross-domain discov [https://oceaninfohub.org/](https://oceaninfohub.org/) -The Ocean InfoHub (OIH) Project aims to improve access to global oceans information, data and knowledge products for management and sustainable development.The OIH will link and anchor a network of regional and thematic nodes that will improve online access to and synthesis of existing global, regional and national data, information and knowledge resources, including existing clearinghouse mechanisms. The project will not be establishing a new database, but will be supporting discovery and interoperability of existing information systems.The OIH Project is a three-year project funded by the Government of Flanders, Kingdom of Belgium, and implemented by the IODE Project Office of the IOC/UNESCO. +The Ocean InfoHub (OIH) Project aims to improve access to global oceans information, data and knowledge products for management and sustainable development. The OIH will link and anchor a network of regional and thematic nodes that will improve online access to and synthesis of existing global, regional and national data, information and knowledge resources, including existing clearinghouse mechanisms. The project will not be establishing a new database, but will be supporting discovery and interoperability of existing information systems. The OIH Project is a three-year project funded by the Government of Flanders, Kingdom of Belgium, and implemented by the IODE Project Office of the IOC/UNESCO. * [OIH Book](https://book.oceaninfohub.org) * [Example Validation](https://github.com/gleanerio/notebooks/blob/master/notebooks/validation/output/report_07-18-2022-15-11-18.pdf) diff --git a/index.md b/index.md index 32eb3df..c9dea88 100644 --- a/index.md +++ b/index.md @@ -17,15 +17,15 @@ Gleaner is a tool for extracting JSON-LD from web pages. You provide Gleaner a l ## Open Foundation -Communities of practice can leverage open schema (schema.org) along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as foundation allows a community to provide a more detailed community experiences, while still leveraging the global reach of commercial search indexes. +Communities of practice can leverage open schema (schema.org) along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as a foundation allows a community to provide more detailed community experiences, while still leveraging the global reach of commercial search indexes. ## Big Picture - Gleaner is part of the larger GleanerIO approach. GleanerIO includes approaches for leveraging spatial, semantic, full text or other index approaches. Additionally there is guidance on running Gleaner as part of a routinely updated index of resources and a reference interface for searching the resulting graph. GleanerIO provides a full stack approach to go from indexing to a basic user interface searching a generated Knowledge Graph, an example index. The whole GleanerIO stack can be run on a laptop (it uses Docker Compose files) or deployed to the cloud. Cloud environments used include AWS, Google Cloud, and OpenStack. + Gleaner is part of the larger GleanerIO approach. GleanerIO includes approaches for leveraging spatial, semantic, full text or other index approaches. Additionally, there is guidance on running Gleaner as part of a routinely updated index of resources and a reference interface for searching the resulting graph. GleanerIO provides a full stack approach to go from indexing to a basic user interface searching a generated Knowledge Graph, an example index. The whole GleanerIO stack can be run on a laptop (it uses Docker Compose files) or deployed to the cloud. Cloud environments used include AWS, Google Cloud, and OpenStack. GleanerIO is also designed to play well with others. As long as packages work well in a web architecture framework, they likely can be integrated into the GleanerIO approach. The GleanerIO approach is modular and even Gleaner itself could be swapped out for other implementations. -Indeed, GleanerIO advocates _principles over project_. GleanerIO is really just a set of principles for which reference implementations (projects) have been developed or external projects have been used. These have evolved and been implemented to address communities like Ocean InfoHub, Internet of Water, GeoCODES and more. The results and approaches of these communities are openly maintained at the GleanerIO GitHub Organization pages. They provide guidance on how yet other communities could leverage this approach to address their functional needs. +Indeed, GleanerIO advocates _principles over project_. GleanerIO is really just a set of principles for which reference implementations (projects) have been developed or external projects have been used. These have evolved and been implemented to address communities like Ocean InfoHub, Internet of Water, GeoCODES and more. The results and approaches of these communities are openly maintained at the GleanerIO GitHub Organization pages. They provide guidance on how other communities could leverage this approach to address their functional needs. ## History -Communities of practice can leverage open schema (schema.org) along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as foundation allows a community to provide a more detailed community experiences, while still leveraging the global reach of commercial search indexes. \ No newline at end of file +Communities of practice can leverage open schema (schema.org) along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as a foundation allows a community to provide more detailed community experiences, while still leveraging the global reach of commercial search indexes.