materialsintelligence · cs464osu · Jan 4, 2019 · Jan 4, 2019 · Apr 8, 2019 · Apr 8, 2019
diff --git a/README.md b/README.md
@@ -1,25 +1,163 @@
 <img src="docs/matscholar_logo.png" alt="matscholar logo" width="300px">
 
-`matscholar` (Materials Scholar) is a Python library for materials-focused natural language processing (NLP). It is maintained by a team of researchers at UC Berkeley and Lawrence Berkeley National Laboratory as part of a project funded by the Toyota Research Institute.
+`matscholar` (Materials Scholar) is a Python library for materials-focused natural language 
+processing (NLP). It is maintained by a team of researchers at UC Berkeley and Lawrence Berkeley 
+National Laboratory as part of a project funded by the Toyota Research Institute.
 
-This library provides a Python interface for interacting with the Materials Scholar API, performing basic NLP tasks on scientific text, and example notebooks on using these tools for materials discovery and design.
+This library provides a Python interface for interacting with the Materials Scholar API, performing
+basic NLP tasks on scientific text, and example notebooks on using these tools for materials 
+discovery and design.
 
 
 ## Setup
 
-We *highly* recommend using a [conda environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) when working with materials scholar tools.
+We *highly* recommend using a [conda environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) 
+when working with materials scholar tools.
 
 1. Clone or download this repo
 2. Navigate to the root directory (matscholar)
 3. `pip install -r requirements.txt`
-4. `pip install .` [or](https://stackoverflow.com/questions/15724093/difference-between-python-setup-py-install-and-pip-install) `python setup.py install`
+4. `pip install .` [or](https://stackoverflow.com/questions/15724093/difference-between-python-setup-py-install-and-pip-install) 
+`python setup.py install`
 
 
 ## Configuring Your API Key
 The Materials Scholar API can only be accessed by providing an API key in `x-api-key` request header field. 
 To receive an API key to access the Materials Scholar API, please contact John Dagdelen at jdagdelen@lbl.gov.
 
-Once you have an API key, you can add it as an environment variable `MATSCHOLAR_API_KEY` for ease of use. 
+## API Usage
+
+For convenience, the Materials Scholar API can be accessed via a python wrapper.
+
+### Instantiating the Rester
+
+If an API key has already been obtained, the rester is instantiated as follows:
+
+```python
+from matscholar.rest import Rester
+
+rester = Rester(api_key="your-api-key", endpoint="api.matscholar.com")
+```
+
+To avoid passing the API key and endpoint as arguments, set the following environment variables 
+for ease of use: `MATSCHOLAR_API_KEY`, `MATERIALS_SCHOLAR_ENDPOINT`.
+
+### Resources
+
+The methods of the Rester class can be used to access resources of the Materials Scholar API.
+
+**Searching documents**
+
+Our corpus of materials science abstracts can be searched based on text matching 
+(ElasticSearch) or by filtering based on the Named Entities extracted from each document. 
+Entity based searches support the following entity types: material, property, application, 
+descriptor, characterization, synthesis, phase.
+
+To get the raw text of abstracts matching a given query:
+
+```python
+# text match for "solid oxide fuel cells"
+example_text = "solid oxide fuel cells"
+
+# entity filters: include documents mentioning BaZrO3 and nanoparticles; 
+# exclude documents mentioning thin films
+example_entities = {"material": ["BaZrO3"], "descriptor": ["nanoparticle", "-thin film"]}
+
+docs = rester.search_documents(text=example_text, filters=example_entities)
+```
+
+This will return a list of dictionaries containing the raw-text for each abstracts along with 
+associated metadata.
+
+**Searching entities**
+
+We have extracted materials-science named entities from nearly 3.5 million materials science
+absracts. Details on how this was performed can be found in Ref. [1].
+
+The extracted named entities for each document associated with a query are returned by the 
+search_entities method. This method takes as input a dictionary with entity types as keys and a list of entities
+ as values. For example, to find all of the entities that co-occur with the material
+"GaN":
+
+```python
+docs = rester.search_entities(query={"material": ["GaN"]})
+```
+
+This wil return a list of dictionaries representing documents matching the query; each dict will contain 
+the DOI as well as each unique entity found in the corresponding abstract.
+
+A summary of the entities associated with a query can be generated using the search_entities_summary method. To get 
+statistics for entities co-occuring with GaN,
+
+```python
+summary = rester.search_entities_summary(query={"material": ["GaN"]})
+```
+ This will return a dictionary with entity types as keys; the values will be a list of the top entities
+ that occur in documents matching the query, each item in the list will be [entity, document count, fraction].
+
+To perform a fast literature review, the search_materials_by_entities method may be used. For a chosen application, 
+this will return a list of all materials that co-occur with that application in our corpus. For example,
+to see which materials co-occur with the word thermoelectric in a document,
+
+```python
+mat_list = rester.search_materials_by_entities(["thermoelectric"], elements=["-Pb"], cutoff=None)
+```
+
+The above search will find all materials co-occurring with thermoelectric that do not contain lead. 
+The result will be a list, with each element containing a list of [material, co-occurence counts, co-occurrence dois].
+
+**Word embeddings**
+
+Materials science word embeddings trained using word2vec; details on how the embeddings were trained,
+and their application in materials science discovery can be found in Ref. [2].
+
+To get the word embedding for a given word,
+```python
+embedding = rester.get_embedding("photovoltaics")
+```
+
+This will return a dict containing the embedding. The word embedding will be a 200-dimensional array.
+
+The rester also has a get_close_words method (based on cosine similarity of embeddings) which can be used to 
+explore the semantic similarity of materials science terms; this approach can be used discover materials
+for a new application (as outlined in the reference above), 
+
+To find words with a similar embedding to photovolatic:
+
+```python
+close_words = rester.get_close_words("photovoltaics", top_k=1000)
+```
+
+This will return the 1000 closest words to photovoltaics. The result will be a dictionary containing 
+the close words and their cosine similarity to the input word. 
+
+**Named Entity Recognition**
+
+In addition to the pre-processed entities present in our corpus, users can performed Named Entity 
+Recognition on any raw materials science text. The details of the model can be found in Ref. [1].
+
+The input should be a list of documents with the text represented as a string:
+
+```python
+doc_1 = "The bands gap of TiO2 is 3.2 eV. This was measured via photoluminescence"
+doc_2 = "We deposit GaN thin films using MOCVD"
+docs = [doc_1, doc_2] 
+tagged_docs = rester.perform_ner(docs, return_type="concatenated")
+```
+
+The arguement return_type may be set to iob, concatenated, or normalized. The latter will replace
+entities with their most frequently occurring synonym. A  list of tagged documents will be returned.
+Each doc is a list of sentences; each sentence is a list of (word, tag) pairs.
+
+## Citation
+
+If you use any of the API functionality in your research, please consider citing the following papers
+where relevent:
+
+[1] Weston et al., coming soon
+
+[2] Tshitoyan et al., Nature (accepted)
+
 
 ## Contributors
 @jdagdelen, @vtshitoyan, @lweston
diff --git a/matscholar/rest.py b/matscholar/rest.py
@@ -12,7 +12,7 @@
 """
 
 __author__ = "John Dagdelen"
-__credits__ = "Shyue Ping Ong, Shreyas Cholia, Anubhav Jain"
+__credits__ = "Leigh Weston, Amalie Trewartha, Vahe Tshitoyan"
 __copyright__ = "Copyright 2018, Materials Intelligence"
 __version__ = "0.1"
 __maintainer__ = "John Dagdelen"
@@ -66,6 +66,7 @@ def __exit__(self, exc_type, exc_val, exc_tb):
     def _make_request(self, sub_url, payload=None, method="GET"):
         response = None
         url = self.preamble + sub_url
+        print(url)
         try:
             if method == "POST":
                 response = self.session.post(url, json=payload, verify=True)
@@ -88,7 +89,7 @@ def _make_request(self, sub_url, payload=None, method="GET"):
                 if hasattr(response, "content") else str(ex)
             raise MatScholarRestError(msg)
 
-    def materials_search(self, positive, negative=None, ignore_missing=True, top_k=10):
+    def search_materials(self, positive, negative=None, ignore_missing=True, top_k=10):
         """
         Given input strings or lists of positive and negative words / phrases, returns a ranked list of materials with
         corresponding scores and numbers of mentions
@@ -111,7 +112,7 @@ def materials_search(self, positive, negative=None, ignore_missing=True, top_k=1
 
         return self._make_request(sub_url, payload=payload, method=method)
 
-    def close_words(self, positive, negative=None, ignore_missing=True, top_k=10):
+    def get_close_words(self, positive, negative=None, ignore_missing=True, top_k=10):
         """
         Given input strings or lists of positive and negative words / phrases, returns a list of most similar words /
         phrases according to cosine similarity
@@ -187,46 +188,124 @@ def materials_map(self, highlight, limit=None, ignore_missing=True, number_to_su
 
         return self._make_request(sub_url, payload=payload, method=method)
 
-    def search_ents(self, query):
-        '''
+    def search_entities(self, query):
+        """
         Get the entities in each document associated with a given query
 
         :param query: dict; e.g., {'material': ['GaN', '-InN']), 'application': ['LED']}
         :return: list of dicts; each dict represents a document and contains the extracted entities
-        '''
-        method = 'POST'
-        sub_url = '/ent_search'
+        """
+
+        method = "POST"
+        sub_url = "/ent_search"
         payload = query
 
         return self._make_request(sub_url, payload=payload, method=method)
 
-    def get_summary(self, query):
+    def get_close_journals(self, query):
         '''
+
+        :param query: string: a paragraph
+        :return: list: [['journal name', 'cosine similarity'], ...]
+        '''
+
+        method = 'POST'
+        sub_url = '/journal_suggestion'
+        payload = {'abstract': query}
+
+        return self._make_request(sub_url, payload=payload, method=method)
+
+
+    def search_entities_summary(self, query):
+        """
         Get a summary of the entities associated with a given query
 
         :param query: dict; e.g., {'material': ['GaN', '-InN']), 'application': ['LED']}
         :return: dict; a summary dict with keys for each entity type
-        '''
-        method = 'POST'
-        sub_url = '/ent_search/summary'
+        """
+
+        method = "POST"
+        sub_url = "/ent_search/summary"
         payload = query
 
         return self._make_request(sub_url, payload=payload, method=method)
 
+    def get_close_materials(self, material):
+        """
+        Finds the most similar compositions in the corpus.
+
+        :param material: string; a chemical composition
+        :return: list; the most similar compositions
+        """
+        method = "GET"
+        sub_url = '/materials/similar/{}'.format(material)
+        return self._make_request(sub_url, method=method)
+
+    def perform_ner(self, docs, return_type="concatenated"):
+        """
+        Performs Named Entity Recognition.
+
+        :param docs: list; a list of documents; each document is represented as a single string
+        :param return_type: string; output format, can be "iob", "concatenated", or "normalized"
+        :return: list; tagged documents
+        """
+
+        method = "POST"
+        sub_url = "/ner"
+        payload = {
+            "docs": docs,
+            "return_type": return_type
+        }
+        return self._make_request(sub_url, payload=payload, method=method)
+
+    def search_materials_by_entities(self, entities, elements, cutoff=None):
+        """
+        Finds materials that co-occur with specified entities. The returned materials can be screened
+        by specifying elements that must be included/excluded from the stoichiometry.
+
+        :param entities: list of strings; each string is a property or application
+        :param elements: list of strings; each string is a chemical element. Materials
+        will only be returned if they contain these elements; the opposite can also be
+        achieved - materials can be removed from the returned list by placing a negative
+        sign in from of the element, e.g., "-Ti"
+        :param cutoff: int or None; if int, specifies the number of materials to
+        return; if None, returns all materials
+        :return: list; a list of chemical compositions
+        """
+
+        method = "POST"
+        sub_url = "/search/material_search"
+        payload = {
+            "entities": entities,
+            "elements": elements,
+            "cutoff": cutoff
+        }
+        return self._make_request(sub_url, payload=payload, method=method)
+
+    def search_documents(self, text, filters, cutoff=None):
+        """
+        Search abstracts by text with filters for entities
+        :param text: string; text to search
+        :param filters: dict; e.g., {'material': ['GaN', '-InN']), 'application': ['LED']}
+        :param cutoff: int or None; if int, specifies the number of matches to
+        return; if None, returns all matches
+        :return: list; a list of chemical compositions
+        """
+
+        method = "POST"
+        sub_url = "/search"
+        filters['text'] = text
+        payload = {
+            "query": filters,
+            "limit": cutoff
+        }
+
+        return self._make_request(sub_url, payload=payload, method=method)
+
 
 class MatScholarRestError(Exception):
     """
     Exception class for MatstractRester.
     Raised when the query has problems, e.g., bad query format.
     """
     pass
-
-
-if __name__ == '__main__':
-    query = {
-        'material' : ['GaN', '-InN'],
-        'application' : ['LED']
-    }
-    query = json.dumps(query)
-    rest = Rester()
-    print(rest.get_summary(query))