flaxsearch · epugh · Sep 10, 2019 · Sep 10, 2019 · Sep 10, 2019 · Sep 10, 2019
diff --git a/README.md b/README.md
@@ -16,20 +16,20 @@ Taxonomies and document classification
 --------------------------------------
 
 Clade taxonomies have a tree structure, with a single top-level category (e.g.
-in the example data, "Social Psychology"). There is no distinction between 
+in the example data, "Social Psychology"). There is no distinction between
 parent and child nodes (except that the former has children) and the hierachical
 structure of the taxonomy is completely orthogonal from the node data. The
 structure may be freely edited.
 
 Each node represents a category, which is represented by a set of "keywords"
-(words or phrases) which should be present in a document belonging to that 
-category. Not all the keywords have to be present - they are joined with 
-Boolean OR rather than AND. A document may belong to multiple categories, 
+(words or phrases) which should be present in a document belonging to that
+category. Not all the keywords have to be present - they are joined with
+Boolean OR rather than AND. A document may belong to multiple categories,
 which are ranked according to standard Solr (TF-IDF) scoring. It is
 also possible to exclude certain keywords from categories.
 
 Clade will also suggest keywords to add to a category, based on the content of
-the documents already in the category. This feature is currently slow as it 
+the documents already in the category. This feature is currently slow as it
 uses the standard Solr MoreLikeThis component to analyse a large number of
 documents. We plan to improve this for a future release by writing a custom
 Solr plugin.
@@ -43,20 +43,20 @@ export the document categorisation to an XML or CSV file.
 Installation prerequisites
 --------------------------
 
-- Java 6 or 7
+- Java 6 or later
 
-- Python 2.6 or 2.7
+- Python 3
 
 - Solr 4.6 (also tested with 4.8.1)
     http://www.apache.org/dyn/closer.cgi/lucene/solr/4.6.2
     Other versions may also work, but we have not tested them with Clade.
 
 - Python modules
-    sunburnt:   http://opensource.timetric.com/sunburnt/installation.html
+    scorched:   https://github.com/lugensa/scorched
     lxml:       http://pypi.python.org/pypi/lxml/
-    httplib2:   http://code.google.com/p/httplib2/downloads/list
-	
-On Windows you can download binaries of httplib2 from http://www.lfd.uci.edu/~gohlke/pythonlibs/
+    pyjq:       https://github.com/doloopwhile/pyjq
+    httplib2
+
 
 Download the latest versions from these sites and follow the installation
 instructions. The system has been developed on Linux and the following instructions
@@ -69,32 +69,32 @@ as appropriate):
 
     $ export CFLAGS=-Qunused-arguments
     $ export CPPFLAGS=-Qunused-arguments
-    $ sudo -E pip install lxml	
+    $ sudo -E pip install lxml
 
 -------------------
 Windows quick start
 -------------------
 
 On Windows, you can use the demo.bat file to run all the various parts of Clade with the
-supplied example data. You will need to edit paths.bat to match your configuration first. 
+supplied example data. You will need to edit paths.bat to match your configuration first.
 
-The demo_noclassify.bat file also runs Clade, but doesn't repeat the initial classification 
+The demo_noclassify.bat file also runs Clade, but doesn't repeat the initial classification
 to avoid duplicates. If you need it, cleanup.bat clears all the indexes.
 
 ------------
 Example data
 ------------
 
-The Clade distribution includes an example taxonomy and documents derived 
-from Wikipedia and on the topic of Social Psychology. The taxonomy is 
+The Clade distribution includes an example taxonomy and documents derived
+from Wikipedia and on the topic of Social Psychology. The taxonomy is
 provided as a CSV file to illustrate how to import existing taxonomies.
 
 To import the taxonomy, run on the command line:
 
     $ python classify.py import data/socpsy.csv
-	
+
 	on Windows:
-	
+
     C:\> python classify.py import data\socpsy.csv
 
 This will create a Python data structure, and pickle it as data/tax.
@@ -103,61 +103,61 @@ To import the documents, first copy the Clade Solr configuration files into your
 Solr home, e.g.:
 
     $ cp -f clade/solr-conf/* apache-solr-4.6.2/example/solr/collection1/conf
-	
+
 	on Windows:
-	
+
     C:\> copy clade\solr-conf\*.* apache-solr-4.6.2\example\solr\collection1\conf
 
 Then [re]start Solr, e.g.:
 
-    $ cd apache-solr-4.6.2/example
+      $ cd apache-solr-4.6.2/example
     $ java -jar start.jar &
-	
+
 	on Windows:
 
     C:\> cd apache-solr-4.6.2\example
     C:\> java -jar start.jar &
 
 Now start the Stanford Named Entity Recognition server, which is used to pull
 names, places etc. out of the source data:
- 
-    $ cd stanford-ner-2011-09-14
+
+    $ cd stanford-ner-2018-10-16
     $ ./server.sh &
-	
+
 	or on Windows:
-	
-    C:\ cd stanford-ner-2011-09-14
+
+    C:\ cd stanford-ner-2018-10-16
     C:\ server.bat
 
 Finally, add the example documents (which are provided as plaintext files):
 
     $ python classify.py textdir data/socpsy-pages
-	
+
 	on Windows:
-	
+
     C:\> python classify.py textdir data\socpsy-pages
-	
-This will output the name of each file as it is processed, and will take a 
-few minutes to complete. 
+
+This will output the name of each file as it is processed, and will take a
+few minutes to complete.
 
 
 --------------
 Running the UI
 --------------
 
 The Clade UI is implemented as a web application. To start it, run:
- 
+
     $ python server.py
-	
+
 	on Windows:
-	
+
     C:\> python server.py
 
-Then point a browser at 
+Then point a browser at
 
     http://localhost:8080/
-    
-    
+
+
 ----------------------
 Taxonomy import/export
 ----------------------
@@ -169,9 +169,9 @@ optionally the document categorisations) to CSV or XML.
 To import from CSV:
 
     $ python classify.py import taxonomy.csv
-	
+
 	on Windows:
-	
+
     C:\> python classify.py import taxonomy.csv
 
 To import from XML, use the .xml file extension. The extension must be either .csv
@@ -180,22 +180,22 @@ or .xml and is used to determine the type of file read.
 To export to CSV:
 
     $ python classify.py export taxonomy.csv
-	
+
 	on Windows:
-	
+
     C:\> python classify.py export taxonomy.csv
-    
+
 To export to XML, use the .xml file extension. Again, the extension determines the
 type of file written. In the case of exported to XML, document classifications may
 be written to the file by specifying the number of document ids that should be
 exported per category, e.g.:
 
     $ python classify.py export taxonomy.xml 100
-	
+
 	on Windows:
-	
+
     C:\> python classify.py export taxonomy.xml 100
-    
+
 
 -----------
 UI Controls
@@ -206,14 +206,14 @@ The Clade UI has two modes: Taxonomy and Document. It starts up in the former.
 Taxonomy mode
 -------------
 
-The page is divided into two halves: the taxonomy tree on the left, and 
+The page is divided into two halves: the taxonomy tree on the left, and
 information about the current selected category on the right. Clade can support
 multiple taxonomies, which can be selected from the drop-down in the upper left.
 Selecting a taxonomy will load it into the tree view, where nodes can be
 expanded or collapsed, selected and manipulated.
 
-The taxonomy tree display has three small icons in the upper right. The + 
-icon causes a new category to be added as a child of the currently selected 
+The taxonomy tree display has three small icons in the upper right. The +
+icon causes a new category to be added as a child of the currently selected
 node. The "pen" icon allows the current node name to be edited, and the X
 icon deletes the current node (with no warning!)
 
@@ -240,7 +240,7 @@ category, showing the document ID, the title, the current rank and the previous
 rank (before changing the active keywords). To view a document, click the title,
 which will switch the UI into document mode.
 
-To add a new taxonomy, make sure the taxonomy drop-down has no taxonomy 
+To add a new taxonomy, make sure the taxonomy drop-down has no taxonomy
 selected, then click the Create button. Enter a name for the new taxonomy
 and click OK. You can then create the taxonomy by adding nodes to the root
 node. All changes will be immediately saved to the data/tax file.
@@ -262,12 +262,11 @@ Document mode
 -------------
 
 Again, in document mode the page has a left section and a right section. On the
-left is a list of taxonomy categories which match the document, ranked in 
-decreasing score order. The document ID and full text are displayed on the 
+left is a list of taxonomy categories which match the document, ranked in
+decreasing score order. The document ID and full text are displayed on the
 right, together with a list of keywords from the selected category. The document
 text is highlighted for matching keywords.
 
 To go back to the taxonomy display, click the Back button in the upper right.
-Don't use your browser back button - it won't work, as the UI is implemented 
+Don't use your browser back button - it won't work, as the UI is implemented
 in Javascript/AJAX.
-