Skip to content

Commit 4781d91

Browse files
authored
Remove v1 API support and transition to v2 exclusively (#521)
* refactor: remove v1 API deprecation middleware and v1 configuration settings * refactor: restrict API version routing to v2 only in urls.py * refactor: remove V1 support and hardcode V2 in OrganizationViewSet and related views * refactor: remove rorapi v1 models, serializers, and index template * refactor: remove v1 support and consolidate matching and queries to v2 * refactor: remove version parameter from check_ror_id and its calls * test: delete v1 tests and refactor v2 unit tests to remove versioning * docs: update README to default to v2 schema indexing and endpoints * refactor: improve file handling in BulkUpdate and clean up unused vars
1 parent fd61fd5 commit 4781d91

30 files changed

Lines changed: 231 additions & 3695 deletions

README.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,11 @@ ROR staff should replace values in [] with valid credential values. External use
3535

3636
3. Index the latest ROR dataset from https://github.com/ror-community/ror-data
3737

38-
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data -s 1
38+
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data -s 2
3939

4040
*Note: You must specify a dataset that exists in [ror-data](https://github.com/ror-community/ror-data)*
4141

42-
4. <http://localhost:9292/organizations>.
42+
4. <http://localhost:9292/v2/organizations>.
4343

4444
5. Optionally, start other services, such as [ror-app](https://github.com/ror-community/ror-app) (the search UI) or [generate-id](https://github.com/ror-community/generate-id) (middleware microservice)
4545

@@ -64,9 +64,9 @@ Used in the data deployment process managed in [ror-records](https://github.com/
6464

6565
docker-compose up -d
6666

67-
3. Index the latest v1 ROR dataset from https://github.com/ror-community/ror-data . To index a v2 dataset, see [Indexing v2 data below](#indexing-v2-data)
67+
3. Index the latest ROR dataset from https://github.com/ror-community/ror-data (see [Indexing v2 data](#indexing-v2-data) below):
6868

69-
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data -s 1
69+
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data -s 2
7070

7171
*Note: You must specify a dataset that exists in [ror-data](https://github.com/ror-community/ror-data)*
7272

@@ -92,19 +92,17 @@ To delete the existing index, create a new index and index a data dump:
9292

9393
**LOCALHOST:** Run
9494

95-
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data -s 1
95+
docker-compose exec web python manage.py setup v1.0-2022-03-17-ror-data -s 2
9696

9797
**DEV/STAGING/PROD:** Access the running ror-api container and run:
9898

99-
python manage.py setup v1.0-2022-03-17-ror-data -s 1
99+
python manage.py setup v1.0-2022-03-17-ror-data -s 2
100100

101101
*Note: You must specify a dataset that exists in [ror-data](https://github.com/ror-community/ror-data)*
102102

103103
#### Indexing v2 data
104104

105-
The `-s` argument specifies which schema version to index. To index a v2 data dump, use `-s 2`. To index both v1 and v2 at the same time, omit the `-s` option.
106-
107-
Note that a v2 formatted JSON file must exist in the zip file for the specified data dump version. Currently, v2 files only exist in [ror-community/ror-data-test](https://github.com/ror-community/ror-data-test). To index a data dump from ror-data-test rather than ror-data, add the `-t` option to the setup command, ex:
105+
The API uses the v2 schema only. Use `-s 2` when indexing a data dump. A v2 formatted JSON file must exist in the zip file for the specified data dump version. Currently, v2 files only exist in [ror-community/ror-data-test](https://github.com/ror-community/ror-data-test). To index a data dump from ror-data-test rather than ror-data, add the `-t` option to the setup command, ex:
108106

109107
python manage.py setup v1.32-2023-09-14-ror-data -s 2 -t
110108

rorapi/common/create_update.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def new_record_from_json(json_input, version):
7676
if not error:
7777
new_record['locations'] = updated_locations
7878
new_record = add_created_last_mod(new_record)
79-
new_ror_id = check_ror_id(version)
79+
new_ror_id = check_ror_id()
8080
print("new ror id: " + new_ror_id)
8181
new_record['id'] = new_ror_id
8282
error, valid_data = validate_record(sort_list_fields(new_record), V2_SCHEMA)

rorapi/common/es_utils.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,8 @@
66
class ESQueryBuilder:
77
"""Elasticsearch query builder class"""
88

9-
def __init__(self, version):
10-
if version == "v2":
11-
self.search = Search(using=ES7, index=ES_VARS["INDEX_V2"])
12-
else:
13-
self.search = Search(using=ES7, index=ES_VARS["INDEX_V1"])
9+
def __init__(self):
10+
self.search = Search(using=ES7, index=ES_VARS["INDEX_V2"])
1411
self.search = self.search.extra(track_total_hits=True)
1512
self.search = self.search.params(search_type="dfs_query_then_fetch")
1613

rorapi/common/matching.py

Lines changed: 29 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66

77
from rorapi.common.models import Errors
88
from rorapi.common.es_utils import ESQueryBuilder
9-
from rorapi.v1.models import MatchingResult as MatchingResultV1
109
from rorapi.v2.models import MatchingResult as MatchingResultV2
1110

1211
from collections import namedtuple
@@ -200,25 +199,16 @@ def get_similarity(aff_sub, cand_name):
200199
return comparfun(aff_sub, cand_name) / 100
201200

202201

203-
def get_score(candidate, aff_sub, countries, version):
202+
def get_score(candidate, aff_sub, countries):
204203
"""Calculate the similarity between the affiliation substring
205204
and the candidate, using all name versions."""
206-
if version == "v2":
207-
country_code = candidate.locations[0].geonames_details.country_code
208-
all_names = [
209-
name["value"] for name in candidate.names if "acronym" not in name["types"]
210-
]
211-
acronyms = [
212-
name["value"] for name in candidate.names if "acronym" in name["types"]
213-
]
214-
else:
215-
country_code = candidate.country.country_code
216-
all_names = (
217-
[candidate.name]
218-
+ [l.label for l in candidate.labels]
219-
+ list(candidate.aliases)
220-
)
221-
acronyms = candidate.acronyms
205+
country_code = candidate.locations[0].geonames_details.country_code
206+
all_names = [
207+
name["value"] for name in candidate.names if "acronym" not in name["types"]
208+
]
209+
acronyms = [
210+
name["value"] for name in candidate.names if "acronym" in name["types"]
211+
]
222212

223213
if countries and to_region(country_code) not in countries:
224214
return 0
@@ -239,11 +229,11 @@ def get_score(candidate, aff_sub, countries, version):
239229
MatchedOrganization.__new__.__defaults__ = (False, None, None, 0, None)
240230

241231

242-
def match_by_query(text, matching_type, query, countries, version):
232+
def match_by_query(text, matching_type, query, countries):
243233
"""Match affiliation text using specific ES query."""
244234
candidates = query.execute()
245235
scores = [
246-
(candidate, get_score(candidate, text, countries, version))
236+
(candidate, get_score(candidate, text, countries))
247237
for candidate in candidates
248238
]
249239
if not candidates:
@@ -262,11 +252,10 @@ def match_by_query(text, matching_type, query, countries, version):
262252
return chosen, all_matched
263253

264254

265-
def match_by_type(text, matching_type, countries, version):
255+
def match_by_type(text, matching_type, countries):
266256
"""Match affiliation text using specific matching mode/type."""
267257

268-
fields_v1 = ["name.norm", "aliases.norm", "labels.label.norm"]
269-
fields_v2 = ["names.value.norm"]
258+
fields = ["names.value.norm"]
270259
substrings = []
271260
if matching_type == MATCHING_TYPE_HEURISTICS:
272261
h1 = re.search(r"University of ([^\s]+)", text)
@@ -289,12 +278,7 @@ def match_by_type(text, matching_type, countries, version):
289278
else:
290279
substrings.append(text)
291280

292-
queries = [ESQueryBuilder(version) for _ in substrings]
293-
294-
if version == "v2":
295-
fields = fields_v2
296-
else:
297-
fields = fields_v1
281+
queries = [ESQueryBuilder() for _ in substrings]
298282

299283
for s, q in zip(substrings, queries):
300284
if matching_type == MATCHING_TYPE_PHRASE:
@@ -309,7 +293,7 @@ def match_by_type(text, matching_type, countries, version):
309293
q.add_common_query(fields, normalize(text))
310294
queries = [q.get_query() for q in queries]
311295
matched = [
312-
match_by_query(t, matching_type, q, countries, version)
296+
match_by_query(t, matching_type, q, countries)
313297
for t, q in zip(substrings, queries)
314298
]
315299
if not matched:
@@ -327,16 +311,15 @@ class MatchingNode:
327311
"""Matching node class. Represents a substring of the original affiliation
328312
that potentially could be matched to an organization."""
329313

330-
def __init__(self, text, version):
314+
def __init__(self, text):
331315
self.text = text
332-
self.version = version
333316
self.matched = None
334317
self.all_matched = []
335318

336319
def match(self, countries, min_score):
337320
for matching_type in NODE_MATCHING_TYPES:
338321
chosen, all_matched = match_by_type(
339-
self.text, matching_type, countries, self.version
322+
self.text, matching_type, countries
340323
)
341324
self.all_matched.extend(all_matched)
342325
if self.matched is None:
@@ -388,20 +371,19 @@ class MatchingGraph:
388371
This prevents matching an organization to a substring and another
389372
organization to the substring's substring."""
390373

391-
def __init__(self, affiliation, version):
374+
def __init__(self, affiliation):
392375
self.nodes = []
393-
self.version = version
394376
self.affiliation = affiliation
395377
affiliation = re.sub("&amp;", "&", affiliation)
396378
affiliation_cleaned = clean_search_string(affiliation)
397-
n = MatchingNode(affiliation_cleaned, self.version)
379+
n = MatchingNode(affiliation_cleaned)
398380
self.nodes.append(n)
399381
for part in [s.strip() for s in re.split("[,;:]", affiliation)]:
400382
part_cleaned = clean_search_string(part)
401383
do_not_match = check_do_not_match(part_cleaned)
402384
# do not perform search if substring exactly matches a country name or ISO code
403385
if do_not_match == False:
404-
n = MatchingNode(part_cleaned, self.version)
386+
n = MatchingNode(part_cleaned)
405387
self.nodes.append(n)
406388

407389
def remove_low_scores(self, min_score):
@@ -422,7 +404,7 @@ def match(self, countries, min_score):
422404
]:
423405
chosen.append(node.matched)
424406
acr_chosen, acr_all_matched = match_by_type(
425-
self.affiliation, MATCHING_TYPE_ACRONYM, countries, self.version
407+
self.affiliation, MATCHING_TYPE_ACRONYM, countries
426408
)
427409
all_matched.extend(acr_all_matched)
428410
return chosen, all_matched
@@ -492,33 +474,31 @@ def get_output(chosen, all_matched, active_only):
492474
return sorted(output, key=lambda x: x.score, reverse=True)[:100]
493475

494476

495-
def check_exact_match(affiliation, countries, version):
496-
qb = ESQueryBuilder(version)
477+
def check_exact_match(affiliation, countries):
478+
qb = ESQueryBuilder()
497479
qb.add_string_query('"' + affiliation + '"')
498480
return match_by_query(
499-
affiliation, MATCHING_TYPE_EXACT, qb.get_query(), countries, version
481+
affiliation, MATCHING_TYPE_EXACT, qb.get_query(), countries
500482
)
501483

502484

503-
def match_affiliation(affiliation, active_only, version):
485+
def match_affiliation(affiliation, active_only):
504486
countries = get_countries(affiliation)
505-
exact_chosen, exact_all_matched = check_exact_match(affiliation, countries, version)
487+
exact_chosen, exact_all_matched = check_exact_match(affiliation, countries)
506488
if exact_chosen.score == 1.0:
507489
return get_output(exact_chosen, exact_all_matched, active_only)
508490
else:
509-
graph = MatchingGraph(affiliation, version)
491+
graph = MatchingGraph(affiliation)
510492
chosen, all_matched = graph.match(countries, MIN_CHOSEN_SCORE)
511493
return get_output(chosen, all_matched, active_only)
512494

513495

514-
def match_organizations(params, version):
496+
def match_organizations(params):
515497
if "affiliation" in params:
516498
active_only = True
517499
if "all_status" in params:
518500
if params["all_status"] == "" or params["all_status"].lower() == "true":
519501
active_only = False
520-
matched = match_affiliation(params.get("affiliation"), active_only, version)
521-
if version == "v2":
522-
return None, MatchingResultV2(matched)
523-
return None, MatchingResultV1(matched)
502+
matched = match_affiliation(params.get("affiliation"), active_only)
503+
return None, MatchingResultV2(matched)
524504
return Errors('"affiliation" parameter missing'), None

rorapi/common/matching_single_search.py

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
from rorapi.common.models import Errors
88
from rorapi.settings import ES7
99
from rorapi.common.es_utils import ESQueryBuilder
10-
from rorapi.v1.models import MatchingResult as MatchingResultV1
1110
from rorapi.v2.models import MatchingResult as MatchingResultV2
1211

1312
from collections import namedtuple
@@ -296,23 +295,20 @@ def get_output(chosen, all_matched):
296295
return all_matched
297296

298297

299-
def get_candidates(aff, countries, version):
300-
qb = ESQueryBuilder(version)
298+
def get_candidates(aff, countries):
299+
qb = ESQueryBuilder()
301300
qb.add_affiliation_query(aff, 200)
302301
return match_by_query(aff, qb.get_query(), countries)
303302

304303

305-
def match_affiliation(affiliation, version):
304+
def match_affiliation(affiliation):
306305
countries = get_countries(affiliation)
307-
chosen, all_matched = get_candidates(affiliation, countries, version)
306+
chosen, all_matched = get_candidates(affiliation, countries)
308307
return get_output(chosen, all_matched)
309308

310309

311-
def match_organizations(params, version):
310+
def match_organizations(params):
312311
if "affiliation" in params:
313-
matched = match_affiliation(params.get("affiliation"), version)
314-
315-
if version == "v2":
316-
return None, MatchingResultV2(matched)
317-
return None, MatchingResultV1(matched)
312+
matched = match_affiliation(params.get("affiliation"))
313+
return None, MatchingResultV2(matched)
318314
return Errors(["'affiliation' parameter missing"]), None

0 commit comments

Comments
 (0)