Skip to content

Commit b5f6c04

Browse files
committed
Document new methods and deprecations
1 parent 3eab990 commit b5f6c04

3 files changed

Lines changed: 112 additions & 118 deletions

File tree

documentation/index.md

Lines changed: 18 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,10 @@ The library has detailed API documentation which can be found in the menu at the
5454

5555

5656
## Breaking Changes
57-
From 6., remove unused `generate_qc_resource_from_rows` method.
57+
From 6.5.2, remove unused `generate_qc_resource_from_rows` method.
58+
`generate_resource_from_rows`, `generate_resource_from_iterable` and
59+
`download_and_generate_resource` are deprecated. They are replaced by
60+
`generate_resource` and `download_generate_resource`.
5861

5962
From 6.5.0, files will not be uploaded to the HDX filestore if the hash and size have
6063
not changed, but if there are any resource metadata changes, except for last_modified,
@@ -823,16 +826,14 @@ dictionary. HEADERS is either a row number (rows start counting at 1), or the
823826
actual headers defined as a list of strings. If not set, all rows will be
824827
treated as containing values:
825828

826-
dataset.generate_resource_from_rows("FOLDER", "FILENAME", ROWS,
827-
RESOURCE DATA, HEADERS, "ENCODING")
829+
dataset.generate_resource("FOLDER", "FILENAME", ROWS, RESOURCE DATA, HEADERS,
830+
COLUMNS, "FORMAT", "ENCODING", DATECOL or YEARCOL or
831+
DATE_FUNCTION)
828832

829-
Building on these basic resource generation methods, there are more powerful
830-
ones `generate_resource_from_iterator` and `download_and_generate_resource`.
831-
832-
A resource can be generated from a given list or tuple: HEADERS and an ITERATOR
833-
which can return rows in list, tuple or dictionary form. A mapping from headers
834-
to HXL hashtags, HXLTAGS, must be provided along with the FOLDER and FILENAME
835-
where the file will be generated for upload to the filestore. The dataset
833+
The first 4 parameters are mandatory, the rest are optional. A resource can be generated
834+
from a given list or tuple or other iterable. The method returns a tuple with a bool
835+
True is the resource was addeed and a dictionary of information. FOLDER and FILENAME
836+
specify where the file will be generated for upload to the filestore. The dataset
836837
time period can optionally be set by supplying DATECOL for looking up
837838
dates or YEARCOL for looking up years. DATECOl and YEARCOL can be a column name
838839
or the index of a column. Note that any timezone information is ignored and UTC
@@ -846,40 +847,9 @@ datetime. The lowest start date and highest end date are used to set the
846847
time period and are returned in the results dictionary in keys startdate
847848
and enddate.
848849

849-
dataset.generate_resource_from_iterator(HEADERS, ITERATOR, HXLTAGS,
850-
"FOLDER", "FILENAME", RESOURCE_DATA, DATECOL or YEARCOL or DATE_FUNCTION,
851-
QUICKCHARTS, "ENCODING")
852-
853-
If desired, `generate_resource_from_iterator` can generate a separate
854-
QuickCharts resource designed to be used in a time series QuickCharts bite
855-
provided that the input has #indicator+code, #date and #indicator+value+num.
856-
This is achieved by supplying the parameter QUICKCHARTS which activates various
857-
QuickCharts related actions depending upon the keys given in the dictionary.
858-
The returned dictionary will contain the QuickCharts resource in the key
859-
qc_resource. If the keys: hashtag - the HXL hashtag to examine - and values -
860-
the 3 values to look for in that column - are supplied, then a list of booleans
861-
indicating which QuickCharts bites should be enabled will be returned in the
862-
key bites_disabled in the returned dictionary. For the 3 values, if the key:
863-
numeric_hashtag is supplied then if that column for a given value contains no
864-
numbers, then the corresponding bite will be disabled. If the key: cutdown is
865-
given, if it is 1, then a separate cut down list is created containing only
866-
columns with HXL hashtags and rows with desired values (if hashtag and values
867-
are supplied) for the purpose of driving QuickCharts. It is returned in the key
868-
qcrows in the returned dictionary with the matching headers in qcheaders. If
869-
cutdown is 2, then a resource is created using the cut down list. If the key
870-
cutdownhashtags is supplied, then only the provided hashtags are used for
871-
cutting down otherwise the full list of HXL tags is used.
872-
873-
The QuickCharts resource will be of form similar to below:
874-
875-
GHO (CODE),ENDYEAR,Numeric
876-
#indicator+code,#date+year+end,#indicator+value+num
877-
VIOLENCE_HOMICIDERATE,1994,123.4
878-
MDG_0000000001,2015,123.4
879-
880-
`download_and_generate_resource` builds on `generate_resource_from_iterator`.
881-
It uses an DOWNLOADER, an object of class `Download`, `Retrieve` or other class
882-
that implements `BaseDownload` to download from URL. Additional arguments in
850+
`download_generate_resource` builds on `generate_resource`.
851+
It uses a DOWNLOADER, an object of class `Download`, `Retrieve` or other class
852+
that implements `BaseDownload` to download from a URL. Additional arguments in
883853
**KWARGS are passed to the `get_tabular_rows` method of the DOWNLOADER.
884854

885855
Optionally, headers can be inserted at specific positions. This is achieved
@@ -889,12 +859,11 @@ row. If supplied, it takes as arguments: headers (prior to any insertions) and
889859
row (which will be in dict or list form depending upon the dict_rows argument)
890860
and outputs a modified row.
891861

892-
The rest of the arguments are the same as for
893-
`generate_resource_from_iterator`.
862+
The rest of the arguments are the same as for `generate_resource`.
894863

895-
dataset.download_and_generate_resource(DOWNLOADER, "URL", HXLTAGS,
896-
"FOLDER", "FILENAME", RESOURCE_DATA, HEADER_INSERTIONS, ROW_FUNCTION,
897-
DATECOL or YEARCOL or DATE_FUNCTION, QUICKCHARTS, **KWARGS)
864+
dataset.download_generate_resource(DOWNLOADER, "URL", "FOLDER", "FILENAME",
865+
RESOURCE_DATA, HEADER_INSERTIONS, ROW_FUNCTION,
866+
DATECOL or YEARCOL or DATE_FUNCTION, **KWARGS)
898867

899868
### QuickCharts Generation
900869

src/hdx/data/dataset.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2739,7 +2739,11 @@ def process_row(row: ListTupleDict) -> Optional[ListTupleDict]:
27392739
resource.set_file_to_upload(filepath)
27402740
self.add_update_resource(resource)
27412741
retdict["resource"] = resource
2742-
retdict["headers"] = headers
2742+
if columns is not None:
2743+
retdict["headers"] = columns
2744+
retdict["original_headers"] = headers
2745+
else:
2746+
retdict["headers"] = headers
27432747
retdict["rows"] = rows
27442748
return True, retdict
27452749

tests/hdx/data/test_dataset_resource_generation.py

Lines changed: 89 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ def test_download_generate_resource(self, configuration):
4343
filename = "conflict_data_alg.csv"
4444
resourcedata = {
4545
"name": "Conflict Data for Algeria",
46-
"description": "Conflict data with HXL tags",
46+
"description": "Conflict data",
4747
}
4848
admin1s = set()
4949

@@ -66,45 +66,47 @@ def process_row(headers, row):
6666
row_function=process_row,
6767
yearcol="YEAR",
6868
)
69+
expected_headers = [
70+
"lala",
71+
"GWNO",
72+
"EVENT_ID_CNTY",
73+
"EVENT_ID_NO_CNTY",
74+
"EVENT_DATE",
75+
"YEAR",
76+
"TIME_PRECISION",
77+
"EVENT_TYPE",
78+
"ACTOR1",
79+
"ALLY_ACTOR_1",
80+
"INTER1",
81+
"ACTOR2",
82+
"ALLY_ACTOR_2",
83+
"INTER2",
84+
"INTERACTION",
85+
"COUNTRY",
86+
"ADMIN1",
87+
"ADMIN2",
88+
"ADMIN3",
89+
"LOCATION",
90+
"LATITUDE",
91+
"LONGITUDE",
92+
"GEO_PRECISION",
93+
"SOURCE",
94+
"NOTES",
95+
"FATALITIES",
96+
]
6997
assert success is True
7098
assert results == {
7199
"startdate": datetime(2001, 1, 1, 0, 0, tzinfo=timezone.utc),
72100
"enddate": datetime(2002, 12, 31, 23, 59, 59, tzinfo=timezone.utc),
73101
"resource": {
74-
"description": "Conflict data with HXL tags",
102+
"description": "Conflict data",
75103
"format": "csv",
76104
"name": "Conflict Data for Algeria",
77105
},
78-
"headers": [
79-
"lala",
80-
"GWNO",
81-
"EVENT_ID_CNTY",
82-
"EVENT_ID_NO_CNTY",
83-
"EVENT_DATE",
84-
"YEAR",
85-
"TIME_PRECISION",
86-
"EVENT_TYPE",
87-
"ACTOR1",
88-
"ALLY_ACTOR_1",
89-
"INTER1",
90-
"ACTOR2",
91-
"ALLY_ACTOR_2",
92-
"INTER2",
93-
"INTERACTION",
94-
"COUNTRY",
95-
"ADMIN1",
96-
"ADMIN2",
97-
"ADMIN3",
98-
"LOCATION",
99-
"LATITUDE",
100-
"LONGITUDE",
101-
"GEO_PRECISION",
102-
"SOURCE",
103-
"NOTES",
104-
"FATALITIES",
105-
],
106+
"headers": expected_headers,
106107
"rows": [
107108
{
109+
"lala": "lala",
108110
"GWNO": "615",
109111
"EVENT_ID_CNTY": "1416RTA",
110112
"EVENT_ID_NO_CNTY": None,
@@ -130,9 +132,9 @@ def process_row(headers, row):
130132
"SOURCE": "Associated Press Online",
131133
"NOTES": "A Berber student was shot while in police custody at a police station in Beni Douala. He later died on Apr.21.",
132134
"FATALITIES": "1",
133-
"lala": "lala",
134135
},
135136
{
137+
"lala": "lala",
136138
"GWNO": "615",
137139
"EVENT_ID_CNTY": "2229RTA",
138140
"EVENT_ID_NO_CNTY": None,
@@ -158,9 +160,9 @@ def process_row(headers, row):
158160
"SOURCE": "Kabylie report",
159161
"NOTES": "Riots were reported in numerous villages in Kabylie, resulting in dozens wounded in clashes between protesters and police and significant material damage.",
160162
"FATALITIES": "0",
161-
"lala": "lala",
162163
},
163164
{
165+
"lala": "lala",
164166
"GWNO": "615",
165167
"EVENT_ID_CNTY": "2230RTA",
166168
"EVENT_ID_NO_CNTY": None,
@@ -186,9 +188,9 @@ def process_row(headers, row):
186188
"SOURCE": "Crisis Group",
187189
"NOTES": "Students protested in the Amizour area. At least 3 were later arrested for allegedly insulting gendarmes.",
188190
"FATALITIES": None,
189-
"lala": "lala",
190191
},
191192
{
193+
"lala": "lala",
192194
"GWNO": "615",
193195
"EVENT_ID_CNTY": "2231RTA",
194196
"EVENT_ID_NO_CNTY": None,
@@ -214,7 +216,6 @@ def process_row(headers, row):
214216
"SOURCE": "Kabylie report",
215217
"NOTES": "Rioters threw molotov cocktails, rocks and burning tires at gendarmerie stations in Beni Douala, El-Kseur and Amizour.",
216218
"FATALITIES": "0",
217-
"lala": "lala",
218219
},
219220
],
220221
}
@@ -227,7 +228,7 @@ def process_row(headers, row):
227228
assert resources == [
228229
{
229230
"name": "Conflict Data for Algeria",
230-
"description": "Conflict data with HXL tags",
231+
"description": "Conflict data",
231232
"format": "csv",
232233
},
233234
]
@@ -236,12 +237,35 @@ def process_row(headers, row):
236237
join(folder, filename),
237238
)
238239

240+
columns_to_include = [
241+
"lala",
242+
"GWNO",
243+
"EVENT_ID_CNTY",
244+
"EVENT_ID_NO_CNTY",
245+
"EVENT_DATE",
246+
"YEAR",
247+
"TIME_PRECISION",
248+
"EVENT_TYPE",
249+
"ACTOR1",
250+
"ALLY_ACTOR_1",
251+
"INTER1",
252+
"ACTOR2",
253+
"ALLY_ACTOR_2",
254+
"INTER2",
255+
"INTERACTION",
256+
"COUNTRY",
257+
"ADMIN1",
258+
"ADMIN2",
259+
"ADMIN3",
260+
"FATALITIES",
261+
]
239262
success, results = dataset.download_generate_resource(
240263
downloader,
241264
TestDatasetResourceGeneration.url,
242265
folder,
243266
filename,
244267
resourcedata,
268+
columns=columns_to_include,
245269
header_insertions=[(0, "lala")],
246270
row_function=process_row,
247271
datecol="EVENT_DATE",
@@ -251,6 +275,30 @@ def process_row(headers, row):
251275
dataset["dataset_date"]
252276
== "[2001-04-18T00:00:00 TO 2001-04-21T23:59:59]"
253277
)
278+
assert results["headers"] == columns_to_include
279+
assert results["original_headers"] == expected_headers
280+
assert results["rows"][0] == {
281+
"lala": "lala",
282+
"GWNO": "615",
283+
"EVENT_ID_CNTY": "1416RTA",
284+
"EVENT_ID_NO_CNTY": None,
285+
"EVENT_DATE": "18/04/2001",
286+
"YEAR": "2001",
287+
"TIME_PRECISION": "1",
288+
"EVENT_TYPE": "Violence against civilians",
289+
"ACTOR1": "Police Forces of Algeria (1999-)",
290+
"ALLY_ACTOR_1": None,
291+
"INTER1": "1",
292+
"ACTOR2": "Civilians (Algeria)",
293+
"ALLY_ACTOR_2": "Berber Ethnic Group (Algeria)",
294+
"INTER2": "7",
295+
"INTERACTION": "17",
296+
"COUNTRY": "Algeria",
297+
"ADMIN1": "Tizi Ouzou",
298+
"ADMIN2": "Beni-Douala",
299+
"ADMIN3": None,
300+
"FATALITIES": "1",
301+
}
254302

255303
success, results = dataset.download_generate_resource(
256304
downloader,
@@ -267,40 +315,14 @@ def process_row(headers, row):
267315
"startdate": datetime(2001, 1, 1, 0, 0, tzinfo=timezone.utc),
268316
"enddate": datetime(2002, 12, 31, 23, 59, 59, tzinfo=timezone.utc),
269317
"resource": {
270-
"description": "Conflict data with HXL tags",
318+
"description": "Conflict data",
271319
"format": "csv",
272320
"name": "Conflict Data for Algeria",
273321
},
274-
"headers": [
275-
"lala",
276-
"GWNO",
277-
"EVENT_ID_CNTY",
278-
"EVENT_ID_NO_CNTY",
279-
"EVENT_DATE",
280-
"YEAR",
281-
"TIME_PRECISION",
282-
"EVENT_TYPE",
283-
"ACTOR1",
284-
"ALLY_ACTOR_1",
285-
"INTER1",
286-
"ACTOR2",
287-
"ALLY_ACTOR_2",
288-
"INTER2",
289-
"INTERACTION",
290-
"COUNTRY",
291-
"ADMIN1",
292-
"ADMIN2",
293-
"ADMIN3",
294-
"LOCATION",
295-
"LATITUDE",
296-
"LONGITUDE",
297-
"GEO_PRECISION",
298-
"SOURCE",
299-
"NOTES",
300-
"FATALITIES",
301-
],
322+
"headers": expected_headers,
302323
"rows": [
303324
{
325+
"lala": "lala",
304326
"GWNO": "615",
305327
"EVENT_ID_CNTY": "1416RTA",
306328
"EVENT_ID_NO_CNTY": None,
@@ -326,9 +348,9 @@ def process_row(headers, row):
326348
"SOURCE": "Associated Press Online",
327349
"NOTES": "A Berber student was shot while in police custody at a police station in Beni Douala. He later died on Apr.21.",
328350
"FATALITIES": "1",
329-
"lala": "lala",
330351
},
331352
{
353+
"lala": "lala",
332354
"GWNO": "615",
333355
"EVENT_ID_CNTY": "2229RTA",
334356
"EVENT_ID_NO_CNTY": None,
@@ -354,9 +376,9 @@ def process_row(headers, row):
354376
"SOURCE": "Kabylie report",
355377
"NOTES": "Riots were reported in numerous villages in Kabylie, resulting in dozens wounded in clashes between protesters and police and significant material damage.",
356378
"FATALITIES": "0",
357-
"lala": "lala",
358379
},
359380
{
381+
"lala": "lala",
360382
"GWNO": "615",
361383
"EVENT_ID_CNTY": "2230RTA",
362384
"EVENT_ID_NO_CNTY": None,
@@ -382,9 +404,9 @@ def process_row(headers, row):
382404
"SOURCE": "Crisis Group",
383405
"NOTES": "Students protested in the Amizour area. At least 3 were later arrested for allegedly insulting gendarmes.",
384406
"FATALITIES": None,
385-
"lala": "lala",
386407
},
387408
{
409+
"lala": "lala",
388410
"GWNO": "615",
389411
"EVENT_ID_CNTY": "2231RTA",
390412
"EVENT_ID_NO_CNTY": None,
@@ -410,7 +432,6 @@ def process_row(headers, row):
410432
"SOURCE": "Kabylie report",
411433
"NOTES": "Rioters threw molotov cocktails, rocks and burning tires at gendarmerie stations in Beni Douala, El-Kseur and Amizour.",
412434
"FATALITIES": "0",
413-
"lala": "lala",
414435
},
415436
],
416437
}

0 commit comments

Comments
 (0)