Skip to content

Commit 915750e

Browse files
authored
feat: Added food database exercise (#1)
* feat: Added database template * docs: Added assignment instructions * docs: Added data to release
1 parent 533ae46 commit 915750e

2 files changed

Lines changed: 238 additions & 2 deletions

File tree

README.md

Lines changed: 217 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,217 @@
1-
# coding-assignment
2-
Coding assignment for the Research team
1+
# Coding assignment - Food database
2+
3+
In the context of computer vision, detection tasks with open class domains require more flexibility than flat class distinction in simple vision tasks (dog/cat classification).
4+
5+
6+
7+
## Context
8+
9+
At Foodvisor, expanding activities in a new regions often means that extra object classes have to be recognized. Below are some reasons why the end target class structure may not be defined at a given moment $t$:
10+
11+
- **Data availability**: not enough existing images of the corresponding item at $t$
12+
- **Task evaluation changes**: user expectations about detection granularity changes
13+
14+
We'll define two concepts:
15+
16+
- **coverage**: ratio of observed classes that are correctly categorized in ground truths, parent category included
17+
- **granularity**: ratio of observed items that have distinct labels in the class structure
18+
19+
Consider the following examples for an image classification problem:
20+
21+
1. **Coverage change**: given a certain labeling budget, your team first decides to define the class structure as follows: fruits, meat, fish, which satisfies users coverage expectations at $t$. Now at $t' > t$, users expects vegetables to be recognized.
22+
2. **Granularity change**: given a certain labeling budget, your team first decides to define the class structure as follows: vegetables, fruits, meat, fish, which satisfies users granularity expectations at $t$. Now at $t' > t$, users expects beef and pork to be distinguished.
23+
24+
Hence the new for a flexible class structure to continuously adapt to changes in user expectations, while maintaining a reasonable labeling budget.
25+
26+
27+
28+
## Assignment
29+
30+
In this assignment, you will define a data structure that handles both coverage and granularity changes using a directed graph structure. Say $O$ is the ensemble of all existing objects, and that, as a first approach we define $A,B,C$ as child nodes of $O$ to be labeled.
31+
32+
- Consider the case of coverage extension: add $D$ as child node of $O$
33+
- Consider the case of granularity extension: add $E, F$ as child nodes of $A$
34+
35+
Taking a snapshot of your class structure at $t$, you'll want to maximize both coverage and granularity of your existing labeled data.
36+
37+
- In first case above, if classes are one-hot-encoded, the encoding vector length needs to be extended.
38+
- In second case above, data labeled as $A$ will have to be staged for next labeling task to be either $E$ or $F$, meaning that the information of those being child nodes of $A$, need to persist in your data structure.
39+
40+
### Submission
41+
42+
Your submission is expected to be a GitHub (or similar) repository, implemented in Python3.
43+
44+
Your repository should include:
45+
46+
- instructions to install requirements to run your code
47+
- credits to the different repositories or resources that you used for your implementation
48+
49+
50+
51+
**Task**
52+
53+
Define a `Database` class (in the database.py file) able to continuously maximize coverage and granularity of existing labeled data. A reference abstract category (named `core` in the following examples) will need to be created in your constructor (cf. ensemble $O$ mentioned in the previous section)
54+
55+
**Design requirements**
56+
57+
`Database` class will need to have the following methods implemented:
58+
59+
- `add_nodes`: takes a list of tuples as input and edit the graph
60+
- `add_extract`: takes a dict as input and stored information appropriately
61+
- `get_extract_status`: returns the status of each image considering graph modifications that occurred after the extract was added.
62+
63+
Feel free to create any additional classes or data structures you deem necessary.
64+
65+
**Inputs**
66+
67+
The `add_nodes` method takes a list of tuples as input. Each tuple has two elements: the first being the ID of a new node, and the second the ID of the parent node. If the parent node is `None`, this operation should be conducted first and will define the root node. Each non-empty graph will start with a tuple whose second member is `None` (it is the expected name of your core abstract category that needs to be created when the graph is instantiated).
68+
69+
70+
71+
Once created, nodes can only be added to the graph. But depending on the parent node, it will have different effect on our class structure:
72+
73+
- coverage extension
74+
75+
```python
76+
[("A", "core"), ("B", "core"), ("C", "core")]
77+
```
78+
79+
- granularity extension
80+
81+
```python
82+
[("A1", "A"), ("A2", "A"), ("C1", "C")]
83+
```
84+
85+
- mixed operation
86+
87+
```python
88+
[("D", "core"), ("D1", "D"), ("D2", "D")]
89+
```
90+
91+
92+
93+
The `add_extract` method takes a dict as input where the keys are image names, and values are list of class/node IDs (string).
94+
95+
```python
96+
{"img001": ["A"], "img002": ["C1"]}
97+
```
98+
99+
100+
101+
Lastly, the `get_extract_status` method will return the status of each data sample from the extract, which can either be:
102+
103+
- `invalid`: some labels could not be matched against database
104+
- `valid`: label is matched and no operation is required
105+
- `granularity_staged`: label is matched but some labels have new child nodes in the database
106+
- `coverage_staged`: label is matched but direct parent node has a new child node since last update (priority again granularity_staged)
107+
108+
```python
109+
{"img001": "granularity_staged", "img002": "valid"}
110+
```
111+
112+
### Evaluation
113+
114+
Here is an input example
115+
116+
```python
117+
from database import Database
118+
119+
# Initial graph
120+
build = [("core", None), ("A", "core"), ("B", "core"), ("C", "core"), ("C1", "C")]
121+
# Extract
122+
extract = {"img001": ["A"], "img002": ["C1"]}
123+
# Graph edits
124+
edits = [("A1", "A"), ("A2", "A")]
125+
126+
# Get status (this is only an example, test your code as you please as long as it works)
127+
status = {}
128+
if len(build) > 0:
129+
# Build graph
130+
db = Database(build[0][0])
131+
if len(build) > 1:
132+
db.add_nodes(build[1:])
133+
# Add extract
134+
db.add_extract(extract)
135+
# Graph edits
136+
db.add_nodes(edits)
137+
# Update status
138+
status = db.get_extract_status()
139+
print(status)
140+
```
141+
142+
should return
143+
144+
```python
145+
{"img001": "granularity_staged", "img002": "valid"}
146+
```
147+
148+
**explanation**
149+
150+
"A" has new child nodes since labeled data was provided thus "granularity_staged" for img001
151+
152+
153+
154+
Here is an another example
155+
156+
```python
157+
from database import Database
158+
159+
# Initial graph
160+
[("core", None), ("A", "core"), ("B", "core"), ("C", "core"), ("C1", "C")]
161+
# Extract
162+
{"img001": ["A", "B"], "img002": ["A", "C1"], "img003": ["B", "E"]}
163+
# Graph edits
164+
[("A1", "A"), ("A2", "A"), ("C2", "C")]
165+
166+
# Get status (this is only an example, test your code as you please as long as it works)
167+
status = {}
168+
if len(build) > 0:
169+
# Build graph
170+
db = Database(build[0][0])
171+
if len(build) > 1:
172+
db.add_nodes(build[1:])
173+
# Add extract
174+
db.add_extract(extract)
175+
# Graph edits
176+
db.add_nodes(edits)
177+
# Update status
178+
status = db.get_extract_status()
179+
print(status)
180+
```
181+
182+
and the expected result of `status` method
183+
184+
should return:
185+
186+
```python
187+
{"img001": "granularity_staged", "img002": "coverage_staged", "img003": "invalid"}
188+
```
189+
190+
**explanation**
191+
192+
img001 has "A" as label, which got extended granularity
193+
194+
img002 has "C1" as label, its parent node "C" has a new node "C2" since then. It also has "A" label but coverage staging has priority over granularity staging
195+
196+
img003 has an unmatched label "E"
197+
198+
199+
200+
Your implementation should yield the same `get_extract_status` output as expected.
201+
202+
Not meeting this success condition does not necessarily mean that your application won't be considered further, but we expect you to produce code of your own. <u>Plagiarism will not be considered lightly.</u>
203+
204+
205+
206+
### Data
207+
208+
Please check this [release](https://github.com/Foodvisor/coding-assignment/releases/tag/v0.1.0) attachments for the data you can use to test your code:
209+
210+
- graph_build.json: graph nodes initial build
211+
- img_extract.json: image template
212+
- graph_edits.json: node additions to perform
213+
- expected_status.json: the expected output of the `get_extract_status` method
214+
215+
216+
217+
Best of luck!

database.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!usr/bin/python
2+
# -*- coding: utf-8 -*-
3+
4+
5+
class Database(object):
6+
7+
def __init__(self):
8+
9+
raise NotImplementedError
10+
11+
def add_nodes(self):
12+
13+
raise NotImplementedError
14+
15+
def add_extract(self):
16+
17+
raise NotImplementedError
18+
19+
def get_extract_status(self):
20+
21+
raise NotImplementedError

0 commit comments

Comments
 (0)