|
1 | | -# coding-assignment |
2 | | -Coding assignment for the Research team |
| 1 | +# Coding assignment - Food database |
| 2 | + |
| 3 | +In the context of computer vision, detection tasks with open class domains require more flexibility than flat class distinction in simple vision tasks (dog/cat classification). |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +At Foodvisor, expanding activities in a new regions often means that extra object classes have to be recognized. Below are some reasons why the end target class structure may not be defined at a given moment $t$: |
| 10 | + |
| 11 | +- **Data availability**: not enough existing images of the corresponding item at $t$ |
| 12 | +- **Task evaluation changes**: user expectations about detection granularity changes |
| 13 | + |
| 14 | +We'll define two concepts: |
| 15 | + |
| 16 | +- **coverage**: ratio of observed classes that are correctly categorized in ground truths, parent category included |
| 17 | +- **granularity**: ratio of observed items that have distinct labels in the class structure |
| 18 | + |
| 19 | +Consider the following examples for an image classification problem: |
| 20 | + |
| 21 | +1. **Coverage change**: given a certain labeling budget, your team first decides to define the class structure as follows: fruits, meat, fish, which satisfies users coverage expectations at $t$. Now at $t' > t$, users expects vegetables to be recognized. |
| 22 | +2. **Granularity change**: given a certain labeling budget, your team first decides to define the class structure as follows: vegetables, fruits, meat, fish, which satisfies users granularity expectations at $t$. Now at $t' > t$, users expects beef and pork to be distinguished. |
| 23 | + |
| 24 | +Hence the new for a flexible class structure to continuously adapt to changes in user expectations, while maintaining a reasonable labeling budget. |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +## Assignment |
| 29 | + |
| 30 | +In this assignment, you will define a data structure that handles both coverage and granularity changes using a directed graph structure. Say $O$ is the ensemble of all existing objects, and that, as a first approach we define $A,B,C$ as child nodes of $O$ to be labeled. |
| 31 | + |
| 32 | +- Consider the case of coverage extension: add $D$ as child node of $O$ |
| 33 | +- Consider the case of granularity extension: add $E, F$ as child nodes of $A$ |
| 34 | + |
| 35 | +Taking a snapshot of your class structure at $t$, you'll want to maximize both coverage and granularity of your existing labeled data. |
| 36 | + |
| 37 | +- In first case above, if classes are one-hot-encoded, the encoding vector length needs to be extended. |
| 38 | +- In second case above, data labeled as $A$ will have to be staged for next labeling task to be either $E$ or $F$, meaning that the information of those being child nodes of $A$, need to persist in your data structure. |
| 39 | + |
| 40 | +### Submission |
| 41 | + |
| 42 | +Your submission is expected to be a GitHub (or similar) repository, implemented in Python3. |
| 43 | + |
| 44 | +Your repository should include: |
| 45 | + |
| 46 | +- instructions to install requirements to run your code |
| 47 | +- credits to the different repositories or resources that you used for your implementation |
| 48 | + |
| 49 | + |
| 50 | + |
| 51 | +**Task** |
| 52 | + |
| 53 | +Define a `Database` class (in the database.py file) able to continuously maximize coverage and granularity of existing labeled data. A reference abstract category (named `core` in the following examples) will need to be created in your constructor (cf. ensemble $O$ mentioned in the previous section) |
| 54 | + |
| 55 | +**Design requirements** |
| 56 | + |
| 57 | +`Database` class will need to have the following methods implemented: |
| 58 | + |
| 59 | +- `add_nodes`: takes a list of tuples as input and edit the graph |
| 60 | +- `add_extract`: takes a dict as input and stored information appropriately |
| 61 | +- `get_extract_status`: returns the status of each image considering graph modifications that occurred after the extract was added. |
| 62 | + |
| 63 | +Feel free to create any additional classes or data structures you deem necessary. |
| 64 | + |
| 65 | +**Inputs** |
| 66 | + |
| 67 | +The `add_nodes` method takes a list of tuples as input. Each tuple has two elements: the first being the ID of a new node, and the second the ID of the parent node. If the parent node is `None`, this operation should be conducted first and will define the root node. Each non-empty graph will start with a tuple whose second member is `None` (it is the expected name of your core abstract category that needs to be created when the graph is instantiated). |
| 68 | + |
| 69 | + |
| 70 | + |
| 71 | +Once created, nodes can only be added to the graph. But depending on the parent node, it will have different effect on our class structure: |
| 72 | + |
| 73 | +- coverage extension |
| 74 | + |
| 75 | +```python |
| 76 | +[("A", "core"), ("B", "core"), ("C", "core")] |
| 77 | +``` |
| 78 | + |
| 79 | +- granularity extension |
| 80 | + |
| 81 | +```python |
| 82 | +[("A1", "A"), ("A2", "A"), ("C1", "C")] |
| 83 | +``` |
| 84 | + |
| 85 | +- mixed operation |
| 86 | + |
| 87 | +```python |
| 88 | +[("D", "core"), ("D1", "D"), ("D2", "D")] |
| 89 | +``` |
| 90 | + |
| 91 | + |
| 92 | + |
| 93 | +The `add_extract` method takes a dict as input where the keys are image names, and values are list of class/node IDs (string). |
| 94 | + |
| 95 | +```python |
| 96 | +{"img001": ["A"], "img002": ["C1"]} |
| 97 | +``` |
| 98 | + |
| 99 | + |
| 100 | + |
| 101 | +Lastly, the `get_extract_status` method will return the status of each data sample from the extract, which can either be: |
| 102 | + |
| 103 | +- `invalid`: some labels could not be matched against database |
| 104 | +- `valid`: label is matched and no operation is required |
| 105 | +- `granularity_staged`: label is matched but some labels have new child nodes in the database |
| 106 | +- `coverage_staged`: label is matched but direct parent node has a new child node since last update (priority again granularity_staged) |
| 107 | + |
| 108 | +```python |
| 109 | +{"img001": "granularity_staged", "img002": "valid"} |
| 110 | +``` |
| 111 | + |
| 112 | +### Evaluation |
| 113 | + |
| 114 | +Here is an input example |
| 115 | + |
| 116 | +```python |
| 117 | +from database import Database |
| 118 | + |
| 119 | +# Initial graph |
| 120 | +build = [("core", None), ("A", "core"), ("B", "core"), ("C", "core"), ("C1", "C")] |
| 121 | +# Extract |
| 122 | +extract = {"img001": ["A"], "img002": ["C1"]} |
| 123 | +# Graph edits |
| 124 | +edits = [("A1", "A"), ("A2", "A")] |
| 125 | + |
| 126 | +# Get status (this is only an example, test your code as you please as long as it works) |
| 127 | +status = {} |
| 128 | +if len(build) > 0: |
| 129 | + # Build graph |
| 130 | + db = Database(build[0][0]) |
| 131 | + if len(build) > 1: |
| 132 | + db.add_nodes(build[1:]) |
| 133 | + # Add extract |
| 134 | + db.add_extract(extract) |
| 135 | + # Graph edits |
| 136 | + db.add_nodes(edits) |
| 137 | + # Update status |
| 138 | + status = db.get_extract_status() |
| 139 | +print(status) |
| 140 | +``` |
| 141 | + |
| 142 | +should return |
| 143 | + |
| 144 | +```python |
| 145 | +{"img001": "granularity_staged", "img002": "valid"} |
| 146 | +``` |
| 147 | + |
| 148 | +**explanation** |
| 149 | + |
| 150 | +"A" has new child nodes since labeled data was provided thus "granularity_staged" for img001 |
| 151 | + |
| 152 | + |
| 153 | + |
| 154 | +Here is an another example |
| 155 | + |
| 156 | +```python |
| 157 | +from database import Database |
| 158 | + |
| 159 | +# Initial graph |
| 160 | +[("core", None), ("A", "core"), ("B", "core"), ("C", "core"), ("C1", "C")] |
| 161 | +# Extract |
| 162 | +{"img001": ["A", "B"], "img002": ["A", "C1"], "img003": ["B", "E"]} |
| 163 | +# Graph edits |
| 164 | +[("A1", "A"), ("A2", "A"), ("C2", "C")] |
| 165 | + |
| 166 | +# Get status (this is only an example, test your code as you please as long as it works) |
| 167 | +status = {} |
| 168 | +if len(build) > 0: |
| 169 | + # Build graph |
| 170 | + db = Database(build[0][0]) |
| 171 | + if len(build) > 1: |
| 172 | + db.add_nodes(build[1:]) |
| 173 | + # Add extract |
| 174 | + db.add_extract(extract) |
| 175 | + # Graph edits |
| 176 | + db.add_nodes(edits) |
| 177 | + # Update status |
| 178 | + status = db.get_extract_status() |
| 179 | +print(status) |
| 180 | +``` |
| 181 | + |
| 182 | +and the expected result of `status` method |
| 183 | + |
| 184 | +should return: |
| 185 | + |
| 186 | +```python |
| 187 | +{"img001": "granularity_staged", "img002": "coverage_staged", "img003": "invalid"} |
| 188 | +``` |
| 189 | + |
| 190 | +**explanation** |
| 191 | + |
| 192 | +img001 has "A" as label, which got extended granularity |
| 193 | + |
| 194 | +img002 has "C1" as label, its parent node "C" has a new node "C2" since then. It also has "A" label but coverage staging has priority over granularity staging |
| 195 | + |
| 196 | +img003 has an unmatched label "E" |
| 197 | + |
| 198 | + |
| 199 | + |
| 200 | +Your implementation should yield the same `get_extract_status` output as expected. |
| 201 | + |
| 202 | +Not meeting this success condition does not necessarily mean that your application won't be considered further, but we expect you to produce code of your own. <u>Plagiarism will not be considered lightly.</u> |
| 203 | + |
| 204 | + |
| 205 | + |
| 206 | +### Data |
| 207 | + |
| 208 | +Please check this [release](https://github.com/Foodvisor/coding-assignment/releases/tag/v0.1.0) attachments for the data you can use to test your code: |
| 209 | + |
| 210 | +- graph_build.json: graph nodes initial build |
| 211 | +- img_extract.json: image template |
| 212 | +- graph_edits.json: node additions to perform |
| 213 | +- expected_status.json: the expected output of the `get_extract_status` method |
| 214 | + |
| 215 | + |
| 216 | + |
| 217 | +Best of luck! |
0 commit comments