Skip to content

Commit c7e9111

Browse files
authored
Merge pull request #147 from vmarkovtsev/master
Add the missing information about the duplicates dataset
2 parents b200f76 + 416a059 commit c7e9111

1 file changed

Lines changed: 13 additions & 0 deletions

File tree

Duplicates/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,16 @@ print(len(ds.assignments))
4242
print(len(ds.pairs))
4343
```
4444

45+
### Origin
46+
47+
The choice of the files was designed in the included [notebooks](notebooks).
48+
49+
### Limitations
50+
51+
There were ~4 active human reviewers who did the labeling, they were from
52+
the same company, and talked to each other. Hence there can be bias in the labels.
53+
Code duplication is subjective, anyway.
54+
55+
### License
56+
57+
Code: MIT. Labels: Open Data Commons Open Database License (ODbL). Actual file contents © their authors.

0 commit comments

Comments
 (0)