Introduce Document as canonical bibliographic record, replacing FileMetadata#48
Introduce Document as canonical bibliographic record, replacing FileMetadata#48
Conversation
cgoudet
left a comment
There was a problem hiding this comment.
Pas eu le temps de finir mais j'ai déjà quelques commentaires.
|
|
||
| def __str__(self): | ||
| return f"Metadata for {self.source_file_id}" | ||
| return self.title[:self._TITLE_DISPLAY_LENGTH] |
There was a problem hiding this comment.
Tu as mis une logique lorsque le chunk est trop long pour afficher mais pas ici. Ce serait bien de l'ajouter.
| assert not s3_fn.exists() | ||
|
|
||
|
|
||
| # --- Document: title constraint --- |
There was a problem hiding this comment.
Tu peux mettre les tests reliés entre eux dans une class pytest au lieu d'un commentaire.
|
|
||
|
|
||
| @pytest.mark.django_db | ||
| def test_document_requires_title(): |
There was a problem hiding this comment.
Ce serait top d'ajouter dans le graphe de flow à quel endroit en veut créer ce document. Comme le titre n'est peut-être pas quelque chose que l'on aura de suite dans le process.
| @pytest.mark.django_db | ||
| def test_multiple_documents_without_doi_allowed(): | ||
| """Multiple Documents with empty DOI are allowed (partial unique constraint).""" | ||
| Document.objects.create(title="Report A", doi="") |
There was a problem hiding this comment.
Du coup "" et None sont la même chose lors de la création?
| def test_document_created_without_source_file(): | ||
| """A Document can exist without a linked SourceFile.""" | ||
| doc = Document.objects.create(title="Metadata-only paper", doi="10.9999/meta") | ||
| assert doc.pk is not None |
There was a problem hiding this comment.
Tester que source file est bien null?
cgoudet
left a comment
There was a problem hiding this comment.
Dans l'ensemble c'est top. Juste quelques petits commentaires de style.
| model = Document | ||
|
|
||
| source_file = factory.SubFactory(SourceFileFactory) | ||
| tags_pubmed = factory.LazyFunction(list) |
There was a problem hiding this comment.
Pourrais tu créer des tickets pour ajouter petit à petit les metadonnées des articles dans Document?
|
Et j'ai oublié d'explicité l'évidence des precommit qui échouent. |
Summary
FileMetadata(a file-centric metadata bag) withDocument, a canonical bibliographic record (title, DOI,external_idsJSON) that exists independently of how many times the paper was fetchedSourceFilenow carries a nullable FK toDocument: it is set after parsing and allows multiple fetches of the same paper to converge on one recordDocumentChunkpoints toDocumentdirectly; the redundantSourceFileFK is removed (path ischunk -> document -> source_file)Document.doi(non-empty only) prevents duplicate records while allowing DOI-less importsTest plan
test_document_requires_title— DB rejectsNULLtitletest_duplicate_nonempty_doi_rejected— unique constraint fires on duplicate non-empty DOItest_multiple_documents_without_doi_allowed— empty DOI rows are not constrainedtest_document_created_without_source_file—Documentcan exist before any file is fetchedtest_document_chunk_requires_document— chunk FK isNOT NULLtest_document_chunk_linked_to_document—document.chunksreverse relation worksmanage.py migrateon a clean DB and verify no errors