-
Notifications
You must be signed in to change notification settings - Fork 23
Support XML Schema Definition Language (XSD) import #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…triction, and extension
|
Thank you @multimeric! I think there is a lot of interest in an XML -> LinkML importer. |
|
Thank you!! Re lack of isomorphism. See also https://stackoverflow.com/questions/191536/converting-xml-to-json-using-python which mentions a "standard" (for instance-level). If there is nothing more up to date, I suggest being consistent with this, such that xml --[xmltodict]--> json validates via [xml schema]-->linkml Or at least an option to do this - for now just marking this in the docs is sufficient |
|
Hmm that's an interesting suggestion. However the proposed solution is to prepend <p id="1">text</p>Becomes {
"p": {
"@id": 1,
"$": "text"
}
}I think this is a bit ugly, and also perhaps easily confused with JSON-LD, but I will implement it that way if you prefer. I consider this scenario a bit different from standard XML to JSON conversion because here we have a schema that can describe which slots are attributes directly. So I'm wondering fi there's a good field in the LinkML SlotDefinition that would capture this (current I'm using |
|
And now I reflect this you are right, this would make things quite complicated. Having the So scratch that, let's keep it in mind for future extensions. Formally the way to do this in linkml would be to use conforms_to or instantiates slots:
id:
conforms_to: xsd:attribute
description:
conforms_to: xsd:attributeor slots:
id:
instantiates: [xsd:attribute]
description:
instantiates: [xsd:attribute]The former is stronger and could be used for validation if we later implement metaclasses for xml schema (see https://linkml.io/linkml/schemas/annotations.html#validation-of-annotations). (a to be defined xml schema metamodel): class:
XmlAttribute:
is_a: slot
class_uri: xsd:attribute
description: If a slot instantiates this then it should be mapped to an attribute when serializing to XMLHowever, if you want to keep your method flexible such that you can pass in a profile on the command line (e.g. using keywords) and satisfy your use case first that's valid! |
|
Okay, I've gone with your second suggestion, as I like the idea that |
|
these test failures seem unrelated to the main content of your PR, but my guess is that updating the lock file introduced some upstream library change that modifies behavior |
|
Ah no that's my fault, I used some type syntax that isn't compatible with Python 3.9. It should hopefully be fixed now. |
|
@multimeric I am starting a review of this PR. Do you have time to merge from linkml/schema-automator main and resolve the poetry.lock conflict? I was able to do the merge on my local machine, so let me know if you'd prefer me to push that or to assist w/ poetry.lock. So far the tests seem to pass (there are a lot of errors in the logs that appear unrelated to your PR). Thanks for adding this feature. |
tfliss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review. Most of the comments are suggestions, but the range capitalization and merge will be needed changes, so I'm submitting this review while putting together an example XSD to try to get a little more coverage.
One other thing that should be included is a brief documentation file that covers what functionality is included / intended and what isn't. This can be at a relatively high level, but will help call out to the user that it is only a partial implementation that is under development.
| from schema_automator.importers import XsdImportEngine | ||
| import tempfile | ||
|
|
||
| def parse_string(xsd: str) -> SchemaView: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider making these XSD file strings fixture files.
Alternatively add a method to XsdImportEngine that can read an XSD string directly (or io.StringIO) if they're inline in the test on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to add a convert_string method to the XsdImportEngine. I think I avoided making these XML documents separate files because it's more messy and it ends up being harder to understand the test.
tfliss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@multimeric Thank you for the quick poetry.lock update and also spending time on the f-strings.
I apologize for the extra round-trip, but please see my comment change/request on TYPE_MAP. I'm pretty sure the ranges need to be lower case. If you make just that change I'll approve and we can get this merged. There is a lot of interest in having this functionality and this is a great base to work from!
| from schema_automator.importers.import_engine import ImportEngine | ||
|
|
||
| # Mapping between https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/datatypes.html#built-in-primitive-datatypes and https://linkml.io/linkml-model/latest/docs/#types | ||
| TYPE_MAP = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the values here need to be lower case. The linkml.io has them uppercase for some reason, but when I run the linter, and comparing to the tutorials, etc. the ranges should be lowercase. I changed this bit as follows locally and the lint errors for the ranges were resolved. More testing would be great, but if this change is made I'll accept the PR. I must have forgot to click 'add; on this comment yesterday, sorry about the delay.
uv run --no-project linkml lint some_schema.linkml.yaml
TYPE_MAP = {
"anyURI": "uri",
"boolean": "boolean",
"date": "bate",
"dateTime": "datetime",
"decimal": "decimal",
"double": "double",
"float": "float",
"string": "string",
"time": "time",
"int": "integer",
"integer": "integer",
# NMTOKEN is basically just a string inside an attribute
"NMTOKEN": "string",
}
tfliss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amc-corey-cox @sierra-moxon After the developer days side chat with Cory Levinson, we'd like to merge this PR as-is. I will open a separate issue for the remaining minor change I requested and Cory will start a document to plan additional unit test fixtures and roadmap. Chris M previously approved this PR but I think I unintentionally reset that.
cc @multimeric many thanks for this new feature.
Adds an
XsdImportEngine, with tests.XSD doesn't map cleanly to LinkML, because:
RootElementclass where necessary