Skip to content

For example command, add option for YAML result file#120

Closed
tanmay-9 wants to merge 5 commits intoqlever-dev:mainfrom
tanmay-9:generate-yml-file-example-queries
Closed

For example command, add option for YAML result file#120
tanmay-9 wants to merge 5 commits intoqlever-dev:mainfrom
tanmay-9:generate-yml-file-example-queries

Conversation

@tanmay-9
Copy link
Copy Markdown
Collaborator

  • Takes --queries-file argument with a yaml filename as input to read the queries (format of yaml file: {queries: [{query: str, sparql: str}]}
  • Takes --generate-output-file argument to specify whether to generate a yaml file which can be used for performance comparison web app.
  • Takes --output-basename and --backend-name arguments, so that the output file can be named {args.output_basename}.{args.backend_name}.results.yaml
  • Added support for qlever-results+json header to example-queries
  • When output file needs to be generated, automatically use tsv format to get results for non-qlever backends and qlever-results+json format for qlever backends to get runtime execution tree information.
  • Added ruamel.yaml as a dependency in pyproject.toml to work with yaml files.

@tanmay-9 tanmay-9 force-pushed the generate-yml-file-example-queries branch 2 times, most recently from 4737cb5 to a6751ba Compare January 25, 2025 01:25
@tanmay-9 tanmay-9 force-pushed the generate-yml-file-example-queries branch from a6751ba to 4b0aa23 Compare February 12, 2025 00:50
@tanmay-9 tanmay-9 force-pushed the generate-yml-file-example-queries branch from 4b0aa23 to 20ba9ba Compare February 20, 2025 00:11
@hannahbast hannahbast changed the title Add an option to generate yaml file with results for performance comparison in example queries For example command, add option for YAML result file Mar 19, 2025
Copy link
Copy Markdown
Collaborator

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanmay-9 Thank you for this. I have now had the chance to look at this and play around with it and leave a first round of comments.

from pathlib import Path
from typing import Any

from rdflib import Graph
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain what you need Graph for? Anyway, as long as it's in there, you have to include rdflib in the dependencies in the pyproject.toml

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am storing the headers and results of a SPARQL query in the output YAML file, for text/turtle accept header (Construct and Describe queries), I am using rdflib.Graph to get the results in ?s ?p ?o form
I had added rdflib to toml in the web-app pull request but forgot about it here. I will make that change

Comment on lines +13 to +14
from ruamel.yaml import YAML
from ruamel.yaml.scalarstring import LiteralScalarString
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you import ruamel.yaml and mot the seemingly more standard yaml?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruamel.yaml has support for YAML 1.2, preserves formatting and seems to be more actively maintained. I have worked with it before and continued using it here. But, if it's not needed I can switch to PyYaml as well.

Comment on lines +163 to +178
subparser.add_argument(
"--generate-output-file",
action="store_true",
default=False,
help="Generate output file in the 'output' directory",
)
subparser.add_argument(
"--backend-name",
default=None,
help="Name for the backend that would be used in performance comparison",
)
subparser.add_argument(
"--output-basename",
default=None,
help="Name for the dataset that would be used in performance comparison",
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be one option, where the argument is the basename of the output file, for example --output-yml dblp.qlever

As far as I can see, the only other place where you need one of these argument separately is when you determine whether the endpoint is a QLever endpoint, in order to determine the media type for the output. But there is an option --accept for that already . Note that producing application/qlever-results+json is more expensive (because it contains more data), so a user should be aware of this, especially when comparing to other engines, otherwise the comparison is unfair.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the --backend-name and --output-basename arguments because then I can generate the output the file as f"{args.output_basename}.{args.backend_name}.results.yaml". The evaluation web app then can directly read these files from the output folder and extract the name of the dataset and engine and display the comparison results.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, when output file needs to be generated, I am automatically using tsv format to get results for non-qlever backends, qlever-results+json format for qlever backends ( so that runtime execution tree information can be displayed in the webapp) and text/turtle format for Construct and Describe queries. I am doing this so that I can read the SPARQL query results easily and put them in the output YAML file. But yes, I think I'll have to change this logic if a standardized comparison needs to be done between various engines.

record["runtime_info"] = json.loads(runtime_info_str)
record["runtime_info"]["client_time"] = client_time
record["headers"] = headers
record["results"] = results
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remind me what the point of having the results in the YAML output is?

I see the MAX_RESULT_SIZE in the code. This should be an option. And depending on your answer to the first question, it might be better to have no results (or very few results) in the output by default.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just something that comes from the evaluation project. The output yaml files in it had the query results in them. So, for my project, it was discussed that it would be nice to be able to display the results of the query in the web app as well. This way we can also verify if the query results are correct (can also be done with just the COUNT). I agree that it does add a fair bit of complexity to the example-queries command here and for simple benchmarking purposes doesn't add much.

@hannahbast
Copy link
Copy Markdown
Collaborator

@tanmay-9 And another question: I tried to use the YAML files produced with the new --generate-output-file option for https://github.com/tanmay-9/qlever-evaluation (I put them in a folder output there as usual), but that didn't work (nothing showed in the web app). What am I missing?

@tanmay-9
Copy link
Copy Markdown
Collaborator Author

I have changed the structure of the generated YAML file with example-queries command and it is no longer compatible with the evaluation project.

@tanmay-9 tanmay-9 force-pushed the generate-yml-file-example-queries branch 2 times, most recently from 697dc3f to f15b215 Compare March 24, 2025 11:58
@tanmay-9 tanmay-9 force-pushed the generate-yml-file-example-queries branch from f15b215 to 06d3eaa Compare April 5, 2025 14:41
@hannahbast hannahbast requested review from Qup42 and Copilot and removed request for Qup42 and Copilot April 30, 2025 21:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for YAML input files to read queries and enables output file generation for performance comparisons using YAML. The changes include new command-line arguments for specifying query and output file options, new helper functions for parsing YAML and constructing output records, and an update to project dependencies.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/qlever/commands/example_queries.py Added new arguments; introduced functions to handle YAML queries and output.
pyproject.toml Updated dependencies to include ruamel.yaml and rdflib.
Comments suppressed due to low confidence (2)

src/qlever/commands/example_queries.py:331

  • The variable 'is_qlever' is used before it is initialized. Please initialize it (e.g., set is_qlever to False) before using it in the conditional assignment.
is_qlever = is_qlever or "qlever" in args.backend_name.lower()

src/qlever/commands/example_queries.py:212

  • The return type in the 'parse_queries_file' function is annotated as dict[str, list[str, str]], but the function actually returns a dictionary with the 'queries' key mapping to a list of dictionaries. Consider updating the type annotation to dict[str, list[dict[str, str]]].
def parse_queries_file(queries_file: str) -> dict[str, list[str, str]]:

@tanmay-9 tanmay-9 closed this Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants