For `example` command, add option for YAML result file by tanmay-9 · Pull Request #120 · qlever-dev/qlever-control

tanmay-9 · 2025-01-19T22:46:15Z

Takes --queries-file argument with a yaml filename as input to read the queries (format of yaml file: {queries: [{query: str, sparql: str}]}
Takes --generate-output-file argument to specify whether to generate a yaml file which can be used for performance comparison web app.
Takes --output-basename and --backend-name arguments, so that the output file can be named {args.output_basename}.{args.backend_name}.results.yaml
Added support for qlever-results+json header to example-queries
When output file needs to be generated, automatically use tsv format to get results for non-qlever backends and qlever-results+json format for qlever backends to get runtime execution tree information.
Added ruamel.yaml as a dependency in pyproject.toml to work with yaml files.

hannahbast

@tanmay-9 Thank you for this. I have now had the chance to look at this and play around with it and leave a first round of comments.

hannahbast · 2025-03-19T01:58:16Z

 from pathlib import Path
+from typing import Any

+from rdflib import Graph


Please explain what you need Graph for? Anyway, as long as it's in there, you have to include rdflib in the dependencies in the pyproject.toml

As I am storing the headers and results of a SPARQL query in the output YAML file, for text/turtle accept header (Construct and Describe queries), I am using rdflib.Graph to get the results in ?s ?p ?o form
I had added rdflib to toml in the web-app pull request but forgot about it here. I will make that change

hannahbast · 2025-03-19T01:59:27Z

+from ruamel.yaml import YAML
+from ruamel.yaml.scalarstring import LiteralScalarString


Why do you import ruamel.yaml and mot the seemingly more standard yaml?

ruamel.yaml has support for YAML 1.2, preserves formatting and seems to be more actively maintained. I have worked with it before and continued using it here. But, if it's not needed I can switch to PyYaml as well.

hannahbast · 2025-03-19T02:00:19Z

+        subparser.add_argument(
+            "--generate-output-file",
+            action="store_true",
+            default=False,
+            help="Generate output file in the 'output' directory",
+        )
+        subparser.add_argument(
+            "--backend-name",
+            default=None,
+            help="Name for the backend that would be used in performance comparison",
+        )
+        subparser.add_argument(
+            "--output-basename",
+            default=None,
+            help="Name for the dataset that would be used in performance comparison",
+        )


This should be one option, where the argument is the basename of the output file, for example --output-yml dblp.qlever

As far as I can see, the only other place where you need one of these argument separately is when you determine whether the endpoint is a QLever endpoint, in order to determine the media type for the output. But there is an option --accept for that already . Note that producing application/qlever-results+json is more expensive (because it contains more data), so a user should be aware of this, especially when comparing to other engines, otherwise the comparison is unfair.

I put the --backend-name and --output-basename arguments because then I can generate the output the file as f"{args.output_basename}.{args.backend_name}.results.yaml". The evaluation web app then can directly read these files from the output folder and extract the name of the dataset and engine and display the comparison results.

Currently, when output file needs to be generated, I am automatically using tsv format to get results for non-qlever backends, qlever-results+json format for qlever backends ( so that runtime execution tree information can be displayed in the webapp) and text/turtle format for Construct and Describe queries. I am doing this so that I can read the SPARQL query results easily and put them in the output YAML file. But yes, I think I'll have to change this logic if a standardized comparison needs to be done between various engines.

hannahbast · 2025-03-19T02:03:08Z

+                    record["runtime_info"] = json.loads(runtime_info_str)
+        record["runtime_info"]["client_time"] = client_time
+        record["headers"] = headers
+        record["results"] = results


Please remind me what the point of having the results in the YAML output is?

I see the MAX_RESULT_SIZE in the code. This should be an option. And depending on your answer to the first question, it might be better to have no results (or very few results) in the output by default.

It is just something that comes from the evaluation project. The output yaml files in it had the query results in them. So, for my project, it was discussed that it would be nice to be able to display the results of the query in the web app as well. This way we can also verify if the query results are correct (can also be done with just the COUNT). I agree that it does add a fair bit of complexity to the example-queries command here and for simple benchmarking purposes doesn't add much.

hannahbast · 2025-03-19T03:09:48Z

@tanmay-9 And another question: I tried to use the YAML files produced with the new --generate-output-file option for https://github.com/tanmay-9/qlever-evaluation (I put them in a folder output there as usual), but that didn't work (nothing showed in the web app). What am I missing?

tanmay-9 · 2025-03-20T15:21:04Z

I have changed the structure of the generated YAML file with example-queries command and it is no longer compatible with the evaluation project.

…file in example-queries

…sults+json) and not qlever(tsv) in example-queries

…ample-queries command

Copilot

Pull Request Overview

This PR adds support for YAML input files to read queries and enables output file generation for performance comparisons using YAML. The changes include new command-line arguments for specifying query and output file options, new helper functions for parsing YAML and constructing output records, and an update to project dependencies.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/qlever/commands/example_queries.py	Added new arguments; introduced functions to handle YAML queries and output.
pyproject.toml	Updated dependencies to include ruamel.yaml and rdflib.

Comments suppressed due to low confidence (2)

src/qlever/commands/example_queries.py:331

The variable 'is_qlever' is used before it is initialized. Please initialize it (e.g., set is_qlever to False) before using it in the conditional assignment.

is_qlever = is_qlever or "qlever" in args.backend_name.lower()

src/qlever/commands/example_queries.py:212

The return type in the 'parse_queries_file' function is annotated as dict[str, list[str, str]], but the function actually returns a dictionary with the 'queries' key mapping to a list of dictionaries. Consider updating the type annotation to dict[str, list[dict[str, str]]].

def parse_queries_file(queries_file: str) -> dict[str, list[str, str]]:

tanmay-9 force-pushed the generate-yml-file-example-queries branch 2 times, most recently from 4737cb5 to a6751ba Compare January 25, 2025 01:25

tanmay-9 force-pushed the generate-yml-file-example-queries branch from a6751ba to 4b0aa23 Compare February 12, 2025 00:50

tanmay-9 force-pushed the generate-yml-file-example-queries branch from 4b0aa23 to 20ba9ba Compare February 20, 2025 00:11

hannahbast changed the title ~~Add an option to generate yaml file with results for performance comparison in example queries~~ For example command, add option for YAML result file Mar 19, 2025

hannahbast requested changes Mar 19, 2025

View reviewed changes

tanmay-9 force-pushed the generate-yml-file-example-queries branch 2 times, most recently from 697dc3f to f15b215 Compare March 24, 2025 11:58

tanmay-9 added 5 commits April 5, 2025 16:40

Added ability to read yaml queries file for example_queries command

941c942

Replaced PyYaml with ruamel.yaml and added function to write to yaml …

24dab21

…file in example-queries

Generate yaml records by reading the result_file for qlever(qlever-re…

b18e024

…sults+json) and not qlever(tsv) in example-queries

Incorporate changes to example-queries from extract-queries commit

8e1ff12

args.accept is not forced to be AUTO for --generate-output-file in ex…

06d3eaa

…ample-queries command

tanmay-9 force-pushed the generate-yml-file-example-queries branch from f15b215 to 06d3eaa Compare April 5, 2025 14:41

hannahbast requested review from Qup42 and Copilot and removed request for Qup42 and Copilot April 30, 2025 21:21

Copilot AI reviewed Apr 30, 2025

View reviewed changes

tanmay-9 closed this Feb 27, 2026

		from ruamel.yaml import YAML
		from ruamel.yaml.scalarstring import LiteralScalarString

Conversation

tanmay-9 commented Jan 19, 2025

Uh oh!

hannahbast left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannahbast commented Mar 19, 2025

Uh oh!

tanmay-9 commented Mar 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants