- Repository Virtual Environment Setup
- CantusDB Setup
- Middleware Setup
- Text to SQL Data Collection Process
- Evaluation Gold and Predicted Outputs
- Links
- Ensure python3 is installed by typing the following into a terminal:
Note the python version used for this project: Python 3.11.3 (v3.11.3:f3909b8bc8, Apr 4 2023, 20:12:10) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
python3
- Create a virtual environment in the project directory:
python3 -m venv .venv - Activate the virtual environment:
source .venv/bin/activate - Verify the Python executable being used:
which python - Upgrade pip to the latest version:
python3 -m pip install --upgrade pip - Confirm the pip version:
python3 -m pip --version - Install project dependencies from the requirements.txt file:
pip3 install -r requirements.txt - Deactivate the virtual environment when done:
deactivate
- Clone the CantusDB repository and ensure it is up to date by regularly pulling the latest changes.
- Follow the instructions on the Deploying CantusDB Locally for Development page to set up the website for local development.
- Obtain the
dev_envfile from theCantusDB Resourcessection. This file must be provided by a CantusDB developer. - During the Populating the Database step, request the
cantus_dump.sqlfile from a CantusDB developer and use it to populate the database. - Verify that the setup is complete by confirming that the
Chants,Sources, andFeastssections are accessible vialocalhost.
- Confirm that the CantusDB Setup is complete and the website is functional.
- Place the
middleware.pyfile into the*/CantusDB/django/cantusdb_project/cantusdb/directory. - Navigate to the
Chants,Sources, orFeastspage on the CantusDB website, input search criteria, and click Apply. - The
middleware.pyscript will automatically generate and populate thenlq_sql.jsonandsql_queries.logfiles located in*/CantusDB/django/cantusdb_project/directory. - Each new search performed on the CantusDB website will update the data in
nlq_sql.jsonandsql_queries.log. - Note that not all SQL queries in these files are relevant. Focus on the query that matches the search results displayed on the website.
- Refer to the SQL Output Extraction section for additional instructions.
-
On the localhost version of the website, navigate to the
Chants,Sources, orFeastspages and enter the desired search information. For example:- Segment:
CANTUS Database - General search:
Montreal - Country:
Canada - Century:
16th century - Complete Source/Fragment:
Complete source
Observe that only one result is returned for this search.
- Segment:
-
Open the
*/CantusDB/django/cantusdb_project/nlq_sql.jsonfile and locate the query that matches the inputted search criteria. Look for:- A query starting with
SELECT DISTINCT - Conditions such as:
UPPER(main_app_institution.country::text) LIKE UPPER('%Canada%') AND UPPER(main_app_century.name::text) LIKE UPPER('%16th century%') AND UPPER(main_app_institution.name::text) LIKE UPPER('%Montreal%') - A query ending with:
ORDER BY main_app_institution.siglum ASC, main_app_source.shelfmark ASC LIMIT 1;
- A query starting with
-
Before executing the SQL query on the Docker Postgres container, remove the trailing
LIMIT 1from the query. This precaution should be taken for all queries to ensure complete results are retrieved. -
Ensure the database container for the website is running, then run the following Docker command in a terminal to extract information into a file:
docker exec cantusdb-postgres-1 psql -U cantusdb -d cantusdb \ -c "\pset format csv" -c "GOLD_SQL_QUERY" | sed '1d' > */Path-to-the-Repository/nl-to-sql/NL2SQL/gold_outputs/object/object_output_filex.csv
-
When running the Docker command, replace
GOLD_SQL_QUERYwith the SQL query obtained from the search, update the path information appropriately, and ensure that the object is one ofchants,sources, orfeasts. -
Ensure that the
object_output_filex.csvfollows the correct naming convention:object_output_filexshould bec,s, orfforchants,sources, orfeasts, respectively.- The
xvalue represents the index number of the file within that directory.
-
If everything is formatted correctly, the
object_output_filex.csvfile will be generated in the specified directory. -
Copy and paste the gold SQL query and the gold output path into the appropriate
chants.json,sources.json, orfeasts.jsonfile. Add them to thesql_queryandgold_output_pathfields. -
Write the natural language query corresponding to the gold SQL query. Ensure that:
- It starts with: Given this database schema, generate a SQL query that shows me all the.
- It ends with: Format your response without any formatting or newlines.
- Values are enclosed in single quotes (e.g.,
'Canada'). - Attribute names are capitalized (e.g.,
Country).
-
Prompt the chosen LLM with the
natural_language_inputsand eitherdatabase-schema-with-optionsordatabase-schema-without-options. Ensure that the input does not exceed the character limit of the selected LLM. -
If the LLM generates an SQL query with incorrect formatting, clean it up by:
- Removing newlines.
- Adding spaces where necessary.
- Ensuring the query is formatted as a single line.
-
Ensure the database container for the website is running, then execute the following Docker command in a terminal to extract the information into a file:
docker exec cantusdb-postgres-1 psql -U cantusdb -d cantusdb \ -c "\pset format csv" -c "PREDICTED_SQL_QUERY" | sed '1d' > */Path-to-the-Repository/nl-to-sql/NL2SQL/predicted_outputs_without_options/object/llm/object_output_filex.csv
-
In the Docker command:
- Replace
PREDICTED_SQL_QUERYwith the cleaned-up query generated by the LLM. - Ensure the correct directory is chosen based on the schema used:
predicted_outputs_without_optionsorpredicted_outputs_with_options. - Update the
llmdirectory name to match the specific LLM being used. - Rename the file
object_output_filex.csvto follow the appropriate naming convention, whereobjectcorresponds tocforchants,sforsources, orfforfeasts, andxis the index number of the file.
- Replace
-
If everything is formatted correctly, the
object_output_filex.csvfile will be generated in the specified directory. -
Copy and paste the predicted SQL query and the predicted output path into the appropriate
chants.json,sources.json, orfeasts.jsonfile. Add them under thepredicted_sql_query_with_optionsorpredicted_sql_query_without_optionsfields within the correspondingllmfield.
- Navigate to the project repository in the terminal and ensure that you are in the
nl-to-sql/NL2SQLdirectory:cd /Path-to-the-Repository/nl-to-sql/NL2SQL - Run the eval.py script using the Python interpreter from the previously set up virtual environment:
python3 eval.py
- The script will output results resembling the following structure:
{
"without_options": {
"unordered": {
"gpt": 23,
"claude": 23,
"grok": 16
},
"ordered": {
"gpt": 16,
"claude": 15,
"grok": 11
},
"precision": {
"gpt": 0.7,
"claude": 0.75,
"grok": 0.56
},
"recall": {
"gpt": 0.69,
"claude": 0.74,
"grok": 0.47
},
"f1": {
"gpt": 0.68,
"claude": 0.7,
"grok": 0.48
}
},
"with_options": {
"unordered": {
"gpt": 32,
"claude": 28,
"grok": 23
},
"ordered": {
"gpt": 23,
"claude": 16,
"grok": 10
},
"precision": {
"gpt": 0.92,
"claude": 0.85,
"grok": 0.69
},
"recall": {
"gpt": 0.88,
"claude": 0.78,
"grok": 0.6
},
"f1": {
"gpt": 0.88,
"claude": 0.78,
"grok": 0.61
}
}
}- CantusDB: https://github.com/DDMAL/CantusDB
- CantusDB Wiki: https://github.com/DDMAL/CantusDB/wiki/
