subgraph documentation generation
We adopt the methodology of Wretblad et al. [1] in defining both the difficulty of documenting a tables column and the quality of a column's description.
| Difficulty Level | Description |
|---|---|
| Very Hard | Given the database name, the table name, the column name, example data from the database, and other columns in the table, it is impossible to accurately determine what the column description should be. |
| Hard | Given the database name, the table name, the column name, example data from the database, and other columns in the table, I am unsure what the column description should be. |
| Medium | Given the database name, the table name, the column name, example data from the database, and other columns in the table, I can accurately determine what the column description should be. |
| Easy | Given only the table name and the column name, and other columns in the table, I can accurately determine what the column description should be. |
| Classification | Description |
|---|---|
| Perfect | A perfect column description should contain enough information so that the interpretation of the column is completely free of ambiguity. It does not need to include any descriptions of the specific values inside the column to be considered perfect. The description should contain information about what table the column is referencing. For example, instead of "The name," we want "The name of the client that made the transaction" if we have a transaction database with columns such as NAME, AMOUNT, and DATE to resolve the ambiguity of what the name refers to. Additionally, the column description should be a full and valid English sentence, with proper grammar, capitalization, and punctuation. For instance, instead of "nationality of drivers" when each instance refers to only one driver, it should be "The nationality of a driver." |
| Poor but Correct | The column description is poor but correct, but there is room for improvement. |
| Incorrect | The column description is incorrect. Contains inaccurate or misleading information. It could still contain correct information, but any incorrect information automatically leads to an incorrect rating. |
| No Description | The column description is missing. |
| I Can’t Tell | It is impossible to tell the class of the description with the given information. |
| Quality Level | Description |
|---|---|
| Perfect | Matching the gold description without extra, redundant information. Redundant information is categorized as descriptions that do not provide useful additional information. For example, " + ‘is a primary/foreign key’" (NOT REDUNDANT) versus " + ‘is useful for retrieving data’" (REDUNDANT). |
| Almost Perfect | Matching the gold description but verbose with redundant information, without any incorrect or misleading information. |
| Poor but Correct | The column description is poor but correct but has room for improvement due to missing information. For example, "The Time column records the specific time at which a transaction occurred, formatted in a 24-hour HH:MM pattern," which lacks enough information to make a valid prediction beyond the primary purpose. |
| Incorrect | The column description is incorrect and contains inaccurate or misleading information. Any incorrect information automatically leads to an incorrect rating, even if some correct information is present. |
| Name | Explorer | Github | Creator | Type |
|---|---|---|---|---|
| arbitrum-one-bridge | link | link | messari | messari: schema-bridge |
| gmx-forks | link | link | messari | messari: schema-derivatives-perpfutures |
| uniswap-v3-forks | link | link | messari | messari: schema-dex-amm-extended |
| bancor-v3 | link | link | messari | messari: schema-dex-amm-extended |
| aave-forks | link | link | messari | messari: schema-lending |
| opensea | link | link | messari | messari: schema-nft-marketplace |
| arrakis-finance | link | link | messari | messari: schema-yield |
| eigenlayer | link | link | messari | messari: schema-non-standard |
| livepeer | link | link | livepeer | livepeer: main |
| ens-subgraph | link | link | ens | ens: main |
| graph-network-arbitrum | link | link | e&n | graph: network arbitrum |
| known-origin | link | link | known-origin | known-origin |
- Wretblad, Niklas et al. Synthetic SQL Column Descriptions and Their Impact on Text-to-SQL Performance. arXiv preprint arXiv:2408.04691, 2024.