Data Description

Data Preparation

These JSON files represent sessions of queries performed by either real users or simulated runs. They share a similar hierarchical structure.

1. Common Structure

Both user and simulated files have this general form:

{
  "id": "...",
  "sid": "...",
  "rank": "...", //optional parameter
  "interactions": [ ... ]
}

Optional parameter: If the query prediction candidates of the simulator are not ranked, you don't need to set the rank field. Be aware that all candidates for one query are considered as rank 1, and separate values are calculated for each of them.

Field	Type	Description
`id`	`string`	Unique identifier of the session/run. For user files, e.g., `"Session_2"`; for simulated runs, e.g., `"Run_1_core-bm25-1-query-advanced_question-200td.log"`
`sid`	`string`	Session ID, critical for matching simulated runs with real user sessions
`rank`	`string`	(optional) Describes the order of the candidate queries by the expected likelihood to reproduce the original query (ascending order)
`interactions`	`List[dict]`	Chronologically ordered queries for this session/run

2. Interaction Object

Each element of interactions has the following structure:

{
  "q": "...",
  "serp": [ ... ],
  "clicks": [ ... ]
}

Field	Type	Description
`q`	`string`	Search query text
`serp`	`List[int]`	Document IDs returned in the Search Engine Results Page (SERP) for this query, ordered by rank
`clicks`	`List[int]`	Document IDs that were clicked

3. Example: User Session

{
  "id": "Session_2",
  "sid": "2",
  "interactions": [
    {
      "q": "passivation",
      "serp": [
        {
          "docid": "5184714",
          "score": null
        },
        {
          "docid": "717105",
          "score": null
        },
        {
          "docid": "4712986",
          "score": null
        },
        {
          "docid": "5096442",
          "score": null
        },
        {
          "docid": "2249793",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        }
      ],
      "clicks": []
    },
    {
      "q": "acid passivation",
      "serp": [
        {
          "docid": "40843926",
          "score": null
        },
        {
          "docid": "4712986",
          "score": null
        },
        {
          "docid": "4488527",
          "score": null
        },
        {
          "docid": "42872005",
          "score": null
        },
        {
          "docid": "51414810",
          "score": null
        },
        {
          "docid": "132317361",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        }
      ],
      "clicks": [24919459]
    }
  ]
...
}

Contains all interactions in chronological order
SERP arrays contain the ranked result lists with corresponding DocIDs (and scores if available)
Click arrays show documents clicked by the user

4. Example: Simulated Run

{
  "id": "Run_1_core-bm25-2-query-advanced_question-200td.log",
  "rank": "1",
  "sid": "2",
  "interactions": [
    {
      "q": "passivation",
      "serp": [
        {
          "docid": "5184714",
          "score": null
        },
        {
          "docid": "717105",
          "score": null
        },
        {
          "docid": "4712986",
          "score": null
        },
        {
          "docid": "5096442",
          "score": null
        },
        {
          "docid": "2249793",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        }
      ],
      "clicks": []
    },
    {
      "q": "acid passivation",
      "serp": [
        {
          "docid": "40843926",
          "score": null
        },
        {
          "docid": "4712986",
          "score": null
        },
        {
          "docid": "4488527",
          "score": null
        },
        {
          "docid": "42872005",
          "score": null
        },
        {
          "docid": "51414810",
          "score": null
        },
        {
          "docid": "132317361",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        },
        {
          "docid": "",
          "score": null
        }
      ],
      "clicks": [24919459]
    }
  ]
...
}

If there are multiple candidate queries predicted for the original query, rank indicates the likelihood of success for this query to be similar to the original query
sid matches the corresponding user session for evaluation.

The examples originate from the example files located in the data folder. In order to get a better understanding on how the files needs to be structured you can have a look at the corresponding files. Every .log files corresponds to one simulation run/original log files that was brought into the corresponding format.

5. Key Notes for Comparison

Matching by sid:

Only simulated runs with the same sid should be compared to the user session
id is descriptive and mainly for logging

Chronology preserved:

Queries are listed in order

SERP alignment:

serp contains ranked documents

Multiple runs per session:

rank distinguishes multiple simulated SERPs per user session
Each run can be evaluated independently

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Description

Data Preparation

1. Common Structure

2. Interaction Object

3. Example: User Session

4. Example: Simulated Run

5. Key Notes for Comparison

FilesExpand file tree

data_description.md

Latest commit

History

data_description.md

File metadata and controls

Data Description

Data Preparation

1. Common Structure

2. Interaction Object

3. Example: User Session

4. Example: Simulated Run

5. Key Notes for Comparison