[21] LLM Dataset and generator by Andriamanampisoa · Pull Request #129 · BhuvanArn/TalkUp.AI

Andriamanampisoa · 2026-04-11T07:51:54Z

What type of PR is this? (check all applicable)

Description

This pull request introduces a dataset generation system for simulating IT recruiter-candidate conversations, aimed at fine-tuning the Lucie-7B-Instruct-v1.1 model. It adds a configurable, multi-role, multi-level, and quality-based synthetic data generator, along with supporting configuration files for roles, levels, qualities, and domain-specific Q&A.

A sample dataset have also been provided.

Linked GitHub Ticket

Closes EpitechPromo2027/G-EIP-600-NAN-6-1-eip-tugdual.de-reviers#21

Workspace

…et generator. This dataset is a very basic one to fine-tune the LLM

vercel · 2026-04-11T07:51:59Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
talk-up-ai-dev	Ready	Preview, Comment	Apr 12, 2026 7:26am

railway-app · 2026-04-11T07:53:51Z

🚅 Deployed to the TalkUp.AI-pr-129 environment in talk-up-ai

Service	Status	Web	Updated (UTC)
Backend	✅ Success (View Logs)		Apr 12, 2026 at 7:28 am

Copilot

Pull request overview

Adds a synthetic multi-turn conversation dataset generator (IT recruiter ↔ candidate) intended for fine-tuning Lucie-7B-Instruct-v1.1, backed by JSON configuration for roles, levels, qualities, and role-specific Q&A.

Changes:

Introduce a Python script that generates multi-turn recruiter/candidate conversations and writes them to a dataset file.
Add JSON configuration files for role selection, experience levels, answer “quality”, and role-specific question/answer banks.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
ai/llm-core/datasets/generator/dataset-generator.py	Implements the synthetic conversation generator and dataset writer.
ai/llm-core/datasets/generator/config/roles.json	Defines the set of roles to sample when generating conversations.
ai/llm-core/datasets/generator/config/levels.json	Defines experience levels used in prompts/responses.
ai/llm-core/datasets/generator/config/qualities.json	Defines response quality tiers that control answer content.
ai/llm-core/datasets/generator/config/role_data.json	Provides role-specific Q&A templates (with a default fallback).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

BhuvanArn

👍

Andriamanampisoa added 3 commits April 11, 2026 16:46

chore: add datasets 'dataset_multi_turn.jsonl' generated by the datas…

4642d34

…et generator. This dataset is a very basic one to fine-tune the LLM

feat: add dataset generator

221f506

build: add configuration files for the generator

0760a99

Andriamanampisoa requested review from BhuvanArn, Tugduoff, badarouzia, Copilot and eregine April 11, 2026 07:51

Andriamanampisoa self-assigned this Apr 11, 2026

Andriamanampisoa removed the request for review from Tugduoff April 11, 2026 07:52

Copilot started reviewing on behalf of Andriamanampisoa April 11, 2026 07:52 View session

railway-app Bot temporarily deployed to talk-up-ai / TalkUp.AI-pr-129 April 11, 2026 07:53 Destroyed

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Comment thread ai/llm-core/datasets/generator/dataset-generator.py Outdated

Comment thread ai/llm-core/datasets/generator/dataset-generator.py Outdated

Comment thread ai/llm-core/datasets/generator/dataset-generator.py Outdated

Comment thread ai/llm-core/datasets/generator/config/roles.json

refactor: changes requested for the pull request

0b24dbe

railway-app Bot temporarily deployed to talk-up-ai / TalkUp.AI-pr-129 April 12, 2026 07:26 Destroyed

vercel Bot deployed to Preview – talk-up-ai-dev April 12, 2026 07:26 View deployment

BhuvanArn approved these changes Apr 12, 2026

View reviewed changes

Andriamanampisoa merged commit ab020e1 into staging Apr 12, 2026
8 checks passed

Andriamanampisoa deleted the 21-llm-datasets branch April 12, 2026 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[21] LLM Dataset and generator#129

[21] LLM Dataset and generator#129
Andriamanampisoa merged 4 commits into
stagingfrom
21-llm-datasets

Andriamanampisoa commented Apr 11, 2026

Uh oh!

vercel Bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

railway-app Bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BhuvanArn left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Andriamanampisoa commented Apr 11, 2026

What type of PR is this? (check all applicable)

Description

Linked GitHub Ticket

Workspace

Uh oh!

vercel Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

railway-app Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BhuvanArn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented Apr 11, 2026 •

edited

Loading

railway-app Bot commented Apr 11, 2026 •

edited

Loading