Fix data processing in chapter3 by CaptainArshia · Pull Request #1139 · huggingface/course

CaptainArshia · 2025-11-30T09:24:38Z

This commit fixes a data processing bug in the tokenization examples across all language translations of Chapter 3, Section 2.

The Problem:
The code was passing dataset columns directly to the tokenizer, which caused compatibility issues.

The Fix:
Converted the dataset columns to lists before tokenization by wrapping them in list():

Changed: raw_datasets["train"]["sentence1"]
To: list(raw_datasets["train"]["sentence1"])

Impact:
This change was applied consistently across all languages versions to ensure the code examples work correctly when tokenizing sentence pairs from the MRPC dataset.

…mpatibility with the tokenizer.

HuggingFaceDocBuilderDev · 2025-11-30T09:40:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…zer compatibility

Fix data processing by converting tokenized sentences to lists for co…

6de7e10

…mpatibility with the tokenizer.

Fix data processing by converting sentence arrays to lists for tokeni…

a746246

…zer compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix data processing in chapter3#1139

Fix data processing in chapter3#1139
CaptainArshia wants to merge 2 commits intohuggingface:mainfrom
CaptainArshia:fix-processing-the-data-bug

CaptainArshia commented Nov 30, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CaptainArshia commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CaptainArshia commented Nov 30, 2025 •

edited

Loading