Skip to content

zhewenshen/BAMBINO-LM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM

Children from bilingual backgrounds benefit from interactions with parents and teachers to re-acquire their heritage language. In this paper, we investigate how this insight from behavioral study can be incorporated into the learning of small-scale language models. We introduce BAMBINO-LM, a continual pretraining strategy for BabyLM that uses a novel combination of alternation and PPO-based perplexity reward induced from a parent Italian model. Upon evaluation on zero-shot classification tasks for English and Italian, BAMBINO-LM improves the Italian language capability of a BabyLM baseline. Our ablation analysis demonstrates that employing both the alternation strategy and PPO-based modeling is key to this effectiveness gain. We also show that, as a side effect, the proposed method leads to similar degradation in L1 effectiveness as human children would have had in an equivalent learning scenario.

arch

Training

Prerequisites

Ensure you have python3 and pip installed on your machine. Then, install the necessary dependencies via:

pip install -r requirements.txt

Configuration

Parameters and configurations for training are located in config.json. Adjust the settings as needed to customize your training process.

Training

To start the training process, use the following command:

python3 train.py --config config.json

Evaluation

Evaluations for our experiments were conducted using EleutherAI's Language Model Evaluation Harness on the UINAUIL dataset. Please refer to the original repositories for usage.

License

Distributed under the MIT License. See LICENSE.txt for more information.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages