1. README and Code Structure Mismatch
- README refers to
training/, but the actual directory is ds_training/.
generate_all_train_datasets.sh is mentioned in README but does not exist. Instead, generate_all_train_datasets_v1.sh and generate_all_train_datasets_v2.sh are present.
2. Missing Dependency (sentencepiece)
LlamaTokenizer requires sentencepiece, but it is not listed in requirements.txt.
- Running the code without it causes an
ImportError.
3. Default /output Directory Causes Permission denied
- Scripts attempt to write to
/output, which requires root access.
- Users without root permissions encounter
Permission denied errors.
These issues prevent users from running the code without modifications.
1. README and Code Structure Mismatch
training/, but the actual directory isds_training/.generate_all_train_datasets.shis mentioned in README but does not exist. Instead,generate_all_train_datasets_v1.shandgenerate_all_train_datasets_v2.share present.2. Missing Dependency (
sentencepiece)LlamaTokenizerrequiressentencepiece, but it is not listed inrequirements.txt.ImportError.3. Default
/outputDirectory CausesPermission denied/output, which requires root access.Permission deniederrors.These issues prevent users from running the code without modifications.