diff --git a/README.md b/README.md index 90811f18c9..ad0daa3b16 100644 --- a/README.md +++ b/README.md @@ -1,161 +1,26 @@ # FastSpeech 2 - PyTorch Implementation -This is a PyTorch implementation of Microsoft's text-to-speech system [**FastSpeech 2: Fast and High-Quality End-to-End Text to Speech**](https://arxiv.org/abs/2006.04558v1). -This project is based on [xcmyz's implementation](https://github.com/xcmyz/FastSpeech) of FastSpeech. Feel free to use/modify the code. +This repository is an extended PyTorch implementation of Microsoft's [**FastSpeech 2: Fast and High-Quality End-to-End Text to Speech**](https://arxiv.org/abs/2006.04558v1), initially based on [xcmyz's implementation](https://github.com/xcmyz/FastSpeech), with the core code structure derived from [ming024's original FastSpeech2 implementation](https://github.com/ming024/FastSpeech2). +We introduce several modifications to enable training and inference using **phonological features** instead of phoneme IDs, supporting cross-lingual and low-resource speech synthesis scenarios. This modification allows more linguistically informed training and better generalization across languages. Using this version, we successfully trained a **German baseline TTS model**, and further performed **transfer learning** with a small amount of English data to train an English model. -There are several versions of FastSpeech 2. -This implementation is more similar to [version 1](https://arxiv.org/abs/2006.04558v1), which uses F0 values as the pitch features. -On the other hand, pitch spectrograms extracted by continuous wavelet transform are used as the pitch features in the [later versions](https://arxiv.org/abs/2006.04558). +Our method is inspired by the concept of using cross-lingual phonological information as described in the paper: +> _"Cross-lingual Transfer of Phonological Features for Low-resource Speech Synthesis"_ +> [SSW11 Paper PDF](https://www.pure.ed.ac.uk/ws/portalfiles/portal/215873748/pf_tts_ssw11.pdf) -![](./img/model.png) +We also refer to the [PHOIBLE database](https://phoible.org) for phonological feature definitions and mappings. -# Updates -- 2021/7/8: Release the checkpoint and audio samples of a multi-speaker English TTS model trained on LibriTTS -- 2021/2/26: Support English and Mandarin TTS -- 2021/2/26: Support multi-speaker TTS (AISHELL-3 and LibriTTS) -- 2021/2/26: Support MelGAN and HiFi-GAN vocoder +The overall training and synthesis pipeline still follows the original repository structure [ming024's original FastSpeech2 implementation](https://github.com/ming024/FastSpeech2). However, we have made the following key modifications to support phonological feature-based modeling: -# Audio Samples -Audio samples generated by this implementation can be found [here](https://ming024.github.io/FastSpeech2/). +- **`text/` folder**: contains several modified files to support phonological feature data preparation. +- **`transformer/models.py`**: updated to allow model input as phonological feature vectors instead of phoneme IDs. +- **`synthesis.py`**: modified to support inference using phonological features as input. -# Quickstart - -## Dependencies -You can install the Python dependencies with -``` -pip3 install -r requirements.txt -``` - -## Inference - -You have to download the [pretrained models](https://drive.google.com/drive/folders/1DOhZGlTLMbbAAFZmZGDdc77kz1PloS7F?usp=sharing) and put them in ``output/ckpt/LJSpeech/``, ``output/ckpt/AISHELL3``, or ``output/ckpt/LibriTTS/``. - -For English single-speaker TTS, run -``` -python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step 900000 --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml -``` - -For Mandarin multi-speaker TTS, try -``` -python3 synthesize.py --text "大家好" --speaker_id SPEAKER_ID --restore_step 600000 --mode single -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml -``` - -For English multi-speaker TTS, run -``` -python3 synthesize.py --text "YOUR_DESIRED_TEXT" --speaker_id SPEAKER_ID --restore_step 800000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml -``` - -The generated utterances will be put in ``output/result/``. - -Here is an example of synthesized mel-spectrogram of the sentence "Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition", with the English single-speaker TTS model. -![](./img/synthesized_melspectrogram.png) - -## Batch Inference -Batch inference is also supported, try - -``` -python3 synthesize.py --source preprocessed_data/LJSpeech/val.txt --restore_step 900000 --mode batch -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml -``` -to synthesize all utterances in ``preprocessed_data/LJSpeech/val.txt`` - -## Controllability -The pitch/volume/speaking rate of the synthesized utterances can be controlled by specifying the desired pitch/energy/duration ratios. -For example, one can increase the speaking rate by 20 % and decrease the volume by 20 % by - -``` -python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step 900000 --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml --duration_control 0.8 --energy_control 0.8 -``` - -# Training - -## Datasets - -The supported datasets are - -- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/): a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total. -- [AISHELL-3](http://www.aishelltech.com/aishell_3): a Mandarin TTS dataset with 218 male and female speakers, roughly 85 hours in total. -- [LibriTTS](https://research.google/tools/datasets/libri-tts/): a multi-speaker English dataset containing 585 hours of speech by 2456 speakers. - -We take LJSpeech as an example hereafter. - -## Preprocessing - -First, run -``` -python3 prepare_align.py config/LJSpeech/preprocess.yaml -``` -for some preparations. - -As described in the paper, [Montreal Forced Aligner](https://montreal-forced-aligner.readthedocs.io/en/latest/) (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. -Alignments of the supported datasets are provided [here](https://drive.google.com/drive/folders/1DBRkALpPd6FL9gjHMmMEdHODmkgNIIK4?usp=sharing). -You have to unzip the files in ``preprocessed_data/LJSpeech/TextGrid/``. - -After that, run the preprocessing script by -``` -python3 preprocess.py config/LJSpeech/preprocess.yaml -``` - -Alternately, you can align the corpus by yourself. -Download the official MFA package and run -``` -./montreal-forced-aligner/bin/mfa_align raw_data/LJSpeech/ lexicon/librispeech-lexicon.txt english preprocessed_data/LJSpeech -``` -or -``` -./montreal-forced-aligner/bin/mfa_train_and_align raw_data/LJSpeech/ lexicon/librispeech-lexicon.txt preprocessed_data/LJSpeech -``` - -to align the corpus and then run the preprocessing script. -``` -python3 preprocess.py config/LJSpeech/preprocess.yaml -``` - -## Training - -Train your model with -``` -python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml -``` - -The model takes less than 10k steps (less than 1 hour on my GTX1080Ti GPU) of training to generate audio samples with acceptable quality, which is much more efficient than the autoregressive models such as Tacotron2. - -# TensorBoard - -Use -``` -tensorboard --logdir output/log/LJSpeech -``` - -to serve TensorBoard on your localhost. -The loss curves, synthesized mel-spectrograms, and audios are shown. - -![](./img/tensorboard_loss.png) -![](./img/tensorboard_spec.png) -![](./img/tensorboard_audio.png) - -# Implementation Issues - -- Following [xcmyz's implementation](https://github.com/xcmyz/FastSpeech), I use an additional Tacotron-2-styled Post-Net after the decoder, which is not used in the original FastSpeech 2. -- Gradient clipping is used in the training. -- In my experience, using phoneme-level pitch and energy prediction instead of frame-level prediction results in much better prosody, and normalizing the pitch and energy features also helps. Please refer to ``config/README.md`` for more details. - -Please inform me if you find any mistakes in this repo, or any useful tips to train the FastSpeech 2 model. +--- # References - [FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558), Y. Ren, *et al*. - [xcmyz's FastSpeech implementation](https://github.com/xcmyz/FastSpeech) - [TensorSpeech's FastSpeech 2 implementation](https://github.com/TensorSpeech/TensorflowTTS) - [rishikksh20's FastSpeech 2 implementation](https://github.com/rishikksh20/FastSpeech2) - -# Citation -``` -@INPROCEEDINGS{chien2021investigating, - author={Chien, Chung-Ming and Lin, Jheng-Hao and Huang, Chien-yu and Hsu, Po-chun and Lee, Hung-yi}, - booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, - title={Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech}, - year={2021}, - volume={}, - number={}, - pages={8588-8592}, - doi={10.1109/ICASSP39728.2021.9413880}} -``` +- [PHOIBLE: Phonological Segment Inventory Database](https://phoible.org) +- [Cross-lingual Transfer of Phonological Features for Low-resource Speech Synthesis (SSW11)](https://www.pure.ed.ac.uk/ws/portalfiles/portal/215873748/pf_tts_ssw11.pdf) diff --git a/dataset.py b/dataset.py index 011ba34d89..173e978bf6 100644 --- a/dataset.py +++ b/dataset.py @@ -34,7 +34,7 @@ def __getitem__(self, idx): speaker = self.speaker[idx] speaker_id = self.speaker_map[speaker] raw_text = self.raw_text[idx] - phone = np.array(text_to_sequence(self.text[idx], self.cleaners)) + phone = text_to_sequence(self.text[idx], self.cleaners) # delete np.array mel_path = os.path.join( self.preprocessed_path, "mel", @@ -59,11 +59,14 @@ def __getitem__(self, idx): "{}-duration-{}.npy".format(speaker, basename), ) duration = np.load(duration_path) - + # add for debugging + assert len(pitch) == len(phone), \ + f"Pitch length {len(pitch)} != Text length {len(phone)} for {basename}" + sample = { "id": basename, "speaker": speaker_id, - "text": phone, + "text": phone, #is feature vectors "raw_text": raw_text, "mel": mel, "pitch": pitch, @@ -92,7 +95,7 @@ def process_meta(self, filename): def reprocess(self, data, idxs): ids = [data[idx]["id"] for idx in idxs] speakers = [data[idx]["speaker"] for idx in idxs] - texts = [data[idx]["text"] for idx in idxs] + texts = [data[idx]["text"].float() for idx in idxs] raw_texts = [data[idx]["raw_text"] for idx in idxs] mels = [data[idx]["mel"] for idx in idxs] pitches = [data[idx]["pitch"] for idx in idxs] @@ -103,7 +106,8 @@ def reprocess(self, data, idxs): mel_lens = np.array([mel.shape[0] for mel in mels]) speakers = np.array(speakers) - texts = pad_1D(texts) + #texts = pad_1D(texts) + texts = pad_2D(texts) mels = pad_2D(mels) pitches = pad_1D(pitches) energies = pad_1D(energies) @@ -168,7 +172,7 @@ def __getitem__(self, idx): speaker = self.speaker[idx] speaker_id = self.speaker_map[speaker] raw_text = self.raw_text[idx] - phone = np.array(text_to_sequence(self.text[idx], self.cleaners)) + phone = text_to_sequence(self.text[idx], self.cleaners) # delete np.array return (basename, speaker_id, phone, raw_text) @@ -189,12 +193,12 @@ def process_meta(self, filename): def collate_fn(self, data): ids = [d[0] for d in data] speakers = np.array([d[1] for d in data]) - texts = [d[2] for d in data] + texts = [d[2].astype(np.float32) for d in data] raw_texts = [d[3] for d in data] text_lens = np.array([text.shape[0] for text in texts]) - texts = pad_1D(texts) - + #texts = pad_1D(texts) + texts = pad_2D(texts) return ids, raw_texts, speakers, texts, text_lens, max(text_lens) diff --git a/synthesize.py b/synthesize.py index 59a682aa7d..783df2e727 100644 --- a/synthesize.py +++ b/synthesize.py @@ -13,6 +13,7 @@ from utils.tools import to_device, synth_samples from dataset import TextDataset from text import text_to_sequence +from text.german_numbers import german_normalize_numbers device = torch.device("cuda" if torch.cuda.is_available() else "cpu") @@ -83,6 +84,41 @@ def preprocess_mandarin(text, preprocess_config): return np.array(sequence) +def preprocess_de(text, preprocess_config): + text = text.rstrip(punctuation).replace("ß","ss") + lexicon = read_lexicon(preprocess_config["path"]["lexicon_path"]) + def split_alphanum(match): + letters = match.group(1) + numbers = match.group(2) + # split if character+number + return letters + " " + " ".join(list(numbers)) + + text = re.sub(r'([a-zA-Z]+)(\d+)', split_alphanum, text) + + text = german_normalize_numbers(text) + + phones = [] + words = re.split(r"([,;.\-\?\!\s+])", text) + + for w in words: + if w.lower() in lexicon: + phones += lexicon[w.lower()] + elif re.match(r"[,;.\-\?\!]", w): + phones.append("sil") + phones = "{" + "}{".join(phones) + "}" + phones = phones.replace("}{", " ") + + # text_to_sequence get features + features = text_to_sequence( + phones, + preprocess_config["preprocessing"]["text"]["text_cleaners"], + ) + + print("Raw Text Sequence: {}".format(text)) + print("Phoneme Sequence: {}".format(phones)) + #print("features:{}".format(features)) + return np.array(features) + def synthesize(model, step, configs, vocoder, batchs, control_values): preprocess_config, model_config, train_config = configs @@ -109,7 +145,6 @@ def synthesize(model, step, configs, vocoder, batchs, control_values): if __name__ == "__main__": - parser = argparse.ArgumentParser() parser.add_argument("--restore_step", type=int, required=True) parser.add_argument( @@ -206,6 +241,8 @@ def synthesize(model, step, configs, vocoder, batchs, control_values): texts = np.array([preprocess_english(args.text, preprocess_config)]) elif preprocess_config["preprocessing"]["text"]["language"] == "zh": texts = np.array([preprocess_mandarin(args.text, preprocess_config)]) + elif preprocess_config["preprocessing"]["text"]["language"] == "de": + texts = np.array([preprocess_de(args.text, preprocess_config)]) text_lens = np.array([len(texts[0])]) batchs = [(ids, raw_texts, speakers, texts, text_lens, max(text_lens))] diff --git a/text/IPA_to_phonefeats_mapping.py b/text/IPA_to_phonefeats_mapping.py new file mode 100644 index 0000000000..1db7941507 --- /dev/null +++ b/text/IPA_to_phonefeats_mapping.py @@ -0,0 +1,298 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +# Open-source resource provided by Papercup Technologies Limited +# Author: Marlene Staib +# IPA-TO-PHONOLOGICAL FEATURES MAPPING DICTIONARY + +# Stress markings need to be parsed from the specific dictionary resource used + +ipa_to_phonemefeats = { + '': {'symbol_type': ''}, + '': {'symbol_type': ''}, + '': {'symbol_type': ''}, + '': {'symbol_type': ''}, + '': {'symbol_type': ''}, + 'a': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front', + 'vowel_openness': 'open', + 'vowel_roundedness': 'unrounded', + 'stress': 'unstressed'}, + 'b': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'bilabial', + 'consonant_manner': 'stop'}, + 'd': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'stop'}, + 'e': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front', + 'vowel_openness': 'close-mid', + 'vowel_roundedness': 'unrounded', + 'stress': 'unstressed'}, + 'f': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'labiodental', + 'consonant_manner': 'fricative'}, + 'h': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'glottal', + 'consonant_manner': 'fricative'}, + 'i': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front', + 'vowel_openness': 'close', + 'vowel_roundedness': 'unrounded', + 'stress': 'unstressed'}, + 'j': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'palatal', + 'consonant_manner': 'approximant'}, + 'k': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'velar', + 'consonant_manner': 'stop'}, + 'kʰ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'velar', + 'consonant_manner': 'stop', + 'diacritic': 'epiglottal'}, + 'l': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'lateral-approximant'}, + 'l̩': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'lateral-approximant', + 'diacritic': 'syllabic'}, + 'm': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'bilabial', + 'consonant_manner': 'nasal'}, + 'm̩': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'bilabial', + 'consonant_manner': 'nasal', + 'diacritic': 'syllabic'}, + 'n': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'nasal'}, + 'n̩': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'nasal', + 'diacritic': 'syllabic'}, + 'o': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'back', + 'vowel_openness': 'close-mid', + 'vowel_roundedness': 'rounded', + 'stress': 'unstressed'}, + 'p': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'bilabial', + 'consonant_manner': 'stop'}, + 'pʰ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'bilabial', + 'consonant_manner': 'stop', + 'diacritic': 'epiglottal'}, + 'pf': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'bilabial', + 'consonant_manner': 'affricate'}, + 's': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'fricative'}, + 't': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'stop'}, + 'tʰ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'stop', + 'diacritic': 'epiglottal'}, + 'ts': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'affricate'}, + 't͡ʃ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'postalveolar', + 'consonant_manner': 'affricate'}, + 'u': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'back', + 'vowel_openness': 'close', + 'vowel_roundedness': 'rounded', + 'stress': 'unstressed'}, + 'v': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'labiodental', + 'consonant_manner': 'fricative'}, + 'x': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'velar', + 'consonant_manner': 'fricative'}, + 'y': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front', + 'vowel_openness': 'close', + 'vowel_roundedness': 'rounded', + 'stress': 'unstressed'}, + 'z': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'alveolar', + 'consonant_manner': 'fricative'}, + 'ç': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'palatal', + 'consonant_manner': 'fricative'}, + 'ø': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front', + 'vowel_openness': 'close-mid', + 'vowel_roundedness': 'rounded', + 'stress': 'unstressed'}, + 'ŋ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'velar', + 'consonant_manner': 'nasal'}, + 'œ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front', + 'vowel_openness': 'open-mid', + 'vowel_roundedness': 'rounded', + 'stress': 'unstressed'}, + 'ɐ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'central', + 'vowel_openness': 'open', + 'vowel_roundedness': 'unrounded', + 'stress': 'unstressed'}, + 'ɔ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'back', + 'vowel_openness': 'open-mid', + 'vowel_roundedness': 'rounded', + 'stress': 'unstressed'}, + 'ə': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'central', + 'vowel_openness': 'mid', + 'vowel_roundedness': 'unrounded', + 'stress': 'unstressed'}, + 'ɛ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front', + 'vowel_openness': 'open-mid', + 'vowel_roundedness': 'unrounded', + 'stress': 'unstressed'}, + 'ɪ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front_central', + 'vowel_openness': 'close_close-mid', + 'vowel_roundedness': 'unrounded', + 'stress': 'unstressed'}, + 'ʃ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'postalveolar', + 'consonant_manner': 'fricative'}, + 'ʊ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'central_back', + 'vowel_openness': 'close_close-mid', + 'vowel_roundedness': 'unrounded', + 'stress': 'unstressed'}, + 'ʏ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'vowel', + 'VUV': 'voiced', + 'vowel_frontness': 'front_central', + 'vowel_openness': 'close_close-mid', + 'vowel_roundedness': 'rounded', + 'stress': 'unstressed'}, + 'ɡ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'velar', + 'consonant_manner': 'stop'}, + 'ɟ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'palatal', + 'consonant_manner': 'stop'}, + 'ɲ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'palatal', + 'consonant_manner': 'nasal'}, + 'ʁ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'uvular', + 'consonant_manner': 'fricative'}, + 'c': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'palatal', + 'consonant_manner': 'stop'}, + 'cʰ': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'unvoiced', + 'consonant_place': 'palatal', + 'consonant_manner': 'stop', + 'diacritic': 'epiglottal'}, + 'w': {'symbol_type': 'phoneme', + 'vowel_consonant': 'consonant', + 'VUV': 'voiced', + 'consonant_place': 'labial-velar', + 'consonant_manner': 'approximant'}, +} diff --git a/text/__init__.py b/text/__init__.py index 6f036b0d64..dec6d30a1f 100644 --- a/text/__init__.py +++ b/text/__init__.py @@ -1,44 +1,44 @@ """ from https://github.com/keithito/tacotron """ import re from text import cleaners +from string import punctuation from text.symbols import symbols - +from .en_phonological_features_new import phonological_features,default_feature +import numpy as np +import torch +import torch.nn as nn # Mappings from symbol to numeric ID and vice versa: -_symbol_to_id = {s: i for i, s in enumerate(symbols)} -_id_to_symbol = {i: s for i, s in enumerate(symbols)} +#_symbol_to_id = {s: i for i, s in enumerate(symbols)} +#_id_to_symbol = {i: s for i, s in enumerate(symbols)} # Regular expression matching text enclosed in curly braces: _curly_re = re.compile(r"(.*?)\{(.+?)\}(.*)") - +#for german def text_to_sequence(text, cleaner_names): - """Converts a string of text to a sequence of IDs corresponding to the symbols in the text. - - The text can optionally have ARPAbet sequences enclosed in curly braces embedded - in it. For example, "Turn left on {HH AW1 S S T AH0 N} Street." + processed_phonemes = [] + feature_list = [] - Args: - text: string to convert to a sequence - cleaner_names: names of the cleaner functions to run the text through + m = _curly_re.match(text) + phones = m.group(2).strip().split() - Returns: - List of integers corresponding to the symbols in the text - """ - sequence = [] + for p in phones: + #if p in diphthongs_map: + #for single_phoneme in diphthongs_map[p]: + #processed_phonemes.append(single_phoneme) + #feature = phonological_features.get(single_phoneme, default_feature) + #feature_list.append(feature) + #else: + processed_phonemes.append(p) + feature = phonological_features.get(p, default_feature) + #print(f"Phoneme: {p}, Feature shape: {np.array(feature).shape}, Feature type: {type(feature)}") + feature_list.append(feature) - # Check for curly braces and treat their contents as ARPAbet: - while len(text): - m = _curly_re.match(text) + features = np.array(feature_list, dtype=np.float32) + features = torch.from_numpy(features).float() + return features - if not m: - sequence += _symbols_to_sequence(_clean_text(text, cleaner_names)) - break - sequence += _symbols_to_sequence(_clean_text(m.group(1), cleaner_names)) - sequence += _arpabet_to_sequence(m.group(2)) - text = m.group(3) - - return sequence def sequence_to_text(sequence): @@ -73,3 +73,5 @@ def _arpabet_to_sequence(text): def _should_keep_symbol(s): return s in _symbol_to_id and s != "_" and s != "~" + + diff --git a/text/cleaners.py b/text/cleaners.py index 7bd4d8dbb7..4c4c9f1a88 100644 --- a/text/cleaners.py +++ b/text/cleaners.py @@ -17,6 +17,7 @@ import re from unidecode import unidecode from .numbers import normalize_numbers +from .german_numbers import german_normalize_numbers _whitespace_re = re.compile(r'\s+') # List of (regular expression, replacement) pairs for abbreviations: @@ -87,3 +88,10 @@ def english_cleaners(text): text = expand_abbreviations(text) text = collapse_whitespace(text) return text + +def german_cleaners(text): + '''Basic pipeline that lowercases and collapses whitespace without transliteration.''' + text = lowercase(text) + text = german_normalize_numbers(text) + text = collapse_whitespace(text) + return text \ No newline at end of file diff --git a/text/en_phonological_features_new.py b/text/en_phonological_features_new.py new file mode 100644 index 0000000000..ea86a3d681 --- /dev/null +++ b/text/en_phonological_features_new.py @@ -0,0 +1,116 @@ +phonological_features = { + "spn":[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], + "sil":[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], + "a": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "aj": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "aw": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "aː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "b": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "bʲ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "c": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "cʰ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 + "cʷ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "d": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "dʒ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], + "dʲ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "d̪": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "e": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ej": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "eː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "f": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "fʲ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "fʷ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "h": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "i": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "iː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "j": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "k": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "kp": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "kʰ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 + "kʷ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "l": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], # 41维 + "l̩": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], # 41维 + "m": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "mʲ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "m̩": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1], # 41维 + "n": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "n̩": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1], # 41维 + "o": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ow": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "oː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "p": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "pʰ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 + "pf": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], # 41维 + "pʲ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "pʷ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "s": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "t": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "tʃ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], # 41维 + "tʰ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 + "tʲ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "tʷ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "t̪": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ts": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], # 41维 + "u": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "uː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "v": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "vʲ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "vʷ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "w": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "x": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "yː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "z": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "æ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ç": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "ð": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 copied + "øː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ŋ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "œ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɐ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɑ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ɑː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ɒ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ɒː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ɔ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɔj": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "ɖ": [0,0,0,0,0,1, 1,0, 0,1, 0,0,0,0,0, 0,0,0,0,0,0, 0,0, 0, 0,0,0,0,0,1,0,0,0, 0,0,0,0,0,1, 0,0], + "ʏ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ə": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "əw": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "ɚ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ɛ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɛː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɜ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ɜː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ɝ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ɟ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ɟʷ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ɡ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ɡb": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ɡʷ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ɪ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɫ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], # 41维 copied + "ɫ̩": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], # 41维 copied + "ɱ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 copied + "ɲ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "ʁ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "ɹ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 copied + "ɾ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], # 41维 copied + "ɾʲ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], # 41维 copied + "ɾ̃": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], # 41维 copied + "ʃ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "ʈ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 copied + "ʈʲ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 copied + "ʈʷ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 copied + "ʉ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ʉː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 copied + "ʊ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ʋ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 copied + "ʎ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], # 41维 copied + "ʒ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 copied + "ʔ": [0,0,0,0,0,1, 1,0, 1,0, 0,0,0,0,0, 0,0,0,0,0,0, 0,0, 0, 0,0,1,0,0,0,0,0,0, 0,0,0,0,0,1, 0,0], + "θ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 copied + "ɔʏ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 +} + +default_feature = [0.0] * 41 \ No newline at end of file diff --git a/text/english_mfa.py b/text/english_mfa.py new file mode 100644 index 0000000000..909cbaeb2c --- /dev/null +++ b/text/english_mfa.py @@ -0,0 +1,102 @@ +valid_symbols = [ + "a", + "aj", + "aw", + "aː", + "b", + "bʲ", + "c", + "cʰ", + "cʷ", + "d", + "dʒ", + "dʲ", + "d̪", + "e", + "ej", + "eː", + "f", + "fʲ", + "fʷ", + "h", + "i", + "iː", + "j", + "k", + "kp", + "kʰ", + "kʷ", + "l", + "m", + "mʲ", + "m̩", + "n", + "n̩", + "o", + "ow", + "oː", + "p", + "pʰ", + "pʲ", + "pʷ", + "s", + "t", + "tʃ", + "tʰ", + "tʲ", + "tʷ", + "t̪", + "u", + "uː", + "v", + "vʲ", + "vʷ", + "w", + "z", + "æ", + "ç", + "ð", + "ŋ", + "ɐ", + "ɑ", + "ɑː", + "ɒ", + "ɒː", + "ɔ", + "ɔj", + "ɖ", + "ə", + "əw", + "ɚ", + "ɛ", + "ɛː", + "ɜ", + "ɜː", + "ɝ", + "ɟ", + "ɟʷ", + "ɡ", + "ɡb", + "ɡʷ", + "ɪ", + "ɫ", + "ɫ̩", + "ɱ", + "ɲ", + "ɹ", + "ɾ", + "ɾʲ", + "ɾ̃", + "ʃ", + "ʈ", + "ʈʲ", + "ʈʷ", + "ʉ", + "ʉː", + "ʊ", + "ʋ", + "ʎ", + "ʒ", + "ʔ", + "θ", +] \ No newline at end of file diff --git a/text/german_mfa.py b/text/german_mfa.py new file mode 100644 index 0000000000..7dd75a3fac --- /dev/null +++ b/text/german_mfa.py @@ -0,0 +1,53 @@ +valid_symbols = [ + "a", + "aj", + "aw", + "aː", + "b", + "c", + "cʰ", + "d", + "eː", + "f", + "h", + "iː", + "j", + "k", + "kʰ", + "l", + "l̩", + "m", + "m̩", + "n", + "n̩", + "oː", + "p", + "pf", + "pʰ", + "s", + "t", + "ts", + "tʃ", + "tʰ", + "uː", + "v", + "x", + "yː", + "z", + "ç", + "øː", + "ŋ", + "œ", + "ɐ", + "ɔ", + "ɔʏ", + "ə", + "ɛ", + "ɟ", + "ɡ", + "ɪ", + "ɲ", + "ʁ", + "ʃ", + "ʊ", + "ʏ"] \ No newline at end of file diff --git a/text/german_mfa_phonofeats_mapping.py b/text/german_mfa_phonofeats_mapping.py new file mode 100644 index 0000000000..f0be553864 --- /dev/null +++ b/text/german_mfa_phonofeats_mapping.py @@ -0,0 +1,169 @@ +import numpy as np +import os +from text.german_mfa import valid_symbols +from text.IPA_to_phonefeats_mapping import ipa_to_phonemefeats + +def extract_feature_categories(ipa_to_phonemefeats): + """ + get feature_categories from ipa_to_phonemefeats + """ + feature_categories = {} + + for features in ipa_to_phonemefeats.values(): + for key, value in features.items(): + if key not in feature_categories: + feature_categories[key] = set() + feature_categories[key].add(value) + + for key in feature_categories: + feature_categories[key] = sorted(feature_categories[key]) + + return feature_categories + +def build_feature_to_index(feature_categories): + """ + map index according to feature_categories + """ + feature_to_index = {} + index = 0 + for category, values in feature_categories.items(): + for value in values: + feature_to_index[(category, value)] = index + index += 1 + return feature_to_index + +def find_feature(symbol): + """ + map feature given mfa symbols + """ + if symbol in ipa_to_phonemefeats.keys(): + feature = ipa_to_phonemefeats[symbol] + #print (f"Feature of {symbol} is {feature}") + return feature + elif 'ː' in symbol: + feature = ipa_to_phonemefeats[symbol.replace("ː","")] + #print (f"Feature of {symbol} is {feature}") + return feature + elif symbol == "tʃ": + feature = ipa_to_phonemefeats['t͡ʃ'] + #print (f"Feature of {symbol} is {feature}") + return feature + elif len(symbol) == 2: + feature0 = ipa_to_phonemefeats[symbol[0]] + feature1 = ipa_to_phonemefeats[symbol[1]] + #print (f"Feature of {symbol} is split to \n {symbol[0]} {feature0} \n {symbol[1]} {feature1}") + if feature0 is None or feature1 is None: + print(f"Unknown symbols in combination: {symbol}") + return None + return feature0, feature1 + else: + print(f'unkown symbol{symbol}') + return None + +def phoneme_features_to_onehot(features, vector_length, feature_to_index): + """ + map feature to onehot vector + """ + one_hot_vector = np.zeros(vector_length, dtype=int) + for key, value in features.items(): + if (key, value) in feature_to_index: + one_hot_vector[feature_to_index[(key, value)]] = 1 + return one_hot_vector + +def process_symbol(symbol, feature_to_index, vector_length): + """ + one-hot vector for single or diphtongs + """ + feature = find_feature(symbol) + if isinstance(feature, tuple): + vector0 = phoneme_features_to_onehot(feature[0], vector_length, feature_to_index) + vector1 = phoneme_features_to_onehot(feature[1], vector_length, feature_to_index) + return (vector0, vector1) + elif isinstance(feature, dict): + return phoneme_features_to_onehot(feature, vector_length, feature_to_index) + else: + print(f"Cannot convert unknown symbol to one-hot vectors: {symbol}") + return None + +def main(): + # get categories + feature_categories = extract_feature_categories(ipa_to_phonemefeats) + + # index + feature_to_index = build_feature_to_index(feature_categories) + vector_length = len(feature_to_index) + print(f"Vector length: {vector_length}") + + # get vectors + for symbol in valid_symbols: + result = process_symbol(symbol, feature_to_index, vector_length) + if result is not None: + if isinstance(result, tuple): + print(f"One-hot vectors for {symbol}:") + print(f"First part:\n{result[0]}\nSecond part:\n{result[1]}") + else: + print(f"One-hot vector for {symbol}:\n{result}") + +# write to a file +def write_features_to_file(filename, valid_symbols, feature_to_index, vector_length): + # 获取当前脚本所在目录 + output_path = os.path.join(os.getcwd(), filename) + processed_symbols = set() # 用于跟踪已处理的符号 + + with open(output_path, 'w') as f: + f.write('phonological_features = {\n') + for symbol in valid_symbols: + if symbol in processed_symbols: # 跳过已处理的符号 + continue + + result = process_symbol(symbol, feature_to_index, vector_length) + if result is not None: + if isinstance(result, tuple): # Handling diphthongs or combined symbols + for idx, char in enumerate(symbol): + if char not in processed_symbols: # 确保组合中的单个字符未被处理过 + vector_str = ', '.join(map(str, result[idx].tolist())) + # Write each part of the combined symbol on a new line + f.write(f' "{char}": [{vector_str}], # {len(result[idx])}维\n') + processed_symbols.add(char) # 标记为已处理 + else: # Handling single phonemes + vector_str = ', '.join(map(str, result.tolist())) + f.write(f' "{symbol}": [{vector_str}], # {len(result)}维\n') + processed_symbols.add(symbol) # 标记为已处理 + f.write('}\n') + + + +if __name__ == "__main__": + + feature_categories = extract_feature_categories(ipa_to_phonemefeats) + + # index + feature_to_index = build_feature_to_index(feature_categories) + vector_length = len(feature_to_index) + print(f"Vector length: {vector_length}") + + # Call main process + main() + + # Write features to file in the current directory + write_features_to_file('phonological_features_2.txt', valid_symbols, feature_to_index, vector_length) + + print("Phonological features have been written to phonological_features.txt") + + +''' +# print feature_categories outcome as below: +feature_categories = extract_feature_categories(ipa_to_phonemefeats) +feature_categories = { + 'symbol_type': ['', '', '', '', '', 'phoneme'], + 'vowel_consonant': ['consonant', 'vowel'], + 'VUV': ['unvoiced', 'voiced'], + 'vowel_frontness': ['back', 'central', 'central_back', 'front', 'front_central'], + 'vowel_openness': ['close', 'close-mid', 'close_close-mid', 'mid', 'open', 'open-mid'], + 'vowel_roundedness': ['rounded', 'unrounded'], + 'stress': ['unstressed'], + 'consonant_place': ['alveolar', 'bilabial', 'glottal', 'labial-velar', 'labiodental', 'palatal', 'postalveolar', 'uvular', 'velar'], + 'consonant_manner': ['affricate', 'approximant', 'fricative', 'lateral-approximant', 'nasal', 'stop'], + 'diacritic': ['epiglottal', 'syllabic'], + } +''' \ No newline at end of file diff --git a/text/phonological_features.py b/text/phonological_features.py new file mode 100644 index 0000000000..a1d3a4f8e1 --- /dev/null +++ b/text/phonological_features.py @@ -0,0 +1,64 @@ +phonological_features = { + "sil":[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], + "a": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "j": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "w": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "aː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "b": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "c": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "cʰ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 + "d": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "eː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "f": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "h": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "iː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "k": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "kʰ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 + "l": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], # 41维 + "l̩": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], # 41维 + "m": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "m̩": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1], # 41维 + "n": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "n̩": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1], # 41维 + "oː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "p": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "pf": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], # 41维 + "pʰ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 + "s": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "t": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ts": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], # 41维 + "tʃ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], # 41维 + "tʰ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], # 41维 + "uː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "v": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "x": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "yː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "z": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "ç": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "øː": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ŋ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "œ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɐ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɔ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ʏ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ə": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɛ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɟ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ɡ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 41维 + "ɪ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "ɲ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], # 41维 + "ʁ": [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "ʃ": [0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], # 41维 + "ʊ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 + "aj": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "aw": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], # 41维 + "ɔʏ": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], # 41维 +} + +diphthongs_map = { + "aj": ["a", "j"], + "aw": ["a", "w"], + "ɔʏ": ["ɔ", "ʏ"], +} + +default_feature = [0.0] * 41 \ No newline at end of file diff --git a/text/symbols.py b/text/symbols.py index ae99253b3e..006754842b 100644 --- a/text/symbols.py +++ b/text/symbols.py @@ -5,25 +5,29 @@ The default is a set of ASCII characters that works well for English or text that has been run through Unidecode. For other data, you can modify _characters. See TRAINING_DATA.md for details. """ -from text import cmudict, pinyin +from text import cmudict, pinyin, german_mfa,english_mfa _pad = "_" _punctuation = "!'(),.:;? " _special = "-" +#_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzäöü" _letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" -_silences = ["@sp", "@spn", "@sil"] -# Prepend "@" to ARPAbet symbols to ensure uniqueness (some are the same as uppercase letters): -_arpabet = ["@" + s for s in cmudict.valid_symbols] -_pinyin = ["@" + s for s in pinyin.valid_symbols] +_silences = ["@spn", "@sil"] +# Prepend "@" to ARPAbet symbols to ensure uniqueness (some are the same as uppercase letters): +#_arpabet = ["@" + s for s in cmudict.valid_symbols] +#_pinyin = ["@" + s for s in pinyin.valid_symbols] +#_ipa_german = ["@" + s for s in german_mfa.valid_symbols] +_ipa_english = ["@" + s for s in english_mfa.valid_symbols] # Export all symbols: symbols = ( [_pad] + list(_special) + list(_punctuation) + list(_letters) - + _arpabet - + _pinyin + #+ _arpabet + #+ _pinyin + + _ipa_english + _silences ) diff --git a/transformer/Models.py b/transformer/Models.py index effcec285e..46dd134ceb 100644 --- a/transformer/Models.py +++ b/transformer/Models.py @@ -37,7 +37,7 @@ def __init__(self, config): super(Encoder, self).__init__() n_position = config["max_seq_len"] + 1 - n_src_vocab = len(symbols) + 1 + #n_src_vocab = len(symbols) + 1 d_word_vec = config["transformer"]["encoder_hidden"] n_layers = config["transformer"]["encoder_layer"] n_head = config["transformer"]["encoder_head"] @@ -53,9 +53,11 @@ def __init__(self, config): self.max_seq_len = config["max_seq_len"] self.d_model = d_model - self.src_word_emb = nn.Embedding( - n_src_vocab, d_word_vec, padding_idx=Constants.PAD - ) + #self.src_word_emb = nn.Embedding( + #n_src_vocab, d_word_vec, padding_idx=Constants.PAD + #) + self.feature_proj = nn.Linear(41, d_word_vec) + self.position_enc = nn.Parameter( get_sinusoid_encoding_table(n_position, d_word_vec).unsqueeze(0), requires_grad=False, @@ -70,25 +72,29 @@ def __init__(self, config): ] ) - def forward(self, src_seq, mask, return_attns=False): - + def forward(self, src_features, mask, return_attns=False): enc_slf_attn_list = [] - batch_size, max_len = src_seq.shape[0], src_seq.shape[1] + batch_size, max_len = src_features.shape[0], src_features.shape[1] - # -- Prepare masks + # -- prepare mask slf_attn_mask = mask.unsqueeze(1).expand(-1, max_len, -1) - - # -- Forward - if not self.training and src_seq.shape[1] > self.max_seq_len: - enc_output = self.src_word_emb(src_seq) + get_sinusoid_encoding_table( - src_seq.shape[1], self.d_model - )[: src_seq.shape[1], :].unsqueeze(0).expand(batch_size, -1, -1).to( - src_seq.device - ) + + # add for feature vector type + enc_output = self.feature_proj(src_features.float()) # [batch, seq_len, d_word_vec] + + # #for debugging + #print(f"Input features shape: {src_features.shape}") + #print(f"First few values of input features:\n{src_features[:2, :5]}") + #print(f"Initial enc_output shape: {enc_output.shape}") + #print(f"First few values of initial enc_output:\n{enc_output[:2, :5, :5]}") + + # --forward + if not self.training and max_len > self.max_seq_len: + enc_output += get_sinusoid_encoding_table( + max_len, self.d_model + )[:max_len, :].unsqueeze(0).expand(batch_size, -1, -1).to(src_features.device) else: - enc_output = self.src_word_emb(src_seq) + self.position_enc[ - :, :max_len, : - ].expand(batch_size, -1, -1) + enc_output += self.position_enc[:, :max_len, :].expand(batch_size, -1, -1) for enc_layer in self.layer_stack: enc_output, enc_slf_attn = enc_layer( @@ -96,10 +102,11 @@ def forward(self, src_seq, mask, return_attns=False): ) if return_attns: enc_slf_attn_list += [enc_slf_attn] - + #print(f"After Encoder, enc_output shape: {enc_output.shape}") #for debugging + #print(f"First few values of final enc_output:\n{enc_output[:2, :5, :5]}") #for debugging + return enc_output - class Decoder(nn.Module): """ Decoder """