Skip to content

Assessing the feasibility of applying SOTA sEMG silent speech transduction methods to EEG speech synthesis

License

Notifications You must be signed in to change notification settings

MiscellaneousStuff/kara-one-transduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EEG Imagined Speech - Transduction

About

Assessing the feasibility of applying SOTA sEMG silent speech transduction methods to EEG imagined speech synthesis.

FEIS Dataset

Dataset Emotiv EPOC+ 14-Channel Wireless EEG headset. Combines EEG and audio during imagined and vocalised phonemes. Contains English and Chinese data.

Experimental runs: Public Neptune.ai

Progress

Below milestones are for participant 01.

  • Synthesize stimuli, vocal and imagined speech across multiple phonemes
    • 2 heads, 8 layers TransformerEncoder works very well (stimuli and vocal work extremely well and surprisingly, imagined speech synthesis shows promise. This might be working better on this dataset due to the emphasis on temporal alignment during the experimental condition).

Kara One

Dataset combines 3 modalities (EEG, face tracking, audio) during imagined and vocalised phonemic and single-word prompts.

Paper

Dataset

Progress

Below milestones are for MM05:

  • Overfit on a single example (EEG imagined speech)
    • 1 layer, 128 dim Bi-LSTM network doesn't work well (most likely due to misalignment between imagined EEG signals and audio targets, this is a major issue for a transduction network)
  • Overfit on a single example (EEG vocalised speech)
    • 1 layer, 128 dim Bi-LSTM network works well (seems like the temporal alignment between the vocal EEG signals and the audio recordings make it easy to synthesize audio features from parallel vocal EEG signals)
  • Overfit on all /tiy/ examples (EEG vocalised speech)
    • 1 layer, 64 dim Bi-LSTM network works well (reducing the hidden dim compared to single samples prevents gradient explosion earlier in training. not sure why this happens when increasing task complexity...)
  • Overfit on all /m/ and /n/ examples (EEG vocalised speech)
    • 1 layer, 64 dim Bi-LSTM network works somewhat well (the temporal alignment of the signals is roughly correct however, the specific pitches of the utterances are much flatter than the original signal. This could be improved using better low level feature detectors, such as ResNet blocks as done in An Improved Model for Voicing Slient Speech).
  • Generalise on /n/ examples (EEG imagined speech)
    • 1 layer, 64-dim Bi-LSTM network needs improvement (the amplitude and mel spectrogram are predicted correctly, but the temporal alignment, duration and pitch waveform are incorrect)
  • Generalise on /m/ examples (EEG vocal speech)
    • 2 layer, 128 dim Bi-LSTM network works well (temporal alignment, duration and pitch waveform issues from before are resolved by using an LSTM network with a larger hidden dim. Using multiple LSTM layers may allow the network to process the EEG signals hierarchically. There is evidence from the literature that EEG signals are arranged hierarchically, so there is rationale for using multiple LSTM layers for this purpose. (paper 1, paper 2))
  • Transformers on /m/ examples (EEG vocal speech)
    • 2 heads, 8 layers TransformerEncoder works very well (maps vocal EEG to audio features very early during training and with very high temporal and spatial accuracy. Last thing after this would be using ResNet blocks to learn EEG features end-to-end )

Dataset Details

Epochs

EEG epoching is a procedure where specific time-windows are extracted from a continuous EEG signal. These time windows are called "epochs" and are usually time-locked with respect to an event, e.g. a visual stimulus or in the case of this dataset, imagined speech.

Citation (BibTeX)

@INPROCEEDINGS{7178118,
  author={Zhao, Shunan and Rudzicz, Frank},
  booktitle={2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Classifying phonological categories in imagined and articulated speech}, 
  year={2015},
  volume={},
  number={},
  pages={992-996},
  doi={10.1109/ICASSP.2015.7178118}}

About

Assessing the feasibility of applying SOTA sEMG silent speech transduction methods to EEG speech synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published