Abstract:

Lack of effective communication is present between the hearing and Deaf and Hard-of-Hearing (DHH) communities due to the latter’s limited hearing and speaking abilities. The use of American Sign Language (ASL) is one way both communities can communicate, but only a few hearing individuals are fluent in this language. This study addresses this communication gap by translating English text into ASL using a virtual human avatar. Three translation layers were utilized to achieve this task: English to ASL Gloss, ASL Gloss to Hamburg Notation System (HamNoSys), and HamNoSys to Signing Gesture Markup Language (SiGML). The first translation layer used a Transformer model trained on the ASLG-PC12 dataset. A dictionary of 500 ASL Gloss words with their corresponding HamNoSys was developed for the second translation layer, and the HamNoSys2SiGML module was used for the third translation layer. The trained models were evaluated using BLEU, ROUGE-L, and METEOR. Additionally, certified ASL interpreters and DHH community members were assigned as human evaluators to assess the model's output, verify the correctness of the developed corpus, and validate the signing animations created by the signing avatar.
This study revealed that a 4-layer Transformer trained on a cleaned ASLGPC12 dataset generates a higher score across the evaluation metrics than models trained with the original dataset. The developed corpus obtained a 77.8% correctness rating. However, the output signing animations were not clearly understood by the native ASL speakers in terms of sentence comprehension and clarity due to the system's limited ASL vocabulary.

SignSpeak

  • Category: Undergraduate Thesis
  • Translating English Text to American Sign Language using Transformers and Signing Avatars
  • Language: Python
  • Completion date: April 16, 2024

Project Description

This study aims to develop a web-based application that converts English text into ASL through a virtual human avatar. An English text to ASL Gloss corpora was used for model training, and the ASLG-PC12 (American Sign Language Gloss Parallel Corpus 2012) corpus was used. Data preparation was first performed on the sentence pairs before proceeding to create a translation model using Neural Machine Translation (NMT), implementing the Transformer architecture. Machine validation of the transformer model is performed in terms of BLEU, METEOR, and ROUGE-L metrics. A Binary Search Tree function, HamNoSys2SiGML, and the CoffeeScript WebGL Signing Avatars (CWASA) platform are the modules used to translate English text into animated ASL.