Convolutional Sequence to Sequence Model with Non-sequential Greedy Decoding for Grapheme to Phoneme Conversion

ICASSP (2018)


The greedy decoding method used in the conventional sequence-to-sequence models is prone to producing a model with a compounding of errors, mainly because it makes inferences in a fixed order, regardless of whether or not the model’s previous guesses are correct.We propose a non-sequential greedy decoding method that generalizes the greedy decoding schemes proposed in the past. The proposed method determines not only which token to consider, but also which position in the output sequence to infer at each inference step.Specifically, it allows the model to consider easy parts first, helping the model infer hard parts more easily later by providing more information. We study a grapheme-to-phoneme conversion task with a fully convolutional encoder-decoder model that embeds the proposed decoding method. Experiment results show that our model shows better performance than that of the state-of-the-art model in terms of both phoneme error rate and word error rate.


채문정(서울대학교), 박규병(카카오브레인), 방진현(서울대학교), 서수빈(서울대학교/네이버), 박종혁(서울대학교), 김남주(카카오브레인), 박종헌(서울대학교)


NLP speech g2p

발행 날짜