An end-to-end text-to-speech system for vehicle-mounted devices

LUO Xiao; LIU Yue

doi:10.13890/j.issn.1000-128X.2023.06.015

您当前的位置：

首页 >

文章列表页 >

An end-to-end text-to-speech system for vehicle-mounted devices

Intelligent Technology | 更新时间：2024-08-02

- An end-to-end text-to-speech system for vehicle-mounted devices
- Electric Drive for Locomotives Issue 6, Pages: 122-128(2023)
- 作者机构：
  
  中车株洲电力机车研究所有限公司，湖南株洲 412001
- 作者简介：
- 基金信息：
- DOI：10.13890/j.issn.1000-128X.2023.06.015
  CLC： U270.38⁺2
- Published：10 November 2023，
  
  Received：24 March 2022，
  
  Revised：23 May 2023，
- 稿件说明：
扫描看全文
罗潇, 刘悦. 轨道交通车载端到端语音合成[J]. 机车电传动, 2023(6): 122-128.

LUO Xiao, LIU Yue. An end-to-end text-to-speech system for vehicle-mounted devices[J]. Electric drive for locomotives,2023(6): 122-128.
罗潇, 刘悦. 轨道交通车载端到端语音合成[J]. 机车电传动, 2023(6): 122-128. DOI： 10.13890/j.issn.1000-128X.2023.06.015.

LUO Xiao, LIU Yue. An end-to-end text-to-speech system for vehicle-mounted devices[J]. Electric drive for locomotives,2023(6): 122-128. DOI： 10.13890/j.issn.1000-128X.2023.06.015.

摘要

高自然度的语音合成是车载人机交互进入高级智能化的基本要求之一。现阶段的轨道交通领域仍在广泛使用传统的低自然度语音合成算法，这与高速发展的智能化人机交互技术脱节。相比之下，端到端的深度学习语音合成算法凭借其近乎媲美人声的自然度已经成为各领域语音合成的主流算法。文章介绍了一种适用于离线轨道交通车载环境的端到端深度学习语音合成算法，该算法的主观意见评分达到4.18，并且在车载嵌入式硬件平台英伟达Xavier上的实时率达到0.52。试验证明，该算法不仅在自然度等主观性能上远超传统语音合成算法，同时也具备在轨道交通离线车载环境下的工程实用性。

Abstract

High-naturalness text-to-speech is one of the basic requirements for advanced intelligence in vehicle-mounted human-machine interaction. Currently

in the rail transit field

there is widespread use of traditional low-naturalness text-to-speech algorithms

which are out of touch with the rapidly developing intelligent human-machine interaction technology. In contrast

end-to-end deep learning-based text-to-speech algorithms

with their nearly human-like naturalness

have become dominant in various fields of text-to-speech. This paper introduced an end-to-end deep learning-based text-to-speech algorithm suitable for offline railway vehicle environments. The mean opinion score of this algorithm reached 4.18

and the real-time rate on the vehicle-mounted embedded hardware platform NVIDIA Xavier reached 0.52. Experiments show that this algorithm not only outperforms traditional text-to-speech algorithms in terms of subjective performance such as naturalness

but also possesses engineering practicality in the offline vehicle environment of railway transportation.

关键词

轨道交通车载人机交互智能化深度学习端到端语音合成

Keywords

rail transitvehicle-mounted human-machine interactionintelligentdeep learningend-to-end text-to-speech

references

刘悦，林军，游俊.语音识别技术在车载领域的应用及发展[J].控制与信息技术, 2019(2):1-6.

LIU Yue, LIN Jun, YOU Jun. Application and development of automatic speech recognition in vehicle field[J]. Control and information technology, 2019(2):1-6.

刘悦，林军，罗潇，等. 基于时延神经网络的语音识别算法及其在轨道交通领域的应用研究[J].控制与信息技术, 2022(4):11-16.

LIU Yue, LIN Jun, LUO Xiao, et al. Research of speech re-cognition algorithm based on time delay neural network and its application in rail transit[J].Control and information technology, 2022(4):11-16.

SEEVIOUR P, HOLMES J, JUDD M. Automatic generation of control signals for a parallel formant speech synthesizer[C]//IEEE. IEEE International Conference on Acoustics, Speech, and Signal Processing. Philadelphia: IEEE, 1976: 690-693.

OLIVE J. Rule synthesis of speech from dyadic units[C]//IEEE. IEEE International Conference on Acoustics, Speech, and Signal Processing. Hartford: IEEE, 1977: 568-570.

YOSHIMURA T, TOKUDA K, MASUKO T, et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis[C]//DBLP. Sixth European Conference on Speech Communication and Technology. Budapest: DBLP, 1999: 2347-2350.

罗江平, 喻熙倬, 曹经纬, 等. 基于深度学习与支持向量机的钢轨伤损智能识别系统[J]. 机车电传动, 2021(2): 100-107.

LUO Jiangping, YU Xizhuo, CAO Jingwei, et al. Intelligent rail flaw detection system based on deep learning and support vector machine[J]. Electric drive for locomotives, 2021(2): 100-107.

贺德强, 江洲, 陈基永, 等. 基于深度卷积神经网络的铁路接触网鸟窝检测方法研究[J]. 机车电传动, 2019(4): 126-130.

HE Deqiang, JIANG Zhou, CHEN Jiyong, et al. Research on detection of bird nests in overhead catenary based on deep convolutional neural network[J]. Electric drive for locomotives, 2019(4): 126-130.

VAN DEN OORD A, DIELEMAN S, ZEN H, et al. WaveNet: a generative model for raw audio[C]//ISCA. 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9). Sunnyvale: ISCA, 2016: 125.

SOTELO J, MEHRI S, KUMAR K, et al. Char2Wav: end-to-end speech synthesis[C]//ICLR. International Conference on Learning Representations. Toulon: ICLR, 2017: 1-6.

PING Wei, PENG Kainan, GIBIANSKY A, et al. Deep voice 3: scaling text-to-speech with convolutional sequence learning[C]//ICLR. 6th International Conference on Learning Representations. Vancouver: ICLR, 2018: 1-16.

WANG Yuxuan, SKERRY-RYAN R J, STANTON D, et al. Tacotron: towards end-to-end speech synthesis[C]//ISCA. Interspeech 2017. Stockholm: ISCA, 2017: 4006-4010.

SHEN J, PANG Ruoming, WEISS R J, et al. Natural TTS synthesis by conditioning WaveNet on MEL spectrogram predictions[C]//IEEE. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Calgary: IEEE, 2018: 4779-4783.

LI Naihan, LIU Shujie, LIU Yanqing, et al. Neural speech synthesis with transformer network[C]//AAAI. The Thirty-third AAAI Conference on Artificial Intelligence (AAAI-19). Hawaii: AAAI Press, 2019: 6706-6713.

VALIN J M, SKOGLUND J. LPCNET: improving neural speech synthesis through linear prediction[C]//IEEE. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brighton: IEEE, 2018: 5891-5895.

ZEN Heiga, SENIOR A, SCHUSTER M. Statistical parametric speech synthesis using deep neural networks[C]//IEEE. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 7962-7966.

HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.

FAN Yuchen, QIAN Yao, XIE Fenglong, et al. TTS synthesis with bidirectional LSTM based recurrent neural networks[C]//ISCA. Interspeech 2014. Singapore: ISCA, 2014: 1964-1968.

ARIK S Ö, CHRZANOWSKI M, COATES A, et al. Deep voice: real-time neural text-to-speech[C]//JMLR. Proceedings of the 34th International Conference on Machine Learning. Sydney: JMLR, 2017: 195-204.

ARIK S Ö, DIAMOS G, GIBIANSKY A, et al. Deep voice 2: multi-speaker neural text-to-speech[C]//Curran Associates Inc.. Proceedings of the 31st International Conference on Neural Information Processing Systems. California: Curran Associates Inc., 2017: 2966-2974.

REN Yi, RUAN Yangjun, TAN Xu, et al. FastSpeech: fast, robust and controllable text to speech[C]//Curran Associates Inc.. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019: 3171-3180.

REN Yi, HU Chenxu, TAN Xu, et al. FastSpeech 2: fast and high-quality end-to-end text to speech[C]//ICLR. International Conference on Learning Representations. Virtual: ICLR, 2021: 1-15.

CHOROWSKI J, BAHDANAU D, SERDYUK D, et al. Attention-based models for speech recognition[C]//MIT Press. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015: 577-585.

KALCHBRENNER N, ELSEN E, SIMONYAN K, et al. Efficient neural audio synthesis[C]//PMLR. Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 2410-2419.

BATTENBERG E, SKERRY-RYAN R J, MARIOORYAD S, et al. Location-relative attention mechanisms for robust long-form speech synthesis[C]//IEEE. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona: IEEE, 2020: 6194-6198.

Views

下载量

CSCD

CNKI被引量

Alert me when the article has been cited

提交

Tools

Publicity Resources

Pedestrian detection method in rail transit scenes based on fusion of 3D point clouds and images

Research on the new generation framework of PHM systems for railway trains

Lightweight YOLO models for object detection based on low-rank decomposition

Development review and prospects of intelligent technology in rail transit vehicles

Arcing detection method of metro pantograph based on improved yolov4-tiny

Related Author

HE Jia

QIN Yong

DING Ao

WANG Biao

LIU Han

XU Lei

CAI Changjun

CHANG Zhenchen

Related Institution

CHN ENERGY Baoshen Railway Co., Ltd.

CRRC Changchun Railway Vehicles Co., Ltd.

Guangzhou Metro Group Co., Ltd.

CRRC Qingdao Sifang Co., Ltd.

State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Jiaotong University

⁰