Volume 2, Issue 6, December 2014, Page: 89-93
Embedded System for Speech Recognition and Image Processing
Zhengxi Wei, School of Computer Science, Sichuan University of Science & Engineering, Zigong Sichuan 643000, PR China
Jinming Liang, School of Computer Science, Sichuan University of Science & Engineering, Zigong Sichuan 643000, PR China
Received: Dec. 16, 2014;       Accepted: Dec. 23, 2014;       Published: Feb. 6, 2015
DOI: 10.11648/j.jeee.20140206.12      View  3133      Downloads  290
In recent years, the products of voice terminal and image retrieval show the intelligentized trend, but the mature commodities are rare in the market. This paper presents an embedded design method of intelligent voice terminal based on pattern recognition. The design adopts Samsung S3C2410 ARM as target board, Philips Uda1341TS as audio codec, embedded Linux OS as software platform, and speech recognition is implemented through small-vocabulary voice training. To improve the recognized effect, we use the image retrieval technology as an auxiliary tool, which helps speech recognition module create or more accurately find a personal voice-training library. By means of image recognition, the experimental results prove that the effect of speech recognition achieves an average increase of 10 percentages.
Speech Recognition, Embedded Development, Image Retrieval, DTW Algorithm, ARM Development
To cite this article
Zhengxi Wei, Jinming Liang, Embedded System for Speech Recognition and Image Processing, Journal of Electrical and Electronic Engineering. Vol. 2, No. 6, 2014, pp. 89-93. doi: 10.11648/j.jeee.20140206.12
Shen Y T. Portable personal multimedia terminal: U.S. Patent D689, 856[P]. 2013-9-17.
Rasiwasia N, Costa Pereira J, Coviello E, et al. A new approach to cross-modal multimedia retrieval[C]//Proceedings of the international conference on Multimedia. ACM, 2010: 251-260.
Rabiner L R, Schafer R W. Digital Speech Processing [J]. The Froehlich/Kent Encyclopedia of Telecommunications, 2011, 6: 237-258.
Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. Signal Processing Magazine, IEEE, 2012, 29(6): 82-97.
Muscillo R, Schmid M, Conforto S, et al. Early recognition of upper limb motor tasks through accelerometers: real-time implementation of a DTW-based algorithm [J]. Computers in biology and medicine, 2011, 41(3): 164-172.
Zhu B B, Yan J, Li Q, et al. Attacks and design of image recognition CAPTCHAs[C]//Proceedings of the 17th ACM conference on Computer and communications security. ACM, 2010: 187-200.
Lux M, Klieber W, Granitzer M. Caliph & Emir: semantics in multimedia retrieval and annotation[C]//Proceedings of the 19th International CODATA Conference. 2004: 64-75.
Viswanathan M, Viswanathan M. Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale [J]. Computer Speech & Language, 2005, 19(1): 55-83.
Browse journals by subject