Proposed Undergraduate Projects for 2003/04
Project 1: An Interactive Voice Response System Based on VoiceXML and Microsoft .NET Speech SDK
Pre-requisite: Strong in programming
What will you learn: XML, VoiceXML, ASP.NET, .NET, speech recognition
Large-vocabulary continuous speech recognition (LVCSR) has a wide range of applications. With the recent advance in speech and computer technologies, building an LVCSR system on personnel computers has become possible. In this project, you will develop a continuous speech recognition system based on the Cambridge University’ HTK Toolkit (http://htk.eng.cam.ac.uk). The software will be run on Linux platforms and it should be able to transcribe continuous speech into text. It also be able to adapt the speech models to accommodate the variation in speaker characteristics.
Pre-requisite: C/C++, Unix and signal processing
What will you learn: Speech recognition, speaker adaptation, language model, HMM
An on-line ticket ordering system (http://18.104.22.168:8080/cinema/home) has been developed for the subject EIE420 “Software Engineering for Web Applications”. The system is based on the Java Servlet technology and it allows users to order movie tickets via Web browsers running on client PCs. This project is to extend the system so that PDAs and mobile phones with Internet connection can also access the system. To this end, MIDlets will be developed and downloaded to PDAs or mobile phones, and existing servlets will be modified to accommodate the limited computing power of handheld devices.
Pre-requisite: Java, servlets, Unix, Web programming
What will you learn: J2ME, servlets, Web programming
Project 4: An Interactive Software Package for Learning Speech Processing
Multimedia teaching tools have been widely used in many universities to help students understand abstract concepts and theories. This project aims to add some new features to a Windows-based multimedia learning tool (http://www.eie.polyu.edu.hk/~mwmak/Download.htm) to help students learn the concepts of audio and speech processing. The new features include displaying formant tracks and pitch envelopes. Visual C++ and Microsoft Foundation Class (MFC) libraries will be used in this project. The current version of the software can be found in http://www.en.polyu.edu.hk/~mwmak/Download.htm.
Pre-requisite: C/C++, Visual C++, and DSP
What will you learn: Speech processing techniques, Microsoft MFC, multi-threading programming, and audio programming.
Project 5: A Multimedia Software Tool for Speech Coder Design
Multimedia teaching tools have been widely used in many universities to help students understand abstract concepts and theories. This project aims to add some new features to a Windows-based multimedia learning tool to help students learn the concepts of speech coding. The software tool should allow users to change the parameters of speech coders through user interface controls. Visual C++ and Microsoft Foundation Class (MFC) libraries will be used in this project. The current version of the software can be found in http://www.en.polyu.edu.hk/~mwmak/Download.htm.
Pre-requisite: C/C++, Visual C++, and DSP
Knowledge and skills to be learnt: Speech coding techniques, Microsoft MFC, multi-threading programming, and audio programming.
Group Project 6&7: Client/Server Architecture for Distributed Speaker Verification
The European Telecommunications Standards Institution (ETSI) has recently published a front-end processing standard for distributed speech recognition. The standard allows speech features to be extracted from handheld devices and transmitted to remote servers for recognition. This project is divided into two parts. In the first part, you will apply the ETSI standard to implement the front-end of a distributed speaker verification system through which users’ identities can be authenticated over the IP and wireless networks. In the second part, you will develop an on-line speaker verification system based on Gaussian mixture speaker models and support vector machines.
Pre-requisite: C/C++, DSP concepts, Unix
Knowledge and skills to be learnt: Distributed systems, distributed speaker recognition, audio programming, Gaussian mixture models, support vector machines.
Project 8: Fixed-point Implementation of G.723.1 Speech Coder
G.723.1 is a speech compression algorithm standardized by International Telecommunication Union (ITU) for multimedia, visual telephony, wireless telephony, and videoconferencing products. The coder delivers the highest compression ratio of any of the current ITU standards without compromising speech quality. In this project, you will port the ITU’s reference source code to the Taxes Instrument TMS320C5416 DSP chip using the Code Composer Studio and DSP starter kit. The resulting software should be able to encode and decode speech using G.723.1 in real time.
Pre-requisite: C/C++, DSP concepts
What will you learn: Speech processing techniques, speech coding, DSP programming, fixed-point implementation techniques, real-time programming
M.W. Mak's homepage