Research Projects of Dr. M.W. Mak
Summary 2017

Deep Variational Learning for Speaker Verification in Adverse Environments

To incorporate voice biometrics into remote services, it is important to ensure that recognition accuracy can still be maintained even if the users are speaking in an adverse environment that the system has not encountered during training. Adverseness can be caused by background noise, reverberation, bad microphones, and poor communication channels. The prevalent utterance representation (i-vectors) and scoring method (PLDA) rely on linear models to summarize the spectral characteristics of speech and to marginalize out any nuisance variability not related to speaker recognition. Our previous studies have suggested that when there is a severe mismatch between the training and test conditions or when the nuisance variability is too severe, a linear model may not be sufficient. We advocate that the relationship between the latent variables and the observed variables may not be linear. To relax this linearity assumption, we propose incorporating speaker labels into the learning algorithm of variational autoencoders (VAE).

Investigators: M.W. Mak and Jen-Tzung CHIEN

Funding Source: RGC Competitive Bids, 2017

Semi-supervised Deep Learning for Domain Adaptation in Speaker Verification

State-of-the-art speaker recognition systems typically use machine learning techniques to train statistical models based on a large amount of data. While this system works very well under the environment or source domain for which they are trained, their performance suffers when they are deployed in a new target domain -- a phenomenon known as domain mismatch. This project aims to develop a new deep neural network (DNN) for domain adaptation so that only a small amount of labelled data from the target domain will be sufficient for training a system to work in the new domain. A joined network comprising a regression DNN and a classification DNN will be co-trained by a semi-supervised deep learning algorithm that exploits the labelled data from the source domain and unlabelled data from the target domain.  

Investigators: M.W. Mak and Jen-Tzung CHIEN

Funding Source: RGC Competitive Bids, 2016


Deep Neural Networks for Health Monitoring

Investigators: M.W. Mak and Lawrence C.C. Cheung

Funding Source: Innovation and Technology Fund - Teaching Company Scheme  (ITF-TCS, 2016)

Deep Learning for Emotion Recognition of Pets

Investigators: M.W. Mak and Lawrence C.C. Cheung

Funding Source: Innovation and Technology Fund - Teaching Company Scheme  (ITF-TCS, 2016)

Speaker-Adaptive Denoising Deep Networks for SNR-Aware PLDA Speaker Verification

This project aims to develop a special form of deep neural networks (DNNs) called denoising deep autoencoder for robust text-independent speaker verification. Speaker specific features, referred to as bottleneck features, are extracted from the bottleneck layer of the autoencoder. Unlike conventional autoencoders, our denoising deep autoencoders are able to reconstruct the clean speech signals even if the input signals are contaminated with noise and reverberation effects, which result in robust bottleneck features. Another advantage of our autoencoders is that they are adapted to the characteristics of individual speakers, which enrich the speaker-specific information in the bottleneck features. To make the conventional probabilistic linear discriminant analysis (PLDA) more resilience to varying acoustic environments, two new PLDA models are proposed. These models either make explicit use of SNR information during training and scoring or incorporate the SNR variability into the generative model. The proposed bottleneck features can be readily applied to these new PLDA models.

Investigators: M.W. Mak and Brian Mak

Funding Source: RGC Competitive Bids, 2015

Utterance-Length Dependent PLDA and Discriminative Models for I-Vector Speaker Verification

This project aims to develop machine learning algorithms for text-independent speaker verification. In particular, mixture of probabilistic linear discriminant analysis (PLDA) will be applied to overcome the utterance-length variability. Unlike the conventional PLDA for i-vector based speaker verification where information of utterance length is disregarded, our proposed mixture of PLDA can explicitly model utterances of variable length. This capability is important and significant for verifying speakers during a live call because in such situations it is impractical to control the length of the conversation. To further improve the performance of state-of-the-art speaker verification systems, empirical kernel maps and support vector machines will be applied to explicitly incorporate the information of background speakers in the verification process. The proposed method is novel in that instead of simply computing a likelihood ratio as in the conventional PLDA, the method constructs a discriminative model from the likelihood-ratio scores, which has the advantage of maximizing the discrimination between client speakers and impostors..

Investigators: M.W. Mak and Jen-Tzung CHIEN

Funding Source: RGC Competitive Bids, 2014

Speaker Verification with Very Short Utterances

Text-independent speaker verification on long and clean utterances (say 1 minutes) can achieve superb performance, even better than human. However, recognizing speakers based on very short utterances (say 3 seconds) remains a challenge in speaker recognition research. The main reason is that short utterances lack phonetic context and acoustic information for reliable recognition of speakers, especially when the sentences spoken by the target speaker and the claimant are different. This explains why even the state-of-the-art i- vector/PLDA systems can easily fail in such situation. A possible why of overcoming this difficulty is to model the i-vector (i-vectors are compressed representation of utterances) variability in the PLDA model. While significant gain can be obtained, the problem of this approach is the high computational complexity during verification. This project successfully developed a fast scoring algorithm that can reduce the verification time by 33 times.

Investigators: M.W. Mak

Funding Source: HKPolyU, 2015



Bayesian Probabilistic Linear Discriminant Analysis for Speaker Verification

This project aims to develop machine learning algorithms for text-independent speaker verification. In particular, variational Bayesian methods will be applied to overcome the problem of model uncertainty when the amount of development data is limited. Because users of remote services may use the services under a wide variety of acoustic environments, it is imperative for speaker verification systems to compensate for channel and session variability. This project proposes to apply Bayesian mixture of linear discriminant analysis to account for this wide spectrum of channel and session variability.

Investigators: M.W. Mak and Jen-Tzung CHIEN

Funding Source: HKPolyU

 

Energy-Efficient Supervised Learning Machines for Personal Security and Monitoring on Smartphones

With the high processing power of today's smartphones, it becomes possible to turn a smartphone into a personal sound surveillance and monitoring system. Ideally, such a system should be able to detect and classify a variety of sound events 24 hours a day and trigger an emergence phone call or SMS once a specified sound event (e.g., screaming) occurs. To prolong battery life, it is important to minimize the power consumption of the sound-event classifier. This proposal advocates the notion of intrinsic complexity through which the decision function of polynomial support vector machines (SVMs) is written in a matrix-vector-multiplication form so that the majority of computation during the recognition phase can be shifted to the training phase. Further computation saving can be achieved by exploiting the symmetric property of multinomial coefficients of polynomial expansion. More importantly, the complexity of this new form of decision functions is independent of the amount of training data, thus enabling SVM-based systems to scale to data-intensive applications without increasing computational complexity and power consumption. This characteristic has important implications for the deployment of classification systems on mobile devices, because by exploiting the intrinsic complexity, the power consumption of the decision function of polynomial SVMs can be reduced by over 10 folds in practice. This project also investigates the power consumption of different processing stages of sound-event classification systems, including segmentation, feature extraction, and SVM scoring. The performance and power consumption of various acoustic features and SVM kernels will be compared. The proposed project will also investigate efficient feature-transformation techniques to overcome the acoustic mismatch among the feature vectors caused by the variations in the background noise.

Investigators: M.W. Mak and S.Y. Kung

Funding Source: PolyU

 

A Unified Machine Learning Framework for Classifier Design with Applications to Cancer Diagnosis

A recent report published by the Department of Health suggests that cancer is the leading cause of death in Hong Kong. Cancers are diseases that are caused by the abnormal alternations of genetic materials in cells. Historically, diagnosis of cancers involves a histological examination of biopsy specimen. However, this traditional approach can only give limited information about the tumors. With the availability of DNA microarray technologies, a new approach to diagnosing cancers has emerged. Microarrays are capable of measuring the expression profile of thousands of genes simultaneously, and thus revolutionizing biomarker discovery and cancer diagnosis. However, the classification of gene expression profiles is a challenging task because typically the number of genes (features) is several thousands but the number of samples for training a classifier is only a few dozens. This project aims to develop a unified machine learning framework that provides maximum flexibility for constructing kernel-based classifiers for cancer diagnosis using microarray data. The unified framework incorporates the inevitable perturbation in the training data into the learning algorithm, resulting in robust classifiers that are resilient to the measurement errors commonly found in microarray data. The unified framework is very general in that some of the state-of-the-art classifiers are merely its special cases. The proposed work will provide insight into some fundamental machine learning models. The proposed unified framework is also valuable to a variety of problem domains (e.g., multimedia signal processing and protein function prediction) in which kernel-based classifiers such as support vector machines play an important role..

Investigators: M.W. Mak and S.Y. Kung

Funding Source: PolyU

 

Fusion of Functional Site Detection and Kernel Discriminant Analysis for Biological Sequence Classification

In recent years, because of the aging population in China (including Hong Kong), the pharmaceutical industry in this region has experienced a strong growth. Drug design demands the elucidation of protein subcellular localization, folding, and structure to maximize the efficacy and minimize the side effect of drugs. Computation biology has shown promise in meeting this demand. However, with the ever increase in the number of un-annotated amino acid sequences in protein databases, it becomes clear that in order to speed up the drug design process, it is imperative to enhance the computation speed of current protein prediction algorithms. As computation efficiency is critical for genomic and proteomic data mining, this project aims to alleviates the computational burden in two fronts: (1) selecting the most relevant features from amino acid sequences and (2) developing more efficient sequence classifiers. For the former, we advocate a cascade fusion of sequence segmentation and profile alignment in which the most relevant segments of amino acid sequences are detected to speedup the alignment process. For the latter, we propose a new kernel Fisher discriminator that incorporates the notion of maximum margins into kernel Fisher discriminant analysis, leading to a classifier that is ten times faster than current state-of-the-art kernel-based sequence classifiers.

The ultimate goal of the project is to have a systematic selection of relevant features that can improve prediction accuracy and computation efficiency of genomic and proteomic data mining. The proposed work will provide insight into some fundamental machine learning models. The proposed algorithm is also valuable to a variety of problem domains (e.g., speech and language processing) in which sequence classification plays an important role.

Investigators: M.W. Mak and S.Y. Kung

Funding Source: PolyU

 

 

Utterance Partitioning for GMM-SVM Speaker Verification

Recent research has demonstrated the superiority of support-vector-machine (SVM) scoring over likelihood-ratio scoring in text-independent speaker verification. However, one unaddressed issue in SVM scoring is the imbalance between the numbers of speaker-class utterances and impostor-class utterances available for training a speaker-dependent SVM. This proposal aims to develop a resampling technique – namely Utterance Partitioning with Acoustic Vector Resampling (UP-AVR) – to mitigate the data imbalance problem. Briefly, the sequence order of acoustic vectors in an enrollment utterance is first randomized, which is followed by partitioning the randomized sequence into a number of segments. Each of these segments is then used to produce a GMM supervector via MAP adaptation and mean vector concatenation. The randomization and partitioning processes are repeated several times to produce a sufficient number of speaker-class supervectors for training an SVM. Experimental evaluations will be done on the most recent NIST SRE datasets such as SRE’2008 and SRE’2010.

Investigator: M.W. Mak

Funding Sources: PolyU

 

Discriminative Models for Biological Sequence Labeling and Segmentation

 Because of the aging population, the pharmaceutical industry in China (including Hong Kong) has experienced a strong growth in recent years. Labeling and segmenting amino acids in protein sequences, such as the determination of signal-peptide cleavage sites, is an important process in drug design. Because performing such task by experimental means is too costly and time consuming, machine learning techniques have become increasingly important for the pharmaceutical industry. Neural networks and hidden Markov models have been the prevailing machine learning approaches to determining the cleavage sites of signal peptides. These approaches, however, have limitations in that their performance is highly dependent on the feature encoding schemes and that longrange dependences between labels and amino acids cannot be properly modeled. This project aims to alleviate these limitations by using discriminative models such as conditional random fields. To maximize the information extracted from protein sequences, the project proposes using the properties of short amino acid segments to determine real-value feature functions for constructing conditional random fields. The use of real-value features instead of Boolean ones allows us to use a wide range of amino acid properties that are relevant to the task, thus facilitating the incorporation of biological knowledge into the predictors. The ultimate goal of the project is to have a systematic selection of relevant features that can improve prediction accuracy. The proposed work will provide insight into some fundamental machine learning models. The proposed algorithm is also valuable to a variety of problem domains (e.g., speech and language processing) in which discriminative models play an important role.

Investigator: M.W. Mak and S.Y. Kung

Funding Sources: RGC Competitive Bids, 2009.

 

Self-Supervised Feature Selection for Sequence Classification in Bioinformatics

 In recent years, we have witnessed a strong growth in the pharmaceutical industry in China (including Hong Kong), primarily because of the aging population in this region. Classification of proteins based on their amino acid sequences is an important process in drug design. Because performing such process by experimental means is too time consuming, machine learning techniques have become increasingly important for the pharmaceutical industry. One prevailing approach to protein classification is to perform pairwise comparisons between amino acid sequences. However, such method can easily lead to the curse of dimensionality and demands considerable computation resources. This project aims to alleviate these limitations by using feature selection techniques, i.e., selecting the features that are relevant to the classification task and removing those that are redundant. The proposal introduces two new types of learning, namely self-supervised and symmetric-doubly supervised learning, for feature selection. These learning scenarios provide theoretic justifications on why a particular set of features should be selected. To facilitate the fusion of different selection criteria and strategies, a pairwise scoring technique is proposed to convert the self-supervised scenario to the symmetric-doubly supervised one. The ultimate goal is to have a systematic selection of relevant features, which can improve prediction accuracy and computation efficiency. The proposed work will provide insight into some fundamental machine learning models. The proposed algorithm is also valuable to a variety of problem domains (e.g., biometrics) in which pairwise scoring play an important role.

Investigator: M.W. Mak and S.Y. Kung

Funding Source: RGC Competitive Bids, 2008.

 

Homology-Based Kernel Methods for Sequence Classification in Bioinformatics

The aging population in Hong Kong and mainland China leads to a significant growth in pharmaceutical industry in recent years. Prediction of protein functions and subcellular locations is an important process in drug design. Because determining this information by experimental means is time consuming, machine learning has become indispensable tools for pharmaceutical industry to enhance the effectiveness and efficacy of drugs. Currently, subcellular locations of proteins are typically determined by looking at their corresponding amino acid sequences. Although the performance of sequencebased methods has been improving over the years, most of them lack a sound theoretic justification to guarantee similar performance for new data. This project aims to develop a kernel-based classification method for proteins’ subcellular localization, and provides a theoretic justification to ensure predictable performance on new sequences. We will investigate the trade-off between the diagonal dominance of kernel matrices and Mercer’s condition, which will lead to an effective design guideline for constructing kernel-based predictors. The ultimate goal is to have a systematic selection of kernels, which can improve prediction accuracy and computation efficiency. Our proposed algorithm is also valuable to a variety of problem domains (e.g., biometrics) in which kernel methods play an important role.

Investigator: M.W. Mak and S.Y. Kung

Funding Sources: RGC Competitive Bids, 2007.

 

Articulatory Feature-Based Pronunciation Modeling for Robust Speaker Verification

Conventional voice biometric systems typically model the vocal-tract characteristics of speakers by extracting the low-level spectral information from speech signals. These features, however, are known to be sensitive to channel mismatch and background noise. It is commonly believed that apart from using spectral contents, humans also recognize speakers based on their speaking style, prosody, intonation, accent, pronunciation characteristics, and so on. These high-level features carry the personality traits of individuals and are expected to be less susceptible to channel effects and background noise. This project aims to (1) capture the pronunciation characteristics of speakers by modeling how they articulate speech and (2) combine the pronunciation characteristics with spectral features for speaker verification. The project will provide new solutions to some of the practical problems encountered by speaker verification researchers today. These solutions will potentially help telecommunication service providers and financial service providers to open up new markets.

Investigator: M.W. Mak and S.Y. Kung

Funding Source: RGC Direct Allocation

 

Coherence Models for Microarray Data Analysis

With the recent advances in DNA microarray technology, it has become possible to measure the expression level of thousands of genes across hundreds of experimental conditions. The ability to discover hidden patterns in gene expression data has significant impact on drug design and the development of new treatments with maximum efficacy and minimum side effects.

Machine learning techniques offer a viable approach to cluster discovery from microarray data, which involves identifying and classifying biologically relevant groups in genes and conditions. It has been recognized that genes (whether or not they belong to the same gene group) may be co-expressed via a variety of pathways. Therefore, they can be adequately described by a diversity of coherence models. In fact, it is known that a gene may participate in multiple pathways that may or may not be co-active under all conditions. It is therefore biologically meaningful to simultaneously divide genes into functional groups and conditions into co-active categories – leading to the so-called biclustering analysis. For this, we have proposed a comprehensive set of coherence models to cope with various plausible regulation processes. Furthermore, a multi-modality biclustering analysis based on the fusion of different coherence models appears to be promising because the expression level of genes from the same group may follow more than one coherence models. This proposal aims to (1) extend our biclustering algorithms to more difficult genomic dataset (e.g., lymphoma) and (2) conduct performance analysis to confirm that the proposed multi-modality approach enjoys the advantage of high prediction performance.

Investigator: M.W. Mak and S.Y. Kung

Funding Source: PolyU Internal Compeitive Research Grant, 2006.

 

Mobile Phone-Based Speaker Verification via Blind Stochastic Feature Transformation

While today’s speaker verification systems perform reasonably well under controlled conditions, their performance is often compromised under real-world environments. In particular, variations in handset characteristics are known to be the major cause of performance degradation. Research has found that the effect of handset variations can be greatly reduced if handset characteristics are known a priori. However, this requirement limits the scale of the systems because maintaining a handset database for storing the information of all possible handset models is a great challenge. Our proposal overcomes this problem by means of a blind feature transformation approach in which the transformation parameters are determined online without any a priori knowledge of handset characteristics, which makes the method more appropriate for large-scale deployment. The project will provide new solutions to some of the practical problems encountered by speaker verification researchers today. These solutions will potentially help telecommunication service providers and financial service providers to open up new markets.

Investigator: M.W. Mak and S.Y. Kung

Funding Source: RGC Competitive Bids, 2005.

 

Multi-Sample Decision Fusion for Biometric Verification

Over 85% of the population in Hong Kong uses mobile phones, and most of them are willing to carry out financial transactions over wireless networks. However, there is now a growing concern about the security of these transactions. In particular, prevailing remote access systems, which determine the eligibility of users by personal identity numbers, pose a high security risk. This project aims to improve the security of these systems by using biometric technologies. These technologies allow a system to verify its users on the basis of their physiological characteristics, such as voices, fingerprints and face patterns, or some aspect of behaviour, such as handwriting or keystroke patterns. Since the means for biometric systems to identify a person is not based on what he or she knows (a code), or possesses (a card), but on what he or she has (a characteristic), the possibility of forgery can be greatly reduced.

Most biometric authentication systems take one sample (e.g. an utterance or a video shot) from their users in a verification session. To improve the reliability of the verification decisions, some systems require their users to provide more than one sample during verification; the average scores of these samples are then used for making decisions on verification. However, averaging the scores may not produce optimal decisions because this approach considers the patterns in the samples as being equally reliable, which is often not the case in practice. In this project, we propose a novel approach to determining the reliability of individual frame-based feature vectors to combine the scores of the independent samples gathered from users during verification. The proposed fusion approach is very general and is potentially applicable to multi-sample, multi-modal biometric authentication. The project will provide new solutions to some of the practical problems encountered by biometrics researchers today. These solutions will potentially help telecommunication service providers and financial service providers to open up new markets.

Investigator: M.W. Mak and S.Y. Kung

Funding Source: RGC Competitive Bids, 2004.

 

Probabilistic Decision Fusion for Multimodal Person Verification

Financial transactions over wireless networks have become increasingly popular in recent years. However, there is now a growing concern for the security of these transactions. In particular, prevailing remote access systems, which determine the eligibility of users by personal identity numbers, pose a high security risk. This project aims to improve the security of these systems by combining two biometric technologies: speaker verification and face recognition. These technologies allow a system to verify its users by recognizing the unique characteristics contained in the users’ voice and face. Current biometric authentication systems typically consider one biometric feature only (e.g. face, voice, or fingerprint, etc.). While these systems perform reasonably well under controlled conditions, their performance is often compromised under real-world environments. We propose to improve the robustness of these systems by fusing the information gathered from both the audio and visual modalities. A novel approach to determine the reliability of the audio and visual sources is proposed. This reliability information will be used to combine the decisions made by the classifiers in the two modalities. The project will provide new solutions to some of the practical problems encountered by biometrics researchers today. These solutions will potentially help security product manufacturers and financial service providers to open up new markets.

 

Investigator: M.W. Mak and S.Y. Kung

Funding Source: Central Research Grant

 

Towards Multi-modal Human-computer Dialog Interactions with Minimally Intrusive Biometric Security Functions

This is a group research project involving researchers from three universities in Hong Kong: CUHK, HKPolyU and HKUST. The project aims to develop human-centric interface technologies to support secure computing by a diversity of users in a variety of usage contexts. The work to be done in the HKPolyU include the followings:

  • Cross validation of speech data integrity via lip-tracking for biometric applications

  • Reducing transducer distortions for speaker authentication

Investigators:
Name Institution
CHING, P.C. Dept. of Electronic Eng., CUHK
MAK, Brian Dept. of Computer Sicence, HKUST
MAK, Man Wai Dept. of Electronic and Information Eng,. HKPolyU
MENG, Helen Dept. of Systems Eng. & Eng. Management, CUHK
MOON, Y.S. Dept. of Computer Science & Eng., CUHK
SIU, Man Hung Dept. of Electrical and Electronic Eng., HKUST
LEE, Tan Dept. of Electronic Eng., CUHK
TANG, Xiao Ou Dept. of Information Eng., CUHK

Funding Source: RGC Central Allocation Vote

 

Environment Adaptation for Distributed Speaker Verification

This project aims to develop environment adaptation techniques (including feature transformation and modal adaptation) for speaker verification over wireless networks and the Internet. Another purpose of this project is to combine the environment adaptation techniques with a client-side front-end processing approach recently standardized by the European Telecommunications Standard Institute (ETSI) for distributed speaker verification.

Investigator: M.W. Mak

Funding Source: RGC Direct Allocation

 

Non-linear Stochastic Matching for Robust Speaker Verification

While today’s speaker verification systems perform reasonably well under controlled conditions, their performance is often compromised under real-world environments. In particular, variations in handset characteristics are known to be the major cause of performance degradation. Our proposal is to minimize the effects resulting from transducer variation. The proposed approaches overcome the limitations of conventional channel compensation methods by looking at the non-linear characteristics of telephone handsets. A novel non-linear probabilistic transformation method will be derived and evaluated. The project will provide new solutions to some of the practical problems encountered by speaker verification researchers today. These solutions will potentially help security product manufacturers and telephone-based transaction service providers open up new markets.

 

Investigators: M.W. Mak and S.Y. Kung

Funding Source: RGC Competitive Bids (PolyU 5131/02E)

 

Handset Mismatch Compensation for Robust Speaker Verification

This project aims at (1) developing handset mismatch compensation algorithms for speaker verification systems and (2) constructing a Cantonese telephone speech corpus for speaker verification research. Most channel compensation techniques assume that the telephone channel can be approximated by a linear filter. However, telephone handsets typically exhibit non-linear characteristics, suggesting that linear filtering addresses only part of the problem. For this project, we propose a non-linear feature mapper and a probabilistic channel equalizer that integrate the non-linear handset characteristics into the channel compensation process.
 

Funding Source: RGC Competitive Bids (PolyU 5129/01E)

Investigators: M.W. Mak and S.Y. Kung

 

Stochastic Model Adaptation for Robust Speech/Speaker Recognition

The performance of current speech/speaker recognition systems is often affected by the acoustic environment in which the systems are operated. For example, in telephone-based speaker verification, speakers tend to use different telephone handsets in different environments (e.g. office and home). Variation in handset’ characteristics can introduce severe speech variability even though the speech is uttered by the same speaker. Therefore, it is very important for a speaker model to be able to accommodate new acoustic environments. Furthermore, a practical speaker verification system also needs to adapt itself in order to accommodate the change in speaker characteristics over time. This is because speakers often sound different from time to time, a phenomenon known as intra-speaker variability. In this project, we propose to address the above issues by developing a temporally adaptive probabilistic neural network. Training algorithms and adaptation mechanisms, which will be based on our previous work on neural network learning algorithms, will be derived. The network performance will be evaluated using real-world data. 
 

Investigators: M.W. Mak and W.C. Siu

Funding Source: ASD Project

 

Acoustic and Voice Processing

In recent years, speech recognition systems, internet telephony, and video conferencing systems have been employed in a variety of real environments. However, in many practical situations, ambient noise, reverberation, and poor quality of microphones can degrade the performance of these systems drastically. Therefore, it is necessary to develop enhancement algorithms to improve the performance of these systems in adverse acoustic environment. This project is to investigate microphone characteristics and the human auditory system in order to enhance channel distorted, noisy speech for robust speech recognition and teleconferencing.
 

Investigators: M.W. Mak and W.C. Siu

Funding Source: ASD Project

 

Stochastic Matching Techniques for Robust Speaker Recognition

Today’s speaker recognition systems in laboratory environment have reached a very high level of performance. However, several technical issues (such as channel robustness) need to be resolved before these systems can be commercialized. This project is to resolve these issues. In particular, this project aims to develop a set of model-based and feature-based transformation techniques for robust speaker recognition. Parameter estimation algorithms based on the maximum likelihood (ML) principles and maximum a posteriori (MAP) principles will be derived. 

 

Investigators: M.W. Mak

Funding Source: Central Research Grant

 


 


M.W. Mak's Homepage

http://www.eie.polyu.edu.hk/~mwmak/mypage.htm