| Research Projects of
Dr. M.W. Mak
A Unified Machine Learning Framework for Classifier Design with Applications to Cancer Diagnosis A recent report published by the Department of Health suggests that cancer is the leading cause of death in Hong Kong. Cancers are diseases that are caused by the abnormal alternations of genetic materials in cells. Historically, diagnosis of cancers involves a histological examination of biopsy specimen. However, this traditional approach can only give limited information about the tumors. With the availability of DNA microarray technologies, a new approach to diagnosing cancers has emerged. Microarrays are capable of measuring the expression profile of thousands of genes simultaneously, and thus revolutionizing biomarker discovery and cancer diagnosis. However, the classification of gene expression profiles is a challenging task because typically the number of genes (features) is several thousands but the number of samples for training a classifier is only a few dozens. This project aims to develop a unified machine learning framework that provides maximum flexibility for constructing kernel-based classifiers for cancer diagnosis using microarray data. The unified framework incorporates the inevitable perturbation in the training data into the learning algorithm, resulting in robust classifiers that are resilient to the measurement errors commonly found in microarray data. The unified framework is very general in that some of the state-of-the-art classifiers are merely its special cases. The proposed work will provide insight into some fundamental machine learning models. The proposed unified framework is also valuable to a variety of problem domains (e.g., multimedia signal processing and protein function prediction) in which kernel-based classifiers such as support vector machines play an important role.. Investigator: M.W. Mak Funding Source: PolyU
Fusion of Functional Site Detection and Kernel Discriminant Analysis for Biological Sequence Classification In recent years, because of the aging population in China (including Hong Kong), the pharmaceutical industry in this region has experienced a strong growth. Drug design demands the elucidation of protein subcellular localization, folding, and structure to maximize the efficacy and minimize the side effect of drugs. Computation biology has shown promise in meeting this demand. However, with the ever increase in the number of un-annotated amino acid sequences in protein databases, it becomes clear that in order to speed up the drug design process, it is imperative to enhance the computation speed of current protein prediction algorithms. As computation efficiency is critical for genomic and proteomic data mining, this project aims to alleviates the computational burden in two fronts: (1) selecting the most relevant features from amino acid sequences and (2) developing more efficient sequence classifiers. For the former, we advocate a cascade fusion of sequence segmentation and profile alignment in which the most relevant segments of amino acid sequences are detected to speedup the alignment process. For the latter, we propose a new kernel Fisher discriminator that incorporates the notion of maximum margins into kernel Fisher discriminant analysis, leading to a classifier that is ten times faster than current state-of-the-art kernel-based sequence classifiers. The ultimate goal of the project is to have a systematic selection of relevant features that can improve prediction accuracy and computation efficiency of genomic and proteomic data mining. The proposed work will provide insight into some fundamental machine learning models. The proposed algorithm is also valuable to a variety of problem domains (e.g., speech and language processing) in which sequence classification plays an important role. Investigator: M.W. Mak Funding Source: PolyU
Utterance Partitioning for GMM-SVM Speaker Verification Recent research has demonstrated the superiority of support-vector-machine (SVM) scoring over likelihood-ratio scoring in text-independent speaker verification. However, one unaddressed issue in SVM scoring is the imbalance between the numbers of speaker-class utterances and impostor-class utterances available for training a speaker-dependent SVM. This proposal aims to develop a resampling technique – namely Utterance Partitioning with Acoustic Vector Resampling (UP-AVR) – to mitigate the data imbalance problem. Briefly, the sequence order of acoustic vectors in an enrollment utterance is first randomized, which is followed by partitioning the randomized sequence into a number of segments. Each of these segments is then used to produce a GMM supervector via MAP adaptation and mean vector concatenation. The randomization and partitioning processes are repeated several times to produce a sufficient number of speaker-class supervectors for training an SVM. Experimental evaluations will be done on the most recent NIST SRE datasets such as SRE’2008 and SRE’2010. Investigator: M.W. Mak Funding Source: PolyU
Discriminative Models for Biological Sequence Labeling and Segmentation Because of the aging population, the pharmaceutical industry in China (including Hong Kong) has experienced a strong growth in recent years. Labeling and segmenting amino acids in protein sequences, such as the determination of signal-peptide cleavage sites, is an important process in drug design. Because performing such task by experimental means is too costly and time consuming, machine learning techniques have become increasingly important for the pharmaceutical industry. Neural networks and hidden Markov models have been the prevailing machine learning approaches to determining the cleavage sites of signal peptides. These approaches, however, have limitations in that their performance is highly dependent on the feature encoding schemes and that longrange dependences between labels and amino acids cannot be properly modeled. This project aims to alleviate these limitations by using discriminative models such as conditional random fields. To maximize the information extracted from protein sequences, the project proposes using the properties of short amino acid segments to determine real-value feature functions for constructing conditional random fields. The use of real-value features instead of Boolean ones allows us to use a wide range of amino acid properties that are relevant to the task, thus facilitating the incorporation of biological knowledge into the predictors. The ultimate goal of the project is to have a systematic selection of relevant features that can improve prediction accuracy. The proposed work will provide insight into some fundamental machine learning models. The proposed algorithm is also valuable to a variety of problem domains (e.g., speech and language processing) in which discriminative models play an important role. Investigator: M.W. Mak and S.Y. Kung Funding Source: RGC Competitive Bids, 2009.
Self-Supervised Feature Selection for Sequence Classification in Bioinformatics In recent years, we have witnessed a strong growth in the pharmaceutical industry in China (including Hong Kong), primarily because of the aging population in this region. Classification of proteins based on their amino acid sequences is an important process in drug design. Because performing such process by experimental means is too time consuming, machine learning techniques have become increasingly important for the pharmaceutical industry. One prevailing approach to protein classification is to perform pairwise comparisons between amino acid sequences. However, such method can easily lead to the curse of dimensionality and demands considerable computation resources. This project aims to alleviate these limitations by using feature selection techniques, i.e., selecting the features that are relevant to the classification task and removing those that are redundant. The proposal introduces two new types of learning, namely self-supervised and symmetric-doubly supervised learning, for feature selection. These learning scenarios provide theoretic justifications on why a particular set of features should be selected. To facilitate the fusion of different selection criteria and strategies, a pairwise scoring technique is proposed to convert the self-supervised scenario to the symmetric-doubly supervised one. The ultimate goal is to have a systematic selection of relevant features, which can improve prediction accuracy and computation efficiency. The proposed work will provide insight into some fundamental machine learning models. The proposed algorithm is also valuable to a variety of problem domains (e.g., biometrics) in which pairwise scoring play an important role. Investigator: M.W. Mak and S.Y. Kung Funding Source: RGC Competitive Bids, 2008.
Homology-Based Kernel Methods for Sequence Classification in Bioinformatics The aging population in Hong Kong and mainland China leads to a significant growth in pharmaceutical industry in recent years. Prediction of protein functions and subcellular locations is an important process in drug design. Because determining this information by experimental means is time consuming, machine learning has become indispensable tools for pharmaceutical industry to enhance the effectiveness and efficacy of drugs. Currently, subcellular locations of proteins are typically determined by looking at their corresponding amino acid sequences. Although the performance of sequencebased methods has been improving over the years, most of them lack a sound theoretic justification to guarantee similar performance for new data. This project aims to develop a kernel-based classification method for proteins’ subcellular localization, and provides a theoretic justification to ensure predictable performance on new sequences. We will investigate the trade-off between the diagonal dominance of kernel matrices and Mercer’s condition, which will lead to an effective design guideline for constructing kernel-based predictors. The ultimate goal is to have a systematic selection of kernels, which can improve prediction accuracy and computation efficiency. Our proposed algorithm is also valuable to a variety of problem domains (e.g., biometrics) in which kernel methods play an important role. Investigator: M.W. Mak and S.Y. Kung Funding Source: RGC Competitive Bids, 2007.
Articulatory Feature-Based Pronunciation Modeling for Robust Speaker Verification Conventional voice biometric systems typically model the vocal-tract characteristics of speakers by extracting the low-level spectral information from speech signals. These features, however, are known to be sensitive to channel mismatch and background noise. It is commonly believed that apart from using spectral contents, humans also recognize speakers based on their speaking style, prosody, intonation, accent, pronunciation characteristics, and so on. These high-level features carry the personality traits of individuals and are expected to be less susceptible to channel effects and background noise. This project aims to (1) capture the pronunciation characteristics of speakers by modeling how they articulate speech and (2) combine the pronunciation characteristics with spectral features for speaker verification. The project will provide new solutions to some of the practical problems encountered by speaker verification researchers today. These solutions will potentially help telecommunication service providers and financial service providers to open up new markets. Investigator: M.W. Mak and S.Y. Kung Funding Source: RGC Direct Allocation
Coherence Models for Microarray Data Analysis With the recent advances in DNA microarray technology, it has become possible to measure the expression level of thousands of genes across hundreds of experimental conditions. The ability to discover hidden patterns in gene expression data has significant impact on drug design and the development of new treatments with maximum efficacy and minimum side effects. Machine learning techniques offer a viable approach to cluster discovery from microarray data, which involves identifying and classifying biologically relevant groups in genes and conditions. It has been recognized that genes (whether or not they belong to the same gene group) may be co-expressed via a variety of pathways. Therefore, they can be adequately described by a diversity of coherence models. In fact, it is known that a gene may participate in multiple pathways that may or may not be co-active under all conditions. It is therefore biologically meaningful to simultaneously divide genes into functional groups and conditions into co-active categories – leading to the so-called biclustering analysis. For this, we have proposed a comprehensive set of coherence models to cope with various plausible regulation processes. Furthermore, a multi-modality biclustering analysis based on the fusion of different coherence models appears to be promising because the expression level of genes from the same group may follow more than one coherence models. This proposal aims to (1) extend our biclustering algorithms to more difficult genomic dataset (e.g., lymphoma) and (2) conduct performance analysis to confirm that the proposed multi-modality approach enjoys the advantage of high prediction performance. Investigator: M.W. Mak and S.Y. Kung Funding Source: PolyU Internal Compeitive Research Grant, 2006.
Mobile Phone-Based Speaker Verification via Blind Stochastic Feature Transformation While today’s speaker verification systems perform reasonably well under controlled conditions, their performance is often compromised under real-world environments. In particular, variations in handset characteristics are known to be the major cause of performance degradation. Research has found that the effect of handset variations can be greatly reduced if handset characteristics are known a priori. However, this requirement limits the scale of the systems because maintaining a handset database for storing the information of all possible handset models is a great challenge. Our proposal overcomes this problem by means of a blind feature transformation approach in which the transformation parameters are determined online without any a priori knowledge of handset characteristics, which makes the method more appropriate for large-scale deployment. The project will provide new solutions to some of the practical problems encountered by speaker verification researchers today. These solutions will potentially help telecommunication service providers and financial service providers to open up new markets. Investigator: M.W. Mak and S.Y. Kung Funding Source: RGC Competitive Bids, 2005.
Multi-Sample Decision Fusion for Biometric Verification Over 85% of the population in Hong Kong uses mobile phones, and most of them are willing to carry out financial transactions over wireless networks. However, there is now a growing concern about the security of these transactions. In particular, prevailing remote access systems, which determine the eligibility of users by personal identity numbers, pose a high security risk. This project aims to improve the security of these systems by using biometric technologies. These technologies allow a system to verify its users on the basis of their physiological characteristics, such as voices, fingerprints and face patterns, or some aspect of behaviour, such as handwriting or keystroke patterns. Since the means for biometric systems to identify a person is not based on what he or she knows (a code), or possesses (a card), but on what he or she has (a characteristic), the possibility of forgery can be greatly reduced. Most biometric authentication systems take one sample (e.g. an utterance or a video shot) from their users in a verification session. To improve the reliability of the verification decisions, some systems require their users to provide more than one sample during verification; the average scores of these samples are then used for making decisions on verification. However, averaging the scores may not produce optimal decisions because this approach considers the patterns in the samples as being equally reliable, which is often not the case in practice. In this project, we propose a novel approach to determining the reliability of individual frame-based feature vectors to combine the scores of the independent samples gathered from users during verification. The proposed fusion approach is very general and is potentially applicable to multi-sample, multi-modal biometric authentication. The project will provide new solutions to some of the practical problems encountered by biometrics researchers today. These solutions will potentially help telecommunication service providers and financial service providers to open up new markets. Investigator: M.W. Mak and S.Y. Kung Funding Source: RGC Competitive Bids, 2004.
Probabilistic Decision Fusion for Multimodal Person Verification Financial transactions over wireless networks have become increasingly popular in recent years. However, there is now a growing concern for the security of these transactions. In particular, prevailing remote access systems, which determine the eligibility of users by personal identity numbers, pose a high security risk. This project aims to improve the security of these systems by combining two biometric technologies: speaker verification and face recognition. These technologies allow a system to verify its users by recognizing the unique characteristics contained in the users’ voice and face. Current biometric authentication systems typically consider one biometric feature only (e.g. face, voice, or fingerprint, etc.). While these systems perform reasonably well under controlled conditions, their performance is often compromised under real-world environments. We propose to improve the robustness of these systems by fusing the information gathered from both the audio and visual modalities. A novel approach to determine the reliability of the audio and visual sources is proposed. This reliability information will be used to combine the decisions made by the classifiers in the two modalities. The project will provide new solutions to some of the practical problems encountered by biometrics researchers today. These solutions will potentially help security product manufacturers and financial service providers to open up new markets.
Investigator: M.W. Mak and S.Y. Kung Funding Source: Central Research Grant
Towards Multi-modal Human-computer Dialog Interactions with Minimally Intrusive Biometric Security Functions This is a group research project involving researchers from three universities in Hong Kong: CUHK, HKPolyU and HKUST. The project aims to develop human-centric interface technologies to support secure computing by a diversity of users in a variety of usage contexts. The work to be done in the HKPolyU include the followings:
Investigators:
Funding Source: RGC Central Allocation Vote
Environment Adaptation for Distributed Speaker Verification This project aims to develop environment adaptation techniques (including feature transformation and modal adaptation) for speaker verification over wireless networks and the Internet. Another purpose of this project is to combine the environment adaptation techniques with a client-side front-end processing approach recently standardized by the European Telecommunications Standard Institute (ETSI) for distributed speaker verification. Investigator: M.W. Mak Funding Source: RGC Direct Allocation
Non-linear Stochastic Matching for Robust Speaker Verification While today’s speaker verification systems perform reasonably well under controlled conditions, their performance is often compromised under real-world environments. In particular, variations in handset characteristics are known to be the major cause of performance degradation. Our proposal is to minimize the effects resulting from transducer variation. The proposed approaches overcome the limitations of conventional channel compensation methods by looking at the non-linear characteristics of telephone handsets. A novel non-linear probabilistic transformation method will be derived and evaluated. The project will provide new solutions to some of the practical problems encountered by speaker verification researchers today. These solutions will potentially help security product manufacturers and telephone-based transaction service providers open up new markets.
Investigators: M.W. Mak and S.Y. Kung Funding Source: RGC Competitive Bids (PolyU 5131/02E)
Handset Mismatch Compensation for Robust Speaker Verification This project aims at (1) developing handset mismatch
compensation algorithms for speaker verification systems and (2) constructing
a Cantonese telephone speech corpus for speaker verification research.
Most channel compensation techniques assume that the telephone channel
can be approximated by a linear filter. However, telephone handsets typically
exhibit non-linear characteristics, suggesting that linear filtering addresses
only part of the problem. For this project, we propose a non-linear feature
mapper and a probabilistic channel equalizer that integrate the non-linear
handset characteristics into the channel compensation process.
Funding Source: RGC Competitive Bids (PolyU 5129/01E) Investigators: M.W. Mak and S.Y. Kung
Stochastic Model Adaptation for Robust Speech/Speaker Recognition The performance of current speech/speaker recognition
systems is often affected by the acoustic environment in which the systems
are operated. For example, in telephone-based speaker verification, speakers
tend to use different telephone handsets in different environments (e.g.
office and home). Variation in handset’ characteristics can introduce severe
speech variability even though the speech is uttered by the same speaker.
Therefore, it is very important for a speaker model to be able to accommodate
new acoustic environments. Furthermore, a practical speaker verification
system also needs to adapt itself in order to accommodate the change in
speaker characteristics over time. This is because speakers often sound
different from time to time, a phenomenon known as intra-speaker variability.
In this project, we propose to address the above issues by developing a
temporally adaptive probabilistic neural network. Training algorithms and
adaptation mechanisms, which will be based on our previous work on neural
network learning algorithms, will be derived. The network performance will
be evaluated using real-world data.
Investigators: M.W. Mak and W.C. Siu Funding Source: ASD Project
Acoustic and Voice Processing In recent years, speech recognition systems, internet telephony, and
video conferencing systems have been employed in a variety of real environments.
However, in many practical situations, ambient noise, reverberation, and
poor quality of microphones can degrade the performance of these systems
drastically. Therefore, it is necessary to develop enhancement algorithms
to improve the performance of these systems in adverse acoustic environment.
This project is to investigate microphone characteristics and the human
auditory system in order to enhance channel distorted, noisy speech for
robust speech recognition and teleconferencing.
Investigators: M.W. Mak and W.C. Siu Funding Source: ASD Project
Stochastic Matching Techniques for Robust Speaker Recognition Today’s speaker recognition systems in laboratory environment have reached a very high level of performance. However, several technical issues (such as channel robustness) need to be resolved before these systems can be commercialized. This project is to resolve these issues. In particular, this project aims to develop a set of model-based and feature-based transformation techniques for robust speaker recognition. Parameter estimation algorithms based on the maximum likelihood (ML) principles and maximum a posteriori (MAP) principles will be derived.
Investigators: M.W. Mak Funding Source: Central Research Grant
M.W. Mak's Homepagehttp://www.eie.polyu.edu.hk/~mwmak/mypage.htm |