Dr Soon Lay Ki
School of IT
+603 5514 6026
Dr. Soon graduated with a Ph.D (Engineering) in Web Engineering from Soongsil University, Korea in 2009. Her main research interest is data mining, particularly text mining. Previously, she was a Senior Lecturer at Faculty of Computing and Informatics (FCI), Multimedia University. Dr. Soon was also the Deputy Dean (Research and Innovation) of FCI from March 2016 to August 2018. To date, she has graduated six Ph.D and two Master students. Dr. Soon was a Research Fellow in Telekom Malaysia R&D on database optimisation project. Besides teaching in university, she has also conducted courses on Relational Database and NoSQL Database to corporate employees.
- Doctor of Philosophy in Engineering (Web Engineering), Soongsil University, Korea, 2009
- Master in Science (Database), Universiti Putra Malaysia, 2002
- Degree in Computer Science, Universiti Putra Malaysia, 1999
Member of International Professional Bodies
- IEEE Computer Society, Senior Member
My main research interest is data mining, particularly Web data mining (web page classification, social network analysis) and text mining (sentiment analysis). The research conducted during my Ph.D study was related to focused Web crawling, where URL signature was proposed to overcome the problem of processing redundant Web pages in the process of Web crawling. Besides, I was also involved in research on the impact of WWW to the community, particularly from the perspectives of social networking and sentiment analysis on social media. Together with the research collaborators from MMU, University of Malaya and abroad (Soongsil University, Korea and Institute of Business Administration, Pakistan), I am currently conducting research on opinion mining. Since 2013, a few research grants have been secured from TM R&D and MOHE-FRGS on opinion mining. The current active project is related to text mining for emotion detection, where the outcome would be investigated for the application in cyberbully detection.
Title: Emotion Mining Approach for Cyberbully Detection (MOHE FRGS; ongoing)
The proliferation of the Internet technologies has provided platform for many to participate, contribute and respond to contents in social media applications, such as Facebook, Instagram, and YouTube. This community, particularly young adults and teenagers spend a large amount of their time interacting with each other online. While these types of interaction are beneficial in most cases, these online users are exposed to offensive and vulgar dialect online, named cyberbully. Cyberbully occurs when users post insulting posts, often with profanity, which are directed to specific users. Identifying cyberbully at the earliest stage is very crucial to prevent any unbecoming negative impact on the victims. Predominantly, supervised text mining that works on labelled corpus has been used to for cyberbully detection. Labelled corpus are normally represented using words or word features where cyberbully-bound words are given extra weight. Nevertheless, these methods suffer the limitations of having sparse data (infrequent appearance of profanity), inaccurate word contexts (e.g. informal language) and dirty data (e.g. misspelled words). In fact, none of the existing technique takes into account the influence of culture in cyberbully. In light of this, this project attempts to adapt emotion mining techniques for detecting cyberbully. Emotion are social expressions of feelings which is influenced by culture. Emotion mining aims to identify the emotion expressed in the texts by using affective words, such as WordNetAffect, Affective Norms for English Words and EmotionNet. Cyberbully texts carry words that express and invite negative emotions such as anger, sadness and disgust. Understanding that cyberbully can be culturally-bound, this project aims to verify the hypothesis that emotion mining techniques can be adapted to detect cyberbully effectively. Upon the completion of this project, a model for cyberbully detection for Malaysian contexts will be proposed by adapting techniques used for emotion mining.
Title: An Opinion Summarization Holistic Model Using Extractive and Abstractive Text Summarization Techniques (MOHE FRGS; completed)
Reading online reviews about products and services before making any purchase has become a common practice among consumers nowadays. However, the enormous amount of online product reviews has made going through every single review and coming up with a decision a laborious task. A solution to this problem is opinion summarisation where one of the most important tasks is aspect identification. Aspects are attributes related to a target that is being discussed in a review. There are two types of aspects, namely explicit and implicit aspects. To date, aspect identification remains challenging where most proposed solutions only focus on explicit aspects. However, most sentences in reviews do not state their aspect explicitly. In this research, a model that is able to identify both explicit and implicit aspect has been proposed. The proposed model combines topic modelling, Natural Language Processing (NLP) techniques and dictionary-based techniques to identify both explicit and implicit aspects without requiring any training or annotated data. Besides, the impact of using part-of-speech (POS)-tagged documents for aspect identification via topic model has also been investigated. Instead of using raw reviews for topic modelling, the proposed approach uses POS-tagged reviews. The proposed model, topic dictionary model - direct combine (TDM-DC) achieves F1-score of 58.70% while topic dictionary model - topic extended with dictionary (TDM-TED) achieves F1-score of 58.34% out of various configurations. The performance of the two models outperform both the baseline topic model and dictionary approaches as well as existing approaches. One of the main contributions of this research is that the proposed model is domain-independent, which does not require any annotated training data. Besides, the performance of the proposed unsupervised natural language processing (NLP) techniques in aspect identification is comparable to both supervised techniques and machine learning approaches. More importantly, the proposed models are able to identify implicit aspects.
FIT3161 - Computer Science Project
- Emotion Mining Approach for Cyberbully, Soon Lay Ki, 5 August 2018 – 14 February 2020, MRSA-FRGS, RM60,800
- AuD2T: Automated Data to Text Generator, Soon Lay Ki, 15 August 2017 – Ongoing, MoHE FRGS, RM46,000
- Modelling Social Network based Civic Engagement Solutions for Dengue prevention in Malaysia, Soon Lay Ki, 1 August 2016 – 31 January 2019, MoHE FRGS, RM46,226.42
- Pruning and Federation Techniques to Expedite Distributed XML Query Processing, Soon Lay Ki, 1 January 2015 – 30 June 2018, MoHE FRGS, RM35,000
- An Opinion Summarization Holistic Model Using Extractive and Abstractive Text Summarization Techniques, Soon Lay Ki, 1 April 2015 – 31 January 2017, MoHE FRGS, RM74,000
Effective Distributed Query Processing
2015 - Present
Sini Govinda Pillai
Web Entity Representation using Structured Domain Knowledge and Social Context
2018 - Present
Khong Wai Howe
Aspect Identification using Combined Topic Model and Dictionary-based Approach
Suhaila Bt. Saee
Morphological System for Under Resourced-Languages using Hybrid Approach
Saravadee Sae Tan
Information Extraction using Semantic Relation Learning and Greedy Mapping
Chua Chong Chai
Structural Semantic Correspondence for Example-based Machine Translation
Dual Indexing and Mutual Summation Based Keyword Search Method for XML Databases
Goh Hui Ngo
Named Entity Recognition Based on Verbs Associated with Human Activities (VAHA)
Low Cost Multilingual Lexicon Construction for Under-Resourced Language
- Best Paper Award – INTERNATIONAL CONFERENCE ON INTELLIGENT AND INTERACTIVE COMPUTING, 2018
Selected Journal Publications:
- Chua, C. C., Lim, T. Y., Soon, L. K., Tang, E. K., & Ranaivo-Malançon, B. (2017). Meaning preservation in Example-based Machine Translation with structural semantics. Expert Systems with Applications, 78, 242-258.
- Tan, S. S., Lim, T. Y., Soon, L. K., & Tang, E. K. (2016). Learning to extract domain-specific relations from complex sentences. Expert Systems with Applications, 60, 107-117.
- Khong, W. H., Soon, L. K., & Goh, H. N. (2015). A Comparative Study of Statistical and Natural Language Processing Techniques for Sentiment Analysis. Jurnal Teknologi, 77(18).
- Goh, H. N., Soon, L. K., & Haw, S. C. (2015). Automatic discovery of person-related named-entity in news articles based on verb analysis. Multimedia Tools and Applications, 74(8), 2587-2610.
- Lim, L. T., Soon, L. K., Lim, T. Y., Tang, E. K., & Ranaivo-Malançon, B. (2014). Lexicon+ TX: rapid construction of a multilingual lexicon with under-resourced languages. Language Resources and Evaluation, 48(3), 479-492.
- Selvaganesan, S., Haw, S. C., & Soon, L. K. (2014). Effective XML Keyword Search Using Dual Indexing Technique. Information Technology Journal, 13(4), 643-651.
- Selvaganesan, S., Haw, S. C., & Soon, L. K. (2014). XDMA: A Dual Indexing and Mutual Summation Based Keyword Search Algorithm for XML Databases.International Journal of Software Engineering and Knowledge Engineering, 24(04), 591-615.
- Goh, H. N., Soon, L. K., & Haw, S. C. (2013). Automatic dominant character identification in fables based on verb analysis–Empirical study on the impact of anaphora resolution. Knowledge-Based Systems, 54, 147-162.
Selected Conference Publications (2014 onward):
- Govinda Pillai, S., Son L.-K., Haw S.C. (2018) Comparing DBPedia, Wikidata and YAGO for Web Information Retrieval. To appear in Springer Lecture Notes in Networks and Systems Series.
- Bakar, A. A., Soon, L. K., & Goh, H. N. (2017, November). An Exploratory Study on Latent-Dirichlet Allocation Models for Aspect Identification on Short Sentences. In International Conference on Computational Science and Technology (pp. 314-323). Springer, Singapore.
- Rakib, T. B. A., & Soon, L. K. (2018, March). Using the Reddit Corpus for Cyberbully Detection. In Asian Conference on Intelligent Information and Database Systems (pp. 180-189). Springer, Cham.
- Tee, J. Y. J., Soon, L. K., & Ting, C. Y. (2015, August). Structural Analyses of Malaysian Web and Host Graphs. In Future Internet of Things and Cloud (FiCloud), 2015 3rd International Conference on (pp. 443-450). IEEE.
- Subramaniam, S., Haw, S. C., & Soon, L. K. (2014, November). ReLab: A subtree based labeling scheme for efficient XML query processing. In Telecommunication Technologies (ISTT), 2014 IEEE 2nd International Symposium on (pp. 121-125). IEEE.
- Tan, S. S., Lim, T. Y., Soon, L. K., & Tang, E. K. (2014, November). Learning to Match Heterogeneous Structures using Partially Labeled Data. In Proceedings of the 5th International Workshop on Web-scale Knowledge Representation Retrieval & Reasoning (pp. 45-48). ACM.
- Tan, S. S., Soon, L. K., Lim, T. Y., Tang, E. K., & Loo, C. K. (2014, November). Learning the Mapping Rules for Sentiment Analysis. In Proceedings of the 5th International Workshop on Web-scale Knowledge Representation Retrieval & Reasoning (pp. 19-22). ACM.
- Saee, S., Soon, L. K., Lim, T. Y., Ranaivo-Malançon, B., Juk, J., & Tang, E. K. (2014, October). Automatic acquisition of morphological resources for Melanau language. In Asian Language Processing (IALP), 2014 International Conference on (pp. 203-206). IEEE.