Dr Soon Lay Ki

Senior Lecturer
School of IT

soon.layki@monash.edu
+603 5514 6026
Room 2-4-27

Personal statement

Dr. Soon graduated with a Ph.D (Engineering) in Web Engineering from Soongsil University, Korea in 2009. Her main research interest is data mining, particularly text mining.   Previously, she was a Senior Lecturer at Faculty of Computing and Informatics (FCI), Multimedia University. Dr. Soon was also the Deputy Dean (Research and Innovation) of FCI from March 2016 to August 2018. To date, she has graduated six Ph.D and two Master students. Dr. Soon was a Research Fellow in Telekom Malaysia R&D on database optimisation project. Besides teaching in university, she has also conducted courses on Relational Database and NoSQL Database to corporate employees.

Academic degrees

  • Doctor of Philosophy in Engineering (Web Engineering), Soongsil University, Korea, 2009
  • Master in Science (Database), Universiti Putra Malaysia, 2002
  • Degree in Computer Science, Universiti Putra Malaysia, 1999

Professional affiliations

Member of International Professional Bodies

  • IEEE Computer Society, Senior Member

Research Interests

My main research interest is data mining, particularly Web data mining (web page classification, social network analysis) and text mining (sentiment analysis). The research conducted during my Ph.D study was related to focused Web crawling, where URL signature was proposed to overcome the problem of processing redundant Web pages in the process of Web crawling.  Besides, I was also involved in research on the impact of WWW to the community, particularly from the perspectives of social networking and sentiment analysis on social media.  Together with the research collaborators from MMU, University of Malaya and abroad (Soongsil University, Korea and Institute of Business Administration, Pakistan), I am currently conducting research on opinion mining.  Since 2013, a few research grants have been secured from TM R&D and MOHE-FRGS on opinion mining.  The current active project is related to text mining for emotion detection, where the outcome would be investigated for the application in cyberbully detection.

Research Projects

Title: Emotion Mining Approach for Cyberbully Detection (MOHE FRGS; ongoing)

The proliferation of the Internet technologies has provided platform for many to participate, contribute and respond to contents in social media applications, such as Facebook, Instagram, and YouTube. This community, particularly young adults and teenagers spend a large amount of their time interacting with each other online. While these types of interaction are beneficial in most cases, these online users are exposed to offensive and vulgar dialect online, named cyberbully. Cyberbully occurs when users post insulting posts, often with profanity, which are directed to specific users. Identifying cyberbully at the earliest stage is very crucial to prevent any unbecoming negative impact on the victims. Predominantly, supervised text mining that works on labelled corpus has been used to for cyberbully detection. Labelled corpus are normally represented using words or word features where cyberbully-bound words are given extra weight. Nevertheless, these methods suffer the limitations of having sparse data (infrequent appearance of profanity), inaccurate word contexts (e.g. informal language) and dirty data (e.g. misspelled words). In fact, none of the existing technique takes into account the influence of culture in cyberbully. In light of this, this project attempts to adapt emotion mining techniques for detecting cyberbully. Emotion are social expressions of feelings which is influenced by culture. Emotion mining aims to identify the emotion expressed in the texts by using affective words, such as WordNetAffect, Affective Norms for English Words and EmotionNet. Cyberbully texts carry words that express and invite negative emotions such as anger, sadness and disgust. Understanding that cyberbully can be culturally-bound, this project aims to verify the hypothesis that emotion mining techniques can be adapted to detect cyberbully effectively. Upon the completion of this project, a model for cyberbully detection for Malaysian contexts will be proposed by adapting techniques used for emotion mining.

Title: An Opinion Summarization Holistic Model Using Extractive and Abstractive Text Summarization Techniques (MOHE FRGS; completed) 

Reading online reviews about products and services before making any purchase has become a common practice among consumers nowadays. However, the enormous amount of online product reviews has made going through every single review and coming up with a decision a laborious task. A solution to this problem is opinion summarisation where one of the most important tasks is aspect identification. Aspects are attributes related to a target that is being discussed in a review. There are two types of aspects, namely explicit and implicit aspects. To date, aspect identification remains challenging where most proposed solutions only focus on explicit aspects. However, most sentences in reviews do not state their aspect explicitly. In this research, a model that is able to identify both explicit and implicit aspect has been proposed. The proposed model combines topic modelling, Natural Language Processing (NLP) techniques and dictionary-based techniques to identify both explicit and implicit aspects without requiring any training or annotated data. Besides, the impact of using part-of-speech (POS)-tagged documents for aspect identification via topic model has also been investigated. Instead of using raw reviews for topic modelling, the proposed approach uses POS-tagged reviews.  The proposed model, topic dictionary model - direct combine (TDM-DC) achieves F1-score of 58.70% while topic dictionary model - topic extended with dictionary (TDM-TED) achieves F1-score of 58.34% out of various configurations. The performance of the two models outperform both the baseline topic model and dictionary approaches as well as existing approaches. One of the main contributions of this research is that the proposed model is domain-independent, which does not require any annotated training data. Besides, the performance of the proposed unsupervised natural language processing (NLP) techniques in aspect identification is comparable to both supervised techniques and machine learning approaches. More importantly, the proposed models are able to identify implicit aspects.

Education

Unit Taught

FIT3161 - Computer Science Project

Local grants

  • Emotion Mining Approach for Cyberbully, Soon Lay Ki, 5 August 2018 – 14 February 2020, MRSA-FRGS, RM60,800
  • AuD2T: Automated Data to Text Generator, Soon Lay Ki, 15 August 2017 – Ongoing,  MoHE FRGS, RM46,000
  • Modelling Social Network based Civic Engagement Solutions for Dengue prevention in Malaysia, Soon Lay Ki, 1 August 2016 – 31 January 2019, MoHE FRGS, RM46,226.42
  • Pruning and Federation Techniques to Expedite Distributed XML Query ProcessingSoon Lay Ki, 1 January 2015 – 30 June 2018,  MoHE FRGS, RM35,000
  • An Opinion Summarization Holistic Model Using Extractive and Abstractive Text Summarization Techniques, Soon Lay Ki, 1 April 2015 – 31 January 2017, MoHE FRGS, RM74,000

Current supervision

PhD

Samini Subramanian  
Effective Distributed Query Processing
2015 - Present
Multimedia University

Sini Govinda Pillai  
Web Entity Representation using Structured Domain Knowledge and Social Context
2018 - Present
Multimedia University

Completed supervision

Masters

Khong Wai Howe  
Aspect Identification using Combined Topic Model and Dictionary-based Approach
2018
Multimedia University

PhD

Suhaila Bt. Saee 
Morphological System for Under Resourced-Languages using Hybrid Approach
2018
Multimedia University

Saravadee Sae Tan 
Information Extraction using Semantic Relation Learning and Greedy Mapping
20118
Multimedia University

Chua Chong Chai 
Structural Semantic Correspondence for Example-based Machine Translation
2017 
Multimedia University

Selvaganesan Sethuramalingam
Dual Indexing and Mutual Summation Based Keyword Search Method for XML Databases
2015 
Multimedia University

Goh Hui Ngo 
Named Entity Recognition Based on Verbs Associated with Human Activities (VAHA)
2014
Multimedia University

Lim LianTze
Low Cost Multilingual Lexicon Construction for Under-Resourced Language
2013
Multimedia University

Local Award/Recognition/Exhibition/Stewardship

  • Best Paper Award – INTERNATIONAL CONFERENCE ON INTELLIGENT AND INTERACTIVE COMPUTING, 2018

Journal

Saee, Suhaila; Bali, Ranaivo Malancon; Soon, Lay Ki; Lim, Tek Yong (2017) Crawling social media to create morphological resource of under-resourced language: Melanau language, Advanced Science Letters, (11503-11507), Volume: 23, Issue Number: 19366612, 10.1166/asl.2017.10316

Chua, Chong Chai; Lim, Tek Yong; Soon, Lay Ki; Tang, Enya Kong; Ranaivo-Malançon, Bali (2017) Meaning preservation in Example-based Machine Translation with structural semantics, Expert Systems with Applications, (242-258), Volume: 78, Issue Number: 09574174, 10.1016/j.eswa.2017.02.021

Tan, Saravadee Sae; Lim, Tek Yong; Soon, Lay Ki; Tang, Enya Kong (2016) Learning to extract domain-specific relations from complex sentences, Expert Systems with Applications, (107-117), Volume: 60, Issue Number: 09574174, 10.1016/j.eswa.2016.05.004

Subramaniam, Samini; Haw, Su Cheng; Soon, Lay Ki; Koong, Kok Leong (2016) RL-Frag: A framework for query processing over fragmented XML data stream, International Journal of Soft Computing, (289-294), Volume: 11, Issue Number: 18169503, 10.3923/ijscomp.2016.289.294

Goh, Hui Ngo; Soon, Lay Ki; Haw, Su Cheng (2015) Automatic discovery of person-related named-entity in news articles based on verb analysis, Multimedia Tools and Applications, (2587-2610), Volume: 74, Issue Number: 13807501, 10.1007/s11042-013-1618-2

Khong, Wai Howe; Soon, Lay Ki; Goh, Hui Ngo (2015) A comparative study of statistical and natural language processing techniques for sentiment analysis, Jurnal Teknologi, (155-161), Volume: 77, Issue Number: 01279696, 10.11113/jt.v77.6502

Lim, Lian Tze; Soon, Lay Ki; Lim, Tek Yong; Tang, Enya Kong; Ranaivo-Malançon, Bali (2014) Lexicon+TX: rapid construction of a multilingual lexicon with under-resourced languages, Language Resources and Evaluation, (479-492), Volume: 48, Issue Number: 1574020X, 10.1007/s10579-013-9253-0

Selvaganesan, S.; Haw, Su Cheng; Soon, Lay Ki (2014) XDMA: A dual indexing and mutual summation based keyword search algorithm for XML Data Bases, International Journal of Software Engineering and Knowledge Engineering, (591-615), Volume: 24, Issue Number: 02181940, 10.1142/S0218194014500223

Selvaganesan, S.; Haw, Su Cheng; Soon, Lay Ki (2014) Effective XML keyword search using dual indexing technique, Information Technology Journal, (643-651), Volume: 13, Issue Number: 18125638, 10.3923/itj.2014.643.651

Goh, Hui Ngo; Soon, Lay Ki; Haw, Su Cheng (2013) Automatic dominant character identification in fables based on verb analysis - Empirical study on the impact of anaphora resolution, Knowledge-Based Systems, (147-162), Volume: 54, Issue Number: 09507051, 10.1016/j.knosys.2013.09.009

Tee, Yong Jin; Soon, Lay Ki (2012) Tracing similarity within strongly connected components for intelligent web crawling, International Journal of Smart Home, (89-94), Volume: 6, Issue Number: 19754094, https://www.scopus.com/record/display.uri?eid=2-s2.0-84864364791&origin=resultslist

Goh, Hui Ngo; Kiu, Ching Chieh; Soon, Lay Ki; Ranaivo-MalanÇon, Bali (2011) Automatic ontology construction in fiction-based domain, International Journal of Software Engineering and Knowledge Engineering, (1147-1167), Volume: 21, Issue Number: 02181940, 10.1142/S0218194011005621

Book

Pillai, Sini Govinda; Soon, Lay Ki; Haw, Su Cheng (2019) Comparing DBpedia, Wikidata, and YAGO for web information retrieval, Lecture Notes in Networks and Systems, (525-535), Volume: 67, Issue Number: 23673370, 10.1007/978-981-13-6031-2_40

Bin Abdur Rakib, Tazeek; Soon, Lay Ki (2018) Using the Reddit Corpus for Cyberbully Detection, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (180-189), Volume: 10751 LNAI, Issue Number: 03029743, 10.1007/978-3-319-75417-8_17

Bakar, Ameer Abu; Soon, Lay Ki; Goh, Hui Ngo (2018) An Exploratory Study on Latent-Dirichlet Allocation Models for Aspect Identification on Short Sentences, Lecture Notes in Electrical Engineering, (314-323), Volume: 488, Issue Number: 18761100, 10.1007/978-981-10-8276-4_30

Khong, Wai Howe; Soon, Lay Ki; Goh, Hui Ngo; Haw, Su Cheng (2018) Leveraging part-of-speech tagging for sentiment analysis in short texts and regular texts, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (182-197), Volume: 11341 LNCS, Issue Number: 03029743, 10.1007/978-3-030-04284-4_13,

Saee, Suhaila; Soon, Lay Ki; Lim, Tek Yong; Ranaivo-Malançon, Bali; Tang, Enya Kong (2013) Semi-automatic acquisition of two-level morphological rules for Iban language, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (174-188), Volume: 7816 LNCS, Issue Number: 03029743, 10.1007/978-3-642-37247-6_15

Wong, Wein Pei; Chan, Ke Xin; Soon, Lay Ki (2012) Benchmarking the performance of support vector machines in classifying web pages, Communications in Computer and Information Science, (375-378), Volume: 295 CCIS, Issue Number: 18650929, 10.1007/978-3-642-32826-8_42

Tee, Jason Yong Jin; Soon, Lay Ki; Ranaivo-Malançon, Bali (2012) Finding web document associations using frequent pairs of adjacent words, Communications in Computer and Information Science, (360-363), Volume: 295 CCIS, Issue Number: 18650929, 10.1007/978-3-642-32826-8_39

Goh, Hui Ngo; Soon, Lay Ki; Haw, Su Cheng (2012) VAHA: Verbs associate with human activity - A study on fairy tales, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (313-322), Volume: 7345 LNAI, Issue Number: 03029743, 10.1007/978-3-642-31087-4_33

Goh, Hui Ngo; Soon, Lay Ki; Haw, Su Cheng (2012) Automatic identification of protagonist in fairy tales using verb, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (395-406), Volume: 7301 LNAI, Issue Number: 03029743, 10.1007/978-3-642-30220-6_33

Soon, Lay Ki; Lee, Sang Ho (2007) Explorative data mining on stock data -experimental results and findings, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (562-569), Volume: 4632 LNAI, Issue Number: 03029743, https://www.scopus.com/record/display.uri?eid=2-s2.0-38049064106&origin=resultslist

Conference

Kannan, Rathimala; Govindasamy, Menagaeswary A.P.; Soon, Lay Ki; Ramakrishnan, Kannan (2019) Social media analytics for dengue monitoring in Malaysia, Proceedings - 8th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2018, (105-108), 10.1109/ICCSCE.2018.8685028

Subramaniam, Samini; Haw, Su Cheng; Soon, Lay Ki (2017) DGReLab+: Improving XML Path Query Processing by Avoiding Buffering Irrelevant Results, Procedia Computer Science, (804-811), Volume: 115, 10.1016/j.procs.2017.09.157,

Tee, Jason Yong Jin; Soon, Lay Ki; Ting, Choo Yee (2015) Structural analyses of Malaysian Web and host graphs, Proceedings - 2015 International Conference on Future Internet of Things and Cloud, FiCloud 2015 and 2015 International Conference on Open and Big Data, OBD 2015, (443-450), 10.1109/FiCloud.2015.11

Subramaniam, Samini; Haw, Su Cheng; Soon, Lay Ki (2015) ReLab: A subtree based labeling scheme for efficient XML query processing, ISTT 2014 - 2014 IEEE 2nd International Symposium on Telecommunication Technologies, (121-125), 10.1109/ISTT.2014.7238189

Chong, Wei Yen; Selvaretnam, Bhawani; Soon, Lay Ki (2014) Natural Language Processing for Sentiment Analysis: An Exploratory Analysis on Tweets, Proceedings - 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, ICAIET 2014, (212-217), 10.1109/ICAIET.2014.43

Tan, Saravadee Sae; Lim, Tek Yong; Soon, Lay Ki; Tang, Enya Kong (2014) Learning to match heterogeneous structures using partially labeled data, International Conference on Information and Knowledge Management, Proceedings, (45-48), Volume: 2014-November, 10.1145/2663792.2663797

Tan, Saravadee Sae; Soon, Lay Ki; Lim, Tek Yong; Tang, Enya Kong; Loo, Chu Kiong (2014) Learning the mapping rules for sentiment analysis, International Conference on Information and Knowledge Management, Proceedings, (19-22), Volume: 2014-November, 10.1145/2663792.2663796

Saee, Suhaila; Soon, Lay Ki; Lim, Tek Yong; Ranaivo-Malancon, Bali; Juk, Jovianna; Tang, Enya Kong (2014) Automatic acquisition of morphological resources for Melanau language, Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014, (203-206), 10.1109/IALP.2014.6973523

Lim, Lian Tze; Soon, Lay Ki; Lim, Tek Yong; Tang, Enya Kong; Ranaivo-Malançon, Bali (2013) Context-dependent multilingual lexical lookup for under-resourced languages, ACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, (294-299), Volume: 2, https://www.scopus.com/record/display.uri?eid=2-s2.0-84907308799&origin=resultslist

Soon, Lay Ki; Ku, Yee Ern; Lee, Sang Ho (2012) Web crawler with URL signature - A performance study, Conference on Data Mining and Optimization, (127-130), Issue Number: 21556938, 10.1109/DMO.2012.6329810

Tee, Jason Yong Jin; Soon, Lay Ki; Ting, Choo Yee (2012) WebSum: Enhanced SumBasic algorithm for Web site summarization, Conference on Data Mining and Optimization, (137-142), Issue Number: 21556938, 10.1109/DMO.2012.6329812

Alowais, Mohammed Ibrahim; Soon, Lay Ki (2012) Credit card fraud detection: Personalized or aggregated model, Proceedings - 2012 3rd FTRA International Conference on Mobile, Ubiquitous, and Intelligent Computing, MUSIC 2012, (114-119), 10.1109/MUSIC.2012.27

Dolatabadi, Hossein; Soon, Lay Ki; Shirazi, Mahdi Negahi; Mohammadi, Mohammad (2012) Clustering users in micro blogging social networks using probabilistic topic modeling - A framework, Proceedings - 12th International Conference on Computational Science and Its Applications, ICCSA 2012, (113-116), 10.1109/ICCSA.2012.28

Saee, Suhaila; Soon, Lay Ki; Lim, Tek Yong; Ranaivo-Malançon, Bali; Tang, Enya Kong (2012) From raw text to morphological rules for Iban morphological analyser, Proceedings - 2012 International Conference on Asian Language Processing, IALP 2012, (21-24), 10.1109/IALP.2012.71

Soon, Lay Ki; Lee, Sang Ho (2010) Classifying web pages using information extraction patterns - Preliminary results and findings, Proceedings of the 6th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2010, (195-202), 10.1109/SITIS.2010.42

Yean, Lee Chee; Ming, Lim Tong; Soon, Lay Ki (2010) Extended balancing ontological and operational factors in refining multiagent neighborhood using ACO, Proceedings 2010 International Symposium on Information Technology - System Development and Application and Knowledge Society, ITSim'10, (1398-1400), Volume: 3, 10.1109/ITSIM.2010.5561649

Soon, Lay Ki; Hwang, Kyu Baek; Lee, Sang Ho (2010) An empirical study on harmonizing classification precision using IE patterns, 2nd International Conference on Software Engineering and Data Mining, SEDM 2010, (251-256), https://www.scopus.com/record/display.uri?eid=2-s2.0-77956518408&origin=resultslist

Soon, Lay Ki; Lee, Sang Ho (2008) Enhancing URL normalization using metadata of web pages, Proceedings of the 2008 International Conference on Computer and Electrical Engineering, ICCEE 2008, (331-335), 10.1109/ICCEE.2008.112

Soon, Lay Ki; Lee, Sang Ho (2008) Identifying equivalent URLs using URL signatures, SITIS 2008 - Proceedings of the 4th International Conference on Signal Image Technology and Internet Based Systems, (203-210), 10.1109/SITIS.2008.21

Soon, Lay Ki; Lee, Sang Ho (2007) An empirical study of similarity search in stock data, Conferences in Research and Practice in Information Technology Series, Volume: 84, Issue Number: 14451336, https://www.scopus.com/record/display.uri?eid=2-s2.0-84860376679&origin=resultslist