A Partition Based Feature Selection Approach for Mixed Data Clustering
Abstract:
A feature contains a measured value. A value can be of an atomic type like categorical (text only) or numerical (number only). Often in real-world environment, data consist of both categorical and numerical valued features. Such datasets are called mixed data. In literature, several clustering methods exist for analysing numerical or categorical data. There are a few clustering algorithms for handling mixed data. Clustering mixed data is dependent on the dissimilarities of its constituent features. This dependability on data types may influence a clustering solution. Assigning appropriate weights to the feature, such that it diminishes the data type influence may improve the performance of a partition clustering algorithm. A novel weighted feature selection approach on nominal features is proposed, for a partitional clustering algorithm that can handle mixed data. The proposed approach exploits the pre-processing nature of the partition clustering algorithm in the selection of weight assignment for nominal features. The benefits of weighting are demonstrated on both simulated and real-world mixed datasets. The experimental results yield better results for weighted nominal features in mixed data clustering.
Speaker's Profile:
Dr. Ashish has been working since year 1999. His work experience primarily lies in two domains, namely education sector and Information Technology (IT) sector. As an educator his last association was with Asia Pacific University where he taught data science subjects and R programming. He completed his research on mixed data clustering at the Department of Information Systems, University of Malaya. His research supervisor was Assoc. Prof. Dr. Maizatul A. Ismail. In the IT sector, he has worked with industrial tech giants like Hewlett Packard (HP), Dell and IBM. His last association was with Micron Technology, Penang where he was employed as a senior data scientist and worked on a video analytics project for smart manufacturing of solid state drives.