Talk show segmentation system based on Twitter using K-medoids clustering algorithm

Kharisma Jevi Shafira Sepyanto; Yulison Herry Chrisnanto; Fajri Rakhmat Umbara

doi:10.24036/jptk.v3i3.15123

Authors

Kharisma Jevi Shafira Sepyanto Department of Informatic, Faculty of Science and Informatic, Universitas Jenderal Achmad Yani, INDONESIA
Yulison Herry Chrisnanto Department of Informatic, Faculty of Science and Informatic, Universitas Jenderal Achmad Yani, INDONESIA
Fajri Rakhmat Umbara Department of Informatic, Faculty of Science and Informatic, Universitas Jenderal Achmad Yani, INDONESIA

DOI:

https://doi.org/10.24036/jptk.v3i3.15123

Keywords:

twitter segmentation, k-medoids clustering, cosine similarity, data transformation, silhouette coefficient

Abstract

Innovations on a talk show on television can be a threat. Audience will be divided into groups so that it can make a downgrade rating program. Program ratings affect companies that will use advertising services. Television companies will go bankrupt. The biggest source of income is sales of advertising services. One way to overcome them can be analyzed in public opinion. The results of the analysis can provide information about the attractiveness of the community towards the program. But the analysis process takes a long time and can be done only by a competent person so another process is needed to get the results of the analysis that is fast and can be done by anyone. In this study using K-Medoids Clustering in the process of identifying public opinion. The clustering process known as unsupervised learning will be combined with the labeling process. The previous episode's tweet data will be labeled and then used to obtain the predicted labels from other cluster members. Before going through the clustering stage, the tweet data will go through the text preprocessing stage then transformed into a numeric form based on the appearance of the word. Transformation data will be clustered by calculating proximity using Cosine Similarity. Labels from the Medoids cluster will be used on unlabeled tweet data. The cluster results were tested using the Silhouette Coefficient method to get 0.19 results. However, this method successfully predicted public opinion and achieved an accuracy of 80%.

References

Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The Impact of Features Extraction on the Sentiment Analysis. Procedia Computer Science, 152, 341–348. https://doi.org/10.1016/j.procs.2019.05.008

Arora, P., Deepali, & Varshney, S. (2016). Analysis of K-Means and K-Medoids Algorithm for Big Data. Procedia Computer Science, 78, 507–512. https://doi.org/10.1016/j.procs.2016.02.095

Chrisnanto, Y. H., & Abdillah, G. (2015). Gambaran Umum Kemampuan Akademik Mahasiswa Unjani Dengan Algoritma Partitioning Around Medoids ( PAM ) Clustering. Seminar Nasional Ilmu Pengetahuan Dan Teknologi, 285–290.

Darnstadt, M., Meutzner, H., & Kolossa, D. (2014). Reducing the Cost of Breaking Audio CAPTCHAs by Active and Semi-supervised Learning. Proceedings - 2014 13th International Conference on Machine Learning and Applications, ICMLA 2014, 67–73. https://doi.org/10.1109/ICMLA.2014.16

Devika, M. D., Sunitha, C., & Ganesh, A. (2016). Sentiment Analysis: A Comparative Study on Different Approaches. Procedia Computer Science, 87, 44–49. https://doi.org/10.1016/j.procs.2016.05.124

Dos Santos, C. N., & Gatti, M. (2014). Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. International Conference on Computational Linguistics, 69–78. Ireland.

Guftar, M., Ali, S. H., Raja, A. A., & Qamar, U. (2015). A Novel Framework for Classification of Syncope Disease using K-Means Clustering Algorithm. SAI Intelligent Systems Conference, 127–132. https://doi.org/10.1109/IntelliSys.2015.7361135

Hutto, C. J., & Gilbert, E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. International AAAI Conference on Weblogs and Social Media, 216–225. https://doi.org/10.1210/en.2011-1066

Ji, W., Wang, R., & Ma, J. (2019). Dictionary-Based Active Learning for Sound Event Classification. Multimedia Tools and Applications, 78(3), 3831–3842. https://doi.org/10.1007/s11042-018-6380-z

Kui, X., Lv, H., Tang, Z., Zhou, H., Yang, W., Li, J., … Xia, J. (2020). TVseer: A Visual Analytics System for Television Ratings. Visual Informatics, 4(3), 1–11. https://doi.org/10.1016/j.visinf.2020.06.001

Li, S. S. (2020). Lifestyles, Technology Clustering, and the Adoption of Over-the-top Television and Internet Protocol Television in Taiwan. International Journal of Communication, 14, 2017–2035.

Pribadi, M. A., Yoedtadi, M. G., & Siswoko, K. H. (2017). Perspektif Praktisi Televisi Indonesia terhadap Konvergensi Televisi dan Internet dalam Persaingan Penyajian Informasi di Internet. Jurnal Muara Ilmu Sosial, Humaniora, Dan Seni, 1(1), 319. https://doi.org/10.24912/jmishumsen.v1i1.372

Ruiz, L. G. B., Pegalajar, M. C., Arcucci, R., & Molina-Solana, M. (2020). A Time-Series Clustering Methodology for Knowledge Extraction in Energy Consumption Data. Expert Systems with Applications, 160, 113731. https://doi.org/10.1016/j.eswa.2020.113731

Shuyang, Z., Heittola, T., & Virtanen, T. (2017). Active Learning for Sound Event Classification by Clustering Unlabeled Data. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 751–755. https://doi.org/10.1109/ICASSP.2017.7952256

Shuyang, Z., Heittola, T., & Virtanen, T. (2018). An Active Learning Method Using Clustering and Committee-Based Sample Selection for Sound Event Classification. 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings, 116–120. https://doi.org/10.1109/IWAENC.2018.8521336

Tan, Y. (2018). An Improved KNN Text Classification Algorithm Based on K-Medoids and Rough Set. International Conference on Intelligent Human-Machine Systems and Cybernetics, 1, 109–113. https://doi.org/10.1109/IHMSC.2018.00032

Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2016). Preprocessing Techniques for Text Mining -An Overview. International Journal of Computer Science & Communication Networks, 5(1), 7–16.