A Natural Language Processing Approach to Grouping Students by Shared Interests

Main Article Content

Aravind Sasidharan Pillai

Abstract

This research introduces an automated, Natural Language Processing (NLP)-based method for assembling students into groups based on shared interests, extracted from personal narratives. For the experiment, each student in the class was required to compose several stories, ranging from 300 to 400 words, to facilitate the extraction of common phrases. These phrases were then used to cluster students according to shared interests revealed in their personal stories. The study applied the Rapid Automatic Keyword Extraction (RAKE) algorithm, an unsupervised and language-agnostic technique for extracting keywords. This method is distinguished by its independence from specific linguistic structures, rendering it broadly applicable across various types of documents and fields. The RAKE algorithm operates through several distinct phases: The first phase involves removing all stopwords and phrase delimiters from the text. This step isolates potential key phrases within the narrative text. Contrary to the traditional use of TF-IDF (Term Frequency-Inverse Document Frequency) metrics, RAKE employs a keyword score-matrix based on Word Frequency, Word Degree, and the Degree to Frequency Ratio. In the final phase, RAKE identifies the highest-scoring phrases among the phrase candidates. These phrases, representing the document's most significant themes or topics, are then used as the basis for student grouping, capturing the core interests manifest in the narratives.

Article Details

How to Cite
Pillai, A. S. (2022). A Natural Language Processing Approach to Grouping Students by Shared Interests. Journal of Empirical Social Science Studies, 6(1), 1–16. Retrieved from https://publications.dlpress.org/index.php/jesss/article/view/91
Section
Articles