TOXIC COMMENTS IDENTIFICATION SYSTEM FOR URDU TEXT
Keywords:
Toxic comments identification, TFIDF, CNN, Random Forest, TUC datasetAbstract
The rapid growth of online platforms has resulted in a prevalent problem of toxic remarks, which threatens user experience and social stability. Existing toxic comment recognition techniques frequently fail to give viable solutions in languages such as Urdu, where linguistic complexities and cultural context are crucial. This research describes a novel Toxic Comments Detection System developed specifically for Urdu language, with the goal of addressing these issues and encouraging healthy online interactions within Urdu-speaking communities. The proposed approach preprocesses and analyses Urdu text using advanced methods of machine learning such as natural language processing (NLP) and deep learning. Using the TFIDF model for feature extraction, the system captures the complex linguistic sophistication inherent in Urdu communication. The system determines the toxicity of incoming remarks using a classifier trained on labeled datasets from our purposed dataset (TUC) Toxic Urdu remarks dataset, assuring strong performance across diverse language situations. Its effectiveness in acknowledging hazardous remarks in Urdu text is demonstrated in an experimental evaluation. The random forest classifiers performed well on the benchmark dataset its accuracy is 94.17% By automatically identifying toxic content, the solution enables platform moderators and users to actively maintain a courteous respectful, and welcoming online environment, enabling constructive conversation and improve user experience in Urdu-language online communities. The Toxic Comments Detection System offered a viable alternative for combating toxic comments in Urdu literature. By employing modern machine learning techniques and considering Urdu's specific language and cultural traits, the system provides a dependable tool for reducing toxicity and encouraging healthy online relationships.Future research can concentrate on improving the system's performance and scalability, as well as expanding its application to other languages and cultural situations.