About 37% of young people between the ages of 12 and 17 have been bullied online. 30% have had it happen more than once.
Photo by Adrian Swancar on Unsplash
Cyber Bullying is a priority for every social media company at the moment from Google to Twitter. Extending past Social Media companies, every organization with a website that allows users to comment takes cyberbullying very seriously, from schools to company websites. It is important that the safety of workers and users of the website are taken seriously and protected.
Cyberbullying occurs on every platform and in every single country in the world. As a company, it is your duty to ensure that nasty comments are flagged and taken off the platform. To be able to do that you need a deep learning algorithm that can detect when a comment is toxic and its class(es) of toxicity.
That is exactly what my web app does: you upload a comment and it tells you if it is clean or if it is toxic and its class(es) of toxicity [toxic, severe_toxic, obscene, threat, insult, and identity_hate].
Check it out here: https://comment-toxicity-classifier.onrender.com/
The data was sourced from Kaggle: https://www.kaggle.com/datasets/julian3833/jigsaw-toxic-comment-classification-challenge
I cleaned the data by removing hyperlinks, special characters and numbers.
I used the Tf-Idf Text Vectorizer, which helps us to vectorize our input data into a specific number of tokens. I chose it because it prioritizes important words while penalizing commonly occuring words.
To gain a better understanding of how the Tf-Idf Vectorizer works:
https://www.geeksforgeeks.org/understanding-tf-idf-term-frequency-inverse-document-frequency/
3-layer neural network:
- 2 hidden layers with the ReLU activation function.
- Output layer with a Sigmoid activation function.
model = keras.Sequential([
layers.Dense(100, input_shape=(6000,), activation = "relu"),
layers.Dense(50, activation = "relu"),
layers.Dense(6, activation = "sigmoid")
])
To ensure that the model could perform multi-classification, I compiled the model with a binary_crossentropy loss.
model.compile(
optimizer = "adam",
loss = "binary_crossentropy",
metrics = ["binary_accuracy"]
)
The model performed well with a
- Training Accuracy: 99.9%
- Testing Accuracy: 93.2%
The model was deployed via a Flask app and hosted using Render.