User:Biasbot AS

Biasbot AS
This user is a bot
(talk · contribs)
	B-Bot digging out stigma from the seas of Wikipedia.
Status	Semi-active
Operator	asoundd
Approved?	Semi (Not approved for direct editing)
Flagged?	No
Task(s)	Neutralizing bias and stigma
Automatic or manual?	Semi-automatic
Programming language(s)	C++, PHP, Python
Exclusion compliant?	No

Introduction[edit]

Biasbot AS is a bot that attempts to somewhat semi-automate the enforcement of Wikipedia's neutral point of view policy. It scans for sentences in articles that contain explicit and implicit forms of bias/stigma before offering neutralized alternatives.

Detection Algorithm[edit]

Model[edit]

Biasbot AS utilizes a deep learning and unsupervised learning techniques to properly identify sentences with stigma. At the core of any natural language processing model is the corpus size. A considerable lack of publicly labeled datasets concerning stigma and bias motivates the need to find a model that maintains accuracy even with a small corpus.

Biasbot AS is an instance of the Bidirectional Encoder Representations from Transformers model or BERT. BERT is pre-trained on millions of words from Wikipedia and BooksCorpus, making it the perfect model for this task; only an additional outer fine-tuning layer is necessary. The model was pretrained through two tasks: Masked Language Model (MLM), where word embeddings are generated as a result of predicting “masked” words, and a next sentence prediction (NSP) task, where the model attempts to predict the next sentence to understand longer-term dependencies across sentences. Both the cased and uncased versions of BERT_BASE are used.

Mechanism[edit]

Due to the lack of moderation of layer tuning that can result in false positives, Biasbot AS only detects bias in articles and offers suggestions; the recommendations are reviewed by real editors, so it is does not yet provide direct edits. The bot is thus semi-automated.

Dataset[edit]

Due to the vast scope of bias (i.e. what variants of stigma will we tackle?) and a lack of available data, Biasbot uses a hand-labeled dataset that solely focuses on mental health stigma. Therefore, the bot only tracks articles in Category:Mental health and its subcategories.

The dataset is relatively simple and is consisted entirely of sentences labeled either as "stigma" or "no stigma."

Artificial Neural Network[edit]

As with any other NLP model, Biasbot AS and its BERT_BASE model makes use of an artificial neural network. During back propagation of the model in fine-tuning, the loss is computed using the sparse categorical, cross-entropy function to reduce the loss function.

Upon fine tuning, dropout regularization with a probability factor of 0.1 was implemented to prevent overfitting.

Threshold Calculation[edit]

Biasbot AS makes use of several activation functions and determines whether a sentence contains stigma through a numerical threshold. The model uses the GELU function as an activation function for a classifier layer. The function maps the input scores into the output probabilities, resulting in 1 as the sum of probabilities. The function can be approximated as:

$GELU\left(x\right)=0.5x\left(1+\tanh \left[{\sqrt {\frac {2}{\pi }}}\left(x+0.44715x^{3}\right)\right]\right)$