Bayesian Network — Modelling Using GeNIe

Emad Bin Abid
Analytics Vidhya
Published in
6 min readMar 5, 2020
Image source here

This article discusses a real-world use case (mock example) of Bayesian based modelling by predicting the validity of allegations for sexual harassment using Bayesian modelling.

In assessments of sexual harassment and predatory allegations, the background information is often over-looked. The results, therefore, may come out as a result of personal biases and favours. In this article, we provide a detailed Bayesian network in which the events (nodes) and their respective conditional and marginal probabilities are so formulated that while analysing the network, we take into account every possibility of event occurrence and its effect on the overall network results.

Introduction

A Bayesian network, Bayes network, Belief network, Bayes(ian) model or probabilistic Directed Acyclic Graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). 1 In our model, we use the power of Bayesian networks to predict whether the accused person is actually a harasser or not. We use Bayesian nets to perform probabilistic and statistical analysis for given evidences based on prior, marginal and posterior probabilities. We try to come up with simplistic yet effective model which predicts correct results and minimises the probability of possible biases which effect the true decisions.

Description

In this section we describe the scenario in detail. We study the scenario and identify the possible causal relationships which lead to a decisive situation. Sexual harassment is a sensitive topic and needs careful model building. Upon studying, we noticed some factors whose relationships with each other can lead to a decisive situation. Following are some of the uncertainties/events listed w.r.t. the scenario. The causal relationships among the uncertainties will be discussed in detail later in the section.

  1. bad profession: This event captures the information, in terms of probabilities, of whether the profession to which the person is associated is bad or not.
  2. good appearance: This variable quantities how well the person is in terms of beauty/ handsomeness.
  3. takes drugs: This variables captures how much the person is addicted to drugs.
  4. good financial status: This variables quantifies the financial status of the accused person. The greater the value of this variable is, the better the financial status of the person is.
  5. bad past conduct records: This variable captures the badness of previous conduct records in the person’s life.
  6. gender: This variables stores the information whether the gender of the person is male or female.
  7. not in relationship: This variable captures if the accused person is in a relationship or not.
  8. honesty: This variable quantifies how much the person is honest in daily routine life.
  9. bad family background: This variable quantifies the badness of family background in terms of business, moral and social values.
  10. bad past education: This variable stores the information whether the accused person studied from the institutions which have low social and moral rank.
  11. bad public places: This variable captures if the person visits bad public places or not.
  12. rude behaviour: This variable captures the rude nature and behaviour of the person.
  13. bad rumours: This variable quantifies how much bad rumours about the person are spread in the society.
  14. threatening personality: This variable captures the threatening personality of the accused person.
  15. bad social circle: This variable captures the information whether the person has a bad social company or not.
  16. less time spent with family and at work: This variable is one of the most important variables because it captures the unusual scenario. If the person doesn’t spend more time with family and work then there is a possibility that the person is involved in unusual activities. The variable quantifies this information.
  17. flirty nature: This variable quantifies how much the person is flirty with opposite gender.

Above mentioned are some of the most possible probable events and uncertainties which have direct relationship with the decision of accusing someone as a sexual harasser or not.

We now create the possible causal relationships using the above uncertainties to model the scenario such that we get accurate results.

Generally, there is a perception that men are more likely to be harassers. If we map this case in our model, we can see that gender variable has a high possibility of being caught as a predator. If there are preexisting bad rumours regarding a person then it could also be mapped as cause and effect relationship to determine the nature of decision. flirty nature and bad rumours can directly be mapped as cause and effect relationship. Generally, our model runs on the basis that the worse the variable values are, the more likely is the person accused. bad family background and less time spent with family and at work have a direct cause and effect relationship in the real world scenario. Similarly, drugs and threatening personality can have a direct cause and effect relationship and so on. In the next section, we define the structure of our model and assign the prior probabilities based on network analysis, surveys and secondary research.

Model

In this section we define the parameters of our model. In the first half of this section we design the basic network structure and in the later half we assign the prior probabilities and calculate marginal probabilities of each node/event.

Structure

In order to formulate the structure, we closely analyse the causal relationships we identified in the previous section. We note that having a bad profession is directly related to bad friend circle, threatening personality, rude behaviour, bad past conduct records and harassment. The change in probability of bad profession changes the probabilities of all these events. One main cluster is formed by the events good appearance, good financial status, flirty nature and not in relationship. All these entities directly effect the probability of harasser. Since there is this notion that men are more involved towards harassment therefore the probability of gender directly effects the probability of harasser. We also note that takes drugs directly effect honesty and harasser. These and more cause and effect relationships were studied and based on that the network is established. The network looks as follows.

Bayesian model generated using GeNIe

Probabilities

We calculate conditional and marginal probabilities using real world sense. By studying surveys and carrying out secondary research, we devise the prior probabilities of the network. In automated Bayesian networks, these prior probabilities are actually learnt by the network itself given that some preexisting labelled data is provided to the model.

Network Usage

In order to signify the questions for which this model can be used, we identify seven main questions or seven main types of sample inferences which we can make using this Bayesian model.

  1. Given the evidence that the person X has a bad profession, predict whether X is harasser or not.
  2. Given the evidence that a person X takes drugs, predict whether X is harasser or not.
  3. Given the evidence that a person X does not have bad profession and bad family background but has bad past conduct records, bad social circle and bad past education, predict whether X is harasser or not.
  4. Given the evidence that a person X is honest, visits bad public places, takes drugs, has a good appearance, has a flirty nature and is not in relationship, predict whether X is harasser or not.
  5. Given the evidence that a person X is male and has good financial status, predict whether X is harasser or not.
  6. Given the evidence that a person X is a harasser and does not have flirty nature, predict whether X is takes drugs or not.
  7. Given that a person X spends less time with family and at work, not in relationship, not honest but visits bad public spaces, predict whether X is male or not.

Conclusion

Accusing some one of sexual predatory is a critical task. The results are often effected by personal biases and favours. We provide alternate solution of predicting whether the person is a harasser or not. For this, we used Bayesian networks to develop a network based architecture and identified some uncertainties in the scenario. We assigned probabilities at each node using surveys and secondary research. We used GeNIe to model our network and perform analysis.

References

  1. The network structure (as shown in image above) is inspired from A Bayesian Decision-Support Tool for Child Sexual Abuse Assessment and Investigation.

--

--