Ethical Alignments in AI: How Training Data Shapes Moral Foundations
Introduction
In the rapidly evolving field of artificial intelligence, understanding the ethical alignment of models is crucial. The Moral Foundations Theory (MFT) provides a robust framework to evaluate the moral underpinnings of AI models. This post delves into the technical evaluation of several prominent AI models using the Moral Foundations Questionnaire (MFQ), comparing these results to human cultural benchmarks, and discussing the implications for AI alignment and applications. Evaluating AI through ethical lenses ensures these technologies align with diverse human values, enhancing their applicability and acceptance.
AI models can exhibit significantly different moral values, largely influenced by their training data. For instance, the Jais-30b model, primarily trained on Arabic data, tends to emphasize loyalty, authority, and purity principles. In contrast, the Llama-3–70b model prioritizes fairness and harm reduction. This variation highlights how training data shapes the ethical alignments of AI models, making it crucial to consider these differences when selecting models for specific applications.
Methodology
Moral Foundations Theory
Moral Foundations Theory (MFT) was developed by social psychologists Jonathan Haidt and Jesse Graham in the early 2000s to understand the innate psychological bases of human morality. The theory posits several universal moral foundations, such as care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/degradation, shaped by cultural contexts. MFT explains the variability in morality across cultures and political groups while maintaining some universal elements. The credibility of the theory is bolstered by extensive empirical research and cross-cultural studies, although it has faced criticism for potentially oversimplifying complex moral landscapes and neglecting other important moral dimensions.
The Moral Foundations Questionnaire (MFQ)
The Moral Foundations Questionnaire (MFQ) is a psychometric tool developed to measure individuals’ moral priorities based on MFT. Created by Jonathan Haidt, Jesse Graham, and colleagues, the MFQ assesses the importance people place on different moral foundations. The questionnaire includes two parts:
- Part 1 (Relevance): Measures how relevant participants find various moral considerations (e.g., “Whether or not someone was cruel”).
- Part 2 (Judgment): Assesses agreement with specific moral judgments (e.g., “I am proud of my country’s history”).
Responses help identify the prominence of various moral foundations, such as care, fairness, loyalty, authority, and sanctity. The MFQ is widely used and validated across diverse cultural and political contexts, making it a reliable method for exploring the complexities of human morality.
MFT posits five primary moral foundations:
- Harm/Care: Sensitivity to suffering and compassion for others.
- Fairness/Reciprocity: Concerns about justice, rights, and equality.
- Loyalty/Betrayal: Allegiance to group or nation.
- Authority/Respect: Adherence to traditions and hierarchical structures.
- Purity/Sanctity: Aversion to contamination and the sacred.
Viewing the Results: There are three models for interpreting MFQ results:
- Two Correlated Factors Model: Groups moral foundations into individualizing (care/harm, fairness/cheating) and binding (loyalty/betrayal, authority/subversion, sanctity/degradation) categories.
- Five Correlated Factors Model: Directly corresponds to the five distinct moral foundations posited by MFT.
- Hierarchical Model: Organizes the five individual moral foundations into higher-order categories of individualizing and binding, reflecting both the diversity and commonality of moral reasoning across contexts.
AI Models and Data Collection
The AI models evaluated include Gemini-1.5, Llama-3–70b, qwen1.5–110b, yi-large-preview, Jais-30b, and GPT-4. These models were assessed using the MFQ to derive scores for each moral foundation. The dataset includes:
- Gemini-1.5-flash: Developed by Google in the USA, this lightweight multimodal model is optimized for high-frequency tasks with a 1 million token context window, utilizing knowledge distillation from the larger Gemini 1.5 Pro model.
- Llama-3–70b-instruct: Created by Meta in the USA, Llama-3 is a large-scale language model with 70 billion parameters, trained on diverse internet text to provide instructional guidance and respond to user queries.
- Qwen1.5–110b-chat: Developed in China by Alibaba’s DAMO Academy, this model features 110 billion parameters and is designed for conversational AI, leveraging a wide range of internet-based data sources.
- yi-large-preview: A large language model developed in China, details on its specific size and training data are less publicly available, but it is part of a series of advanced AI models aimed at providing robust language understanding and generation.
- Jais-30b: Created in the UAE, this 30 billion parameter model focuses on Arabic language processing, trained on a comprehensive dataset of Arabic texts to support various applications in the region.
- GPT-4: Developed by OpenAI in the USA, GPT-4 is a large-scale language model known for its impressive 175 billion parameters, trained on a diverse dataset from the internet to excel in a wide range of language tasks.
Methodology
The Moral Foundations Questionnaire 1 (MFQ-30) was administered to each selected AI model at five different temperature settings (0, 0.3, 0.5, 0.7, 1). Temperature settings in AI determine the randomness of the model’s responses, with lower values resulting in more deterministic outputs and higher values producing more varied responses. This approach aimed to capture a range of potential moral alignments for each model. The resulting outputs were recorded and analyzed to identify patterns in how these models scored across different moral foundations.
High-Level Patterns Observed:
- Gemini-1.5: Demonstrates balanced scores across all foundations, indicating a versatile and culturally inclusive model.
- Llama-3–70b: Scores high on Harm and Fairness, reflecting strong alignment with Western liberal values.
- qwen1.5–110b: Similar to Llama-3–70b but with more balanced scores, suggesting a broader cultural training dataset.
- yi-large-preview: Scores lower on Harm and Fairness, which may indicate a focus on other moral dimensions or a different cultural training bias.
- Jais-30b: Scores lowest in most foundations but highest in Authority, reflecting a potential alignment with hierarchical or traditional values.
- GPT-4: Scores high on Harm and Fairness but low on Loyalty, Authority, and Purity, suggesting a secular, harm-based ethical framework.
These results show that certain values, like Harm, are more universally represented across models, whereas values like Purity are less consistently observed.
Detailed discussion on mean and standard deviation:
The two most strikingly different models are shown below:
Jais-30b model, primarily trained on Arabic data, tends to emphasize loyalty, authority, and purity principles. In contrast, the Llama-3–70b model prioritizes fairness and harm reduction.
Comparative Analysis with Human Cultural Benchmarks
Research on MFT in various cultures provides a benchmark for evaluating AI model alignment. Studies indicate significant cultural variability:
- Western Cultures: Greater tendency to emphasize Harm and Fairness.
- Middle Eastern and South Asian Cultures: Higher scores in Loyalty, Authority, and Purity.
- Gender and Religion: Women and religious individuals often score higher on Purity and Fairness.
- Political Ideology: Conservatives emphasize Loyalty, Authority, Purity, and; liberals value Care and Fairness.
Detailed Evaluation of AI Models
- Gemini-1.5 and Llama-3–70b: Emphasize Harm and Fairness, suitable for contexts requiring compassion and justice.
- qwen1.5–110b: Shows higher Loyalty and Authority, suggesting suitability for group cohesion and respect for authority.
- yi-large-preview: Higher Purity scores, indicating suitability for contexts emphasizing traditional values.
- Jais-30b: Mixed scores, reflecting variable emphasis on Authority and Loyalty.
- GPT-4: High in Harm and Fairness, suitable for contexts emphasizing harm reduction and fairness, less suitable for traditional or hierarchical contexts.
Implications for AI Alignment and Use
Difference in moral values may indicate different strengths of application, for example some models may fair better in different applications:
HR and Content Moderation
- Llama-3–70b and qwen1.5–110b: Their emphasis on fairness and harm reduction aligns with ethical standards in HR and content moderation, ensuring equitable treatment and minimizing harm.
Legal and Compliance
- qwen1.5–110b and Gemini-1.5: Balanced and fair approaches make these models suitable for ensuring adherence to legal standards and equitable treatment.
Education
- Llama-3–70b and GPT-4: Emphasis on fairness and harm reduction supports inclusive and ethical educational content, though GPT-4’s lower scores on traditional values may limit its use in conservative settings.
Technical Discussion on AI Alignment
Training Data and Cultural Biases
The composition of training data significantly influences AI model alignment. For example, Llama-3–70b and qwen1.5–110b exhibit high scores in Harm and Fairness, which may reflect strong Western values in their training datasets. These models prioritize compassion and justice, aligning with liberal Western ethical standards. In contrast, GPT-4’s lower scores in Loyalty and Authority suggest a training emphasis on harm reduction and fairness, which could stem from a more secular and individual-focused dataset. This highlights how the cultural and ethical contexts embedded in training data shape the moral foundations of AI models.
Alignment Processes
Alignment processes, such as reinforcement learning from human feedback (RLHF), play a crucial role in shaping AI model ethics. Annotators prioritizing minimizing harm and promoting fairness can result in high scores in these dimensions, as observed in GPT-4. However, this focus might lead to lower scores in other moral foundations like Loyalty, Authority, and Purity. Such alignment processes ensure that AI models align with specific ethical standards, but they can also introduce biases that underrepresent other moral perspectives.
Cultural and Language Diversity
Models trained on multilingual and multicultural datasets, like Gemini-1.5, exhibit balanced moral foundations. This balance indicates exposure to a wide range of ethical standards, allowing the model to adapt to various cultural contexts. By incorporating diverse training data, these models can better navigate and respect the moral values of different cultures, enhancing their applicability and acceptance in global contexts.
Conclusion
The exploration of AI models’ moral foundations underscores the profound influence of training data and alignment processes on their ethical orientations. This analysis reveals that models like Llama-3–70b and qwen1.5–110b, likely trained on datasets imbued with Western values, prioritize Harm and Fairness. This makes them particularly suitable for applications that emphasize compassion and justice, such as HR and content moderation. In contrast, GPT-4’s lower scores in Loyalty and Authority indicate a focus on harm reduction and fairness, aligning it with secular and individual-centric ethical frameworks.
Gemini-1.5, with its balanced moral foundations, exemplifies the benefits of training on multilingual and multicultural datasets. This diversity equips the model to navigate various cultural contexts effectively, enhancing its applicability in global settings. Meanwhile, Jais-30b, primarily trained on Arabic data, emphasizes Loyalty, Authority, and Purity, making it suitable for environments where these values are paramount.
This study credits the foundational work of social psychologists Jonathan Haidt and Jesse Graham on Moral Foundations Theory, and the innovative applications by AI researchers in model alignment and ethical AI development. The findings highlight the critical role of training data in shaping AI ethics and underscore the necessity for deliberate and diverse data selection in developing AI models.
In conclusion, by understanding and leveraging the moral foundations of AI models, we can better align these technologies with human values, ensuring their ethical application across various fields. This research not only contributes to the field of AI ethics but also provides practical insights for developers and policymakers striving to create more ethically aligned AI systems.
References
- Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations theory: The pragmatic validity of moral pluralism. Advances in Experimental Social Psychology, 47, 55–130.
- Curry, O. S., Mullins, D. A., & Whitehouse, H. (2019). Is it good to cooperate? Testing the theory of morality-as-cooperation in 60 societies. Current Anthropology, 60(1), 47–69.
- Dogtuyol, B., Alper, S., & Yilmaz, O. (2019). The five-factor model of the moral foundations theory is stable across WEIRD and non-WEIRD cultures. Personality and Individual Differences, 147, 112–122.
- Atari, M., Haidt, J., Graham, J., Koleva, S., Stevens, S. T., & Dehghani, M. (2019). Morality beyond the WEIRD: How the nomological network of morality varies across cultures. Social Psychological and Personality Science, 10(7), 903–913.
- Nunes, J. L., Almeida, G. F. C. F., de Araujo, M., & Barbosa, S. D. J. (2023). Are large language models moral hypocrites? A study based on moral foundations. Department of Informatics, Pontifical Catholic University of Rio de Janeiro; FGV Direito Rio; Insper Institute of Education and Research; Federal University of Rio de Janeiro & State University of Rio de Janeiro.