AI’s Role in the Boardroom: Predicting Director Success with Data Science
Descriptive Analytics for Board Director Performance
Introduction
In this study, we employ an approach akin to the strategy illustrated in the movie Moneyball, transposed to the field of corporate governance. Like how Moneyball used player statistics to inform baseball team selection, we leverage a variety of metrics to understand the characteristics of successful board directors.
Our study’s primary objective is not to predict future board director performance, but to understand patterns and relationships that could relate director characteristics and influence configurations to company performance. This could involve identifying links between specific directors and performance, or examining how influence configurations (e.g., a board with consolidated influence versus one with dispersed influence) may correlate with performance.
Through this research, our ultimate aim is to offer valuable insights to organizations that could potentially help optimize their board selection process, thereby enhancing their corporate governance effectiveness. Our unique data on director influence sets our study apart, offering new perspectives in understanding the relationship between directors and company performance.
Methodology Overview
Dataset Description
Our dataset is a meticulously crafted amalgamation of data sourced from Free Float, LLC, supplemented with third-party data from MSCI ESG Research, ESGauge, S&P Global, and Preqin. It comprises 86,772 rows and 41 columns, offering a granular view of director profiles, their backgrounds, and professional influence.
Understanding the Calculation of Target Feature
Performance: Category’ is a measure we use to rank company directors based on their influence on the company’s success. The ‘Performance: Category’ is essentially a rating system for company directors based on their influence on company’s success. Here is a step-by-step explanation:
- Company Comparison: Each company’s performance is compared with its peers in the industry.
- Director Influence: The success or failure of the company is then attributed to each director, according to their influence. This results in ‘wins’ or ‘losses’ for each director.
- Standardization: These ‘wins’ and ‘losses’ are adjusted to a standard scale.
- Win Rate Calculation: A ‘win rate’ is then calculated for each director based on these adjusted scores.
- Director Ranking: The ‘win rate’ of each director is compared with all others.
- Performance Category Assignment: Based on their ranking, directors are placed in a performance category.
For example: If a director’s win rate > 90% of all directors: Performance Category = ‘Hall of Famer’
Model Development Process
- Data Preprocessing: Initiated with thorough data cleaning, missing values were tackled using apt imputation methods and outliers were treated considering their potential impact.
- Exploratory Data Analysis (EDA): This stage involved understanding data structure, observing variable relationships, and identifying potential anomalies, providing crucial insights for feature selection.
- Model Selection & Training: We compared several classification models, selected the best-suited model, and performed model training using an optimal training-testing set partitioning. Cross-validation was employed to prevent overfitting.
- Model Evaluation: Models were evaluated on a mix of accuracy, precision, recall, and F1 score for a balanced and comprehensive performance assessment.
- Feature Selection and Model Optimization: We employed a combination of statistical and feature importance methods to identify the most influential features impacting the ‘PERFORMANCE: CATEGORY’. Leveraging hyperparameter tuning and feature scaling, we were able to iteratively refine and enhance the performance of the model.
- Results Interpretation: The final step involved analyzing the model’s predictions and understanding the contribution of different features in the prediction process.
Data Analysis
Exploratory Data Analysis
Summary Statistic: Continuous variables
In our data, we found missing values in continuous variables, and we chose imputation methods based on each variable’s distribution. For normally distributed variables like age, we used mean imputation. For skewed data such as network power and influence, we used median imputation. Variables with over 80% missing values, such as diversity and another performance metric, were excluded to prevent unreliable and biased results. We also selected only one performance feature for the target variable to avoid multicollinearity, improving our model’s performance and interpretability.
Handling the outlier for numerical features
age_mean = df['DIRECTOR: AGE'].mean()
df['DIRECTOR: AGE'].fillna(age_mean, inplace= True)
inf5_median = df['INFLUENCE: 5YR MEDIAN'].median()
df['INFLUENCE: 5YR MEDIAN'].fillna(inf5_median, inplace= True)
Summary Statistic: Categorical features
We analyzed the distribution of categorical variables, finding an imbalanced target variable with most instances unrated and fewest as hall of famers. We also found gender disparity, with male directors outnumbering females significantly. Despite these imbalances, we didn't apply data balancing techniques initially, aiming to evaluate our models' performance on the original dataset first and consider balancing techniques if needed later.
Handling the outlier for categorical features
sector_mode = df['COMPANY: SECTOR'].mode()
df['COMPANY: SECTOR'].fillna(sector_mode[0], inplace= True)
gender_mode = df['DIRECTOR: GENDER'].mode()
df['DIRECTOR: GENDER'].fillna(gender_mode[0], inplace= True)
Encoding the Categorical features
Proper preprocessing, especially encoding categorical variables into a numerical format, is crucial in machine learning for accurate computations and predictions. Techniques like one-hot, ordinal, and binary encoding are commonly used. Similarly, streamlining target variable categories aids model performance and interpretability. In this study, we combined similar categories to reduce complexity. For example, ‘Hall of famer’ and ‘Allstar’ were grouped into category 2, ‘Starter’ and ‘Rotation’ into category 1, and ‘Benchwarmer’ and ‘Unrated’ into category 0. This approach aligns with best practices in machine learning to ensure efficient model training and reliable results.
Gender Encoding
encoder = LabelEncoder()
df['DIRECTOR: GENDER'] = encoder.fit_transform(df['DIRECTOR: GENDER'])
Company Sector Encoding
columns_to_select = [col for col in df.columns if col not in ['COMPANY: DOMICILE', 'COMPANY: SECTOR']]
# Select all rows and specific columns using DataFrame.loc[]
filtered_df = df.loc[:, columns_to_select]
Director Performance Encoding
category_to_value = {
"HALL OF FAMER": 2,
"ALL STAR": 2,
"STARTER": 1,
"ROTATION": 1,
"BENCHWARMER": 0,
"UNRATED": 0
}
# Encode the values in the 'PERFORMANCE: CATEGORY' column
df['PERFORMANCE: CATEGORY'] = df['PERFORMANCE: CATEGORY'].map(category_to_value)
Handling Multicollinearity
df = df.dropna(axis = 0)
corr = df.corr()
plt.figure(figsize=(100, 60))
sns.heatmap(corr, cmap = 'Blues', annot=True,annot_kws={"size": 30})
# Change the size of the x-axis labels
plt.xticks(fontsize=35)
plt.yticks(fontsize=35)
plt.show()
We conducted a correlation analysis to understand the relationships among features, aiding feature selection and model decisions. This allowed us to detect potential multicollinearity that could impact model performance. Moreover, we used the Variance Inflation Factor (VIF) to pinpoint specific features causing multicollinearity.
Identifying ‘INFLUENCE: 2023’ and ‘INFLUENCE: MRY’ as problematic due to their high correlation, we removed them from our dataset to enhance the robustness and reliability of our analysis
Model Selection & Training
Tree-Based Model
Decision Tree Model
We performed a Decision Tree model with a maximum tree depth of 4, which we chose as an initial attempt to balance the model’s complexity and interpretability. A depth of 4 can help reduce overfitting while still enabling the model to capture complex relationships within the data.
X = df.drop(['PERFORMANCE: CATEGORY'], axis = 1)
y = df['PERFORMANCE: CATEGORY']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
#define classification algorithm
clf_tree = tree.DecisionTreeClassifier(max_depth = 4)
# get start time
start_time = time.time()
# fit the model
clf_tree = clf_tree.fit(X_train, y_train)
# calculate and print the time taken to train the model
train_time = time.time() - start_time
print("Training time: ", train_time)
# reset start time
start_time = time.time()
# make predictions
y_pred_tree = clf_tree.predict(X_test)
# calculate and print the time taken to make predictions
predict_time = time.time() - start_time
print("Prediction time: ", predict_time)
The resulting classification report indicates the following performance metrics:
cm_tree = confusion_matrix(y_test, y_pred_tree)
# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm_tree, annot=True, fmt='d', cmap='Blues',annot_kws={"size": 14})
plt.xlabel('Predicted', fontsize = 14)
plt.ylabel('Actual', fontsize = 14)
plt.title('Confusion Matrix for Decision Tree', fontsize = 14)
# Show the plot
plt.show()
Model Evaluation
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred_tree))
The Decision Tree model demonstrates high precision across all performance categories, indicating reliable classification of a director’s performance. It’s especially adept at identifying lower and top-tier performers, though it may miss some in these categories. This suggests it rarely misclassifies but might not always identify all potential top performers, which should be the focus of optimization.
Feature Importance
# Extract feature importances
importances = clf_tree.feature_importances_
# Create a DataFrame with feature names and their corresponding importances
features_importance = pd.DataFrame({
"Feature": X.columns,
"Importance": importances
})
# Sort the DataFrame by importance in descending order
features_importance = features_importance.sort_values("Importance", ascending=False)
# Select the top 10 most important features
top_10_features = features_importance.head(10)
# Plot the feature importances of the top 10 features
plt.figure(figsize=(10, 8))
sns.barplot(x="Importance", y="Feature", data=top_10_features, palette = 'Blues_r')
plt.title('Top 10 Feature Importance for Decision Tree')
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()
Random Forest Model (Bagging)
Consequently, we applied the Random Forest algorithm to compare its performance with the Decision Tree model, as it has several advantages. Random Forest can provide better generalization due to its ensemble nature, which combines multiple decision trees to make predictions. By constructing trees with different subsets of the data and aggregating their results, Random Forest can capture more complex relationships and is less prone to overfitting.
X = df.drop('PERFORMANCE: CATEGORY', axis = 1)
y = df['PERFORMANCE: CATEGORY']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# create regressor object
clf_rdm = RandomForestClassifier(n_estimators = 100, random_state = 42)
# get start time
start_time = time.time()
# fit the regressor with x and y data
clf_rdm = clf_rdm.fit(X_train, y_train)
# calculate and print the time taken to train the model
train_time = time.time() - start_time
print("Training time: ", train_time)
# reset start time
start_time = time.time()
# make predictions
y_pred_rdm = clf_rdm.predict(X_test)
# calculate and print the time taken to make predictions
predict_time = time.time() - start_time
print("Prediction time: ", predict_time)
We set the number of trees in the Random Forest model to 100 and obtained the following performance metrics:
cm_rdm = confusion_matrix(y_test, y_pred_rdm)
# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm_rdm, annot=True, fmt='d', cmap='Blues',annot_kws={"size": 14})
# Add labels and title
plt.xlabel('Predicted', fontsize = 14)
plt.ylabel('Actual', fontsize = 14)
plt.title('Confusion Matrix for Random Forest', fontsize = 14)
# Show the plot
plt.show()
Model Evaluation
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred_rdm))
With the Random Forest model, we see an impressive enhancement in performance across all categories. Regardless of the performance level (‘Benchwarmer’, ‘Unrated’, ‘Starter’, ‘Rotation’, ‘Hall of Famer’, ‘All Star’), the model exhibits strong precision, indicating high confidence in its predictions. This means the risk of incorrect classification is minimal.
When juxtaposed with the Decision Tree model, the Random Forest model manifests a marked improvement. Its performance superiority is evident through high precision across all performance tiers, which substantiates the reliability of its predictions and reduces the possibility of misidentification.
From a business standpoint, the enhanced performance of the Random Forest model suggests it’s a more robust tool for predicting board director performance. With its high precision, we can trust its predictions, thereby reducing potential errors in our decision-making process and ensuring the identification of high-performing directors
Feature Importance
# Extract feature importances
importances = clf_rdm.feature_importances_
# Create a DataFrame with feature names and their corresponding importances
features_importance = pd.DataFrame({
"Feature": X.columns,
"Importance": importances
})
# Sort the DataFrame by importance in descending order
features_importance = features_importance.sort_values("Importance", ascending=False)
# Select the top 10 most important features
top_10_features = features_importance.head(10)
# Plot the feature importances of the top 10 features
plt.figure(figsize=(10, 8))
sns.barplot(x="Importance", y="Feature", data=top_10_features, palette='Blues_r')
plt.title('Top 10 Feature Importance for Random Forest')
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()
For the Random Forest model, key features like ‘Resume: Network Power’, ‘Who: Network Power’, and ‘Influence: CEOs / Ex CEOs’ significantly predict a board director’s performance. This implies that a director’s professional network strength, influence from previous roles, and experience as a CEO are important factors. These findings stress the need to consider a director’s connections and prior high-ranking experience for effective board selection, guiding sponsors in their decision-making process.
XGBoost Model (Boosting)
Following our experimentation with the Random Forest model, we turned to the XGBoost algorithm. XGBoost, notable for its computational efficiency and enhanced model performance, presents a strong case for superior outcomes. Its gradient boosting framework allows for the rectification of previous model errors, often improving predictive accuracy and minimizing overfitting. Furthermore, it efficiently handles missing values. Hence, we employed XGBoost to explore its potential in boosting the prediction of board director performance.
from xgboost import XGBClassifier
clf_xgb = XGBClassifier(objective = 'multi:softmax',
eval_metric = 'merror',
learning_rate = 0.1,
max_depth = 5,
n_estimators = 1000,
verbosity = 1,
seed = 42)
# get start time
start_time = time.time()
# fit the model
clf_xgb.fit(X_train, y_train, verbose = True, early_stopping_rounds = 10, eval_set = [(X_test, y_test)])
# calculate and print the time taken to train the model
train_time = time.time() - start_time
# reset start time
start_time = time.time()
print("Training time: ", train_time)
# Generate predictions
y_pred_xgb = clf_xgb.predict(X_test)
from sklearn.metrics import confusion_matrix
# Calculate the confusion matrix
conf_matrix_xgb = confusion_matrix(y_test, y_pred_xgb)
# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix_xgb, annot=True, fmt='d', cmap='Blues', annot_kws={"size": 14})
# Add labels and title
plt.xlabel('Predicted', fontsize = 14)
plt.ylabel('Actual' , fontsize = 14)
plt.title('Confusion Matrix for XgBoost' , fontsize = 14)
# Show the plot
plt.show()
Model Evaluation
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred_xgb))
The XGBoost model shows good precision across performance categories, but the Random Forest model outperforms it with superior precision across all tiers. The Random Forest model handles data variations better, offering more reliable predictions. From a business perspective, its high precision supports effective board management and strategic planning. Thus, despite XGBoost’s fair performance, Random Forest is the preferred choice for predicting director performance due to its higher precision.
Feature Importance
# Extract feature importances
importances = clf_xgb.feature_importances_
# Create a DataFrame with feature names and their corresponding importances
features_importance = pd.DataFrame({
"Feature": X.columns,
"Importance": importances
})
# Sort the DataFrame by importance in descending order
features_importance = features_importance.sort_values("Importance", ascending=False)
# Select the top 10 most important features
top_10_features = features_importance.head(10)
# Plot the feature importances of the top 10 features
plt.figure(figsize=(10, 8))
sns.barplot(x="Importance", y="Feature", data=top_10_features, palette = 'Blues_r')
plt.title('Top 10 Feature Importance for XGBoost')
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()
The XGBoost model highlights that a company’s location and sector significantly influence board director performance. The key feature ‘Company: Domicile_US’ suggests that directors in U.S.-based companies perform distinctly, potentially due to specific business environments. The model also underscores the importance of directors in various other countries, suggesting that regional business practices may impact director effectiveness. The feature ‘Company: Sector_Real Estate’ points to industry-specific dynamics influencing director performance. In essence, these insights suggest the necessity for sponsors to consider the unique conditions of their operational context for better board performance predictions.
Neural Network Model
Multi-layer Perceptron (MLP)
We extended our modeling to include a feedforward neural network, after favorable results from Random Forest and XGBoost models. This type of network, part of the deep learning family, excels at extracting complex features from data, albeit at the cost of computational resources and careful hyperparameter tuning. We built our model using Keras Sequential architecture, creating a linear stack of layers in the network.
Our model alternates between Dense and Dropout layers with an output layer. Dense layers with nodes numbered 128, 64, 32, and 16 apply the ‘relu’ activation function, crucial for learning intricate data relationships. Dropout layers at 0.5 rate prevent overfitting by randomly nullifying half of the input units during training, promoting model robustness. The output layer uses a ‘softmax’ activation function to output probabilities for the three performance classes.
The model uses the ‘adam’ optimizer and ‘sparse_categorical_crossentropy’ loss function, suitable for multi-class classification, with ‘accuracy’ as the metric. The architecture and parameters are optimized to balance model complexity and overfitting, based on iterative testing and refinement, further fine-tuned using cross-validation and grid search techniques.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Define a function to create your Keras model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Set the learning rate
learning_rate = 0.001
# Create the optimizer with the specified learning rate
optimizer = Adam(learning_rate=learning_rate)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=1, validation_data=(X_test, y_test))
# Obtain the predicted probabilities from the neural network
y_pred_probs1 = model.predict(X_test)
# Convert the predicted probabilities to class labels
y_pred_nn1 = np.argmax(y_pred_probs, axis=1)
# Compute the confusion matrix
cm_nn1 = confusion_matrix(y_test, y_pred_nn)
# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm_nn1, annot=True, fmt='d', cmap='Blues', annot_kws={"size": 14})
# Add labels and title
plt.xlabel('Predicted', fontsize = 14)
plt.ylabel('Actual' , fontsize = 14)
plt.title('Confusion Matrix for Neural Network' , fontsize = 14)
# Show the plot
plt.show()
Model Evaluation
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred_nn1))
The neural network model’s performance demonstrates considerable efficacy, with an overall accuracy of 81%. However, when compared to the XGBoost and Random Forest models, it exhibits slightly lower metrics in terms of accuracy, precision, and recall.
To optimize our deep learning model further, we proceeded to fine-tune the model’s hyperparameters. This process aims to enhance the model’s predictive capabilities and align its performance more closely with that of the XGBoost and Random Forest models.
Hyperparameter Tuning
We improved our model using Keras Tuner’s RandomSearch for hyperparameter tuning, adjusting parameters within certain ranges to maximize validation accuracy. Our build_model function helped streamline this tuning process.
In the function, we adjusted:
- The number of units in input and hidden layers (from 128 to 256 and 32 to 128, respectively, with step size 32).
- The dropout rate within layers (from 0 to 0.5 with a step size of 0.1), which helps reduce overfitting by randomly dropping neurons during training.
- We provided three learning rate options for the optimizer, which helps balance learning speed with the risk of overshooting the minimum.
Using RandomSearch, we carried out a hyperparameter search across five trials, with each trial using a different set of hyperparameters
from keras_tuner import RandomSearch
# Define a function to create your Keras model
def build_model(hp):
model = Sequential()
model.add(Dense(units=hp.Int('units_input',
min_value=128,
max_value=256,
step=32),
activation='relu',
input_shape=(X_train.shape[1],)))
model.add(Dense(units=hp.Int('units_hidden',
min_value=32,
max_value=128,
step=32),
activation='relu'))
model.add(Dropout(hp.Float('dropout', 0, 0.5, step=0.1)))
model.add(Dense(units=hp.Int('units_hidden',
min_value=32,
max_value=128,
step=32),
activation='relu'))
model.add(Dropout(hp.Float('dropout', 0, 0.5, step=0.1)))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
learning_rate = hp.Choice('learning_rate', [1e-2, 1e-3, 1e-4])
optimizer = Adam(learning_rate=learning_rate)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
start_time = time.time()
# Initialize the tuner
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=5, # the number of different models to try
)
# Perform the hyperparameter search
tuner.search(X_train, y_train, epochs=5, validation_data=(X_test, y_test))
# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
# Build the model with the optimal hyperparameters and train it
model = tuner.hypermodel.build(best_hps)
history = model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=1, validation_data=(X_test, y_test))
# Obtain the predicted probabilities from the neural network
y_pred_probs2 = model.predict(X_test)
# Convert the predicted probabilities to class labels
y_pred_nn2 = np.argmax(y_pred_probs2, axis=1)
# Compute the confusion matrix
cm_nn2 = confusion_matrix(y_test, y_pred_nn2)
end_time = time.time()
computation_time = end_time - start_time
print(f"The computation time is {computation_time} seconds.")
Our Neural Network model saw notable improvements in accuracy, precision, and recall following hyperparameter tuning. However, it still didn’t outperform XGBoost and Random Forest models for our specific problem.
Our aim was to explore how features affect optimal board director performance (class 2). Both XGBoost and Random Forest models showed better precision and recall for class 2 predictions, and a higher overall accuracy, indicating superior classification for all performance classes.
Despite improvements, the Neural Network model was outperformed by XGBoost and Random Forest models. Model choice, however, depends on the dataset and specific use case. In some scenarios, the Neural Network model could excel if the data suits its inherent assumptions and computational advantages. It’s also worth noting that the Neural Network model had a longer computational time of about 85 seconds, a key consideration when time is crucial.
Linear Model
Logistic Regression
In our final phase of model comparison, we implemented a logistic regression model. Logistic regression, a statistically grounded algorithm, excels in binary and multiclass classification problems. It’s simpler than tree-based models or neural networks and is particularly valued for its interpretability and resistance to overfitting.
# define logistic regression classifier
clf_logreg = LogisticRegression()
# get start time
start_time = time.time()
# fit the model
clf_logreg = clf_logreg.fit(X_train, y_train)
# calculate and print the time taken to train the model
train_time = time.time() - start_time
print("Training time: ", train_time)
# reset start time
start_time = time.time()
# make predictions
y_pred_logreg = clf_logreg.predict(X_test)
# calculate and print the time taken to make predictions
predict_time = time.time() - start_time
print("Prediction time: ", predict_time)
# Generate predictions
y_pred_logreg = clf_logreg.predict(X_test)
However, our logistic regression model struggled to accurately predict classes 0 and 2, as indicated by low recall scores of 0.29 and 0.17 respectively. This is likely due to the model’s inability to capture complex, non-linear relationships in the data. The results underscore that for our specific dataset, more complex models like tree-based models and neural networks seem better equipped.
Conclusion
Exploring machine learning models for predicting board director performance, we found the Random Forest model to excel in precision and accuracy. This is consistent with the model’s known resistance to overfitting and ability to handle complex data relations.
The XGBoost model also performed well, albeit slightly less effectively than Random Forest. Decision Tree models provided valuable variable insights, but their predictability was limited, showcasing their overfitting tendencies.
Despite improvements post-hyperparameter tuning, Neural Networks were still outperformed by the Random Forest and XGBoost models. However, their potential shines with larger datasets and appropriate conditions.
Finally, using Logistic Regression emphasized the challenges of dealing with complex data and non-linear relationships. While known for its simplicity and interpretability, it struggled with our dataset, underscoring the need for complex models for complex datasets
Recommendation
The findings suggest that sponsors and stakeholders should consider the following when selecting and evaluating board directors:
- The strength of a director’s professional network is essential. Directors with extensive networks, particularly those with influence and past experience as CEOs, are likely to perform better.
- Consider the influence dynamics and the composition of the board itself, including the gender power gap. A board that is balanced in terms of gender and has a healthy distribution of influence can lead to better performance.
- Location matters. Directors based in different regions may perform differently due to local business practices and environmental factors. Therefore, consider these aspects during selection and performance prediction.
- Industry-specific knowledge is vital. Directors operating in industries such as real estate need to have a deep understanding of the sector’s unique dynamics.
You can find the complete code for this project in the linked GitHub repository. Feel free to clone, fork, or star it. Here is the link: GitHub Repository