BIM + Machine Learning: Unlocking Patterns in Building Design
Predicting Door Frame Materials with Classical Machine Learning
What if you could instantly spot design patterns across a series of buildings of a same typology— and use them to guide or audit new projects? In this post, I’ll show you how I explored that question using a surprisingly simple approach: classic machine learning models applied to BIM-like tabular data.
A note: This project was actually completed several years ago, when I was just beginning my journey in data science. Revisiting it now, I’m sharing not only the workflow and results, but also a bit of that early excitement about applying ML to real architectural data. Architects, engineers, and anyone interested in practical ML: this one’s for you.
Why Even Try?
In real projects, information about building elements (like doors, rooms, or materials) is often scattered across different files, formats, and even languages. I wondered: could a model, trained on historical building data, reliably predict certain attributes — like door frame materials — across multiple projects?
This would not just save time. It could help reveal hidden regularities in your design library, and even flag unusual or inconsistent decisions.
Data: From Messy Reality to Model-Ready
Before modeling, there was a lot of good old-fashioned data cleaning. My raw data came from several finished office projects, spanning different years and even naming conventions. As is typical, preprocessing involved:
- Unifying formats (e.g., heights always in meter)
- Resolving inconsistent room names and materials
- Filtering for features that are available across all projects
Note: This step is essential for real-world BIM analytics but not the focus here — so I won’t show all the code. Just know it was standard but important data wrangling.
Feature Selection: What Goes In?
To keep the experiment realistic and repeatable, I selected features that are commonly available and easy to extract:
- Related spaces (from_room, to_room, their areas)
- Door type parameters (leaf type, host material, floor, etc.)
The idea is — If a model can predict “frame material” using only generic, accessible features, it can generalize to new projects.
feature_columns = [
'floor',
'door_leaf',
'width',
'r_l_opening',
'host_width',
'host_material',
'from_room',
'to_room',
'from_room_area',
'to_room_area'
]
data[feature_columns].head()
-----------------------------------------------------
| floor | door_leaf | width | r_l_opening | host_width | host_material | from_room | to_room | from_room_area | to_room_area |
|-------|-----------|-------|-------------|------------|---------------|-----------|-----------|----------------|--------------|
| 2_UG | 1FL | 1.32 | R | 0.30 | STB | Parken | Schleuse | 3804.75 | 6.73 |
| 2_UG | 1FL | 1.32 | R | 0.25 | STB | Parken | Schleuse | 3804.75 | 8.76 |
| 2_UG | 1FL | 1.32 | R | 0.22 | STB | Parken | Schleuse | 4049.80 | 5.14 |
| 2_UG | 1FL | 1.32 | R | 0.30 | STB | Parken | Schleuse | 4049.80 | 18.77 |
| 2_UG | 1FL | 1.32 | R | 0.25 | STB | Parken | Schleuse | 4049.80 | 12.56 |
office_train['project_code'].value_counts()
-----------------------------------------------------
project_code
office_2 770
office_1 323
office_3 299
office_4 257Target: Why “Frame Material”?
y = ['frame_material']
train['frame_material'].value_counts()
-----------------------------------------------------
frame_material
S 1189
A 366
H 49
gD 45I wanted to test the concept — not necessarily predict the “most important” feature. The target was chosen because “frame material” was present and relatively standardized across projects, despite differences in project year or software template.
Training and Testing: Cross-Building Validation
To make the challenge realistic, I trained models on four completed buildings (office_1 to office_4) and tested them on a completely unseen fifth building (office_5). This simulates deploying your prediction logic to a brand new project.
train = data[(data['project_code']=='office_1') |
(data['project_code']=='office_2') |
(data['project_code']=='office_3') |
(data['project_code']=='office_4')]
test = data[data['project_code']=='office_5']
test['frame_material'].value_counts()
-----------------------------------------------------
frame_material
S 458
A 96The Models: Simple, Interpretable, Powerful
1. CatBoost
Chosen for its native handling of categorical features and ease of use. CatBoost gives robust results on structured, tabular data, making it ideal for the BIM world.
params_office_5 = {'verbose':100,
'random_seed':42,
'cat_features': categorical,
'learning_rate':0.15}
model_catboost = CatBoostClassifier(**params_office_5)
model_catboost.fit(train[X],train[y],eval_set=(val[X],val[y]))2. LightGBM
A gradient boosting framework known for speed and efficiency. Good for quick experiments and competitive baselines.
model_lgbm = LGBMClassifier(**rand.best_params_)
model_lgbm.fit(train[X],train[y].values.ravel() ,eval_set=[(val[X],val[y].values.ravel())])
-----------------------------------------------------
LGBMClassifier(colsample_bytree=0.9, learning_rate=0.03, max_depth=9,
min_child_samples=10, n_estimators=400, num_leaves=127,
reg_lambda=1.0)3. Random Forest
Classic, transparent, and requires OneHotEncoding for categoricals. It’s always a solid benchmark — and often surprisingly strong.
params_office_5 = {'n_estimators': 150,
'min_samples_split': 10,
'class_weight':'balanced',
'max_features': 'log2',
'max_depth': 15,
'random_state': 42}
model_rfc = RandomForestClassifier(**params_office_5)
model_rfc.fit(X_office_train_ohe, office_train[y].values.ravel())How Did They Do?
Each model was evaluated using F1 scores and confusion matrices to understand not just accuracy, but also where the models made mistakes. Here are a few key takeaways:
- CatBoost was the top performer, handling both numeric and categorical features well, with a high overall accuracy and balanced performance across classes.
f1_score(test['frame_material'], test['prediction_frame_material_catboost'], average='macro')
-----------------------------------------------------
0.9346532985087201- Random Forest also did well, but had more confusion between some classes — showing that encoding and model choice can make a difference.
f1_score(test['frame_material'], test['prediction_frame_material_rfc'], average='macro')
-----------------------------------------------------
0.8892929938851365- LightGBM struggled with class imbalance, tending to over-predict the majority class.
f1_score(test['frame_material'], test['prediction_frame_material_lgbm'], average='macro')
-----------------------------------------------------
0.8264818729935008Ensemble Voting:
By combining predictions — either through a simple majority vote or by averaging the predicted probabilities — the ensemble approach produced more robust and reliable results, particularly for those cases where the mean prediction probability exceeded 85%.
proba_cat = model_catboost.predict_proba(test[X])
proba_lgbm = model_lgbm.predict_proba(test[X])
proba_rfc = model_rfc.predict_proba(X_test_ohe)
mean_proba = (proba_cat + proba_lgbm + proba_rfc) / 3
final_pred = mean_proba.argmax(axis=1)
class_labels = model_catboost.classes_
test['final_pred'] = class_labels[final_pred]
max_mean_proba = mean_proba.max(axis=1)
test['mean_pred_proba'] = max_mean_proba
confident_preds = test[test['mean_pred_proba'] >= 0.85]
confident_preds[['final_pred', 'mean_pred_proba', 'frame_material'] + list(X)].head()
-----------------------------------------------------
| final_pred | mean_pred_proba | frame_material | floor | door_leaf | width | r_l_opening | host_width | host_material | from_room | to_room | from_room_area | to_room_area |
|------------|-----------------|----------------|-------|-----------|-------|-------------|------------|---------------|--------------|-----------|----------------|--------------|
| S | 0.921 | S | 2_UG | 1FL | 1.010 | L | 0.250 | STB | Schleuse | Lager | 10.28 | 10.88 |
| S | 0.916 | S | 2_UG | 1FL | 1.135 | R | 0.250 | STB | Schleuse | TRH | 9.48 | 26.19 |
| S | 0.908 | S | 2_UG | 1FL | 1.135 | L | 0.250 | STB | Tiefgarage | Schleuse | 1731.49 | 9.48 |
| S | 0.912 | S | 2_UG | 1FL | 1.135 | R | 0.250 | STB | Schleuse | TRH | 11.07 | 33.02 |
| S | 0.921 | S | 2_UG | 1FL | 1.135 | R | 0.175 | MWK | Tiefgarage | Schleuse | 1731.49 | 11.07 |For door frames where the model’s confidence was lower, it makes sense to leave these fields unfilled, allowing for manual review and expert input. This ensures that only high-certainty predictions are automated, while ambiguous cases can be handled with domain knowledge, supporting both accuracy and practical decision-making in real projects.
What Did the Model Learn?
To understand what drove the model’s decisions, I looked at feature importance from both CatBoost and LightGBM. Both algorithms highlighted that door width and the context of the rooms (such as “from_room”, “to_room”, and “host_material”) were among the most informative features.
model_catboost.get_feature_importance(prettified=True)
-----------------------------------------------------
| Feature Id | Importances |
|------------------|-------------|
| width | 22.68 |
| floor | 12.12 |
| host_material | 11.63 |
| from_room | 11.60 |
| to_room | 10.73 |
| door_leaf | 10.46 |
| host_width | 8.31 |
| to_room_area | 4.97 |
| from_room_area | 3.84 |
| r_l_opening | 3.66 |feature_importances = model_lgbm.feature_importances_
feature_names = model_lgbm.feature_name_
df_feature_importances = pd.DataFrame({
'Feature Name': feature_names,
'Importance (Split)': feature_importances
})
df_feature_importances = df_feature_importances.sort_values(by='Importance (Split)', ascending=False).reset_index(drop=True)
total_importance = df_feature_importances['Importance (Split)'].sum()
df_feature_importances['Importance (%)'] = (df_feature_importances['Importance (Split)'] / total_importance) * 100
df_feature_importances
-----------------------------------------------------
| Feature Name | Importance (Split) | Importance (%) |
|----------------|--------------------|----------------|
| width | 5786 | 24.51 |
| to_room_area | 5766 | 24.42 |
| from_room_area | 5417 | 22.95 |
| host_width | 4968 | 21.04 |
| r_l_opening | 902 | 3.82 |
| to_room | 307 | 1.30 |
| from_room | 239 | 1.01 |
| floor | 204 | 0.86 |
| door_leaf | 18 | 0.08 |
| host_material | 1 | 0.00 |Limitations and Next Steps
This workflow is a proof of concept — not a turnkey solution. If you want to use ML for real design automation or audit, you’ll need:
- More training data (from a wider range of projects)
- Further feature engineering
- Careful architecture selection and model tuning
- Robust validation and uncertainty handling for real-world use
But even with “simple” models and features, we saw that patterns emerge — and can be detected — across projects.
The Takeaway
You don’t need deep learning or a massive BIM data lake to get actionable insights from your past projects. With well-prepared data and classical ML models, you can already:
- Detect design regularities
- Spot outliers or non-standard elements
- Support knowledge transfer between projects
Curious to try this in your office?
Start by standardizing your data extraction, feature engineering, and basic ML workflows. As this experiment shows, the results might surprise you.
The code for this project is available on GitHub.