Sitemap

BIM + Machine Learning: Unlocking Patterns in Building Design

7 min readAug 7, 2025

Predicting Door Frame Materials with Classical Machine Learning

What if you could instantly spot design patterns across a series of buildings of a same typology— and use them to guide or audit new projects? In this post, I’ll show you how I explored that question using a surprisingly simple approach: classic machine learning models applied to BIM-like tabular data.

A note: This project was actually completed several years ago, when I was just beginning my journey in data science. Revisiting it now, I’m sharing not only the workflow and results, but also a bit of that early excitement about applying ML to real architectural data. Architects, engineers, and anyone interested in practical ML: this one’s for you.

Why Even Try?

In real projects, information about building elements (like doors, rooms, or materials) is often scattered across different files, formats, and even languages. I wondered: could a model, trained on historical building data, reliably predict certain attributes — like door frame materials — across multiple projects?

This would not just save time. It could help reveal hidden regularities in your design library, and even flag unusual or inconsistent decisions.

Data: From Messy Reality to Model-Ready

Before modeling, there was a lot of good old-fashioned data cleaning. My raw data came from several finished office projects, spanning different years and even naming conventions. As is typical, preprocessing involved:

  • Unifying formats (e.g., heights always in meter)
  • Resolving inconsistent room names and materials
  • Filtering for features that are available across all projects

Note: This step is essential for real-world BIM analytics but not the focus here — so I won’t show all the code. Just know it was standard but important data wrangling.

Feature Selection: What Goes In?

To keep the experiment realistic and repeatable, I selected features that are commonly available and easy to extract:

  • Related spaces (from_room, to_room, their areas)
  • Door type parameters (leaf type, host material, floor, etc.)

The idea is — If a model can predict “frame material” using only generic, accessible features, it can generalize to new projects.

feature_columns = [
'floor',
'door_leaf',
'width',
'r_l_opening',
'host_width',
'host_material',
'from_room',
'to_room',
'from_room_area',
'to_room_area'
]

data[feature_columns].head()
-----------------------------------------------------
| floor | door_leaf | width | r_l_opening | host_width | host_material | from_room | to_room | from_room_area | to_room_area |
|-------|-----------|-------|-------------|------------|---------------|-----------|-----------|----------------|--------------|
| 2_UG | 1FL | 1.32 | R | 0.30 | STB | Parken | Schleuse | 3804.75 | 6.73 |
| 2_UG | 1FL | 1.32 | R | 0.25 | STB | Parken | Schleuse | 3804.75 | 8.76 |
| 2_UG | 1FL | 1.32 | R | 0.22 | STB | Parken | Schleuse | 4049.80 | 5.14 |
| 2_UG | 1FL | 1.32 | R | 0.30 | STB | Parken | Schleuse | 4049.80 | 18.77 |
| 2_UG | 1FL | 1.32 | R | 0.25 | STB | Parken | Schleuse | 4049.80 | 12.56 |
office_train['project_code'].value_counts()
-----------------------------------------------------
project_code
office_2 770
office_1 323
office_3 299
office_4 257

Target: Why “Frame Material”?

y = ['frame_material']

train['frame_material'].value_counts()
-----------------------------------------------------
frame_material
S 1189
A 366
H 49
gD 45

I wanted to test the concept — not necessarily predict the “most important” feature. The target was chosen because “frame material” was present and relatively standardized across projects, despite differences in project year or software template.

Training and Testing: Cross-Building Validation

To make the challenge realistic, I trained models on four completed buildings (office_1 to office_4) and tested them on a completely unseen fifth building (office_5). This simulates deploying your prediction logic to a brand new project.

train = data[(data['project_code']=='office_1') |
(data['project_code']=='office_2') |
(data['project_code']=='office_3') |
(data['project_code']=='office_4')]

test = data[data['project_code']=='office_5']

test['frame_material'].value_counts()
-----------------------------------------------------
frame_material
S 458
A 96

The Models: Simple, Interpretable, Powerful

1. CatBoost

Chosen for its native handling of categorical features and ease of use. CatBoost gives robust results on structured, tabular data, making it ideal for the BIM world.

params_office_5 = {'verbose':100,
'random_seed':42,
'cat_features': categorical,
'learning_rate':0.15}
model_catboost = CatBoostClassifier(**params_office_5)
model_catboost.fit(train[X],train[y],eval_set=(val[X],val[y]))

2. LightGBM

A gradient boosting framework known for speed and efficiency. Good for quick experiments and competitive baselines.

model_lgbm = LGBMClassifier(**rand.best_params_)
model_lgbm.fit(train[X],train[y].values.ravel() ,eval_set=[(val[X],val[y].values.ravel())])
-----------------------------------------------------
LGBMClassifier(colsample_bytree=0.9, learning_rate=0.03, max_depth=9,
min_child_samples=10, n_estimators=400, num_leaves=127,
reg_lambda=1.0)

3. Random Forest

Classic, transparent, and requires OneHotEncoding for categoricals. It’s always a solid benchmark — and often surprisingly strong.

params_office_5 = {'n_estimators': 150,
'min_samples_split': 10,
'class_weight':'balanced',
'max_features': 'log2',
'max_depth': 15,
'random_state': 42}
model_rfc = RandomForestClassifier(**params_office_5)
model_rfc.fit(X_office_train_ohe, office_train[y].values.ravel())

How Did They Do?

Each model was evaluated using F1 scores and confusion matrices to understand not just accuracy, but also where the models made mistakes. Here are a few key takeaways:

  • CatBoost was the top performer, handling both numeric and categorical features well, with a high overall accuracy and balanced performance across classes.
f1_score(test['frame_material'], test['prediction_frame_material_catboost'], average='macro')
-----------------------------------------------------
0.9346532985087201
The confusion matrix shows that the CatBoost model performs well overall, with most predictions for both classes (“A” and “S”) being correct. However, there are a small number of misclassifications between the two classes: 9 samples of “A” are predicted as “S”, and 12 samples of “S” are predicted as “A”. This indicates the model distinguishes well between classes, but there is still some overlap, which may be related to the feature similarity or data noise. The performance for the majority class (“S”) is slightly better, which is typical in imbalanced datasets.
  • Random Forest also did well, but had more confusion between some classes — showing that encoding and model choice can make a difference.
f1_score(test['frame_material'], test['prediction_frame_material_rfc'], average='macro')
-----------------------------------------------------
0.8892929938851365
The confusion matrix for the Random Forest model shows that most predictions are correct, especially for class “S” (441 out of 458). However, the model confuses some samples of class “A” with “S” (18 times) and vice versa (17 times). This suggests that while Random Forest performs quite well overall, it tends to misclassify a portion of “A” samples as “S” and vice versa, indicating the classes are somewhat challenging to separate for this model.
  • LightGBM struggled with class imbalance, tending to over-predict the majority class.
f1_score(test['frame_material'], test['prediction_frame_material_lgbm'], average='macro')
-----------------------------------------------------
0.8264818729935008
The confusion matrix for LightGBM shows that the model tends to over-predict the ‘S’ class at the expense of the ‘A’ class. While it correctly classifies most of the ‘S’ samples (449 out of 458), it struggles with the ‘A’ class, misclassifying 39 out of 96 ‘A’ instances as ‘S’. This suggests that the model may be biased toward the majority class or not capturing enough signal to separate ‘A’ from ‘S’. Further tuning or data balancing techniques may be needed to improve recall for the minority class.

Ensemble Voting:
By combining predictions — either through a simple majority vote or by averaging the predicted probabilities — the ensemble approach produced more robust and reliable results, particularly for those cases where the mean prediction probability exceeded 85%.

proba_cat = model_catboost.predict_proba(test[X])
proba_lgbm = model_lgbm.predict_proba(test[X])
proba_rfc = model_rfc.predict_proba(X_test_ohe)

mean_proba = (proba_cat + proba_lgbm + proba_rfc) / 3

final_pred = mean_proba.argmax(axis=1)

class_labels = model_catboost.classes_

test['final_pred'] = class_labels[final_pred]
max_mean_proba = mean_proba.max(axis=1)
test['mean_pred_proba'] = max_mean_proba

confident_preds = test[test['mean_pred_proba'] >= 0.85]

confident_preds[['final_pred', 'mean_pred_proba', 'frame_material'] + list(X)].head()
-----------------------------------------------------
| final_pred | mean_pred_proba | frame_material | floor | door_leaf | width | r_l_opening | host_width | host_material | from_room | to_room | from_room_area | to_room_area |
|------------|-----------------|----------------|-------|-----------|-------|-------------|------------|---------------|--------------|-----------|----------------|--------------|
| S | 0.921 | S | 2_UG | 1FL | 1.010 | L | 0.250 | STB | Schleuse | Lager | 10.28 | 10.88 |
| S | 0.916 | S | 2_UG | 1FL | 1.135 | R | 0.250 | STB | Schleuse | TRH | 9.48 | 26.19 |
| S | 0.908 | S | 2_UG | 1FL | 1.135 | L | 0.250 | STB | Tiefgarage | Schleuse | 1731.49 | 9.48 |
| S | 0.912 | S | 2_UG | 1FL | 1.135 | R | 0.250 | STB | Schleuse | TRH | 11.07 | 33.02 |
| S | 0.921 | S | 2_UG | 1FL | 1.135 | R | 0.175 | MWK | Tiefgarage | Schleuse | 1731.49 | 11.07 |

For door frames where the model’s confidence was lower, it makes sense to leave these fields unfilled, allowing for manual review and expert input. This ensures that only high-certainty predictions are automated, while ambiguous cases can be handled with domain knowledge, supporting both accuracy and practical decision-making in real projects.

What Did the Model Learn?

To understand what drove the model’s decisions, I looked at feature importance from both CatBoost and LightGBM. Both algorithms highlighted that door width and the context of the rooms (such as “from_room”, “to_room”, and “host_material”) were among the most informative features.

model_catboost.get_feature_importance(prettified=True)
-----------------------------------------------------
| Feature Id | Importances |
|------------------|-------------|
| width | 22.68 |
| floor | 12.12 |
| host_material | 11.63 |
| from_room | 11.60 |
| to_room | 10.73 |
| door_leaf | 10.46 |
| host_width | 8.31 |
| to_room_area | 4.97 |
| from_room_area | 3.84 |
| r_l_opening | 3.66 |
feature_importances = model_lgbm.feature_importances_
feature_names = model_lgbm.feature_name_
df_feature_importances = pd.DataFrame({
'Feature Name': feature_names,
'Importance (Split)': feature_importances
})
df_feature_importances = df_feature_importances.sort_values(by='Importance (Split)', ascending=False).reset_index(drop=True)

total_importance = df_feature_importances['Importance (Split)'].sum()
df_feature_importances['Importance (%)'] = (df_feature_importances['Importance (Split)'] / total_importance) * 100

df_feature_importances
-----------------------------------------------------
| Feature Name | Importance (Split) | Importance (%) |
|----------------|--------------------|----------------|
| width | 5786 | 24.51 |
| to_room_area | 5766 | 24.42 |
| from_room_area | 5417 | 22.95 |
| host_width | 4968 | 21.04 |
| r_l_opening | 902 | 3.82 |
| to_room | 307 | 1.30 |
| from_room | 239 | 1.01 |
| floor | 204 | 0.86 |
| door_leaf | 18 | 0.08 |
| host_material | 1 | 0.00 |

Limitations and Next Steps

This workflow is a proof of concept — not a turnkey solution. If you want to use ML for real design automation or audit, you’ll need:

  • More training data (from a wider range of projects)
  • Further feature engineering
  • Careful architecture selection and model tuning
  • Robust validation and uncertainty handling for real-world use

But even with “simple” models and features, we saw that patterns emerge — and can be detected — across projects.

The Takeaway

You don’t need deep learning or a massive BIM data lake to get actionable insights from your past projects. With well-prepared data and classical ML models, you can already:

  • Detect design regularities
  • Spot outliers or non-standard elements
  • Support knowledge transfer between projects

Curious to try this in your office?
Start by standardizing your data extraction, feature engineering, and basic ML workflows. As this experiment shows, the results might surprise you.

The code for this project is available on GitHub.

--

--

No responses yet