A Hardware Wonk’s Guide to Specifying the Best 3D and BIM Workstations

By Matthew Stachoni

“Wow, this workstation is just way too fast for me.”
— No one. Ever.

Working with today’s leading Building Information Modeling (BIM) and 3D visualization tools presents a special challenge to your IT infrastructure. Wrestling with the computational demands of the Revit software platform, as well as BIM-related applications such as 3ds Max, Navisworks, Rhino, Lumion, and others, means that one needs the right knowledge to make sound investments in workstation hardware. This article gets inside the mind of a certified (or certifiable) hardware geek to understand the variables to consider when purchasing hardware to support the demands of these BIM and 3D applications.

Specifying new BIM/3D workstations, particularly ones tuned for Autodesk’s 3D and BIM applications, can be a daunting task with all of the choices you have. You can spend quite a bit of research wading through online reviews, forums, and talking with salespeople who don’t understand what you do on a daily basis. Moreover, recent advancements in both hardware and software often challenge preconceptions of what is important.

Computing hardware had long ago met the relatively low demands of 2D CAD, but data-rich 3D BIM and visualization processes will tax any workstation to some extent. Many of the old CAD rules no longer apply; you are not working with small project files, as individual project assets can exceed a gigabyte as the BIM data grows and modeling gets more complex. The number of polygons in your 3D views in even modest models can be huge. Additionally, Autodesk’s high-powered BIM and 3D applications do not exactly fire up on a dime.

Today there exists a wide variety of tools to showcase BIM projects, so users who specialize in visualization will naturally demand the most powerful workstations you can find. However, the software barrier to entry for high end visualization results is dropping dramatically, as we are seeing modern applications that are both easy to learn and create incredible photorealistic images.

The capability and complexity of the tools in Autodesk’s various suites and collections improves with each release, and those capabilities can take their toll on your hardware. Iterating through adaptive components in Revit, or using the advanced rendering technologies such as the Iray rendering engine in 3ds Max will tax your workstation’s subsystems differently. Knowing how to best match your software challenges in hardware is important.

Disclaimer: In this article, I will often make references and tacit recommendations for specific system components. These are purely my opinion, stemming largely from extensive personal experience and research in building systems for myself, my customers, and my company. Use this handout as a source of technical information and a buying guide, but remember that you are spending your own money (or the money of someone you work for). Thus, the onus is on you to do your own research when compiling your specifications and systems. I have no vested interest in any component manufacturer and make no endorsements of any specific product mentioned in this article.

Identifying Your User Requirements

The first thing to understand is that one hardware specification does not fit all user needs. You must understand your users’ specific computing requirements. In general I believe we can classify users into one of three use-case scenarios and outfit them with a particular workstation profile.

1. The Grunts: These folks use Revit day in and day out, and rarely step outside of that to use more sophisticated software. They are typically tasked with the mundane jobs of project design, documentation, and project management, but do not regularly create complex, high end renderings or extended animations. Revit is clearly at the top of the process-consumption food chain, and nothing else they do taxes their system more than that. However, many Grunts will evolve over time into more complex workloads, so their workstations need to handle at least some higher-order functionality without choking.

2. The BIM Champs: These are your BIM managers and advanced users who not only use Revit all day for production support, but delve into the nooks and crannies of the program to help turn the design concepts into modeled reality. They not only develop project content, but create Dynamo scripts, manage models from a variety of sources, update and fix problems, and so on. BIM Champs may also regularly interoperate with additional 3D modeling software such as 3ds Max, Rhino, Lumion, and SketchUp, and pull light to medium duty creating visualizations. As such their hardware needs are greater than those of the Grunt, although perhaps in targeted areas.

3. The Viz Wizards: These are your 3D and visualization gurus who may spend as much time in visualization applications as they do in Revit. They routinely need to push models in to and out of 3ds Max, Rhino, Maya, InfraWorks 360, SketchUp, and others. They run graphics applications such as Adobe’s Photoshop, Illustrator, and others — often concurrently with Revit and 3ds Max. They may extensively use real-time ray tracing found in Unreal Engine 4 and Lumion. These users specialize in photorealistic renderings and animations, and develop your company’s hero imagery. The Viz Wiz will absolutely use as much horsepower as you can throw at them.

Ideally, each one of these kinds of users would be assigned a specific kind of workstation that is fully optimized for their needs. Given that you may find it best to buy systems in bulk, you may be tempted to specify a single workstation configuration for everyone without consideration to specific user workloads. I believe this is a mistake, as one size does not fit all. On the other hand, large disparities between systems can be an IT headache to maintain. Our goal is to establish workstation configurations that target these three specific user requirement profiles.

Industry Pressures and Key Trends

In building out any modern workstation or IT system, we need to first recognize the size of the production problems we are working with, and understand what workstation subsystems are challenged by a particular task. Before we delve too deeply into the specifics of hardware components, let’s review some key hardware industry trends which shape today’s state of the art and drive the future of computing:

  • Maximizing Performance per Watt (PPW)
  • The slowdown of yearly CPU performance advancement and the potential end of Moore’s Law
  • Realizing the potential that parallelism, multithreading, and multiprocessing bring to the game
  • Understanding the impact of PC gaming and GPU-accelerated computing for general design
  • Increased adoption of virtualization and cloud computing
  • Tight price differentials between computer components

Taken together these technologies allow us to scale workloads up, down, and out.

Maximizing Performance per Watt and Moore’s Law

Every year Intel, Nvidia, and AMD release new iterations of their hardware, and every year their products get faster, smaller, and cooler. Sometimes by a little, sometimes by a lot. Today, a key design criteria used in today’s microprocessor fabrication process is to maximize energy efficiency, measured in Performance per Watt, or PPW.

For years, the rate of improvement in integrated circuit design has been predicted quite accurately by Gordon E. Moore, a co-founder of Intel. Moore’s Law, first coined in his 1956 paper, “Cramming More Components onto Integrated Circuits,” is the observation that, over the history of computing hardware, the number of transistors in an integrated circuit has roughly doubled approximately every two years.

Transistor count and Moore’s Law, from 1971 to 2011. Note the logarithmic vertical scale.

How Transistors Work

A transistor is, at its heart, a relatively simple electrically driven switch that controls signaling current between two points. When the switch is open, no current flows and the signal has a value of 0. When the switch is closed, the current flows and you get a value of 1. We combine transistors together into larger circuits that can perform logical operations. Thus, the number of transistors on a processor directly determines what that chip can do, so cramming more of them in a certain amount of space is a critical path to performance improvement.

The most common transistor design is called a metal-oxide-semiconductor field-effect transistor, or MOSFET, which is a building block of today’s integrated circuits. Fundamentally, a MOSFET transistor has four parts: a source, a drain, a channel that connects the two, and a gate on top to control the channel. When the control gate has a positive voltage applied to it, it generates an electrical field that attracts negatively charged electrons in the channel underneath the gate, which then becomes a conductor between the source and drain. The switch is turned on.

Making transistors smaller is primarily accomplished by shrinking the space between the source and drain. This space is determined by the semiconductor technology node using a particular lithography fabrication process. A node/process is measured in nanometers (nm), or millionths of a millimeter.

Moore’s law, being an exponential function, means the rate of change is always increasing. This has largely been true until just recently. Every two to four years a new, smaller technology node makes its debut and the fabrication process has shrunk from 10,000 nm (10 microns) wide in 1971 to only 14 nm wide today. To give a sense of scale, a single human hair is about 100,000 nm (100 microns) wide. Moving from 10,000 nm to only 14 nm is equivalent to shrinking a person who is 5 feet 6 inches tall down to the size of a grain of rice.

Accordingly, transistor count has gone up from 2,300 transistors to somewhere between 1.35–2.6 billion transistors in today’s CPU models. Think about this: Boston Symphony Hall holds about 2,370 people (during Pops season). The population of China is about 1.357 billion people. Now squeeze the entire population of China into Boston Symphony Hall. That’s Moore’s Law for the past 45 years.

As a result of a smaller fabrication process, integrated circuits use less energy and produces less heat, which also allow for more densely packed transistors on a chip. In the late 1990s into the 2000s the trend was to increase on-die transistor counts and die sizes, but with the fabrication process still in the 60 nm to 90 nm range, CPUs simply got a lot larger. Energy consumption and heat dissipation became serious engineering challenges, and led to a new market of exotic cooling components such as large fans, CPU coolers with heat pipes, closed-loop water cooling solutions with pumps, reservoirs and radiators, and even submerging the entire PC in a vat of mineral oil. Clearly, the future of CPU microarchitectures depended on shrinking the fabrication process for as long as technically possible.

Today’s 14 nm processors and 14–16 nm GPUs are not only physically smaller, but also have advanced internal power management optimizations that reduce power (and thus heat) when it is not required. Increasing PPW allows higher performance to be stuffed into smaller packages and platforms, which opened the floodgates to the vast development of mobile technologies that we all take for granted.

This had two side effects. First, the development of more powerful, smaller, cooler running, and largely silent CPUs and GPUs allows you to stuff more of them in a single workstation without it cooking itself. At the same time, CPU clock speeds have been able to rise from about 2.4 GHz to 4 GHz and beyond.

Secondly, complex BIM applications can now extend from the desktop to mobile platforms, such as actively modeling in 3D using a small laptop during design meetings, running clash detections at the construction site using tablets, or using drone-mounted cameras to capture HD imagery.

Quantum Tunneling and the Impending End of Moore’s Law

While breakthroughs in MOSFET technology have enabled us to get down to a 14-nm process, we are starting to see the end of Moore’s law on the horizon. The space between the source and drain at 14 nm is only about 70 silicon atoms wide. At smaller scales, the ability to control current flow across a transistor without leakage becomes a significant problem.

By 2026 we expect to get down to a 5-nm process, which is only about 25 atoms wide. This 5-nm node is often assumed to be the practical end of Moore’s law, as transistors smaller than 7 nm will experience an increase in something called “quantum tunneling” which impacts transistor function. Quantum tunneling is the weird effect that happens when the process becomes so small that the probability of electrons simply passing through the logic gate barrier increases and becomes a source of leakage, keeping the switch from doing its job and thus limiting the size where the information being passed is still completely reliable. To fix this scientists have come up with 3D gate designs which are tall enough to minimize the probability of quantum tunneling, but the pace to move downward is slowing. To paraphrase Intel Fellow Mark Bohr, we are simply quickly running out of atoms to play with.

In the end, however, the future of microprocessor design will need to rely much less on shrinking the process, but through clever and innovative rethinking of micro-architectures and superscalar system design. But these kinds of improvements will likely be much less dramatic that what we have traditionally experienced over recent years. In fact, our discussion on the latest Intel CPUs reflect exactly this trend.

Parallel Processing, Multiprocessing, and Multi-threading

It has long been known that key problems associated with BIM and 3D visualization, such as energy modeling, photorealistic imagery, and engineering simulations are simply too big for a single processor to handle efficiently. Many of these problems are highly parallel in nature, where large tasks can often be neatly broken down into smaller ones that don’t rely on each other to finish before the next one can be worked on. This lead to the development of operating systems that support multiple CPUs.

First, some terminology on CPUs and cores. According to Microsoft, “systems with more than one physical processor or systems with physical processors that have multiple cores provide the operating system with multiple logical processors. A logical processor is one logical computing engine from the perspective of the operating system, application or driver. A core is one processor unit, which can consist of one or more logical processors. A physical processor can consist of one or more cores. A physical processor is the same as a processor package, a socket, or a CPU.”

In other words, an operating system such as Windows 10 will see a single physical CPU that has four cores as four separate logical processors, each of which can have threads of operation scheduled and assigned. The 64-bit versions of Windows 7 and later support more than 64 logical processors on a single computer. This functionality is not available in 32-bit versions of Windows.

All modern processors and operating systems fully support both multiprocessing — the ability to push separate processes to multiple CPU cores in a system — and multi-threading, the ability to execute separate threads of a single process across multiple processors. Processor technology has evolved to meet this demand, first by allowing multiple physical CPUs on a motherboard, then by introducing more efficient multi-core designs in a single CPU package. The more cores your machine has, the snappier your overall system response is and the faster any compute-intensive task such as rendering will complete.

These kinds of non-sequential workloads can be distributed to multiple processor cores on a CPU, multiple physical CPUs in a single PC, or even out to multiple physical computers that will chew on that particular problem and return results that can be aggregated later. Over time we’ve all made the mass migration to multi-core computing even if we aren’t aware of it, even down to our tablets and phones.

In particular, 3D photorealistic rendering lends itself very well to parallel processing. The ray tracing pipeline used in today’s rendering engines involves sending out rays from various sources (lights and cameras), accurately bouncing them off of or passing through objects they encounter in the scene, changing the data “payload” in each ray as it picks up physical properties from the object(s) it interacts with, and finally returning a color pixel value to the screen. This process is computationally expensive as it has to be physically accurate, and can simulate a wide variety of visual effects, such as reflections, refraction of light through various materials, shadows, caustics, blooms, and so on.

You can see this parallel processing in action when you render a scene using the mental ray rendering engine. mental ray renders scenes in separate tiles called buckets. Each processor core in your CPU is assigned a bucket and renders it before moving to the next one. The number of buckets you see corresponds to the number of cores available. The more cores, the more buckets, and the faster the rendering.

Autodesk recognized the benefits of parallelization and provides the Backburner distributed rendering software with 3ds Max. You can create your own rendering farm where you send a rendering job out to multiple computers on your local area network, each of which would render a little bit of the whole, send their finished portion back, which then gets assembled back into a single image or animation. With enough machines, what would take a single PC hours can be created in a fraction of the time.

Just running an operating system and multiple concurrent applications is, in many ways, a parallel problem as well. Even without running any applications, a modern OS has many background processes running at the same time, such as the security subsystem, anti-virus protection, network connectivity, disk I/O, and the list goes on. Each of your applications may run one or more separate processes as well, and processes themselves can spin off separate threads of execution. For example, Revit’s rendering process is separate from the host Revit.exe process. In AutoCAD, the Visual LISP subsystem runs in its own separate thread.

While today you can maximize efficiency for highly parallel CPU workloads by outfitting a workstation with multiple physical CPUs, each with multiple cores, this is significantly expensive and a case of diminishing returns. Other advancements may point to other directions instead of trying to pile on CPU cores.

The Road to GPU Accelerated Computing and the Impact of Gaming

Recognizing the parallel nature of many graphics tasks, graphic processor unit (GPU) designers at AMD and Nvidia have created micro-architectures that are massively multiprocessing in nature and are fully programmable to boot. Given the right combination of software and hardware, we can now offload compute-intensive parallelized portions of a problem to the graphics card and free up the CPU to run other code. In fact these new GPU-compute tasks do not have to be graphics related, but could model weather patterns, run acoustical analysis, perform protein folding, and work on other complex problems.

Fundamentally, CPUs and GPUs process tasks differently, and in many ways the GPU represents the future of parallel processing. GPUs are specialized for compute-intensive, highly parallel computation — exactly what graphics rendering is about — and are therefore designed such that more transistors are devoted to raw data processing rather than data caching and flow control.

A CPU consists of a few — from 2 to 8 in most systems — relatively large cores which are optimized for sequential, serialized processing, executing a single thread at a very fast rate, between 3 and 4 GHz. Conversely, today’s GPU has a massively parallel architecture consisting of thousands of much smaller, highly efficient cores designed to execute many concurrent threads more slowly — between 1 and 2 GHz.

The GPU’s physical chip is also larger. With thousands of smaller cores, a GPU can have three to four times as many transistors on the die than a CPU. Indeed, it is by increasing the PPW that the GPU can cram so many cores into a single die.

Real Time Rendering in Gaming

Back in olden times traditional GPUs used a fixed-function pipeline, and thus had a much more limited scope of work they could perform. They did not really think at all, but simply mapped function calls from the application through the driver to dedicated logic in the GPU that was designed to support them in a hard-coded fashion. This led to all sorts of video driver-related issues and false optimizations.

Today’s graphics data pipeline is much more complex and intelligent. It is composed of a series of steps used to create a 2D raster representation from a 3D scene in real time. The GPU is fed 3D geometric primitive, lighting, texture map, and instructional data from the application. It then works to transform, subdivide, and triangulate the geometry; illuminate the scene; rasterize the vector information to pixels; shade those pixels; assemble the 2D raster image in the frame buffer; and output it to the monitor.

In games, the GPU needs to do this as many times a second as possible to maintain smoothness of play. For example, a detailed dissection of a rendered frame from Grand Theft Auto V5 reveals a highly complex rendering pipeline. The 3D meshes that make up the scene are culled and drawn in lower and higher levels of detail depending on their distance from the camera. Even the lights that make up an entire city nighttime scene are individually modeled — that’s tens of thousands of polygons being pushed to the GPU.

The rendering pipeline then performs a large array of multiple passes, rendering out many High Dynamic Range (HDR) buffers. These are screen-sized bitmaps of various types, such as diffuse, specular, normal, irradiance, alpha, shadow, reflection, etc. Along the way it applies effects for water surfaces, subsurface scattering, atmosphere, sun and sky, and transparencies. Then it applies tone mapping (i.e., photographic exposure) which converts the HDR information to a Low Dynamic Range (LDR) space. The scene is then anti-aliased to smooth out jagged edges of the meshes, a lens distortion is applied to make things more film-like, and the user interface (e.g., health, status, the mini-map of the city) is drawn on top of the scene. Then post effects such as lens flares, light streaks, anamorphic lenses, heat haze, and depth of field to blur out things that are not in focus are applied.

A game like GTA V needs to do all of this about 50 to 60 times a second to make the game playable. But how can all of these very highly complex steps be performed at such a high rate?

Shaders

Today’s graphics pipelines are manipulated through small programs called Shaders, which work on scene data to make complex effects happen in real time. Both OpenGL and Direct3D (part of the DirectX multimedia API for Windows) are 3D graphics APIs that went from the old-timey fixed-function hard-coded model to supporting the newer programmable shader-based model (in OpenGL 2.0 and DirectX 8.0).

Shaders work on a specific aspect of a graphical object and pass it on to the next step in the pipeline. For example, a Vertex Shader processes vertices, performing transformation, skinning, and lighting operations. It takes a single vertex as an input and produces a single modified vertex as the output. Geometry shaders process entire primitives consisting of multiple vertices, edges, polygons. Tessellation shaders subdivide simpler meshes into finer meshes allowing for level of detail scaling. Pixel shaders compute color and other attributes, such as bump mapping, shadows, specular highlights, and so on.

Shaders are written to apply transformations to a large set of elements at a time, which is very well suited to parallel processing. This dovetails with newer GPUs with many cores to handle these massively parallel tasks, and modern GPUs have multiple shader pipelines to facilitate high computational throughout. The DirectX API, released with each version of Windows, regularly defines new shader models which increase programming model flexibilities and capabilities.

Modernizing Traditional Professional Renderers

Two of the primary 3D rendering engines in Autodesk’s AEC collection of applications are Nvidia’s mental ray and the new Autodesk Raytracer. With the recent acquisition of Solid Angle, 3ds Max and Maya now have the Arnold rendering engine as well, which may make it into Revit and other applications in the future. All support real-world materials and photometric lights for producing photorealistic images.

However, mental ray is owned and licensed by Nvidia, to which Autodesk pays a licensing fee with each application it ships with. Autodesk simply takes the core mental ray code and retrofits a User Interface around it for Revit, 3ds Max, etc.

Additionally, mental ray is almost 30 years old whereas the Autodesk Raytracer and, to a smaller extent, Arnold, are brand new. Both ART and Arnold are physically based renderers, whereas mental ray uses caching algorithms such as Global Illumination and Final Gather to simulate the physical world. As such both ART and Arnold ideal for interactive rendering via ActiveShade in 3ds Max.

For end users the primary difference between ART/Arnold and mr is in simplicity and speed, where these newer engines can produce images much faster, more efficiently, and with far less tweaking than mental ray. ART and Arnold also produce images that are arguably of better rendering quality6. Autodesk Raytracer is currently in use in AutoCAD, Revit, 3ds Max, Navisworks, and Showcase. Arnold ships with Maya and Arnold 0.5 (also called MAXtoA) is available as an preview release add-in for 3ds Max 2017.

CPU vs. GPU Rendering with Iray

However, neither mental ray, ART, Arnold, or other popular 3rd party renderers like V-Ray Advanced use the computational power of the GPU to accelerate rendering tasks. Rendering with these engines is almost entirely a CPU-bound process, so a 3D artist workstation would need to be outfitted with multiple (and expensive) physical multi-core CPUs. As mentioned previously, you can significantly lower render times in 3ds Max by throwing more PCs at the problem via setting up a render farm using the included Backburner software. However, each node on the farm needs to be pretty well equipped and Backburner’s reliability through a heavy rendering session has always been shaky, to say the least. That has a huge impact on how you can easily manage rendering workloads and deadlines.

Designed for rasterizing many frames of simplified geometry to the screen per second, GPUs were not meant for performing ray-tracing calculations. This is rapidly changing as most of a GPU’s hardware is now devoted to 32-bit floating point shader processors. Nvidia exploited this in 2007 with an entirely new GPU computing environment called CUDA (Compute Unified Device Architecture), which is a parallel computing platform and programming model established to provide direct access to the massive number of parallel computational elements in their CUDA GPUs. Non-CUDA platforms (that is to say, AMD graphics cards) can use the Open Computing Language (OpenCL) framework, which allows for programs to execute code across heterogeneous platforms — CPUs, GPUs, and others.

Using CUDA/OpenCL platforms, we have the ability to perform non-graphical, general-purpose computing on the GPU, often referred to as GPGPU, as well as accelerating graphics tasks such as calculating game physics.

One of the most compelling areas GPU Compute can directly affect Autodesk applications is with the Nvidia Iray rendering engine. Included with 3ds Max, Nvidia’s Iray renderer fully uses the power of a CUDA-enabled (read: Nvidia) GPU to produce stunningly photorealistic imagery. We’ll discuss this in more depth in the section on graphics. Given the nature of parallelism, I would not be surprised to see GPU compute technologies to be exploited for other uses across all future BIM applications.

Using Gaming Engines for Architectural Visualization

Another tack is to exploit technology we have now. We have advanced shaders and relatively cheap GPU hardware that harnesses them, creating beautiful imagery in real time. So instead of using them to blow up demons on Mars or check some fool on the ice, why not apply them to the task of design visualization?

The advancements made in today’s game engines is quickly competing with, and sometimes surpassing, what dedicated rendering engines like mental ray, v-ray and others can create. A game engine is a complete editing environment for working with 3D assets. You typically import model geometry from 3ds Max or Maya, then develop more lifelike materials, add photometric lighting, animations, and write custom programming code to react to gameplay events. Instead of the same old highly post-processed imagery or “sitting in a shopping cart being wheeled around the site” type animations, the result is a free running “game” that renders in real time, allowing you and your clients to explore and interact with. While 3D immersive games have been around for ages, the difference is that now the overall image quality in these new game engines is incredibly high and certainly good enough for design visualization.

For example, you may be familiar with Lumion, which is a very popular real-time architectural visualization application. Lumion is powered by the Quest3D 3D engine, which Act-3D developed long ago (before most gaming engines were commercially available) as a general 3D authoring tool, on top of which is lots of work with shaders and other optimizations, and easy UI, and lots of prebuilt content.

Currently the most well-known gaming engines available are Unreal Engine 4 and Unity 5, which are quickly becoming co-opted by the building design community. What’s great about both is their cost to the design firm — they’re free. Both Unreal and Unity charge game publishers a percentage of their revenue, but for design visualizations, there is no charge. The user community is growing every day, and add-ons, materials, models, and environments are available that you can purchase and drop into your project.

Matt Stachoni has over 25 years of experience as a BIM, CAD, and IT manager for a variety of architectural and engineering firms, and has been using Autodesk software professionally since 1987. Matt is currently a BIM specialist with Microsol Resources, an Autodesk Premier Partner in New York City, Philadelphia, and Boston. He provides training, BIM implementation, specialized consultation services, and technical support for all of Autodesk’s AEC applications.

Want more? Read on by downloading the full class handout at AU online: A Hardware Wonk’s Guide to Specifying the Best 3D and BIM Workstations, 2016 Edition.

Show your support

Clapping shows how much you appreciated Autodesk University’s story.