Diving into Silicon for the First Time

Loïc *WydD* Petit
19 min readApr 17, 2020

--

As you may have seen in my previous blog post, I like arcade boards, trying to understand how they behave and sometimes fixing them. A year ago, I had my hands on a “Street Fighter II New Challengers” CPS2 game board, but it had graphic glitches as you can see below.

Vintage game and computer collectors are no stranger to these problems. Because of corrosion, component degradation, battery leaks or other issues, those boards eventually break in various ways. In order to fix them, either you are encountering a recurring and documented issue, or you need to actually probe your board to see if each part is working as expected.

You might wonder: how do you do that? Well, usually you use schematics of how the board is structured and try to see if there are things that you can expect. For instance, if you know that the graphics data goes through a specific set of wires, you can test each of those to see if they have activity, if not you have a starting point to figure out what’s wrong and fix it. Of course, this requires skills and knowledge to do it well, but what I would like to emphasise is the need for schematics.

A CPS2 game board, about a thousand wires to test.

In my case, experience tells me that the issue I had felt like it was a broken trace, one line was cut somewhere and couldn’t transmit its data… but where? Sadly, the schematics for the CPS2 board are unknown. Like most of the boards of this era, if the original schematics are not leaked, the entire knowledge of the board is based on a community effort. I wanted to do my part and started painstakingly tracing all wires of this board and hoped to find my issue. Eventually, after weeks of work, I found the broken trace, patched it, and it worked! Because I had traced most of the wires, I launched myself head-first into a new mad initiative: recreate the complete schematics for the CPS2.

Side note: I stress on the fact that schematics are useful in repairing systems, but it is also useful for the emulation community whose aim is to simulate how game boards work in order to play those games on modern hardware. The CPS2 has decent software emulation programs but these are not 100% accurate and lately, people are performing more and more hardware emulation which would definitely require schematics.

A custom chip specifically made for this board.

Unfortunately, on this board, Capcom chose to use five custom-made integrated circuits. On older arcade boards, you often have well-known core components (processor, memory, sound…) bundled with a lot of very well-known basic chips, all wired together to create the final board. However, having custom chips allows more compact designs, and also allows to obfuscate and add security measures to avoid counterfeits and bootleg copies of the game. This was actually proven to be effective as no bootleg boards of CPS2 games have been created. But, this makes the process of understanding the board more complex because obviously, no datasheet exists for those custom chips.

This will be our goal in this article. We are going to try to understand what does one of the chips, the DL-1827 portrayed in the picture above, by looking at its internals and analysing how it is structured.

How custom chips are created?

The die at the middle of a package

There are plenty of ways to create an integrated circuit and technology evolves a lot in this domain. But we’re just going to focus on the global idea. At the core of every integrated circuit, you have what is called a “die”. This is what implements the behaviour of your chip, the rest is just plastic or ceramic packaging and connecting pins to the die.

3D view of a small part of a die with 3 layers of metal in yellow

In order to create the die, the process is split into two parts. The first one aims to create a silicon base called the “wafer” where all the main components will be engraved on (transistors, resistors, capacitors…). The second part will create multiple layers of metal wires in order to connect the components together.

As you may imagine, creating a new chip is a costly process. Moreover, it requires skills and knowledge that will not be part of game companies like Capcom filled at the time with programmers and electronics engineers. So, manufacturers came up with a way to develop application-specific custom chips: gate arrays.

The idea behind gate arrays is simple: instead of creating everything from scratch, you start with a common wafer that contains a regular grid of transistors. And because all logic processing functions can be built using transistors and wires: you can recreate all the necessary building blocks that electronics engineers are used to (logic gates, latches etc…). The manufacturer can design those building blocks, called “standard cells”, by preparing the wiring and the engineers can now create their own circuit in this miniature technology.

As an example, I want to build a 3 input NAND gate (e.g. output will be LOW only if the three inputs are all HIGH). To build it, I have an 8x5 grid of transistors and my standard cells are: a “AND” cell which uses 2x3 transistors and a “NOT” cell which uses 1x2 transistors. Therefore, I can create a working design like this:

A 3 input NAND gate built out of 2 AND cells and a NOT cell on an 8x5 grid.

Obviously, this is a very simplified example with simple building blocs but the idea is there. Engineers have a library of hundreds of standard cells to use, a large grid to place them, and multiple additional layers of metal wires where they can route signals between the cells. Once this is prepared, the manufacturer can just take one of the prepared wafers, apply the metal layers for the cells, apply the metal layers for the inter-cell connectivity and embed the newly created die the final package.

Now that we have understood how the idea of gate arrays work as a whole, we can look at the chip that we want to understand, see in detail how its gate array is structured and try to recreate the cells map.

A first look at the chip

Die shot of the DL-1827 found on the CPS2

As Eduardo Cruz demonstrated in his blog posts detailing his work on the CPS2 security, in our case most the chips found on the game board are based on Fujitsu CG24 gate arrays. To kickstart my journey, Eduardo and John McMaster graciously gave me access to hi-res pictures of the custom chip dies. Creating these pictures is a process that requires skills and materials that I definitely don’t have. The basic idea is to dissolve the package of the chip in acid, then put the extracted die in a microscope and take hundreds of photos that will be stitched on to create a very large picture.

Let’s now look at how Fujitsu created their gate array. If you look at the die shot, you can see the grid of the same pattern repeating over and over. Here is a 2x4 part of this grid. Those blocks are called “basic cells”

View of the basic cells grid, basic cells are highlighted in black on the right

If we consult the datasheet of the Fujitsu Gate Array, we can understand better how the basic cell is structured on the wafer. If you ignore the vertical white bars, those are metal wires delivering power if needed, you can extract the following abstraction.

Fujitsu CG series basic cell abstraction

A-B-C-D-E-F-X-Y are all contact points. The blue layer with A-B-C represents P-tubs, the green layer with D-E-F represents N-tubs, and the orange layer with X-Y represents polysilicon layers. Together, they form 2 PMOS transistors (P1, P2) and 2 NMOS transistors (N1, N2) with prepared wiring as shown. If you are like me, you barely remember how these works. Let’s simplify and imagine these transistors as switches, P1 is closed only if X has low voltage, conversely, N1 is closed only if X has a high voltage. This will be our main unit and all contact points will be used using metal conductors to connect things together. Now let’s analyse our first standard cell.

Analysis of the “V2B” cell

This first standard cell is composed of one basic cell with plenty of wires. Remember that we are only seeing a top-level view and metal wires are arranged in layers at different heights as we have seen previously, this requires some experience to distinguish between layers. The black dots are “nails” or “vias” that cross between different layers. It becomes a game of connect-the-dots using the previous transistor schematic. After some work, we get the schematic on the right. If you have played with logic gates as transistors, you will recognize that this is two “NOT” gates in parallel. If IN is low, P1 and P2 are active, N1 and N2 are not, therefore OUT is connected to Vcc so it’s high. Conversely, if IN is high, P1 and P2 are inactive, N1 and N2 are, therefore OUT is connected to ground so it’s low.

You might ask, why two in parallel? Well, one transistor is only able to provide a limited amount of power, so having some in parallel will provide twice the capacity. In the datasheet, this is referred to as a load unit (lu), using X or Y requires 1lu, and a transistor pair P-N can provide 18lu. Having two in parallel consumes 2lu as input, and provides 36lu as output.

A much more complex cell. This is a D latch with reset. Don’t be fooled, this took me hours to understand and identify with absolute certainty.

This is only one type of standard cell, and the datasheet provides around 200 possible cells. So the first task that we have to address is: identify all the cells that are used on this die. If you are interested in that, please consult the additional resources at the bottom of this article but here’s a list of criteria that help to narrow down the list of candidates without doing the tedious process of retracing all metal wires to each transistors:

  • Number of input/outputs
  • Size of the cell (in “basic cells”)
  • The load of each input and output (in “load unit”)
  • Presence of subparts like simple “NOT” and “AND” gates

In total, after a lot of struggles, I have found and identified 35 types of standard cells. Hopefully, the next time I try to understand another Fujitsu gate array of the same period, there is a high chance that the standard cells will be similar and I will not need to do this again.

Making the process tool-assisted

Let’s recap, I have a ~16000x16000 image and hundreds of standard cells in it. The usual way of analysing the gate array starting from now is to open your die shot in a software like Inkscape (vector graphics editor) and draw each wire, each cell and try your way up from here.

Screenshot of Inkscape after opening Furrtek’s DMG-CPU-B project (GameBoy CPU)

Other than using a couple of gigabytes of ram to just open the project, this is a long and tedious effort. Other people have built custom software to speed up the annotation of the die. I’m also pretty sure that I am not aware of everything that has been done on this topic. But, because I have already played with computer vision in the past. I wanted to see if I could speed up this process by identifying each cell automatically and maybe trace wires.

What didn’t work: using a grid

Traditional ways to convert a picture into paths and polygons are unfortunately not adapted to our use case. After thinking about the problem for quite a while, I thought about another approach: consider the die as a grid. I’m pretty sure that during the manufacturing process, everything is aligned, so if we can reconstruct the pattern, we can maybe simplify the problem.

Using a grid, you can feel the classification that could be done

So, now the idea is to classify each element of this grid into metal, metal-with-via and background. We also want to keep track of the connection between each element of the grid to imagine rebuilding a path.

While I still think that the idea is good, that was a failure in the end. It was actually pretty hard to create heuristics that were robust enough even on easy stuff like the detection of a via. The via may be of various sizes, blurry and sometimes not perfectly in the middle. I tried various ideas to find working criteria (comparing levels between the centre and outer-ring, playing with Laplacian gradients) but I had very unstable results. In the end, what killed the idea was when I discovered that the grid would require critical fine-tuning because it is not perfectly regular. It might still work but it requires more thinking.

What worked: template matching

I was inspired by the quick test that furrtek posted and I wanted to see how far I could go. The aim is to quickly find all standard cells on the die. The idea is simple: take a picture of the cell you want to find on the die, and ask opencv to detect it for you using template matching: i.e. find all positions in the picture that are similar enough to the template you want to find. This worked really well but I could see limits. The main one would be the following: cells can be crossed by additional metal layers used for inter-cell connections.

Two instances of the same standard cells.

While those two pictures don’t look the same at all at first glance, if you look close enough, especially if you focus on the dots representing the vias, you can maybe see the common pattern.

To mitigate this issue, I started using masks. The idea is to only match the most interesting parts of the template. I’m only focusing on wires and vias that belong to the standard cell.

I can just call the matching template algorithm by indicating regions where the colour is deep black and mark them as ignored. Now the results are stable enough to be automated. Quick note: I obviously switched to black & white for the analysis because clearly, the colour data had no additional information.

Great, assuming I have plenty of templates, my job is to figure out where those are. However, as in all projects of combining binary classifiers into a multiclass one, we have to make choices to assemble the final pattern detection. Because some standard cells are very similar, or because some include others, we will definitely have to deal with multiple detections for the same position. For that, I’ve built a simple algorithm that takes the result of the template matching of each pattern and spits out the list of detected cells without overlapping. The program ensures the following constraints: a cell is included in the final result if and only if it validates the quality threshold and either

  • it intersects with no one else
  • it intersects with only significantly smaller cells
  • it intersects with only similar patterns but a better match rate

After optimising the process to make this process fast-enough (using structures like R-tree indices to make fast intersection detections), I am able to take thousands of detections and produce the final result in less than a second.

Two identical overlapped cells

There is still one small problem I have to deal with. Sometimes, a pattern can be flipped entirely. This is possible because the basic cells are arranged in columns of two with horizontal symmetry. Adding the flipped pattern as a new template deals with the issue easily. However, some patterns are not rectangular, so the free space may be used by another cell. I have dealt with the issue by considering that a pattern may be a rectangle with one missing corner (represented as two rectangles during intersection detection). A few dozens of annoying lines of code later we have a working base. By trial and error, we can tune the quality threshold and get decent results with a 97% accuracy score.

The final die with the standard cells highlighted

What is still problematic: simple cells

Reaching a 95%+ score on any fuzzy search or classification is always impressive. The final question we may have is: why some are still problematic. Well, if we look at the results for each type of cell, we get this result sheet.

Detection results

As we can see, the only problems we have are with small cells. Apart from one missing detection on the B12 cell (size 4, mainly because the cell to detect is buried in wires and bad contrasts), all the remaining problems are of size 1 and 2. The main issue is that the small cells are really simple. And because of this, they contain very few wires, and the amount of pixels to match is scarce. This detail influences our way of scoring because the simplest defect will have dramatic consequences or conversely we may detect too much. For instance, we have 3 N3N that are falsely detected instead of the N2N. The N2N cells are detected but the algorithm picked N3N because it is larger.

What didn’t work: trying to be smart.

In order to tackle the last issues I had, I wanted to try another idea: optimize the matching score by using pre-processing and augment the contrast. The intuition behind this is that if make the metal layers flash brighter compared to the rest, make metal borders darker and with more distinctive vias, it would be easier for the algorithm to detect my patterns.

After consultation with my data scientist friends, I narrowed my preparation phase to using a sigmoid function on each pixel.

Result of a sigmoid centred in 150. Note: colours vary between 0 (black) and 255 (white).

This function has the property of bounding the result between two values and being centred on a point that we can tune. We can also augment the contrast by tuning how drastic the slope is at the middle point. Here are a couple of shots of a cell with our function centred in different values.

Those are only some examples. I tried a few sets of parameters, but unfortunately, I never got something significantly better than no preparation at all. You can find the detailed results at the end of this article if you want more details and performance numbers.

The long reverse engineering process

Enough talking about the matching process, we can assume it works for now. We have not talked about the wires between the cells yet. Sadly, this is not an easy process that can be solved with some basic computer vision algorithm. Here is a sample of a region of the die. I added a coloured version with the 16 different wires and the three connections to cells.

Wires all bundled in one place

This requires some experience but after following tons of traces, you somehow develop an eye for it. To see how weird the different metal layers can go, follow the violet wire (far left wire on the top edge) and see how it crosses the blue one.

So, I had to roll up my sleeves and just… follow each wire and note them in my schematics software.

To easily navigate inside the die, I designed a very small program that allowed me to create an overlay with my detected cells (with the name and the coordinates of the cell in the grid). I added keyboard and mouse control to pan and zoom. Also, I added colours to distinguish between different cells and a way to see which cell has been already reported in the schematics (title in blue = present, title in white = not present). At a glance, I could see quickly if a wire I’m following hits a cell I already noted.

Finally, there is one last part that was a bit tricky to understand at first: how pins are wired to the grid. The pin is not connected directly to the input/output of a logic cell that handles whatever you need, it goes through a combination of a part on the edge and a dedicated cell that drives the logic of it, here’s an example:

Two output pins with their power transistors (the grid in the middle) and two cells (in green on the right) which provides the necessary signal and power to drive.

If you take a second look at the die shot with the highlighted cells, you can see that it defines a square of cells in the middle. On the border, those are I/O cells which actually takes a lot of time to note and trace.

Understanding the purpose of the chip

All of that, for what? Now is the moment where you need to have some knowledge of how your system works as a whole. Because otherwise, you will have a just a bundle of cells with no interpretation of what to do with it and you will have learned nothing.

Part of the first drafts… only noting references and coordinates of cells, don’t mind the colours too much…

I knew one thing: I expected this chip to have a lot of useless pass-throughs because that’s what noted Eduardo in his blog. I found those, obviously, it is pretty simple to guess their usage, but I also got a lot of weirdly complex logic. I knew it was some things related to bus timings (e.g. signals that tell the processor when a data is available after a request), but without additional context, after all I don’t even know the pinout of the chip, it was hard to pinpoint the usage of each wire.

After struggling and going back and forth to books about computer design with the 68000 CPU (the CPU on the game board), I recalled that the CPS1 schematics were leaked and contained all the bus timing, hopefully, it would be pretty close enough.

Nearly identical management of the /BR signal with a synchronisation on the falling edge of the 4Mhz clock (CK250 or CLK4M) and on the rising edge of the CPU clock (10MHz or CLK16M on the CPS2)

That’s where it struck me. Even though it was different, there were a lot of identical parts. Starting from this, I could work my way up and put a name on all of the unknown signals.

After that task, it was a matter of tidying the layout, which is always a piece of art in itself, creating the documentation, triple-checking connections, and publishing the results.

Conclusion

That was a long one. Aside from learning something completely new to me, I wanted to show the number of things that you can learn by just digging more and more. Of course, I am helped with the seniority I have on computer science as a whole and building tools is not a problem for me which definitely helped a lot.

I will eventually release the tools I made as open source. For now, this is completely unusable for anyone else, if you are interested, feel free to contact me.

But, let’s imagine for a moment that I already knew all the gate array technology, is my research on cell matching worth it? The question is actually harder to answer than you would think. Of course, this is a great exercise to stretch my computer vision skills. But, I took a lot of time to adjust and try to automate things, if I used this time to just painfully annotate all the die by hand, I would have the same result. This is definitely true and you can’t blindly accept my result because my algorithm is not 100% perfect. However, I can reuse the algorithm, the knowledge and the tools to speed up the process on my next die analysis, so hopefully, it will be worth it in the end.

What did I learn on this project as an engineer?

  • Always do the work by hand before automation. This is not new to me, but I tend to cut corners as soon as I have an idea. If I made more work on tracing all wires, I would never have tried to analyse the image with a grid.
  • Simple models work really well. The final result is pretty straightforward, and every time I tried to make it more complex, it resulted in more struggling and more failures. Better tuning and better adjustments go a long way.
  • Even the smallest hints help a lot. I was stuck plenty of times on small issues, the solution was often given by a little note deep in one datasheet or in a blog post. The best example here would be how I/O cells worked, at first, I just couldn’t wrap my head around until I saw siliconpr0n’s wiki on the topic.

I hope you enjoyed this write-up, you can find the result of this work here as part of my CPS2 reverse engineering project.

Addendum: if you want to try this kind of project

  • furrtek has a complete guide on how to work with Fujitsu CG gate arrays the section on blind vias and understanding larger cells are especially critical.
  • siliconpr0n has notes on Fujitsu dice, the section about I/O pads is very important to grasp.
  • If you can find the complete datasheet of your gate array technology, there are a lot of additional information on top of the standard cell library.
  • Make sure that you really understand how transistors work together to create logic gates. Train your cell recognition skills with furrtek’s examples. Additional note: inputs are not always wired to the base of a transistor.
  • If you are able to find more hints and advice about the chip you are trying to reverse, do so. Any hints can go a long way as I said. For me, the notes that Eduardo Cruz made on Capcom CPS2 security was invaluable. Also, reading the source code of MAME for your system helps understanding how your system works.
  • Detecting wires and contact points is a trial and error process. It’s long and requires experience to master it, so, good luck!

Addendum 2: classification performance results

Here is a study I made after the project was finished and all the fine-tuning done. In this table, you can compare how each preparation influences the score of each algorithm. If you are not familiar with the notion of precision, recall and F1 here’s what you need to know: “precision” measures how accurate it is when it detects a cell, “recall” measures if it detected all cells, “F1” is a combination of two. In short, “Sigmoid 127” performs way worst than the baseline (mainly because it can’t detect some cells) and “Sigmoid 200” is very slightly better but nothing significant (that was one of my most successful tries). And because a simple approach is better than a complex one, that is why I abandoned this idea.

--

--

Loïc *WydD* Petit

French - Computer Science PhD, Hardcore Nerd and dev. I talk about engineering stuff and games so bear with me