Custom 16x16 matrix display PCB

Erik van Zijst
16 min readJul 17, 2020

--

For a previous project I explored what it would take to create a text marquee on an 8x8 LED matrix display without microcontroller, using only 7400 chips, an old EEPROM and breadboard components.

That worked, but 8x8 is very small to do anything interesting and so I wanted to give it another go, create a larger 16x16 panel, design a custom PCB and ultimately hook it up to a microcontroller this time to write some games for it.

LED matrix panels

Matrix displays typically don’t have individual pins for each LED. Instead, internally the LEDs are connected in a grid pattern, exposing only the row and column terminals. Either the row pins are all anode and the column pins cathode, or vice versa.

For reasons unknown to me, the pin-out tends to be rather randomized and unintuitive.

You can uniquely address only one entire row or column at a time and so to display a steady image covering the entire display, you need to quickly scan over the rows/columns, refreshing them faster than the eye can see.

For this project I was going to connect 4 panels, creating a 16x16 screen with 256 single-color pixels.

KiCad schematic of four displays wired together

Row shift registers

While it would be possible to pick a microcontroller with enough gpio pins to directly connect all 32 row and column pins, we can only draw one row or column at a time and so it makes sense to use a shift register to select the active row.

Like a running light, we set one bit to high and continuously shift that in a circle, connecting the register’s output back to its input. The register’s output pins then connect to the matrix’s anode rows, lighting up one row at a time.

Shift registers like these are made up internally of a string of d-type flip flops, the input of each connected to the output of the previous. All clock lines are connected together.

serial-in, parallel-out shift register (www.electronics-tutorials.ws)

To reach all 16 rows, I daisy-chained two 8-bit 74HC595’s and instead of creating a circle, I expose the data line of the first as an input of the display.

This way, when we start at the top of the screen, we pull ROWSDI high, toggle ROWCLK and write a 1 to the first bit (QA) of the first register. For the next 15 clock ticks we keep ROWSDI low.

As these shift registers also contain a storage register parallel to the shift register, we actually need to pulse LATCH after performing a shift to make the result visible on the outputs.

Column shift registers

Now that we can supply power to a single row at a time, we do a similar thing for the columns, where we can sink individual row LEDs to ground, closing their circuit and lighting them up.

For this we can also use shift registers. With a row selected, we then load up a binary pattern into the column shift registers where a 0 or low connects the output to ground. Again, we need 16 bits wide.

We expose the data and clock lines as COLSDI and COLCLK respectively, bringing the running total to 4 wires to control the matrix. We also expose a single, shared output enable (~OE) pin and latch (LE / LATCH) pin. Adding a dedicated ground and power lines this gives a total of 8 wires.

External interface to the display

Because of the shared LATCH we can stage the selection of the next row, as well as the column bit patterns sequentially and then apply it to the displays with a single, atomic operation.

Constant current and the TLC59026

LEDs being non-ohmic need a current limiting component to prevent them from burning out.

If we run the circuit at 3.3 volts and the LEDs forward voltage drop is 2V, we could use 16 identical 130Ω resistors to limit the current through each LED to 10mA. However, not all individual resistors and LEDs are the same and so slight differences in tolerance could lead to subtly different brightness levels.

It would also make it harder to support different input voltages for the board. At 5V we would need 300Ω resistors which, at a fully lit screen, would dissipate 480mW in heat, 1.7 times as much as the LEDs themselves.

A better approach in this case is to use a constant current driver, a component that fixes the total amount of current in a circuit, regardless of the resistance of the load.

The TLC59026 is a chip that combines a 16 bit shift register with adjustable constant current driver, aimed specifically at circuits like ours.

The current is set through the value of R-ext which is independent of the supply voltage:

I = (1.21V / R) × 15.5

I’m using 2.2K for about 9mA per LED. Since all components can operate on 3.3V as well as 5V, we’ll be able to use the board both on 3.3V Raspberry Pi’s, as well as 5V Arduinos.

MOSFETs

We’re nearly there. If we’d build the circuit as described, we’d likely run into problems with the 595 shift registers not being powerful enough to supply up to 144mA to a row of 16 LEDs. In fact, the 74HC595 is rated to supply up to 35mA maximum and 144mA will likely cause damage.

Instead, we can use the 595’s output to switch a transistor which in turn supplies power to the row.

I have a stack of general purpose PN2222 NPN bipolar junction transistors (BJT) which are effectively current amplifiers. The input current through the base gets amplified on the output side. This is fine for low current applications, but at higher currents a Field Effect Transistor (FET) is generally a better choice.

Bipolar Junction vs Field Effect Transistors (quora)

Both BJTs and FETs apply junctions of P and N semiconductor material. A BJT allows current to flow from collector to emitter by sending a current between base and emitter. Higher current through the base translates to higher current from the collector.

A FET works differently and does not require any actual current flow on its input. Instead, applying a voltage to the input (gate) causes an electric field to extend across the semiconductor channel between source and drain which alters the conductivity as a function of the gate voltage.

Since a MOSFET’s gate is not electrically connected to either source or drain, but acts more like the plate of a capacitor, no current is drawn and so we also won’t need a current limiting resistor for each of our shift register outputs.

High-side P-channel switch

Our matrix panels are common anode, which is where we’ll need to apply the transistor. Since this is between VCC and the load (LEDs), we connect the MOSFETs source to VCC and drain to the matrix’s row anode. This configuration is known as a high-side switch. This opposed to a low-side switch where a transistor sits between a load and ground.

Lots of MOSFETs!

Like BJTs, FETs come in either N-channel or P-channel and for our high-side configuration we use a DMP3085LSD dual P-channel MOSFET in SO-8 package with a gate threshold voltage of -1V to -3V. This is the voltage differential between source and gate that drives the transistor to saturation (fully open).

This is a negative voltage because the source is connected to VCC and so to achieve a voltage differential, we need to apply a lower voltage to the gate to activate the transistor. With VCC at 3.3V and the gate at ground, the -3.3V will fully open it. When we have a high signal on the gate, there is zero voltage differential and the transistor will be closed.

This means that the signals from our 595's are now inverted and so instead of a single high bit, we need a single low bit to activate a screen row. We’ll do this in the firmware.

KiCad PCB layout

As a novice not knowing where to source symbols and footprints, I created my own custom schematic symbol for the dual MOSFET package. A decision that would come back to bite me later.

KiCad PCB layout design

I decided to keep the PCB as compact as I could, with the chips underneath the matrix panels and the female connector pins recessed so as to not protrude.

The panels themselves come in different sizes and since I could not find a KiCad symbol, footprint, nor 3D model for my 37.90mm x 37.90mm variant, I butchered a 32mm x 32mm symbol I found online and hoped for the best.

KiCad render

I sent the design off to OSH Park, selected After Dark (transparent solder mask on black substrate) and, not to repeat an earlier mistake, kept the copper pours on the outside of the board for maximum effect.

Meanwhile I ordered the components from DigiKey.

Reflow and assembly

The boards came in and they look great.

As my soldering skills are pretty mediocre and I do not have a hot air gun or reflow capabilities, I gave all the SMT footprints extra long pads so I would have an easier time hand soldering.

Especially the 25 mils pitch of the TLC59025 worried me. I would have picked a larger package had it been available, but alas.

At this point I was lucky enough to get some help from a coworker who runs our lab at work. While I didn’t order a stencil, she applied just the right amount of solder paste by hand and showed me how to reflow the board. The results were spectacular.

I decided against soldering the matrix panels directly to the board. On the one hand because I needed clearance for the connector and the TLC59025 at the very center, but also because it would make it near impossible to get to the chips in case the board wouldn’t work. So instead I put in headers.

Hardware bugs

I whipped up some Python code to test the board through my 8 year old Raspberry Pi 1B’s gpio pins. Since my old Raspbian installation did not have a recent Python 3, I instead ran the scripts on my Macbook and used gpiozero’s remoting to drive the control the pins on the RPi.

While remote gpio access through Python works fine, it is unbelievably slow. So slow in fact that writing a single row to the screen took about 200ms, or about 3.3 seconds for an entire screen refresh. This is so slow you can clearly see the individual rows light up.

Alarmingly, instead of scanning rows neatly top to bottom, it instead swapped each two consecutive rows. Something was not wired correctly, but somehow not badly enough to cause shorts or just not work altogether.

When I addressed row 1, it activated row 2 and when I addressed 2, it would activate 1. The same was true for every other pair. What each pair of rows has in common is that they are powered by a shared dual MOSFET. Since the pins from the shift registers were correctly ordered, something was wrong with the wiring of the MOSFETs.

It turns out when I created the KiCad schematic symbol I messed up the pin-out and apparently numbered pins 5–8 bottom up instead of top-down.

Pay attention when creating your own schematic symbols

As a happy coincidence, these four pins are really only two outputs as the FET uses two pins for a single output, probably due to its high 4.9A output rating. This effectively swapped the outputs of each MOSFET.

We cannot easily change the row scanning order due to our use of a shift register and so row 2 really will scan before row 1. However, we should be able to correct for this in firmware where we can preemptively swap the contents of each row pair. I did fix the symbol, footprint and PCB layout for any future revision.

Firmware

With Python on the RPi being entirely too slow to drive the screen at a reasonable frame rate, I moved to a plain C implementation on the Arduino UNO.

To hold the contents of the screen I used a simple array of 16 unsigned ints with each integer presenting an entire row. Since it’s a monochrome screen, each bit represents an individual LED.

The steps to draw the top row involve first bringing low ROWSDI and toggling ROWCLK. This stages a 0 on the shift registers that will activate the MOSFET for the first row. Then write out each bit of the first unsigned int to COLSDI, toggling COLCLK after each bit. Finally we toggle LE to latch the new values of all registers. The first row is now lit.

The process for the remaining 15 rows is identical, except that we bring ROWSDI high. This ensures the row shift registers only contain a single 0 that automatically moves down one row at a time.

volatile unsigned int screen[16];
volatile byte row = 0;
SIGNAL(TIMER2_COMPA_vect) {
digitalWrite(RSDI, row != 0);
pulse(RCLK);
shiftOut(CSDI, CCLK, LSBFIRST, screen[row]);
shiftOut(CSDI, CCLK, LSBFIRST, screen[row] >> 8);
pulse(LE);
row = (row + 1) % DIM;
}
void pulse(unsigned int pin) {
digitalWrite(pin, HIGH);
digitalWrite(pin, LOW);
}

The screen[] array is the display buffer that we can write to from the main program loop. However, because of the wiring mistake, we need to swap rows whenever we write to screen.

// Adjust for hardware wiring error in v01:
#define fix(row) (row + ((row & 1) ? -1 : 1))
void setpixel(unsigned int row, unsigned int col, bool on) {
if (on) {
screen[fix(row)] |= (0x8000 >> col);
} else {
screen[fix(row)] &= ((0x8000 >> col) ^ 0xff);
}
}

Microcontroller timer interrupts

There’s no multithreading on a microcontroller, but we need to write to the display continuously at a high, fixed rate while also running whatever code to actually do something useful and a timer interrupt is a good fit for that.

The Arduino UNO’s ATmega328 has 3 internal timers that can be used for interrupt callbacks. A timer is essentially a binary counter directly hooked up to the 16MHz oscillator. It constantly increments and loops. There’s a comparator register that you can put a value in that gets matched against the timer at every tick. When it matches, your interrupt routine gets called and the timer gets reset to zero.

Of the three timers, timer 0 and 2 are just 8 bit wide, counting up to 255 and looping back 0. Timer 1 is 16 bit.

At 16MHz, counting from 0 to 255 takes only 16us, which is not very useful for most scheduling and so there’s an additional clock divider that can be put in front of it to slow down the counting. This the timer’s prescalar and depending on the timer can be set to 1, 8, 64, 256, or 1024.

The combination of the processor’s internal clock, the prescaler and comparator value determine the timer’s frequency.

f(Hz) = 16,000,000 / (prescalar × (comparator + 1))

As an example, using a prescalar of 64 and a comparator value of 255 we get a frequency of 976.5625Hz, or about 1KHz. If we were to use that to drive our row function, we’d be painting 1000 rows per second for about 62Hz (1000 / 16) screen refresh rate.

While 62Hz is not bad, it’s actually not really flicker-free. If you swing the display around you can clearly see the flicker. Same when you try to record a video of it on your cellphone.

To choose an appropriate refresh rate we need to know how long it takes to draw one line, which for the above code is 296us, or about 5ms for an entire screen. This puts a hard cap of about 200Hz on our refresh rate. If we go higher we won’t be able to finish the interrupt routine before the next interrupt and we’ll completely starve the main program loop.

A happy medium is 125Hz where we’re burning a little over half of the ATmega’s cycles on the screen driver, leaving the other half for useful stuff.

125Hz ×16 rows = 2kHz and working backwards from 2kHz and a 64 prescaler we get a comparator value of 124.

We configure the timer in setup().

void setup() {
cli(); // disable interrupts
TCCR2A = 0; // set entire TCCR2A register to 0
TCCR2B = 0; // same for TCCR2B
TCNT2 = 0; // initialize counter value to 0
// set compare match register for 2khz increments
OCR2A = 124; // = (16*10^6) / (2000*64) - 1
TCCR2A |= (1 << WGM21); // turn on CTC mode
TCCR2B |= (1 << CS22); // Set CS22 bit for 64 prescaler
TIMSK2 |= (1 << OCIE2A); // enable timer compare interrupt
sei(); // enable interrupts
}
SIGNAL(TIMER2_COMPA_vect) {
...
}

These registers are not terribly well documented on the Arduino site, but can be found in the ATmega’s datasheet.

Logic analyzer

To validate the timing I hooked up the logic analyzer and zoomed in on the interval during which 2 rows were drawn.

Going from top to bottom it clearly shows how the interrupt routine starts by setting RSDI high and clocking RCLK (RSDI was already high, indicating that this is not the top row being drawn). Next the bit pattern for the 16 LEDs is written out on CSDI/CCLK in this case 0000 0001 0100 0101. Finally LE is clocked to latch the new values for both the row and columns simultaneously.

It also nicely confirms what we measured in the firmware, that the interrupt routine takes almost 300us, just over half the time between interrupts, leaving the rest for main program loop execution.

Zooming in further still we can see that a full high/low pulse takes about 2 × 6us. We might be able to get that down a little by bypassing the higher level digitalWrite and shiftOut functions and instead writing to the port registers directly.

Marquee

On to doing something more interesting: scrolling text.

To get text displayed I first needed a very small font. I found a decent looking 5x5 pixel font that I translated into a large 2 dimensional array stored in program memory.

Minimalist 5x5 pixel font (www.dafont.com)

Encoding each character as an array of 5 bytes where each byte corresponds to one row of pixels, I used the 5 high-order bits of every byte.

const byte font[43][5] PROGMEM = {
{0x1f, 0x13, 0x15, 0x19, 0x1f}, // 0
{0x04, 0x0c, 0x04, 0x04, 0x0e}, // 1
{0x1e, 0x01, 0x0e, 0x10, 0x1f}, // 2
{0x1f, 0x01, 0x0e, 0x01, 0x1f}, // 3
{0x10, 0x10, 0x14, 0x1f, 0x04}, // 4
{0x1f, 0x10, 0x1e, 0x01, 0x1e}, // 5
{0x1f, 0x10, 0x1f, 0x11, 0x1f}, // 6
{0x1f, 0x01, 0x02, 0x04, 0x04}, // 7
{0x1f, 0x11, 0x1f, 0x11, 0x1f}, // 8
{0x1f, 0x11, 0x1f, 0x01, 0x1f}, // 9
[...]
{0x1f, 0x11, 0x11, 0x1f, 0x11}, // A
{0x1f, 0x11, 0x1e, 0x11, 0x1f}, // B
{0x1f, 0x10, 0x10, 0x10, 0x1f}, // C
{0x1e, 0x11, 0x11, 0x11, 0x1e}, // D
{0x1f, 0x10, 0x1e, 0x10, 0x1f}, // E
{0x1f, 0x10, 0x1e, 0x10, 0x10}, // F
[...]
}

By sticking to the same ordering and offset as ASCII, it is easy to take a normal string and, using the ordinal byte values of each char as the index into the font array, convert it to an array of symbols.

To then make that scroll across the screen we can iterate over the bits left-to-right, each time shifting the corresponding screen row left by one bit and shifting in a bit from the symbol.

void scroll(const char *msg, byte row) {
byte chars[strlen(msg)][5];
for (unsigned int i = 0; i < strlen(msg); i++) {
// space is mapped to @
const int j = msg[i] == 0x20 ? 0x10 : toupper(msg[i]) - 0x30;
if (j < 0 || j > 43) return; // unsupported char
memcpy_P(chars[i], &(font[j]), 5);
}
scrollbytes(chars, sizeof(chars) / 5, row);
}
void scrollbytes(byte glyphs[][5], int len, byte row) {
for (unsigned int pos = 0; ; pos = ++pos % (len * 6)) {
for (int r = 4; r >= 0; r--) {
screen[fix(row + r)] <<= 1;
screen[fix(row + r)] |= (
(glyphs[pos / 6][r] & (0x20 >> (pos % 6))) ? 1 : 0);
}
delay(75);
}
}
void loop() {
scroll("HELLO WORLD ", 5); // scroll in the middle of the screen
}

To cut corners and due to the fact that the font does not actually cover the entire ASCII character set, I limited myself to just A-Z, 0–9, a few punctuation marks and space. 43 characters in all. Hence the odd 0x30 offset and boundary checks.

Tetris

Lastly I figured I’d write a little Tetris game for it. With four push buttons for left, right, soft-drop and clockwise-rotate, I turned a breadboard into a poor man’s game controller. I then wrote a simple Tetris implementation, leveraging the marquee code to report scores.

Scoring mostly follows the original BPS scoring system. The full code is up on GitHub.

Future ideas

It would be nice to see what it would take to do an RGB version, or even just add gray scale support to the current board although for the moment the bandwidth from the Arduino doesn’t seem quite high enough for that.

Resources

The entire project is on GitHub. The board is licensed under Creative Commons and the software uses the MIT license.

--

--