Needle-in-a-Haystack Electronics Debugging
AKA Tips for analyzing the rare intermittent error that bugs the &%$* out of you and your system.
My last project used an accelerometer attached to an ARM processor through an I²C bus. For my application, it was important to process every acceleration sample — think free-fall detection with a custom signature. To minimize host processor load and lower overall system power consumption, I used the 32-level FIFO that comes with the accelerometer. To make double-sure I never missed a sample, the system monitors the FIFO overrun flag buried in a register of the accelerometer.
In the lab, I observed that about once every few days there were 1 or 2 overruns in the logs. This overrun event was my “needle in a haystack”, and as much as I tried, I could not root-cause it from the debug logs alone. So, I broke out my 5-Series Tektronix scope to setup some signal snooping.
This article presents a process for transforming your scope into a remote cloud-connected datalogger. You’ll get some insight into advanced triggering and storage techniques as well as experience a tool for capturing, analyzing and sharing your scope data called Initial State. My process for the scenario above, went something like this.
- Set up an advanced trigger to watch for the rare overrun occurrence.
- Auto-save pre and post-trigger data from probing the “rest of the world” (i.e. other test points on the board).
- Analyze the results.
- Present findings.
1) Advanced Scope Triggering
Scope triggering is key. If you can devise a trigger that isolates your problem, you’re half way home. A triggered measurement can capture and store pre-trigger and post-trigger data, showing what lead up to the problem and what resulted from it. This ain’t your daddy’s level trigger! The modern scope is a far cry from from the standard triggering of analog oscilloscopes of yore.
Here are some of the options in the Tek 5-Series scope that I have in the lab:
- Edge — Standard trigger when the signal rises (or falls — or both) through the a specified amplitude voltage level.
- Pulse Width — triggers after a pulse with the specified time width occurs. In addition, you can set logic like greater-than or less-than a width in time.
- Timeout — similar to pulse width except it looks for the signal to stay high or low for a certain amount of time.
- Runt — Looks for pulses that should have gone higher (i.e through the whole transition windows of a digital signal), but didn’t, often resulting from a metastable digital line.
- Window —It’s like a level trigger except the signal must enter or exit an amplitude “window” specified by a high and low threshold.
- Logic — Triggers on the instantaneous state of one or more digital lines hitting a specific logic pattern.
- Setup & Hold — Setup and hold time specifications ensure that digital signals stay in a state long enough to be used during a clock cycle. Trigger on specific setup & hold times to make sure your signals are adhering to the specs of your components.
- Rise/Fall time — Trigger on how fast a signal transitions from one state to another. Great for finding lazy signals burdened with slow slew rates.
- Bus — My scope can decode tons of digital busses on the fly. This trigger allows us to start acquisition in response to specific data decoded from a serial bus (like I²C, SPI, CAN, FlexRay, USB, Ethernet, and many more). SPOILER ALERT: This is what we are going to need to find that pesky overrun problem.
- Sequence — You can create a sequence where you trigger a measurement only after any one of the above triggers and then another one happens in sequence. It gets complex, but sounds extra-interesting when searching for a “ needle in a haystack”.
Ok, so the accelerometer is talking on an I²C bus. The chip’s FIFO status has a dedicate overrun bit that’s set when the the FIFO is full and unread. The overrun bit is sent on address 0x50h at bit D1. Technically, the bit is part of the interrupt mechanism supplied by the accelerometer, which is already being handled by the host processor. What I really wanted to do is get a bunch of pre and post-trigger data from the board around the time that the glitch occurs.
With that goal in mind, I configure the trigger to focus on D1 at address 0x50h from the I²C bus. All the other bits in that register (and across the rest of the bus for that matter) are “Don’t Care”. Check out the trigger configuration in the screenshot below.
I specified my trigger type as “Bus” and chose the I²C bus. In the screenshot above, a measurement is already triggered in the waveform area. You can see the decoded address (0x50h) and data (0x12h). In binary 0x12h is 00010010. That “1” in the 2nd LSB (Least Significant Bit) is the overrun bit being set.
I had one last triggering requirement before moving on. Once the overrun occurs, it can happen a bunch of times in a row before the FIFO is read, clearing the flag. Because of this, I need to suppress the repeated flurry of possible triggers. Remember, we are looking for something that occurs every so often on the order of hours — triggering once and waiting for next overrun hours later. Enter the notion of holdoff…
Holdoff settings for delayed capture
Holdoff basically configures the scope to ignore triggers for a specified time period. This is exactly what I want in order to get a fresh acquisition on each of my triggered measurements. I set my holdoff to 10 seconds, which is plenty of time for the system to recover from the overrun and resume normal operation. I am not worried about missing a unique event during that time, because I’m more concerned with getting an acquisition sufficiently far away from the last event to reduce the possibility of confounding variables.
Summary of triggering configuration
It may seem like a journey to get here, but I am confident that you can discern what is going on. I am probing the I2C bus between the processor and an accelerometer. The trigger is watching a specific decoded bit (i.e. the overrun flag), and waiting at least 10 seconds before allowing a new trigger to initiate another acquisition. I now have an automated system watching for a rare occurrence and triggering a measurement when it happens, even if it is days later.
Now I could just press the Run Button on the scope … and wait… but… Should I watch the scope? Check it in the morning? Will one instance of the glitch be enough? Should I debug directly on the scope? What if I need to get some help from a team member with the problem? I decide that I will save the data after a trigger, and wait until I have a few saved acquisitions before I start trying to diagnose the problem.
2) Scope acquisition settings for long-running triggered measurements
Ok, we’ve got the scope triggering to look for the FIFO overrun. Now we need to setup an acquisition that will help me identify the problem. My 5-Series Tek scope is cool, because it has eight mixed-signal channels giving me a lot of options. I probed some other digital busses, enables, interrupts, analog power rails, and some end-of-line test points. Most importantly, the scope also has an option to “Save on Trigger,” allowing me to automatically save data each time my overrun occurrence triggers a measurement.
Save on Trigger + Initial State
The ability to save data to a folder on the scope when a trigger occurs is a perfect marriage with the 5-series scope integration with Initial State. Setting up the Initial State direct connect service on my scope creates a “magic folder” on the scope that automatically sends data to a safe encrypted location in the cloud — think Dropbox/GoogleDrive for your scope acquisition. After it’s captured, the Initial State visualization system is by far the best in-broswer waveform analysis experience you’ll find (Sure, I am biased, but take a look fo’reelz).
Here’s what’s cool about this use-case coupled with Initial State. By simply targeting the long-running triggered measurement to my Initial State scope folder (i.e. the magic folder), the data is automatically cloud-synched, and I can access it anywhere. Maybe, I am ten feet away from the scope and too lazy (ahem… busy) to walk over, in another office, or already at home. I can pull up the data capture from anywhere on any device (including my phone) and see my results.
So, I went though the installation process of Initial State on the 5-Series which creates the magic folder under my username. Now I can configure the scope to “save on Trigger”, target that special folder, and wait for data to come to me (cue the make-it-rain gif). See the screenshot below for the scope configuration.
- I decided to save 5 instances of the triggered acquisition.
- Check “save on Trigger” and click configure.
- Select the special Initial State folder (which appears after you pair the scope with you Initial State account).
- Choose to save “waveform” data.
Now it’s time to Set it… and forget it. With our triggering configured as “Save on Trigger,” the data is saved to my Initial State username folder and subsequently uploaded automatically.
- Click the “Single/Seq” button to start the acquisition. The trigger shows “ready” and the scope starts hunting for our FIFO overrun trigger.
- Once found, the measurement is triggered, data is saved to the scope, data is uploaded, and you can inspect the problem from a lawn chair on a beach. That’s gotta be better than being perched on a lab stool all day.
3) Analyze in Initial State
Ok, once you are in the Initial State environment, you have all types of great way to analyze and debug. First, let’s locate my new buckets of data. Since we configured the “save on Trigger” to store 5 acquisitions, I see 5 new buckets in the Initial State application after all the overruns have been found. In my example, this took a few days, and I was able to see them show up along the way by checking in on my phone from time to time.
Next, I decoded I²C information from the raw digital signals using an expression. In the image above CH2_D0 is the SCL (clock) and CH2_D1 is the SDA (data). Initial State has serial decode built in, so I was able to write the following expression to extract the address and data.
Now I can zoom in, place an annotation at the trigger point showing the overrun flag being set for easy reference. Since I’m getting pre and post trigger data the trigger point should be in the middle of the capture. I found where my the I²C bus showed Address: 0x50h and Data: 0x12h and marked it exactly where the FIFO overrun bit is reading as “set”.
I will spare you the details of all the investigation that lead to the root cause of the problem. In short, it turned out to a be another component stretching the I²C lines (i.e. hogging the bus) during an intermittent and size-able data burst. We decimated that data transfer to free up bus time between segments, and all was well.
The ability to have all of the data at you finger tips, access from anywhere, and the ability to debug with butter-smooth waveform viewer, all in the browser, is incredible. You may find that you like exploring waveforms in Initial State better than on the scope directly.
Share and Collaborate with your Team
With a solid data organization destination, waveform viewer, expression engine, annotations all in the cloud, it‘s a logical extension to create a place where teams can collaborate on data. Let’s say you want to share your findings with the team and give them access to data with the same dynamic ability to explore the waveforms.
If you are screen-shotting scope front panels (or heaven-forbid taking a picture with your phone), please stop and check this out.
I hope you‘ve glean perhaps a little something about scope triggering, new-school debugging techniques, and team collaboration. Please contact us at Initial State through the support link if you have any questions.
Products used in the making of this tutorial: