A Deeper Look at Neuralink’s N1 Chip
Thorough Analysis of Neuralink’s ASIC Based on Their Most Recent Patent
Neuralink is an ambitious neurotechnology company that’s aiming to upgrade nature’s most complex organ — the human brain. Their team of exceptionally talented people have developed a sleek, innovative, ultra high bandwidth brain-machine interface system that far outshines the status quo.
Co-founded by Elon Musk, Neuralink is building next-generation brain-machine interfaces with scalable neural channel density and real-time data processing unparalleled to anything in the neurotech space.
All connected to their state-of-the-art ASIC, the “N1 Chip,” their current system comprises of 3,072 electrodes connected to 96 thin, flexible threads (~4–6 μm), far finer than the average human hair.
With a typical chip life cycle (from Design → Verification → Tape Out) being approximately one to several years, Neuralink co-designs their chip with the rest of the system and the tight feedback loop has enabled their small team of analog and digital chip designers to get a record-breaking four tape-outs each year, all hyper optimizing major levers in the neural implant (including miniaturization, power consumption, reliability, economics etc.)
After many great conversations with people who have, and are working with Neuralink, the team @Neuronic 🧠 decided to spend the weekend breaking down the latest iteration of Neuralink’s N1 System on a Chip to come to a consensus on what’s going on at the:
Silicon Level of Elon Musk’s Human Brain-Machine Interface
Neuralink’s N1 System on a Chip
As sourced from their 2019 white paper, the N1 is a hermetically-sealed SoC capable of processing 3072 individual channels of neuronal activity, processing 200 Mbps of data per channel using only 6.6 microwatts of power. The chip is capable of compressing neurological data up to 200 times making it suitable for Bluetooth transmission. The chipset is implanted in the skull and connected to filaments called “threads.”
From a signal acquisition scope, a single N1 chip interfaces with 96 threads, each thread carrying 32 electrodes. These threads are embedded in the flesh of the brain, with electrodes less than 60 microns from their target neuron by a custom-built, surgeon-assisted robot. Once the signal is picked up by the conductive threads, it’s up to the custom ASICs that Neuralink has designed to amplify, digitize, and filter the signal.
In the recording module lies 12 state-of-the-art ASIC chips, which you can see assembled in a 4x3 grid above. Each ASIC is capable of processing 256 channels, meaning there are 8 threads for every chip.
Snapshot of Specifications:
- 1 Recording Module/ Printed Circuit Board
- 12 ASICs
- 96 threads (8 threads/ ASIC)
- 3072 channels + 3072 analog pixels (32 channels + analog pixels /thread)
- 7.2 μV RMS Noise
- 6.6 μW Power
- 200x Compression
- 900ns to Compute
- 0.2 μA Amplitude
- 7.8 μs Time Resolution
- 4 x 5 mm Die Size
Journey of a Neuron from Analog to Digital
There are 3 fundamental stepping stones in the Journey of a Neuron from Analog to Digital with their implanted technology from an electronics perspective.
1. Analog Processing of Neuron Spikes (Action Potentials)
Before we convert analog neural signals into digital bits, we must begin by amplifying and filtering them. This is fundamentally the job of the “analog pixel.” As demonstrated in the figure above, an “analog pixel” simply comprises of the analog circuitry, filters, etc. The ideal scenario is one analog pixel per electrode so that we can configure them independently. As such, in the case of the N1 SoC, there are 3072 analog pixels and each one of these pixels takes up a significant portion of the physical space on the chip. Depending on how well these pixels work determines both the signal quality and the characteristics of the overall neural interface. When designing the analog pixel, there were three main considerations.
- Make it as small as possible so we can fit more
- As low power as possible so we can generate less heat and have longer battery runtimes
- As low noise as possible so we can get the best signals
The interesting part of designing the analog pixels is that the above considerations are at odds with each other. For example, we want to achieve lower noise on the amplifier so that more spikes can be detected, but as transistors get smaller, it becomes harder to get lower noise while keeping the power the same or less.
Another thing to note with this “stepping stone” is that amplitude is typically less than 10 microvolts. As such, when it comes to amplifying the signals, that would require a system gain of 43 dB to 60 dB in order to place the signals within the 10-bit resolution of the onboard ADC (~ 1 mV).
Impedance concerns compound the narrow ADC resolution because decreasing electrode geometries mean greater resistivity and noise in the system. In their paper published in August of 2019, Neuralink investigated two surface treatments (polystyrene sulfonate and iridium oxide), which have promising impedance characteristics.
2. Automatic Spike Detection
Once the signals are amplified, they are converted and digitized to 0s and 1s by our on-chip analog to digital-converter. Spikes or action potentials are often critical for certain BMI tasks. Currently, there are several methods for detecting spikes such as thresholding, PCA, etc.
One of the robust ways that Neuralink is going about detecting spikes is by directly characterizing the shape (different from template matching). In some cases, we can identify different neurons from the same electrode based on shape. Analog pixels can capture the entire neural signal samples at 20,000 samples per second with 10 bits of resolution resulting in over 200 megabits per second of neural data for each channel.
Neuralink is able to stream the entire broadband signal through a single USB C connector and cable. They wanted to eliminate cables and connectors so they modified their algorithms to fit the hardware by making it both scalable and low power.
Currently, they are able to implement the algorithm to compress neural data by more than 200 times and only takes 900 nanoseconds to compute, which is faster than the time it takes for the brain to realize it happened.
3. Every Channel Stimulation (Generation of Action Potentials)
The final stepping stone for Neuralink was electrode stimulation. They wanted the N1 SoC to provide electrode stimulation on top of reception. As of right now, the N1 chip can stimulate any one of its electrodes in groups of 64 — simultaneously. With an ability to stimulate even more neurons comes the performance of more complex tasks.
A Deeper Look at Neuralink’s N1 SoC
Moving on, we dive into the main patent relating to the ASIC chip that describes in detail the way that neural signals pass through the N1 module as well as the various customizations that can be made to the setup outlined in the white paper.
Different Chip Architectures
As explained in the previous section, each N1 module carries multiple ASICs to maximize processing throughput. Several arrangements were proposed in the patent, most of which can be categorized either as:
- Linear
- Two-dimensional
For simplicity’s sake, the linear arrangement, where data is passed from one ASIC to the next, will be the focus of this review.
In this arrangement, the 1st ASIC receives data from its respective channels, packetizes the data, and pushes the processed signal in packets to the next ASIC in series. The 2nd ASIC then receives data from both the previous ASIC and its respective electrode and passes on newly packetized data alongside the packets from the previous ASIC to the next chip.
This process repeats until the aggregate data packets of all the chips are offloaded to another computing system from the last ASIC. The specific amount of data passed on to the next ASIC depends on what data management techniques are employed to improve power efficiency, which will be discussed later on.
Neuralink’s ASIC is composed of a few fundamental components:
- ports for inter-chip data transfer (left in, right out)
- array of analog pixels/neural amplifiers
- analog-to-digital converter (ADC)
- digital multiplexer
- controller
- configuration circuitry
- compression engine
- merge circuitry
- serialized/deserializer: act as inbound and outbound packet queues
The data flow within the ASIC begins with the analog pixels, which are tuneable amplifiers that are organized in an 8x8 grid. The ASIC imaged in the 2019 white paper boasted 4 of these amplifier grids, totalling 256 analog pixels for a 1:1 ratio with the number of channels interfacing with the chip. Low noise signal amplification is crucial for acquiring and conditioning weak neural signals that are collected by the electrodes. Additionally, the array of amplifiers may also take part in signal compression by applying thresholds set by the config circuitry on the raw data coming from the electrodes.
Once amplified, the signal is digitized by ADCs. In the setup described in the patent, there are 8 ADCs that receive signals from each of the eight rows of amplifiers. In the case of the version in the white paper, there would be 32 ADCs.
The digitized signal is then passed to the multiplexer, which serializes the data and filters for specific rows and columns from the amplifier array. The configuration circuitry, which can be programmed via a scan chain or JTAG interface (a method of injecting instructions into flash memory) to enable the desired mode, instructs the multiplexer on which analog pixels to sample from.
From a low-level perspective, the digital multiplexer works by implementing multiple NAND gates that control which input signals get passed on to the output. In a simple 2-input multiplexer, a separate control signal is sent to the multiplexer to switch between modes: either input 1 or input 2 passes to the output. The number of modes scales with 2^n where n is the number of control inputs.
The config circuitry is the main programming interface to the ASIC, that can switch the chip between several operating modes including skip channel, scheduled column, and event voltage spikes. These modes are essentially a set of instructions that implement thresholds in different components of the ASIC, including the compression engine, merge circuitry and multiplexer. A line in the patent summarizes the role of the config circuitry elegantly: “As the remaining circuitry are unique instruments in the orchestra playing specific roles, the config circuitry is the conductor.”
Program instructions are sent via a scan chain path to the config circuitry. Scan chains and boundary scans were originally created for testing integrated circuits without using physical probes. These scans work by utilizing test and scan cells to drive a signal onto a pin and across individual traces on a board. These cells are arranged like a shift register, where the initial scan-in input propagates throughout the circuit until the scan-out.
The config circuitry also sets all the parameters for the chip including the amp settings (# of amps to read out), polling frequency of electrodes, compression engine thresholds, etc. With stored program instructions, the config circuitry continues to send instructions to the rest of the chip during operation.
Continuing with the data flow, the serialized signal is sent to the controller which communicates with the compression engine and merge circuitry. The main function of the controller is to packetize data. The controller may orchestrate the analog-to-digital conversion by communicating which columns of the amplifier to sample from and when. The data packet architecture is illustrated in the next section. Additionally, the controller modulates the sampling rate or stop sampling from the amplifiers at chosen steps. Controller instructions can be altered every 6.25 µs (160 kHz).
From this point on, the data flow diverges depending on the modes programmed by the config circuitry. Data packets may first pass through the compression engine for compression or they may be sent straight to the merge circuitry. The compression engine performs the key function of effective data management when the array of amplifiers does not apply thresholds to the incoming data. In these scenarios, the compression engine receives the raw, high-bandwidth signals, which in some cases can be sampled at 20 kHz, from the amplifiers. Compression strategies predominantly involve applying thresholds to detect spikes in a specific range, summary statistics like channel-wise averages, and/or event-based triggers off-chip. Alternatively, information-theoretic lossless compression techniques like PNG, TIFF, or ZIP may be used. In some examples, the reduction in bandwidth from the compression engine can exceed 1,000 times fewer data.
These thresholds may be set on the voltage of the signal or the frequency of the signal. Low-frequency and high-frequency signals may not be valuable to the recorder and can be filtered out by the compression engine. Non-spike signals are discarded, essentially reducing the size of the data packets, and compressing the signal. For voltage-based thresholds, a technique called non-linear energy operator (NEO) may be used to automatically find a threshold that accurately detects spikes.
Briefly reviewing NEO, it essentially filters the signals for the periods at which there are fast frequency and amplitude changes of spikes, which can be seen as short peaks in the NEO filtered output.
NEO, represented by 𝝍[x(n)], of a signal x(n) can be computed as shown above. It simply compares the deviation between the signal at n time step and the signal at n-1 and n+1 time steps.
Furthermore, a threshold for NEO detection can be calculated as the mean of the NEO filtered output multiplied by a factor C. In this equation, N is the number of samples of the signal. C is found empirically and should be tested on several neural datasets beforehand to achieve the best results.
Both the compression engine and controller play a crucial role in throttling the amount of data being generated by each chip. Throttling allows for power and performance efficiency improvements for the N1 system.
Alternatively, during the Neuralink launch event, DJ Seo introduced a novel on-chip spike detection algorithm that involved directly characterizing the shape of a spike. This method is able to compress neural data by more than 200x and only takes 900 nanoseconds to compute, which is faster than the time it takes for the brain to realize it happened. This technique even allows for identifying different neurons from the same electrode based on shape.
Lastly, the merge circuitry receives data packets from the controller, compression engine, and the de-serializer. If you can recall, the de-serializer converts packets from off-chip links to on-chip, essentially queuing in the data from the previous ASIC. The merge circuitry essentially chooses which packets to send and when to do so from the data packets created on-chip and the ones from the previous ASIC. The chosen packets are then sent off-chip through a serializer.
Data Packet Architectures
Bandwidth is one of the primary limiting factors of a neural implant which calls for throttling and backpressure techniques to be implemented to prevent packet overflow. In regards to the maximum bandwidth on a single chip, it is as follows:
- For Analog Bandwidth, which is mainly driven by the signal characteristics of a neural spike is usually around 500Hz - 5 kHz. You want to double that to capture fine detail/timing of the signal and due to Nyquist Sampling Theorem, you need to sample at twice that frequency.
- For Digital Bandwidth, it’s basically the calculation of how many bits of data you’re processing per unit of time. The ASIC generates a 10-bit number per sample, which means that at 20 kHz, each channel is generating 200 kilobits of data per second (~200 megabits per second for 1000 channels).
Knowing this is vital as Neuralink’s processing stage needs to be capable of handling this volume of data, which guides a lot of the design decisions.
Backpressure is a concept programmed into the ASIC network by instructing the controller to stall packets being sent by the merge circuitry as packets are queued, in order to fill the available bandwidth. These packets awaiting passing are placed in buffers like a store and forward buffer. The system may also be able to gauge the extent of backpressure and can instruct the merge circuitry to drop back-logged packets at a certain predetermined threshold.
Strategies that can be employed to reduce dependency on backpressure have been outlined in the previous section, primarily involving thresholds to only record neurological events. These strategies avoid polling channels at 20,000 times per second, and instead only need to sample less than ten times per second in some cases. Also, due to the “recharge interval” of a neuron remaining relatively constant, built-in refractory periods can be integrated to pause recording for a set time.
When these strategies are used, data packets coming from each chip has a variable size vs. a fixed one that is relatively easy to predict when and how much data is being generated. Instead of opting for length standardization by using payloads full of null data sets, the structure of each data packet can be minimized for efficiency. Packets may be customized according to the needs of the data it encapsulates, minimizing empty packets and waster bandwidth, thereby relieving congestion.
In the figure above, an example packet structure is shown alongside a clock timer. The packet is composed of a header, the packet, and a trailer, which is the header for the next packet. The packet in this example is made up of a 10-bit word: 1 for the header, 8 for each row of amplifiers, and 1 for the trailer.
Above is an example header, with the size value determining the number of bits per category (Name), where 2^size is the range of values in the Value column. The header includes a chip id, representing the position of the ASIC in a network of chips.
In a scenario where data compression takes the form of voltage spike detection, a data packet may be organized as above. For example, the word “00001001” represents spike events occurring in rows 0 and 3. In a full bandwidth stream, packets are always 80 bits, 10 bits per row of raw data, and may not have a header. Also, column addresses are implicit in full bandwidth packets, as the columns stream out in the same order. In other packet arrangements, the number of words in the packet data depends on the system mode.
The packet above shows an alternative variable packet example where a command and receive signal is used for packet transfer and only 2 data words are present due to compression. The example signal sequence shown would control packet transfer between the various component parts of the chip including the controller, merge circuitry, etc. The request signal indicates a packet is being requested when the signal goes high. The receive signal indicates that the receiver is ready to receive a packet when it goes high. The request signal then indicates that the packet has been sent when the signal goes low. The receive signal indicates it is no longer ready to receive data when the signal goes low. Since the system can support more bandwidth than this specific chip is requesting, this means that the unused bandwidth can be allocated to other chips.
When determining which signals to pass on, in some examples, all rows in a column are read out every 160 kHz (8x faster than the high-bandwidth channel sample rate). For each column read, the controller would then build a packet based on information set in config circuitry. For example, if only 2 rows in a particular column are requested, the packet would consist of a header (10 bits) & the ADC data for those requested 2 rows (20 bits) for a 20 or 30-bit packet.
A 64-bit vector called SkipVec configures which of the 64 channels (8x8 amp array) are sampled/skipped. Channel n is skipped if the nth entry of SkipVec is set to 1 , i.e , SkipVec[n] = 1. For example, if column 1 is being processed by the controller, and SkipVec[15:8]=00110011, the resulting packet would be (assuming chip id 2 and ADC data = 0 for all rows):
If you recall from the table with the example header, the first 4 bits are allocated to the chip id and the last 3 bits are allocated to the column number. The proceeding words are the ADC data for the rows filtered for by the SkipVec. The packet does not inherently maintain information about the origin of row data, hence to interpret row data, the receiver must also know SkipVec.
When skipping columns, for each amp column read, the controller checks the particular time step (1 of 8 then repeats) and decides whether to send the entire column based on what the config circuitry instructs. When filtering columns, an 8x8 matrix called SkipCol may be used, where the col number and a time step are indexed from 0 to 7. In such examples, SkipCol may refer to the mathematical vector instruction for the amplifier to pass event data. Let t be an integer representing absolute time, and k = % 8. Then we skip column n at step k if SkipCol [ n , k ] = 1.
For example , if SkipCol [ 7 : 0,3 ] = 11001100, this corresponds to columns 0, 1, 3, and 4 being sent on time steps 3, 11, 19, etc. These same columns might also be sent on other time steps as well.
When filtering by columns, it is generally more efficient to send the entire column in a packets vs. sending subsets of a column due to header word overhead.
Lastly, for packet management across the series of ASIC chips, traffic control is necessary to ensure sufficient amounts of information from each chip exit to off-chip systems. If each chip merely passed on all packets as they were received, the data flow off the chipset to the computer for storage and processing may be biased toward the closest chip, especially in a 50–50 arrangement (the next receives half of the packets from the incoming channels and half from previous chips).
In the case of a 50:50 arrangement with full bandwidth (refer to abbreviations from FIG.1b, C(n) represents the chip id):
- D1 w/ 100% weightage → C2, all of the packets from chip 1 transfer to chip 2
- [D1, D2] w/ 50% → C3
- [D1, D2] w/ 25% + D3 w/ 50% → C4, etc.
As you can see, as more packet transfer steps are added, the data from the prior packages drop by a factor of 2, creating an unfair bias for the last chip.
Several solutions are proposed for this problem, including offsetting the bias at each transfer step. For example, when chip 2 passes packets to chip 3, the number of packets from chip 3 are not passed with 50% of the bandwidth, rather they are passed with 33% of the bandwidth and those from the first and second chip are passed with 66% of the bandwidth. The fourth chip would only use 25%, biasing the previous chips to also share 75% in 25% splits.
Individual merge circuitry components in each chip may be programmed with these metering instructions to create balanced data packet scenarios. In some examples, the buffers in the serializer and/or deserializer may also be instructed to aid the merge circuitry in this balancing act or meter the packets it is passing along as well.
Final Notes and Applications
By leveraging the features of Neuralink’s system that have been described above, there are many levels of customization and reprogrammability that can be used for the purpose of the user.
For example, if you want to get a snapshot of a calibration curve for spike detection, you could allow titration of high fidelity information with compressed data by sampling different sets of amplifiers at different times. Chip parameters may be configured on-the-fly to help visualize the effects of different parameters from the user’s perspective in real-time.
Furthermore, an important detail that Neuralink raises is that the systems outlined in the patent for their ASICs can also be modelled/replicated using software, processing components, hardware logic circuitry, etc.
- A general-purpose computer could be used to run the multi-chip system
- Field programmable logic arrays, which is a configurable integrated circuit programmed with special-purpose instructions can also be used to implement the modules described in the patent
- For the analog pixels/amplifiers, a variety of component types can be used for the device technology, metal-oxide-semiconductor field-effect transistor (MOSFET), complementary metal-oxide-semiconductor (CMOS), polymer technologies, etc.
Key Gaps with Neuralink’s N1 SoC
Several recurring technological gaps are addressed throughout the patent and in other Neuralink sources at large.
The technological gaps with the N1 SoC can be summarized as follows:
- Signal-to-noise ratio from the amplifier scales with the size of the transistors, but as transistors get smaller, it becomes harder to get lower noise while keeping the power consumption the same or less.
- Manufacturing times for N1 systems scale disproportionately with the number of channels. It takes 5 times longer to built 3,072 channel recording system vs. a 1,536 channel system.
- ASICs have limitations in electrode array-readout capacity, which is referred to as bandwidth in the patent. You can max out the number of electrodes in the brain but if the ASIC chip overall cannot handle the high-throughput channels, then that extra information would be unutilized.
- The reprogrammability feature of Neuralink’s ASIC is limited by the number of chips networked in a series. The scan chains used to update instructions for the system may take longer as the number of chips networked in series scales.
The magnitude and order of importance of these issues are somewhat ambiguous, but at a macro-level, they accumulate to restrict Neuralink in scaling their systems and effectively recording more neurons.
Bringing Neuronic into the Picture….
In our last update, Neuronic outlined our firm belief that in order to truly move the needle in the Neurotech industry, we must begin by iterating on the tools we have today.
Since then, we have decided to scope down to ASIC design as it seems to be low-hanging fruit to cause the most amount of impact in the space.
In the next little while, Neuronic 🧠 will be:
- Building low-grade + low-cost integrated chips and hardware projects
- Analyzing state-of-the-art ASICs through patents, conversations, hands-on experience etc.
When analyzing ASICs through patents, papers, etc., we will have a very problem-focused mentality in an attempt to get a solid download of the technological gaps related to ASICs. Besides the N1 chip, we will be doing a deep analysis on other industry-grade chips like:
- Utah Array by Blackrock Neurotech
- SoC by Intan Technologies
- SiNAPS by Plexon Inc and Corticale
- The Argo by Paradromics etc.
At Neuronic, we are on the path of:
Developing Cutting-Edge BMIs to Increase the Derivative of Innovation and Growth in Neurotechnlogy.
We are sole believers that each and every one of you reading this article today can drastically change the outlook of our project, and we would love to have a conversation with you. The future of neurotech is a long and enjoyable journey; come along for the show.
We want to talk to you! Sign up for a Quick Meeting using this Link: https://calendly.com/neuronic/30min
For those who want updates a bit faster, check out our Twitter or feel free to email at mikaelhaji@gmail.com.