VLSI Architecture for High-Performance, Low-Cost, On-chip Learning (X1)
What is the goal of a processor? High speeds, multiple efficient cores and low costs. To make such a processor we need a well-built architecture which is adaptable, flexible and which has the ease of system integration. Architectures like the X1 sets an example, it was made in accordance to the goals and it did have some compromises.
When we talk about adaptability it is an essential quality a chip should have for neural network research and for many applications. The primary goal out here was to develop an adaptive system that learns efficiently and at high speeds on-chip without extra hardware.
Flexibility is the ability of a system to utilize any neural network algorithm. In today’s life the neural network field is rapidly changing, and algorithms are evolving which makes flexibility a necessary attribute to have of any neurocomputing engine. Programmability implies digital implementation which is very important because by building programmable analog systems the cost-performance advantage is lost. That is why the first major decision was to make the X1 an all-digital implementation.
Low cost is a basic requirement to allow neurocomputers to proliferate into real world applications. The cost of implementation is kept low by using a medium that allows the mass manufacture of complex systems, and an architecture that reduces the cost within that medium. For the X1 architecture CMOS was used because the competition in microprocessors and memories forced CMOS to a unmatched level of price per function. To reduce the implementation cost of CMOS the silicon requirements should be very small which is a major tradeoff. A centralized digital update helps in reduction of the silicon area requirements. There are other motivations for using digital techniques. Bailey and Hammerstrom have shown that multiplexed communication allows for more cost-effective implementation in silicon of complex, high fan-out connectivity.
Speed is increased by increasing concurrency in the computation. Neural networks are naturally suited for this since they are massively parallel. Ideally one would like to utilize all the parallelism that is available, including the parallel computation of all synapses, as one sees in biological systems.
By making some trade-offs the X1 architecture was created. The X1 chip has a number of simple, digital signal processor like PNs (Processor Nodes), operating in an SIMD (Single Instruction stream, Multiple Data stream) configuration. Broadcast interconnect is used to create inexpensive, high-performance communication. The PN architecture is optimized for traditional neural network applications, but is general enough to implement any neural network algorithm (learning and non-learning) and many feature extraction computations, including classical digital signal processing, pattern recognition, and rule based processing (fuzzy and non-fuzzy). The first implementation of the X family architecture is the X1 chip. The X1 is well suited for use in a variety of research and industrial applications. It represents the first member of an evolving and upward compatible line of neurocomputer architectures.
Reference: Hammerstrom, D. (1990). A VLSI architecture for high-performance, low-cost, on-chip learning. 1990 IJCNN International Joint Conference on Neural Networks. doi:10.1109/ijcnn.1990.137621