Are ASICs the next big development in AI?

May 27th, 2019, Published in Articles: EngineerIT, Featured: EngineerIT

Just as field programmable gate arrays (FPGAs) and graphics processing units (GPUs) are beginning to show promise in artificial intelligence (AI), new developments in application specific integrated circuits (ASICs) are challenging to become devices of choice for AI in the future. The uncertainty began when Google announced its second-generation ASICs to accelerate machine learning. But not everyone agrees; many believe there is a lot of life left in FPGAs and GPUs.

Fig. 1: Google’s tensor processing unit.

Google calls its ASIC chip the tensor processing unit (TPU) because it underpins TensorFlow, the software engine that drives its deep learning services. TensorFlow was released last year under an open-source license, which means anyone can use and modify it. It is not clear if Google will share the designs for the TPU, but outsiders can make use of the company’s machine learning hardware and software.

Google is just one of many companies incorporating deep learning into a wide range of internet services. Facebook, Microsoft, and Twitter are also in the AI space. Currently they are driving their neural nets with GPUs made by companies like Nvidia. Some, including Microsoft, are also exploring the use of FPGAs, which can be programmed for specific tasks.

A TPU board fits into the same slot as a hard drive in data centres. Google says that its own chips provide “an order of magnitude better-optimised performance per watt for machine learning” than other hardware options.

TPUs are tailored to machine learning applications, allowing the chip to be more tolerant of reduced computational precision, which means it requires fewer transistors per operation. Because of this, more operations per second can be squeezed into the silicon for more sophisticated and powerful machine learning models. Additionally, these models can be applied faster so users get more intelligent results more rapidly.

According to Karl Freund, consulting lead for high performance computing and deep learning at Moor Insights & Strategy, there are four major technologies that can be used to accelerate the training and use of deep neural networks: CPUs, GPUs, FPGAs, and ASICs. The good old standby CPU has the advantage of being infinitely programmable, with decent but not astronomical performance. It is used primarily in inference workloads where the trained neural network guides the computation to make accurate predictions about the input data item.

FPGAs from Intel and Xilinx, on the other hand, offer excellent performance at very low power, but also offer more flexibility by allowing the designer to change the underlying hardware to best support changing software. FPGAs are used primarily in machine learning inference, video algorithms, and thousands of small-volume specialised applications. “The skills needed to program the FPGA hardware are scarce, and the performance of an FPGA will not approach that of a high-end GPU for certain workloads”, Freund said.

“Technically, a GPU is an ASIC used for processing graphics algorithms. The difference is that an ASIC offers an instruction set and libraries to allow the GPU to be programmed to operate on locally stored data as an accelerator for many parallel algorithms. GPUs excel at performing matrix operations (primarily matrix multiplications). Basically, GPUs are very fast and relatively flexible”, he said

The alternative is to design a custom ASIC dedicated to performing fixed operations extremely fast since the entire chip’s logic area can be dedicated to a set of narrow functions. In the case of the Google TPU, they lend themselves well to a high degree of parallelism, and processing neural networks is an “embarrassingly parallel” workload. Think of an ASIC as a drag racer; it can go very fast, but it can only carry one person in a straight line for a quarter mile. You couldn’t drive one around the block, or take it out on an oval racetrack.”

Freund believes designing an ASIC can be an expensive endeavour, costing many tens or even hundreds of millions of dollars and requiring a team of fairly expensive engineers. “Paying for all that development means many tens or hundreds of thousands of chips are needed to amortise those expenses across the useful lifetime of the design (typically two to three years). Additionally, the chip will need to be updated frequently to keep abreast of new techniques and manufacturing processes. Finally, since the designers froze the logic early in the development process, they will be unable to react quickly when new ideas emerge in a fast-moving field, such as AI. On the other hand, an FPGA (and to a limited extent even a GPU) can be reprogrammed to implement a new feature.”

At this stage of the development of AI, it is not expected that any of the other companies will jump the GPU ship and hop on board with their own ASIC design. They seem to have their hands full developing machine learning models. Building an ASIC, and the software that enables it is expensive and will take the focus off their current development of AI systems. Freund said it is more likely they will be combining the performance of a GPU for training with the flexibility and efficiency of an FPGA, for inference.

FPGAs and GPU are currently not under threat but with new developments accelerating faster than ever before, it is difficult to predict. One assuring factor is that AMD, a major manufacturer of GPUs, says there is still plenty of demand.

Related Articles

  • Industry collaboration determines renewal and modernisation at NETFA
  • Integrated IO-Link technology is here
  • The power of remote monitoring
  • Technical services provider launched
  • Locally manufactured FTTx products