Project Brainwave is a new way of doing inference with Deep Learning from Microsoft. Below is a high level video of the key components of BrainWave. There are a lot of elements that go in to Brainwave, Let's take a look at them individually before looking at what Brainwave is and the problems it tries to solve.
FPGA Architecture - Field Programmable Gate Array.
An FPGA is a chip which is designed to be reprogrammable, hence the field programmable element. FPGAs have been used for all sorts of problems and are nothing new, however, using them to accelerate AI models is new (Linn, 2016). Building a specialist chip can take years to develop, FPGAs allow an engineer to write algorithms directly on to the chip. The FPGA can also be reprogrammed as required to respond to a change in the requirements. Linn asserts that researchers inside Microsoft noticed that FPGAs could be an option to combat the inevitable end of Moore Law. FPGAs are used in a lot of key points in Microsoft's Azure Platform. The benefit of using an FPGA over an a CPU running an application, is that as the code is programmed to the FPGA and does not use anything as a middleman the latency is non-existent. So in simple terms, a simple reprogrammable chip.
We need to define what is an AI model. When I talk about an AI model I am referring to a Deep Neural Network. This could be a Recurrent Neural Network (RNN) predicting labels on text data, a Convolutional Neural Network (CNN) performing image or audio processing or even a Generative adversarial network (GAN). This is not a shallow machine learning model. There is a high degree of sophistication baked in to these models. As a result we need specialist hardware to accelerate their performance. If we take the example of resnet-50, to score an image in Resnet-50 8 million operations are applied. This is what Brainwave is trying to achieve.
Linn notes that there a lots of large organisations and start-ups who are looking at the development of DPUs. Brainwave has taken a different stance to most others in that it uses FPGAs for this function. Rather than building a specialist processing unit, FPGAs mean that Microsoft is able to innovate as new ideas are postulated and new ways of working discovered.
Brainwave supports Microsoft's CNTK, TensorFlow and Caffe. There are plans to support ONNX the open standard for DNN models. Brainwave converts models written in these frameworks in to code which is optimised to run on FPGAs.
Brainwave is capable of running lots of different complicated workloads. Linn notes that as well as CNNs for image processing, which while complicated is easy to engineer performance metrics, Brainwave is capable of running more complex scenarios such as Long short-term memory (LSTM) models without the need for batching. If you head over to my GitHub, you will find a link to a project which I wrote for the generation of new session abstracts from previous session abstracts submitted to technical conferences. Brainwave was demoed recently using a Gated Recurrent unit (GRU). Microsoft demoed a model five times larger than RESNET-50 and achieved a record-setting performance. Each request to the model responded in less than one millisecond. This is incredibly powerful.
What alternative DNN processing units will do is batch a sequence of images to be scored. This is because GPU and NPU (Neural network processing units) are not optimised to with single requests. A batch is passed over (in increments 64,128 etc) and all requests are sent and processed in parallel. You do not get the response back until they have all finished. The aggregate performance of a batch of requests is what important here. You have to wait for all images to be scored before a response is returned. There is a trade-off between Low latency and High throughput. Brainwave, eliminates batching which significantly simplifies the process. Microsoft provides an interface and will score and respond as fast as you can stream your requests.
As BrainWave removes the requirement to batch requests, real-time inference is possible. As AI models become more applied the speed of response is critical. Typically low latency means more expensive. As you scale up compute, typically cost also increases. This is not the case with BrainWave. Performance, flexibility and scale paramount. Cost does not appear to be a problem. In the video below you will see reference to the cost of $0.20 per million images. Microsoft are working to decrease this cost!
It should be asserted that Project BrainWave is designed for deployment, not for training. are many great options for training DNNs in Azure. A Deep Learning Virtual Machine (DLVM) would be better for training. There are alternatives. At the time of writing BrainWave is only available in the East US Data Centre. If you're planning on using this in Europe watch out for data movement costs and also additional latency caused by the movement.
As mentioned BrainWave is running on FPGAs, these have been deployed in to Azure over the last few years. But FPGA can be installed on your own servers or on an Edge machine (Edge is where the cloud ends and you begin). What if you have a lot of very secure information or possibly you are working in a disconnected environment such as an oil drilling facility. Then deploying to the Edge is a credible option for you. This is a great option for a lot of customer working in remote areas or with data they do not want to worry about streaming in to the Cloud. The latency is a little worst but the trade-off far outweighs this.
BrainWave is a really interesting and impressive option for Deep Learning models. Below is a link to a video from Build 2018.
Research:
Burger, D (2017) Microsoft Unveils project Brainwave for real-time AI. Online. Available at : https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/ [Accessed January 2019]
Burger, D (2018) Hyperscale hardware: ML at scale on top of Azure + FPGA : Build 2018. Online. Available at: Hyperscale hardware: ML at scale on top of Azure + FPGA : Build 2018 [Accessed January 2018]
Linns, A (2016) The Moonshot that succeeded: How Bing and Azure are using an AI Supercomputer in the cloud. Online. Available at: https://blogs.microsoft.com/ai/project_brainwave_catapult_moonshot/ [Accessed January 2019]