Microsoft made an impact in the world of devoted AI hardware today. The company introduced a low-latency and high-speed system serving of machine learning models. The team proposed a new system called Brainwave that will grant developers access to deploy machine learning models onto programmable silicon and will allow achieving high performance beyond what they could get from a CPU and GPU.
This could mean that the performance of central processing unit could go beyond current levels; with some lack of batching operations trying to help hardware to handle requests as they receive them. The system that Microsoft choose is many times larger than the convolutional neural networks like the Resnet-50 and Alexnet, which other companies use for benchmarking their own hardware. As users don’t want to wait long for their apps to respond, there, providing low latency insights is important for delivering machine learning systems at scale.
Doug Burger, a distinguished engineer with Microsoft Research states- “We call it real-time AI because the idea here is that you send in a request, you want the answer back”.
He also states that- “If it’s a video stream, if it’s a conversation, if it’s looking for intruders, anomaly detection, all the things where you care about interaction and quick results, you want those in real time”.
“All of the numbers that [other] people are throwing around are juiced,” he said.
Microsoft is now using Brainwave across the army of FPGAs that it has installed it in its data centers. Burger states- Brainwave will allow Microsoft to more strongly and rapidly enable supporting the artificial intelligence features. Moreover, Brainwave is also focussing on making Brainwave available for the third party customers through its Azure cloud platform.
However, the speed shown today is being run on new hardware and Burger says that there’s a room for Intel’s and Microsoft for further optimizing both the hardware’s performance and Brainware’s usage of it respectively.