Good morning and it’s great seeing all of you here Thank you for coming to the session. I’m ted, a program manager on The azure learning team. Doug is distinguished engineer At microsoft research. You heard announcement with Project brain wave and what doug will be going through is deep Dive into what project brain wave is and i’ll talk about how Cuuse project brain wave. Start off with doug and give you Overview of project brain wave and how you can use that to Accelerate everything. >> Great, thank you, ted, good Morning, everyone. Thanks for coming We’ll do 25 minutes for my half and 25 minutes for ted’s and This can be interactive. If you have questions, feel free To put your hand up. We have enough time to have online discussion For those looking at my right eye and wondering what happened, It wasn’t a bar fight or technical disagreement I’m a squash player, i took a racket to the face saturday morning I had protective glasses on, no problems here, but it does look A little grim, although it looked worse saturday morning >> Okay. So this is a really exciting Moment for us because we are taking this work that we’ve been Doing across the company to accelerate our internal ai and Bringing it to our customers. So, i run a team in research That built a big chunk of the brain wave architecture which Runos top of the fpg infrastructure we built and ted Is the product owner that is really running the service and So really ted is your right point of contact I’m always happy to answer questions and forward e-mail to Ted. Since he’s the hard product Truth guy, he’ll give you some serious technical depth, you Know, i was a professor for 10 years, if i flip into professor Mode, ted may come up and pull me off So what is project brain wave? it’s a hardware Architecture like invota, google Na chip. Neuroprocessing units So project brain wave is a hardware architecture, the Reason we call it project, we don’t have an official product Name yet. It is still in the research Project nomenclature phase We’ve chosen, with the hardware architecture to synthesize down To fpga’s, rather than building a custom chip That is deliberate choice and we think at Time, it’s been the right choice amount of flexibility and also the ability to get into a performance leadership position for dnn interference Had great collaboration with intel, working to optimize the System and benefit florida — benefited from their technology Friends from intel, thank you for your partnership and support I will play a short video that we had produced and to give you A sense of our overall fpga project and what we’ve done over The past few years, it’s been a really interesting ride because We’ve been able to leverage this for all sorts of different work loads Project brain wave is the latest iteration that drives it for Deep learning. So let me start with that And then we’ll continue talking. >> In this area of exponential Data growth data center need cloud to keep up with growing Appetites for compute power. Since the early days of cloud Computing, microsoft has been innovating specialized Processors to give cpu’s boost for critical workloads, among Accelerator options, fpga offer unique combination of speed and Flexibility, idea for keeping pace with rapid innovation On fpga’s, data flows through programmable silicon level logic Blocks process instructions in parallel, a perfect approach for Big data. Unique board level architecture Uses intel fpga with interconnected configurable Compute layer, microsoft leads the industry in transforming Data centers with programmable hardware. We were the first to prove the Value of fpga’s for cloud computing, first to deploy them At cloud scale and first to use them to accelerate enterprise Level applications, beginning with me. Our leadership accelerated Networking delivered the world’s fastest cloud Our pioneering with fpga for distributing computing paved the Way for break through in artificial intelligence

Our fpga’s enable realtime ai with leading price for performance And this is just the beginning. Microsoft has an aggressive Roadmap of platform, architecture and algorithmic Innovation ahead. Azure will remain the world’s Most intelligent cloud. >> Okay So that is a little bit of the project history and i think you Saw from 2016 to 2018, we were building a lot of ai capability On to our network of fpga’s we provisioned within the company And really what the announcement today or this week has been is About bringing that to our azure customers. So we’ll talk, i’ll talk about The architecture and why it is disruptive and how we generated Performance and ted will go into what the service looks like, how You can access it. One thing that is important to Understand, when you work with raw fpga’s, typically right hdl, I’m sure many of you heard of, hardware description line, Language like vera-log, languages you code like Designing a chip and you take that hardware code and run it Through a different tool chain or compiler, hardware compiler Then you can either take that to a fab, and produce your own Custom chip or run through a different flow and synthesize to An fpga. The fpga, you can change the Image every second if you want to, although it is hard to Generate programs that fast. So what is really nice here is That we’ve built an engine that is what project brain wave is, That has its own language, so we can write down to it in a higher Level language. When you have dnn, we can work With customers to map the dnn to the engine rather than drop down To the bare metal of the fpga, so it is a much easier developer Experience than working with raw fpga’s, we’re abstracting Complexity of the programmable hardware chip from our Developers and users. But the really nice thing about That model is that we can actually keep turning the engine And turning the crank and improving that hardware Continuously under the hood. Almost like you get a noose like When using cpu, you get new cpu without changing the chip in the socket As we generate architecture, we can slide those in and they’ll Remain software compatible with models that you port You get continually improving level of performance I mean, ted knows that we — Launch offering and three or four weeks ago, maybe 10 times As slow as it is today. The team was just tuning it and Brought the latency down and now about 1.8-Ish milliseconds per image As customers, you will see those improvements in a steady stream We rev the hardware more than once a month One neat thing about that model is that when there is a Discovery in the deep learning community, we see research paper Published with some new operator, some new activation Function, new algorithm, substantially similar but Different from what we’re deploying, we can pull Innovations in very fast and roll out a new image It is like your hardware infrastructure that are running The model necessary your enterprises for those enterprise Customers, we’ll have steady stream of advances following Closely behind the discovery in the research literature It really keeps you up to date and keeps you in a super Compelling position. All right, i’ll go on with a few More details here. I don’t think we want to see the Video again. Okay, so we talk a lot in the Release about realtime ai, i would like to be very precise About what that is. I think many of you know this, So apologies for redundant information, what many chip Necessary dnn acceleration community, something called batching The idea behind batching, you take n requests, n could be two, Four, eight, 2fix, 5, 12, 154. Doesn’t have to be power of two, But often is and lump together into one big batch or package And then ship that off to the hardware all at once The reason that that works with a lot of the chips like gpu’s And tpu’s today is that the chip doesn’t have enough bandwidth Remember bandwidth to keep one request busy, you get low Utilization if you send a single request. By sending a thousand requests, The chip can work on the first chunk of each thousand request And second chunk of each thousand requests and all Thousand finish at once. The aggregate performance is

Higher and latency you see is actually much slower To date, typically people have to make a trade-off between do i Want to run with low latency or do i want to run with high Cost. What we tried to do is eliminate That distinction and so we got rid of batching The chip runs nearly full through-put with single request You send one request to the project brain wave service, the Chip gets very high utilization and it sends you the answer as Soon as it is processed. You are sending stream of Individual requests and getting a stream of responses back And so you don’t have to play game necessary software, Batching up a thousand separate requests waiting for them to Arrive, batching together, doing data marshaling and waiting and Getting the results back. Really simplifies the way you Use this system. You don’t have to think, well, What is my latency bound, how much should i batch? How much will it cost me to do that? we just say, we provide single Interface, send requests as fast as you can and responses come Back really fast. That is really what you want in Realtime scenarios, when you are on — we’ve talked about jable, Lead customer, one of our lead customers, they have a Manufacturing line, they don’t want 256 boards fly by the Manufacturing line before they ship the images off to dnn, to Be processed and figure out which board is bad, they are Down the line in the next stage, being incorporated into products Right? they send it, look at circuit Board, send image, comes back good or bad, if it is bad, they Pull it offa. Good, it keeps going and keep up With rate of the line. They can speedup the line now Because of those capabilities. So for deep learning, the deep Learning is starting to cover many, many areas and initially It is being integrated into workflow for things pretty Common, vision, language, speech, question and answer, Knowledge. But what is happening more and More, as we’re discovering new uses that humans aren’t good at, Those are harder to find, things like sales, lead generation, Malware detection, bug finding, many, many, many applications of Deep learning that we’re discovering work well that we Didn’t — just didn’t know we haven’t evolved as humans the Way with speech and vision. So this, these capabilities are Growing and becoming broader and broader and seeing them get Incorporated into microsoft products and competitors are Doing the same thing. It turns out many of these use Cases end up being realtime on inference And so if you’re having an interactive session, or you’re Integrating with some live process like a manufacturing Line, or you’ve got a thousand cameras in a retail store, and You want to know what is happening at any given point, is Somebody shoplifting, some product out of stock, Refrigerator door left open? these are things you want to Find out as soon as they happen. That low latency is really critical Bee eliminating the trade-off between cost and speed, we can Now integrate realtime ai into processes with really no kons Sequences or trade-offs. Wree think it is a pretty Exciting ouring. — Offering Previously to the release, dnn been pretty constrained by Performance and power, which is why we’re starting to see new Accelerators be generated and because these accelerators tend To be pretty memory bound, like i said, that is why you haven’t Needed batching until now. With this release, now, we can Give you hardware accelerator you can get realtime ai, get Responses quickly and get very competitive costs, really Trade-offs go away. One other interesting thing We’ve announced and i’ll touch on this a little later, do this For cloud and edge. So a lot of customers have Realtime flow necessary their factories, their warehouses, in Their oil platforms, oil wells, all, you know, in their ships, Their drones, their airplanes, in their office buildings, all Of these things have servers. And very often just the data You’re processing is so intensive, you don’t want to Send it all the way to the cloud or don’t have time to send it All the way to the cloud. Realtime ai and the edge, in Servers that are on customer premises is really important There are also scenarios in the cloud that people care about, Like when your big data sets are in the cloud, you have multiple Data sets streaming from multiple places, satellite Imagery, you want to correlate with other events and so

Definitely realtime use in the cloud, as well, but also Realtime uses on the edge. When we develop this capability, We thought it would be really important for customers to have both Now the cloud offering is a little ahead, we have people in Production today, thanks to great work of ted and his team And we are working through the details of the edge offering, But we are in preview and working with partners So now i already talked about why we chose fpga’s rather than Synthesizing custom chip, i want to go over advantages again, Because there is a lot of noise in the community, i mean, There’s start-ups that raised hundreds of millions of dollars And really want they’re talking about high levels of performance And there is a lot of one-ups manship with metrics and Benchmarks and just a lot of chatter because the economic Stakes are so high. I really want to cut through That. And so, the first thing we did With the project brain wave architecture, we want to do this Realtime thing, get rid of batching. That means you both want to have The high through-put that i mentiond and very low latency, So performance that are on the left was one of our primary Goals and so what we’ve been able to show is that we’re Actually, we have industry-leading latency for the Requests we’re sending over to the fpga, the launch model we Have is resonate 50, general image classifier, very popular Image based dnn, of course working with lots of partners on Other models and ton of models within microsoft, rolling these Out in rapid succession, but on resonate 50, no one show level Of performance at any batch size comparable to what we are Achieving on fpga. We’re hitting about from local Host to the chip and back about 1.8 Milliseconds per image, that Number is continuing to drop. There is a little bit of because Of all this economic stakes and numbers flying around, people Have been pushing the meme, if you build a custom chip, it will Be faster. They talked about asics, Activation integrated circuit, people think is efficient, they Do one thing. It turns out all dnn Accelerators are programmable. They are all programmable Engines, they all have an architecture that you can Program, too, you want to run different models, that Programmability comes at a cost. You want to be able to do Multiple things and make design choices when you build the chip So tpu’s, gpu’s, other mp’s, neuroprocessing units from Start-ups and the project brain wave engine on fpga’s and other Neural network implementations people are doing on fpga are Examples are programmable accelerators and really what Have you to look is architecture at the top layer, not deployment Strategy, fpga or custom chip or something else What we have shown here is that we’re able to achieve Industry-leading performance with the fpga By choosing the right architecture and a lot of people Thought that wasn’t possible, but i think a lot of that is Misconception and fun. I talked a little about the Flexibility in that we can incorporate new discoveries Quickly, tune the engine continuously, you get this Rapidly improving stream of innovation. Really without changing your Software model, so we pour the model and then your engine gets Better and better over time. Now of course, we can do things That break the architecture if we get radical improvement, Discussion with customers, you know, just opportunity We will support backwards compatible models Another interesting thing we do, though, this is something we Haven’t talked about widely yet, is dnn models are actually very different A convolutional for image process suggest very different From recurrent network. Many different networks coming Out of research literature. Different networks have Different requirements. And so what we do, in the Project brain wave system, is we will actually generate images of The hardware accelerator that gets mapped on the fpga that are Customized for individual models or classes of models So if you have four big buckets of models, we can generate four Varians of the engine tailord and stream lined to the models As opposed to single-fixed architecture that has to serve All models and you make design compromise and pick those when You freeze the architecture and then that has to work for two, Three, four, five years. We’re able to shrink wrap the Engine to fit models. In fact, we do things like pick

Data types, instead of 32-bit floating point or 16-bit Floating point or eight-bit integer, might pick nine bit for This model and eight bit for this model. That happens behind the scenes, In the tool, we have flow, take model and cast into the right Data type and pair those with an image of the projects brain wave Architecture that has been optimized for that model When you think about the edge offering, what you want to do And i think i know we’ll get into details later, you have, You’ll be running azure client on server and connect up to Azure ml, you trained your mod and he will pull down the model And a version of the project brain wave engine optimized for That model. So really the fpga images that Architecture and the models are paired tightly together So you canapt myself on per model basis That is capability really that very few other offerings have Will give you another boost in efficiency up beyond what we’re Doing with raw architecture and realtime ai and elimination of Batching. Finally, in our cloud, for cloud Offering, we have scale. Microsoft has been putting one Fpga board in about every new server it buys for three years, Azure started in 2015 and doing this about the same time frame We’ve deployed massive numbers of these things We have more scale than anybody else in the industry for this Technology. And so there is some demos this Week, the azure demo in ai for earth, been shown already, Right? >> mark raci novic will show it >> I won’t give it away. Should i say it? okay >> Go to mark’s talk. >> Go to mark’s talk and see the Demo that, emphasizes large scale we’re operating For customers that want to integrate very cost-effective Realtime ai into their businesses, you know, at the Business site, we can do that, our offering will do that If you want to go into the cloud and intersect flow and do ai on Flows, we can do that. If you want massive scale, we Can do that, that is one other advantage of the cloud, massive Scale and that elasticity, if you want to do, want to be Processing hundreds of millions of images, short time frame, we Can scale out and get the job done for you That is a really compelling capability that i think we’re Just starting to understand what that really means because they Are all these big data sets and people haven’t been able to Operate at the scale and speeds before. It will make new things Possible, like continuous monitoring of very large data Sets, large facilities, geographic regions, think about The power grid, what you might be able to do there with Millions of producers and millions of consumer as opposed To producers, move up and down. Worldwide scale is an Opportunity to really start building control systems, but You need large scale to run the control systems with ai, that is Something we offer. Okay Then we don’t really talk about cost with other technologies Because there is a base cost and margin, what we have been Talking about is cost to you, the customer. When you come to azure, for this Offering, and you want to get on ted’s service, his team’s Service, start classifying images for your business, for Your research, you know, for your personal life, whatever, The current cost of the current offering is under 20 cents per Million images, that is what it costs you to rent the service, Process million images about $.20. 21 On the slide, but because We’ve been tuning the engine, the costs have come down because Through-put has gone up since we made the slide I’m hopeful in a few weeks we’ll be 16 cents and 15 cents I can’t keep up, we’re going to leave it at 21 cents and speak To it. Actually i think ted, we’ll be At 15 cents in a few weeks, a month. Okay That is also amazing thing and this level of capability hasn’t Been available before. Go ahead [Question inaudible] >> so the low cost — yeah, the Question, i’ll repeat the question for everyone Low cost is what latency. It is a little bit of nuanced Answer, i will give you honest answer, rather than marketing answer The low cost is at that realtime ai batch one offering The okay. And so today on the — from the Service hosting the chip, like i said, 1.8 Milliseconds, Currently we haven’t tuned the software stack as much as we Might, it is still a few milliseconds over to the server and back

Okay, so we’re seeing six to seven milliseconds and then we Also haven’t tuned the flow of images, so if you’re using all 24 Cords, for example, on host machine, driving all that data To the fpga and then back, there is a bunch of q’ing driving up Another 10 milliseconds. There is actually a additional Latency when driving at high through-put, but that latency is Not fundamental and we’ll be driving it out, you know, over The next few months. So for example, the time to the Server and back will go to near zero, we have networking Capability, i might be getting over my skis, but i’ll keep Pushing on that. So i think marketing answer is Also true, there is no at that cost, no inherent latency built Into the system. It is industry-leading latency Of the chip, batch size one, realtime ai, that is all true Just today when you are driving maximum throughput, a bunch of Queueing elsewhere in the system, not on the chip itself, Adding latency, that will come down over time. Okay So like i said before, really what we want no compromises, Give you realtime ai, high performance, at that cost Structure, which is continuing to drop The and i’m confident we’ll get there and it is already really good Ted, do you want to add anything to that? >> Yeah, the question from latency, like doug said, if you Were to send an image as fpga, six milliseconds end to end and So basically the service offering is about 42 send cents an hour The to run one million images takes about 31 minutes, that is Where we come up with the 21 cent mark >> Thanks. >> And continuing to drop Emphasize our team has been burning so hard up to build, Every cent we’re squeezing down, they get excited by Kind of a sad life. Okay. I think i’ve talked to a lot of this I’m going to say a little bit about our internal technology Because we don’t have quite ready for the public offering, So today what we’ve deployed in azure are servers that four of Intels, fpga’s on individual carts Okay, that is, you know, fairly standard, high-end server, it Has four cards. You already hashed the price Structure to rent vm, you can rent multiple ones if you want More through-put, multiple boxes What we have actually built within microsoft and haven’t Brought to customers yet, want you to see where it is going, Cloud. What we do within our Infrastructure is the fpga directly to the network and Though they are network attached devices which allow us to scale Out with low latency and what you can see here for large Models and this is something we do in world-wide production Already, if we have a large model, take it and then we’ll Just swipe it across those x there, fpga’s talking to level Zero switches in the data center up to the level of the switches This is something aspirationally i would like to bring to Customers, get ability to run larger models at greater speed The point, we have a road map where there is advanced Technology coming down the pipe, we have to figure out the Kayedence to get to azure and bring to the customers Of course, on the fpga, is that project brainwave hardware Architecture that synthesize down the hardware you can see Over there on the right of the chip And so we’ve talked about the speed and the trying to Eliminate trade-off between cost and latency, we have tuning to do And we’ve talked about the flexibility, another advantage We get this continuous stream of updates and improvements and you Also get the ability to customize the engine for Different model, gives you additional bump over what we are Able to do with other technologies. The last point that i would like To emphasize and i think have you hopefully heard this is the Fact that we want to support many of the frameworks we want People to use any framework possible that they bring to the Doing models, like to ren tens flow, lower latency, i think we Are already there. We want you to bring pipe cord, Many others and have open source, open neural onyx format We’ll be supporting. Many people in different Frameworks, we have on this infrastructure. And get great benefits without Being locked into a vertical. Question in the back

(Question inaudible) >> so the question was, do asics Become more cost-efficient than fpga’s and i think maybe i’ll Use that and like to address that question now, i think it is Really important to address. So i mean, i’m long-time Computer architecture researcher and cpu architect by training, Maybe this is a new advepture. But if you want to be academic And knitpicky and asics, i’m sure you know this, application Specific integrated circuit, typically that means fixed Pipeline that does one thing. All these neural architectures People are bringing, programmable architecture and Can be synthesized, not asics, synthesized down to a chip, how People are using asic today or fpga So now if you decide hey, here is a neural processing npu Architecture that i want to run at very large scale, and i’m Synthesizing to fpa, convinced it will not change and models, Class of models it supports as opposed to other variance is so Big i want to optimize for cost. I might be willing to pay $50 Million and stamp out hardened version of that, where you know, The area slightly lower. Take something synthesize fpga And make more efficient copy, although you have to freeze the Design and it takes several years to do that and get it in Deploy sxment costs tens of millions of dollars There are scenario where is this makes sense and but at least for Us, you know, every quarter we ask ourselves is it time to take One of the engines of the many we synthesize and we haven’t hit That point yet, still iterating it fast In theory, it is possible, but right now all the economics say No, this is the path we’re on is the one and really, you know, Right now we’re at a think a very strong performance Leadership position in the industry. While retaining all that Flexibility, it is a pretty good place to be Yeah. I mean, again, i think what Really matter system architecture you pick above that Custom chip deployment vehicle and so i’m just as an exacademic And ted might want to stop me when i’m running over Five minutes, okay. What is fascinating right now, All of these different designs people are building are Fundamentally different architectures, brain wave is out Of order vector data flow thing that looks under hood like super Scaler made of vectors, but much wider vlp, you know, the other Start-ups have massive number styles, statically scheduled Tiny processors, which all the smarts in the compiler, google Has cistolic array, old debates we had in the ’80s about, replay Now and no one knows what the answer is People that pick the right architecture do well regardless Of how they deploy it. Again, we’re iterateing and Learning fast, i’m pretty optimistic about where we are Okay. We talked about scale, skip over That. And i think this is slide 15, Isn’t it? >> one more >> One more? i’m going to turn it over to Ted, i think this is where he can pick it up >> Okay, cool. I got done three minutes early >> Thanks, doug. This is first time you have seen It, three minutes early. Thank you, doug I’m ted, i’m a pm on the azure learning team and doug Essentially described to you the engine of Lamborghini, beautiful engine, superly, highly tuned And optimized and just hums. But it is an engine So from the product team, i’m here to give you the keys to the Ferrari, and show you how to drive it. In terms of project brain wave, Talk about the first thing is resonate 50 Resonate 50 is network, way i like to think about networks and How they can be flexible is in the context of bump — dock Say you want to train a bomb-sniffing dog A bomb-sniffing dog is a german shepherd, imagine having a Three-year-old german shepherd, with the infrastructure to be Able to take in smells and be able to distinguish among smell Necessary a very sensitive way. That is what a german shepherd Is. German shepherd is not a

Bomb-sniffing dog, you give it some smells, this smells like a bomb This doesn’t smell like a bomb. This smells like a bomb This doesn’t. 20 Boxes of kibble later you Have a bomb-sniffing dog. Training that german shepherd is Not that difficult. Building the german shepherd is The hard part. Essentially resonate 50 coming Out of resonate out of microsoft research is this german shepherd When we launch with resonate 50, have you capability of having German shepherd, take in data, image data, take out features And then to be able to train features to do the things you Want to do. So as with the german shepherd, Now have a bomb-sniffing dog and Fruit sniffing dog, a skin cancer sniffing dog, different Things you can do. And i can’t emphasize enough for Doug and his team the kinds of world-class researchers we have To be able to do that industry-leading performance And so in terms of what we have from the azure machine learning Integration, doug talked about no batching required, i’m going To now talk about what that looks like from azure machine Learning. Can we roll the sound? Let me try to replay the video. >> Connecting with people from Different cultures, finding and treating cancer earlier, making The world accessible to everyone, today’s breakthroughs In artificial intelligence come from deep neural networks using Large multi layered models that need amazing amounts of Computing power. Running models high scale, low Cost and ultrafast speeds always been extremely difficult Not anymore. Project brain wave unlocks Future of ai by unleashing programmable hardware using Intel fpga, delivering realtime ai as blazing speed with no Batching, no compromises and no need to choose between high Performance and low cost. Azure machine learning Accelerated models power by project brain wave enable data Scientist and developers to train deep neural network and Deploy them to the world’s largest configurable cloud with Record-setting performance. Resonate 50 is Imclassification model Eight billion operations, project brain wave leads the Industry in speed on resonate 50 models with under two Milliseconds per image with image classification that can be Customizeed with your data. Jable, one of the most Technological advanced manufacturing companies on the Planet maintains rigorous standards of quality control Jable identifies deep x in electrical components using Human judgment one set of photographs at a time What if jable could analyze thousands of image necessary Seconds using deep learning to reliably identify anomalies for People to examine more closely, making humans more effective and Making the whole process faster and more accurate Now they can. By using azure machine learning Accelerated models to train and deploy a model, whether in the Cloud or on the edge. What can you build with Accelerated realtime image processing, detect spills or Open freezer door necessary retail store, conduct realtime Medical imaging analysis, inspect equipment, crack Endanger species and this is only the beginning, accelerating With programmable hardware means project brain wave can quickly To keep up with rapid innovation and deep learning Microsoft is using many models for test, audio, sfeech and Natural language and models to you soon You will be able to accelerate custom models and design the Next generation of realtime ai applications Start accelerate withing azure machine learning and project Brain wave today. >> All right, so let’s talk a Little bit about azure machine learning and in terms of how we Can integrate with project brain wave Azure machine learning is end to end data science platform and You think about what data scientists do today Data science is sexiest job of the 21st century, but talk to Data scientist and they are probably spending 80 of time on Boring tasks like data cleaning and data janitorial tasks Starting from intelligent data preparation perspective, having Tools to be able to help you get that data into cleaner form that You then can process. In terms of model training, Giving you flexibility of training your model on the Computer that makes the most sense to you. Maybe you want to train a model Very quickly on local work station, trying out different Models and trying different framework to see how you are

Able to get something that seems to be promising, you want to do That quickly on a local machine. Then you might want to scale up To spark cluster, run cluster on cpu’s, submit jobs to big Cluster you can train on to be able to train your model Maybe for deep learning models, spend gpu cluster using azure Batch, spin-up, submit your job to train and then spin down again Integration of all of these various compute context and Recently with integration of databricks too to train models Quickly frchlt model management perspective, this is also Starting to be very important. We were talking to cio of a wall Street firm, we have 10,000 learning machines in our firm We make decisions from them, no idea who created the models, no Idea what the training results are, no idea of the performance What happen necessary that situation? it would be great to be able to Get the results from model, trace the model back to the Source code, know the data scientist that created the Model, know the training results and all that, that is becoming Very, very important from enterprise perspective We bring in the best of open source, whether you want to use Python, microsoft, python tool kit, tense flow, cafe, different Frameworks you are able to use in azure machine learning, so Much is happen nothing source to bring the best of open source A lot of great features and functionality in open source, Which is great, some thing open source does not care about are Boring things like compliance and security and v-nets and Things enterprise cares about. So what we do with azure machine Learning, we give you the best of open source, the best of Microsoft and then package it up to enterprise grade end to end Data science platform to be able to build and train models So the preview offering for azure machine learning hardware Accelerated model is incorporateed into this entire Infrastructure and platform. The same platform you are using To train models, deploy to cpu, deploy to gpu, deploy to the Edge, you are now integrating to be able to deploy on to an fpga So this is python, this is tense flow and creating mod and he Will deploying it in a serverless architecture frchlt Infrastructure perspective, quick overview, our first region Is east u.S., So east u.S. Is what we’re offer withing launch If you deploy model today in east u.S. Each of the stamps Have 20 racks, each rack has a lot of boxes and each boxes have Four fpga’s like that story i met a man from st. Kits, each Wife and each wife had seven kittens and how much were going To st. Kits? basically this is the number, How we basically have all of these in the data center We’re going to be adding new regions very soon in west Europe, southeast asia, and south central u.S., Coming in The next few months, you can deploy modelos fpga’s in these Regions and expand out from there In the actual azure host itself, basically we have the intel Fpga, four of them, there is wire service that basically is a Thing that programs the fpga. Doug talked about this image That you flash to the fpga. Today is that highly optimized Resonate 50 image. And then in this host, we have Our vm, this is an azure vm, currently this is available only To us as azure machine learning, back at ap i, we as azure Machine learning provision the vm, and flash the fpga and give You back that api On this vm, we have our vm extension, so when vm is Provision, the vm extension talk to wire service and wire service And tell it to flash the fpga. And then that brain image is Flashed to fpga, this is the way that we talk to brain wave talks To the vm to the actual fpga and on top of that is our azure machine learning So basically then we expose the api, just way that people are Able to expose these today. And that is essentially what Happens underneath covers when we give you this api I have monitoring, etc, and goodness you expect from azure service The pipeline, which i’ll be talking about, actual model you Are deploying will be running in the azure machine learning run Time and so part will be running on cpu For example, maybe you’re preprocessing and converting j Peg picture to — running on cpu And then when those tens are run on the fpga, that gallon to Brain wave and process that on resonate 50, on the fpga for Resonate 50 and then you might have some post-processing, so

Once you get the extracted features, you’re going to be Running that classifier and that classifier runs on cpu The idea is we have highly optimized operators that can run On cpu, and fpga, and then accelerating that on fpga When you actually deploy a model, basically again, using Azure machine learning and that will go through the code and Notebook for this in a minute, but you’re writing tensure flow, Using python tensure flow doing preprocessing, converting jpeg Images to tensures and deciding whether you run the fpga and Then you might train classifier at the final stage and then After that, what happens is service definition goes to model Management service and this is same model management service in Azure machine learning today, meaning if this model, if this Destination is about to go to fpga, great, model management server knows about it Maybe you created another model run og cpu cluster, knows about it In this way, you can imagine the model necessary your enterprise, You know which models have been created and you know where Models have been deployed, one central location to manage your models The model management service talks to control plane service, Spin up and provision that model and this is just another view of That vm, where we have that orchestrator, the orchestrator Might run preprocessing on the cpu, and runs resonate 50 on Brain wave and then for classifier, then you have Essentially again, tensure flow classifiers you might want to run We can also front this with software load balancer, right Now, in terms of the through-put we’re seeing about 1.8 Milliseconds, getting 530 images per second Say you need 2000 images per second, what we’ll do, make lots Of copies on those on multiple vm’s and front it with software Load balancer to get through-put that you would need Essentially this is architectural diagram of what Happens when you click deploy and i’ll show you when you run One line to deploy model all this is happening underneath the Covers to give you the api. Let’s run to that right now To our cloud service. Okay. Sorry Let me see if i can apologies For this, doug, did you want to take questions and let me remote To my machine while i’m — yeah. Okay >> Anyone have any questions? start here and go back Ct->> (question inaudible) >> right Let me see if i — so there is a bunch of different architectures Like we discussed, which do i think is the right one in the future? >> Yeah. >> Well, it’s a great question, I wish i knew the answer. We’ve taken — >> (Inaudible) — >> microsoft Okay. The Trade-offs are really i think threefold One is how complex is the software interface, and so, for Example, if you are providing 1000 cords and the schedule has To be static, that is one approach people are taking You are presenting like the titanium world in a broader sense It’s a very, very software intensive interface And those architectures, i’m not a fan of those architectures, They haven’t worked super well in the past, this time might be different More tractable for the software to imagine I’m waiting to see. The solid array approach is through-put approach

Right? that is not one where you would, You know, that is probably actually pretty good for Training because you can just bang through a lot of requests And work on slices. And then the approach we’ve Taken is actually we expose single threat of control with Venture operations. So you have Nnotated c that are vector operations and those get Taken and broken into many, many, many suboperations in the Hardware and schedule on the fabric and run as full graph The advantages of that are that you, the programming model Pretty simple. The disadvantage, you need Pretty so-50 indicated hierarchy of decoders, which we’ve built And is working. You have seen the results And then another advantage you get is software compatibility If you want to roll out new generation of fpga or new image, I would have to change the program, i could just change the Micro-architecture under the hood I don’t think how wide in scale, brain wave architecture, single Instruction matrix operations that generate over a million Mouth operations per instruction and those get fanned out to 130,000 Parallel units run for 10 cycles So we do that, 1.3 Million operations in cycle from single Instruction. Okay. So i won’t say which is right, Are you good? but i do have my biases and Those are trade-offs as i see them. If the compiler can schedule That stuff really well, you can really lower the complexity of Decoder that i talked about in hardware Okay, maybe we’ll go back to ted and take a couple questions once We are through the demo. >> Cool. Thanks >> Great question, thank you. >> All right, cool So just to walk you through demo in terms of what the experience Will be, we have our github The notebook, you can set up environment, everything is Packaged in nice condo environment, by creating this And activating this, this will install dependencies in your Environment that you need. Next step here, you can see just Python tensure flow. Image preprocessing, you are Defining how you convert the images, we have just a bunch of Nodes and the output of one node and input to the next one will Be tensures, you will be taking jpeg images or whatever they are And converting them to tensures in the preprocessing step The next step here is where you are loading a quantitized Version of 50. Optimized version from doug’s team What this means is that this quantititized version gives same Results from cpu, and gpuas fpga. Now you can featurize your data On cpu’s or gpu’s. When training, maybe spin a Cluster and featurize your data with this featurizer The features are the same as if running on fpga so when you Train your classifier, the classifier will be performing Very well and so this is that quantititized version of Resonate 50 and moving forward, we’re going to be enabling Different types of models. Think about the various models Out there net, yolo, inception, resonate, plethora of models, But more models in the model gallery that we’ll be able to Bring in. And then after that, you’ll be Training classifier and classifier is essentially the Thing that takes features and then train the classifier and Then we package up all of these into what we call a service definition So preprocessing step, the step that runos fpga and then the Classifier, all of this is service definition file and that Is what we take. When you deploy, this is just Your azure subscription id, with your model management account And again, the same model management account that you are Using for azure machine learning, the account you are Using to deploy model to cpu, oh, gpu’s, same account, Everything incorporated into the end to end platform Here is one line i talked about. This creates service line right Here and all that we saw in the previous slide, when you click This, when you run this, it is going to take that service Definition, registered with model management account, go to The fpga vm, that model will now be on the vm From a client perspective, now that you have this api, exposed, Cucall it. So here i have a picture of a

Chip that jable might be interested in analyzing Think about a circuit board and all the various components of Circuit board and determine whether that passed or failed The inspection. Let’s click right here and you Can see just how fast it came back. So this is 14 milliseconds, so There is a little bit of overhead, but the time it takes For this picture to leave my client machine, go to fpga at Api, run two milliseconds of resonate 50, eight billion, run Classifier, get results, send results back to my client Machine in 14 milliseconds. Do you think about doing Something on cpu machine, similar model run og cpu, end to End is about 150 milliseconds, 10 times faster than a cpu implementation So this is just the end to end on how you are able to now Easily create and deploy this model And so for those who are at the talk yesterday, in joseph’s Session, we built an app essentially that looked at the Performance of cpu running this model on cpu, and running it on A box that has cpu’s and fpga’s n. This case, you will see that The needle kind of barely moves, we’re getting four to six images Per second on a cpu. So there’s a model, we deploy it On cpu machine. It’s running in the same region As client app here and sending images, four concurrent threads To send images, as many as we can to the cfu model and media Latency and here is the through-put, about six images Per second. So let’s now start sending Images over to the api that is running the model on fpga And so you can see just with one thread, i’m getting media Latency about six milliseconds, six milliseconds to go end to End from image to api and back again with the results And the reason that the through-put is pretty low is Because i’m only using one thread to send one image, one After the other. I can’t send images fast enough To this api. Let’s slam this api and kick it Up a notch or eight notches here and check out that kind of Through-put that we’re seeing. Due to what doug was talking About, queueing latency, if you have essentially eight machining Sending images to the api, you would be getting six to seven Millisecondalatence and he about 530 to 540 images processed per Second on the fpga. Again, this is the type of Performance that we’re seeing, 42 cents an hour, 21 cents for Million images, that is for one chip and that is everything from Our cloud offering. This is the thing you’ll be able To do when you clone that github repo, this is the type of Performance you will be able to see The fpga live in the east u.S. Data center, your client Machines should also live in east u.S., Otherwise the network Just sending images across the network of the api would add a Lot of latency there. So now the next thing i want to Talk about — six to seven we talked about is several Milliseconds going to the across network and back and in the Video we talked about accelerated networking program, Not using, but announced the vm to vm latency oozing accelerated Networking and we’ll be turning it on, 25 microseconds, just Actually quite a bit lower, i don’t think we are supposed to Say that. It is going to be dropping quite a lot You start to see how with this hardware accelerator it is going End to end, just incredibly low latency all the way through >> Okay. Sorry for the interruption >> Absolutely. The next thing i want to talk About, ai at the cutting edge. This is integration with iot edit This box right here, here i have rack with two servers There is a dell, and Hewlett-packard server. Basically now what we can have Is putting these fpga chips into these edge devices And all integrated with azure machine learning and azure iot edge Take a moment to talk about what azure iot edge is Azure iot edge enable you to pipeline from the cloud and Bring to an edge machine. So why would edge machine be important? Doug mentioned earlier, maybe you have an oil platform out in The middle of the ocean, maybe you have super secret nuclear Research facility and you have data that you don’t want to Touch the internet. Maybe you have those thousand Camera necessary your facility that you want to process without Having to send that data to the cloud

What azure it it iot enables tou Do, compile services and containerize them in docker Containers. Azure functions or custom docker Computers incorporateed with lt-edge run time containerized And then you are able to bring that down to the edge and now This is running on the edge device You manage everything from the cloud So for example, maybe you have 11,000 retail stores, so from Iot hub in the cloud, you imagine all thousand edge devices Configure and say, these are models i want to run in northern California, these are models i want to run in the eastern United states. Configure them in the cloud and Document and push all of that to the edge, the edge devices will Pull containers from respective registries to be able to bring Down containers and run it. And so this enables you to be Able to imagine from the cloud in disconnect or nonconnect or Scenarios in which there are very little connectivity Saving on bandwidth, saving onalatence and he cost when you Bring this down to the edge. What this means is the same type Of integration could work and so let’s jump over to a different Vm. I know all vm’s are looking the Same, i’ll show this vm right here and this is an actual edge Machine, so let me just pull up the device manager and prove to You, you can see here cata put fpga device So right here in this machine, in one of the edge machines, we Have fpga cards with an fpga. This is actual fpga in edge Device you essentially would be able to deploy on to your On-premise location there. And then the same type of Can run. Open this config file real fast And just show here this is the Ip, vm here, 10, 123, 80134. It is not, i’m not making call Off the box, i’m making call on the direct machine that is Happening right here. So let me run the demo again And same app except now running locally on the machine and let’s Just start that again. Same idea So what is happening here is the same thing, except now the Images are being sent, processed locally and then let’s kick it Up a little bit more, the number of threads, there is still some Things we’re trying to work, basically what is happen withing Iot edge, everything is container based, sending images From container, api, getting it back, latencys are higher, Through-put is different on this edge machine, but same fpga, Same type of architecture and the ability now to process mass Amounts of data locally on your edge device So this is something that we are working with private preview If you are interested in this type of scenario, you want to Have fpga on an edge device on prem, feel free to reach out to Us and we’ll be able to chat with you more about it >> For us, i mean, like for the cluster, about eight and then Edge device, maybe kick it up more. There is some overhead with Containers that is main reason we have to work through And so really just to summarize everything we’ve talked about, In terms of azure machine learning, azure machine Learning, hardware accelerated models powered by project brain Wave, good at naming at microsoft, you know what you are Get withing our names, essentially the models, you Know, as we had covered, easy to create, just using python and Tensure flow and deploy to the cloud. Deploy anywhere Everything i ran through in the jupiter notebook with tensure Flow code, the model can be deployed to the cloud and also Deployed to the edge without changing anything Write once, deploy, use azure it it edge. From the iot hub, data Scientists created newapt myselfed model, works better, I’ll push it to my edge devices. Now those edge devices have Updated model running on the fpga So in terms of next steps, we’d love for you to just go check Out our github repo, to give it a shot and be able to deploy own

Fpga server and unleash the power of realtime ai Thank you very much. [Applause] >> We have about eight minutes left to take questions, invite Doug back up to address any other questions folks might have Let’s start with two, uh-huh. >> Got it already >> The brain wave, i really like the product The question is really about — for — i believe public log is The future, in the next maybe few years, still a lot of Customers using cloud, the — what is blocking uses? >> Yeah, in terms of, just private environment that does Not touch the — >> private environment >> Yeah, so this is where we might want to Ml.Net and server offerings. It is a cloud offering that Enables you to — services and model management and different Type of service that help you manage all that We have a whole suite of on-prem offerings that also be able to Run these things without ever having to touch the cloud Those are things we can talk about. Engineering director for machine Learning, i’m sure he has more context to speak to that >> So in terms of working things on premises, we have the server Offering, algorithm, you can use that Model are containerized, cutake the contain sxer run on premise And send to the cloud if you want, if you choose not to, Those are options that exist. As ted said, azure iot around The place you can run the models. >> Thank you >> Thank you for the talk. For workloads that are bursted In nature, how long are we looking at to go out and add new Instances on the fly if i realize i need more demand, how Long do sd it take to increase capacity? >> that is a great question This is on the order of seconds. Basically we have a vm pool, we Have a vm pool. Today it’s, you know, flash Resonate 50 image and just a matter of popping your model File to sending it up and adding to load balancer It is somewhat a manual process today, don’t have automated way To detect higher traffic, there will be manual work have you to Do to increase the pool. I want to spin-up some more, be Able to add more fairly easily. Great question And to that point, what i want to say is that from the Experiences that we’ve had working with some Customer necessary our preview, before Gpu, and cpu, expensive, you want to be cognizant how many at A given time and scale up and scale down as you need, because They are so expensive n. Way the fpga machines are so cheap, one Customer, they just provisioned for their max The max through-put even though may be only use thanksgiving Much at a time and this much and provisioning all that was Cheaper than doing it on cpu or gpu clusters So that way they didn’t need to worry at all, it was cheaper That way. >> So maybe the elephant in the Room, i don’t know if you addressed brain wave addressing Scaling for the models. Is that, when is that coming? And — >> my mic on here So i don’t know that it is an elephant in the room Azure offers leading edge gpu today, you can rent to train >> Yeah, it was from the perspective of different silicon — >> Yeah. That’s right We haven’t announced anything with respect to training at this Stage. So there is elephant in the Room, i’m ignoring it. >> I think a lot of people assumed that already >> They are certainly free to do so But, you know, so today the answer is, intel cpu’s for Training and when we, if and when we have offerings that add Value for customers, we’ll get those out >> The cost value — (inaudible) — >> If you had something that was provided good cost value, we Would bring that out to our customers. >> Yep >> Yeah. >> Use example of —

(Inaudible) — >> so the question is what kind Of hardware was on that drone that was introduced We are working with a bunch of different hardware partners in Terms of iot, we have a session this afternoon at 4:30, going to Focus on iot and azure machine learning and various deployment Targets for that. Fpga’s and also announced Partnership with qualcom, the chips 605 to containerize models And deploy them and accelerate them on Qualcomchips and the drone, gpu’s running on the Drone as far as i know. Yeah, in the — yeah I guess — >> so maybe ask the question, Two questions, how is the fpga like (inaudible) — >> I think doug pcie? >> the other question is (Question inaudible) — >> Sorry. Maybe i — >> I would be happy to take this. Happy to The question is are we planning take this to lighter weight iot Edge devices with potentially smaller fpga’s. And i would say if you look at The partnerships that we’ve announced and containerized ml, Really azure is one of the 0 small number of global computers With massive number of cloud hubs talking to billions and Billions of iot devices and ml is central part of that I don’t think we want to be tied to one particular technology What we want to do, push the best ai we have to the end Points with this offering, what we’re showing, we can hit really Differentiated high-end performance and cost in the Cloud and what i like to call heavy, i don’t think the Terminology is settled. When they are for other lighter Iot nodes, there is i think a lot of churn in different Technologies and whatever ends up being the victor or the set Of — when things stabilize, the technology gets deployed, we Will serve those. If we think we can do it with Smaller fpga’s and partnership with intel, for example, that Would be a great offering. If it is other technologies, you Know, that may also be good. I think what we’ll try to do Make all that better. >> Uh-huh. Yes >> The on-premise iot edge (inaudible) — >> yes, thank you There is a booth with working servers with real fpga servers That are actually working in the iot edge booth in the expo hall Thank you for reminding me something to see So we’re running out of time, the thought i want to leave with In terms of the direction microsoft is going with all of This, probably saw the mail two months ago from satya nadella, You probably saw it sooner because of the e-mails from Microsoft, one main thing was combination of a team in being The ai platform team and the cloud ai platform team from Azure. Typically you know schizophrenic With different personalities and what happens, the team are Dealing with their main scale problems and have a set of tools And things they are using for them and then we make some other Set of tools for our end users and customers and that just Didn’t make sense and i think we’re starting to see the light And organizationally, combined so that this is one of those Things where the same problems that the team had been trying to Solve, the tools they are using, technology we are using, You. This is one part of that, you Will see that more and more and more and so we’re super excited About just all the great things that we can bring to you as the End customer and as the end user Starting with something like brain wave and resonate 50 to Help you accelerate and get realtime ai