[MUSIC PLAYING] ROB CRAFT: I have three demos I can show I can slot in an additional six slides, if you would prefer I’m betting if you came to a machine learning session this late, demos are probably where the action is at If anyone disagrees, feel free to swig beer right now There’s got to be a drinking game involved, guys There has to be OK My name is Rob Craft I’m responsible for the machine learning products for Google Cloud Platform on the product side I’m going to try to spend the next hour with you guys mostly with me talking Unfortunately, that’s kind of the way this stage thing works But I’m going to try to do my best to honor the shot clock– because they’re conveniently giving me a nice, big, blinking clock at the bottom here– with 10 to 12 minutes at the end for questions We do have microphones set up And then, there’s always the option to dogpile the stage at the end, which seems to be what most people choose to do I may or may not grab a beer and nod vaguely So questions publicly would receive better responses Does that sound cool? All right So we saw this guy today He does good talk, doesn’t he? He should do this for a living Turns out, a lot of what we’re trying to do with machine learning is public now So it seems new and inviting to a lot of people that perhaps don’t have a long history with machine learning, or artificial intelligence Depending on what university you went through, it’s either called AI or ML For some folks that have been practicing in the space, this has been the normal reaction when you tell people you have an advanced degree in machine learning or AI People do this face You ready? Yeah How’s that going? You have job? But you have job? OK, all right That’s good So you’re writing code, or? So it’s come out of the back end Remember Deep Blue? And winning chess matches, and Jeopardy, and– that was all brute force It was a dictionary-based system that had some heuristics on the back side The Jeopardy system was a natural language-based system that brute-forced on the BI system So at the end of the day, those were great demo cases to get people excited about a machine that can look and feel like it’s thinking, or that it’s doing cognition, or that it’s doing predictions on things Largely, these things went dark immediately after that And then everyone started working on it anyway So pharmaceuticals, trading applications for high speed trading, people that do protein folding modeling, drug interactions, oil gas field exploration, seismic study, climate modeling– you get the idea These things have been largely the realm of very expensive, very boutique, and very quiet innovation About a year ago, Google published something called TensorFlow from our Brain team, which is an open source, deep and wide set of frameworks that allow you to create your own set of systems with exactly the same code that we use internally The good news there is it’s pretty well tested, it’s supported by some wicked smart people The bad news is you’re still writing machine code So what I thought might be interesting, and I was foolish enough to be close enough to being correct that they gave me the job, was, why don’t we try to automate all the tough stuff? And leave the interesting stuff, which normally isn’t tough How many people in the room used to write assembly code for a living? God, and you’re still relatively sane At least you’re all wearing pants, which is kind of the bar at this point It is– it was really hard, right? You kind of care how the system worked But the line of code you wrote was hand-crafted It was a thing of beauty It worked at super high performance, and you understood everything about it We’re largely still at that level for most machine learning models that people create They handcraft it, they understand it deeply, or they wish they did In the case of neural networks And I’ll explain a little bit why that’s harder than it was with wide models But at the end of the day, it isn’t a scalable system where you can say, OK, I’ll do the initial thing, but I have 50 people who can probably take this to the next level of scale Everybody’s writing that hand-crafted level thing So we’re at the year one, you can think of, having large scale access to human producible machine learning code, which is quite good, not OK– not a simple linear regression model and not time series data based stuff So this is kind of the beginning of the cycle that Eric called out And by the way, this was from Next 16, which was about a year ago It was held in a really cool facility in San Francisco with about one fourth of the people that came this year And it’s not about me I get it But the top five subscribed sessions at this Next event were all machine learning relative, which indicates that we’re doing a very poor job explaining what machine learning is because everyone’s coming to the sessions So let’s get ourselves out of a job pretty quick I do this one mostly for the meta experience So for those of you that have a machine learning

pipeline already figured out in a system that already uses this stuff, you’re like OK, yeah, we’ve already gone through this stage I would wager most of us in the room have heard enough about machine learning that they think they should do something with it But they don’t necessarily understand how to understand discretely Should I be building my own models and hiring a team that knows how to do this stuff, and hand crafting, and dealing with open source tool sets, and a lot of other really hard work? Or should I go with a black box system that I’ve seen come into the market in the last 18 months, something that solves translation, that understands spoken word, that can identify five people wearing weird costumes, smiling, and two of them are happy, and three of them are sad– those sorts of things So this is meant to try to guide us through it Don’t worry It did say “pop quiz” in order to get everyone’s attention because most people are overachievers if they’re coming to a machine learning session at 6:40 at night, as we discussed But it’s both open book and open answer So let me quickly flip through And I even started drinking myself, if you can tell by the check marks A couple of things The first one that– let me just walk through one or two of these, and you guys will kind of get the idea Detect– B2B is business to business– company products in users’ living room photos OK So that probably means you can’t run on a public backbone based system because that might be creepy– but not creepy like Santa Claus, creepy like NSA So let’s not be creepy about it Let’s have a fully established system that’s built on a trusted environment that you have full control on And you will have full control of the features that you develop In this case, you would use ML Engine to build your own model And you would provision that model to perhaps even run on the smart TV Or you would have it run on the remote control in the room We’ll provision upwards of six billion new addressable devices For those of you that have been trolling the new IOT stage, you get the idea that we’re going to have a lot of devices coming online More and more of these devices will have sufficient compute capability to do predictions locally Several will have sufficient compute capability to do training locally And the science behind this will advance in the next year to 18 months for sure from us to allow a supervisory service to declare when training should occur, where, and with what model So this gets really, really interesting to enable device side stuff So I wanted to call this one out first that it’s in ML Engine today ML Engine today is distributed TensorFlow with a whole lot of magic behind it, automating a lot of the system provisioning, making it a managed service But over time, it will turn into much more of a drag drop exercise, a guided experience like data that looks like this Here’s the four algorithms you should start with Here’s the pre-processing pipeline we recommend And here’s the overall tuning that we recommend on your neural network when you’re through your first trial So you get the idea of these things become increasingly more valuable and faster to use over time “Over time” used to mean years and years and years It’s moving so much faster And I’ll talk about translation in a moment as another example So let’s maybe offer a scandalous view on the state of silicone in the marketplace I spoke at our partner conference on Tuesday night, and NVIDIA was in the room They literally high fived when I came to this slide And I think they were smoking cigars and putting orders in for Maseratis But the short version of the story is, holy mother, is GPUs turning into a core facility to drastically improve the latency– shortening it– the time to train– shortening it– and the amount of predictions which can handle per unit of watt At Google, we typically think on feature representation per watt involved in our data center And that’s how we optimize our workloads is efficiency is great Efficiency writ large looks like, are we powering the right facilities with the right capabilities? In this case, though, the idea behind this is largely the CPU market has stabilized “Flat” was a harsh word, according to my friends at Intel I had lunch with But what it might mean is relative to compute through CPU, compute through GPUs is explosively growing And it’s growing in particular in one area almost uniquely, and that is machine learning focused This chart, if you were to look at how Google manages its own data center systems, would largely represent what we think of as continuing services that we want to increase efficiencies with over time But growth increases So roughly, you reach a steady state after you get to a few million compute cores involved When you start adding GPUs to the mix, the math changes pretty quickly because server density is really hard You can’t pack 400 GPUs in one 42U rack space, right? The floor melts So you have to have some sort of density metric to figure these things out Turns out GPUs are relatively power hungry That’s our problem to solve, in our view, in this world Or it can be your problem to solve if you want to create your own ecosystem around this on prem or through your own host or a partner What I will promise you, since I’ve gone through the exercise for Google, you will under spec and underestimate the price of your spec for the GPUs that you want to buy Surprisingly enough, with 98% share in the market,

NVIDIA isn’t going to cut you too terribly good a deal unless you buy a whole bunch of these things So my argument for why you’re here is, why wouldn’t you let someone else worry about the hard stuff on provisioning these things with super high density? And for those of you that have been following the Google space for a while, there’s one thing not on this chart I call out the word CPU is there, GPU is there There’s another three letter acronym Anyone want to win a t-shirt? Too many drinks TPU, Tensor Processing Unit Get that guy another sweater vest So the tensor processing units are dedicated for machine learning purposes only You can think of CUDA cores and other GPUs as really dumb, really simple, but really a lot of core compute that’s really great for parallelizable things TPUs is a family of things that Google’s been investing in for several years to drastically lower the watt curve but drastically increase the slope of the usage curve So we’ll be talking more about that over the years because nobody can keep a secret in the silicon business I’m sure we’ll be talking public about these things over the next while The while could be 18 months if we’re particularly lucky It could be shorter if we’re less lucky But the general idea here is up and to the right for all usage for machine learning inside of Google And we think that’s a corollary for certain segments of customers outside of Google as well So our math on this was as this curve goes up and to the right, why don’t we invest in tools and technologies to make it simpler to use at larger and larger scale? That set of tools, I’m very happy to say, is exactly what we’re going to offer you guys, not any simpler version or any longer range version You’ve heard us talk about spanner a lot today– really cool stuff Been used for years inside of Google Everyone familiar with MapReduce? Paper written almost a decade ago and no longer in use at Google because it doesn’t scale to what we need it to That was 10 years Then we went to about six years, then to three years The TensorFlow system was written and used immediately inside of Google and made available the same day So we’ve shortened the innovation cycle for what we think we need internally for the next stage of things were offered immediately, in the case of TensorFlow, as it’s still really hard to do but it’s really, really good And it took me 14 months– I’m sorry a little slow– to turn this into a generalizable solution set that’s called ML Engine So we’ll talk about these things more in depth But I want to get across the cycle that’s meant to make this curve pay off in revenue for all of us This is one of the proof points We’re a search business Some of you guys have used us Hopefully, you found your way to Mosconi using Maps It’s a pretty good product In this case, this is based on a blog article we put out against RankBrain, which is an internal machine learning system we use to optimize intent fulfillment and to get away from writing rules So my quick anecdote for search at Google is on words of nine years ago at this point, we got out of the rules business Everyone in this room probably writes rules for a living if you write code If this, then that Those are rules The following conditions are met, the following thing should execute The store procedure sees this, the store procedure writes that Those are all rules-based systems What if you were able to declare, through a statistical model, here’s what good looks like and the confidence that good is this thing And why doesn’t the system then determine on its own how it should execute to get to that good thing? That’s largely what a predictive-based system tries to get to And happily, when we gave up on writing rules inside of Google for our search engine, we had several million rules If I type the word “giants” and “baseball,” I get a result. If I type “baseball” and then “giants,” in many cases, I got a different result And part of that was the way that it assumed my language type affected what my meaning was Part of it was just because someone wrote a rule and it didn’t debug to the same condition when we executed it With the condition that we we’re in today, mapping those three million rules into the number of intents that people have, the number of intents people actually want is much, much smaller, drastically reducing the support mechanism required to prove that we’re fulfilling intent And it’s much easier for the bench code and test code to make sure these things work So does that logic make sense? For those of you considering, why should use machine learning? The short answer is if you have some gnarly code that you no longer want to support, and it changes a lot then what the business rules are for it, and you’re lucky enough to have a decent amount of data to understand what is a good outcome and a bad outcome for a particular business process, it’s a great candidate for a machine learning solution And we’ll talk a little bit more about some of the verticalized things I’ll do two further examples inside of the photo space White text on a white background is always a great idea, apparently Sorry about that And yes, the lower right is indeed a glacier So in the case of Photos, everyone used Google Photos at some time, played with it? It got drastically better about a year ago Everyone was like, oh, that’s actually surprisingly nice

What we were doing was labeling all the imagery behind the scenes and saying, here’s all the pictures of Mary We’re going to tag it with Mary Here’s all the pictures of birthday parties And now you can type in, show me birthday party pictures of Mary And it shows it to you And you’re like, well, of course it does I expect that A year ago, this was black magic Getting to the place where you can do that in video and at scale across thousands and thousands of videos against things that you haven’t seen before and thus don’t have labels for, that’s the challenge in front of us And that’s the opportunity in front of you guys if you want to extend the systems So all these core frameworks that were involved in creating photos and other things now are wrapped into a service called Vision API And I’ll talk about that in a little while But the core of Vision API is it provides labels with confidence that you can then write client side code on without writing all the thousands and millions of lines of hard end stuff to be able to get the core systems There were many PhDs in the space 20 years ago You saw [INAUDIBLE] shaking your head saying, man, we have video now I wish we had video at Stanford figured out That is really, really hard because, as we hopefully have known if we’ve ever– remember DVDs? If you froze a DVD and you saw the picture, it looked really bad You don’t remember the movie looking that bad But when you freeze frame a DVD image, it looks bad Well, at 30 frames a second, your eye handles all the badness on the imagery, which is how they can get away with three megabit kind of imagery per second Well, video at 30 frames a second, interpreted as if it were a standalone scene, doesn’t do very well at all It turns out you need a whole different set of techniques to go after video systems than you do straight 2D vision systems So more to come there This is my ironic slide 10% of Google Gmail is Gmail talking to Gmail at this point We’re working hard to try to get this higher as we go What this does is solved a UI problem Thousands of hours of PhD time spent tuning a machine learning model, all intended to solve the fact that your thumbs are bigger than your keyboard on your phone Literally, it was to save the 10 seconds and you fat fingering your response and not paying attention to someone at the table while you do this Because we’re all going to be having dinner later, and you’re probably all going to be texting someone while you’re doing it And wouldn’t be nice if at the bottom it could just say, I don’t know But I’ll get right back to you Because it was a question asked about something And it’s a person that you’ve talked with that you’ve responded to in a personal context before So you take the trouble to say, I’ll get right back to you If it’s someone that you normally interact with on a very casual style, it’ll say, no idea, dude The ability to do that is quite difficult to get right And when you get it wrong, boy, does it feel wrong More and more of the time, those little three answers at the bottom of the GIF that you guys are seeing is turning into something that, in-context, actually does not cause you to do a context switch You’re able to continue the conversation You’ll glance down, you’ll give a quick scan of the mail, you look at the bottom One of those three will probably be close enough And then you move on with the rest of your day and you get to keep your attention on what you’re on It was a UX problem because the five inch screen and thumbs and math and machine learning was a solution here So two things are interesting One is the system is able to automate its own behaviors– in this case, response to mails And the second piece is it’s improved so fast so quickly So this was– gosh, I think less than 15 months old at this point And for my purposes, over 70% of the time at least, I’m picking one of the three at the bottom, whether it’s my boss or whether it’s my wife And you could argue that’s the same person OK Google Translate with neural machine translation This was a broad set of announcements we did all the way back in November Everyone in the room used Google Translation? Thank you My mortgage thanks you My family thanks you So we used to be pretty good at this whole translation stuff And what it requires is a pretty big system, dozens and dozens of languages, variants on languages, colloquialisms, localizations, mistakes handling, all that good stuff, all baked in a very, very strong team in Google Research About six months ago, they picked up the challenge to start using some TensorFlow systems and some accelerating hardware so they can do massive amounts of training on data set which before was relatively opaque Because they do billions of characters of translations a day, working through that backlog entirely– gosh, why would you bother because it only gets 1% or 2% better on a heuristic system if you go all the way soup to nuts? Then they applied a neural network based system on top of it with some help from the brain team We have 85% lower error rates on our eight top languages today We’re better than human translations at two of those top eight language pairs And we’re now able to translate languages that we don’t speak because the neural network between languages that are relatively close on family basis translate roughly the same between language pairs Really exciting space So take a look if you want to read after Or if you’re able to multitask, you can read the blog entry that we did on it today We rolled out to eight languages initially, which covers about 35% of the spoken or written language I think we rolled out another three days ago– Vietnamese and a couple of others

Know that Mandarin is our arch nemesis It is super hard Mandarin speakers think Mandarin is hard If you give them enough drink, they’ll tell you that We hope to largely solve for some of these canonically difficult languages within the next couple of years to human standard of translation At better than human standard, it sounds like that’s amazing It’s still not quite perfect because the way you would phrase it might be phraseologically different whether you’re speaking casually or whether you’re speaking formally Because one of my top feature areas have asked for people to use our translation service for business use because this is a full service that you can use rather than just for websites at Google is, I would like to have a casual translation service I would like one for medical I would like the system to understand that CAGR in French– Combined Avenue Growth Rate– is CAGR They don’t translate it They flip between languages in professional settings many times Some of you whose English isn’t the first language probably do this yourselves You’ll be speaking to the family at home All of a sudden pops out three or four words in English just because it’s easier or more convenient, or the concept is cleaner for you Getting that kind of intermix is something we’re not quite good enough yet in my view We’re actively working on it OK Everyone in the room heard of chess? Some people in the room are probably pretty decent at it If you’ve got a computer science, math background, it tends to have some correlation to chess capability Anyone in the room any good at Go? Oh, there’s some profess to actually being good at Go, not playing Go That’s awesome It turned out that we had estimates of 20 to 30 years before a machine learning system would be able to beat a relatively good Go player And about eight months’ worth of work, the DeepMind, which is an Alphabet company that’s not part of core Google, but obviously, we talk– were able to write a system that was able to beat number three in the world in an enormous stadium full of people And boy, was that earth shattering for people in the Go space Next challenge that we’re trying to pick up in this area is strategic games Anyone know why so many people in machine learning and AI spend time trying to solve games? Can anyone describe in one sentence why you enjoy playing cards? And the answer is you probably can’t You would say things like, because it’s fun, or I like being with friends, or I like doing the math in my head if I’m a particular card shark about it But you would choose a different sentence, which means it’s a hard cognition problem The way that you learn is a way that someone else doesn’t learn, or they learn through a different vector That’s why there’s different teaching styles in schools So at the end of the day, machine learning is one of the techniques in the larger universe of artificial intelligence But if you can teach a game to a computer, you are so much closer to having the computer being able to cognate like a person and solve problems in a way that’s relatable to a human Because we’re not ever, in this space, trying to solve a machine being able to talk to a machine That exists It’s having a person be able to tersely and completely explain something to a computer and the reverse And I don’t have it in this slide deck, but we have self captioning, which is amazing We have a picture of a small child holding a stuffed animal And the self caption created by the machine learning system is “a cute child holding her stuffed animal.” What would you say in that picture would be exactly the same words They got that one right There’s some wildly hilarious ones, which are absolutely not right, which typically I don’t put on a slide But you get the idea It’s like, when you get that one moment Normally at work, like when you have code that compiles clean the first time, you drop the mic You’re like, I’m getting a coffee, right? I am the best coder in the world As a machine scientist, once you get that one thing that’s at 94% accuracy, you’re like, don’t touch it Save, save, save You want to save over and over like you just beat the boss at the end of the level That tells you we’re a little bit early in the ability to debug and understand causation for some of the things we do But Alpha Go is doing some really significantly interesting things about true AI and the attempt to solve how things think They’re using a lot of advanced techniques in machine learning Powering these systems is Google Cloud, it may not surprise you to learn It also would not surprise you to learn that CPUs and GPUs weren’t up to the task And they used some specialized silicone to make this stuff make sense in near real time So let me get really practical I talked a little bit about wattage At the end of the day, Google is a power buyer We buy lots of power We buy hundreds of thousands of cores of compute every single quarter We buy tens of thousands per week We deploy thousands of racks per day You get the idea We’re deploying a new data center a month this year alone for Google Cloud We kind of are in the power business At the end of the day, we’ve spent many, many, many years increasing our power efficiencies And it’s handcrafted based on, OK, does this rack have a lot of compute cores in it? Well, then the fan should face this way If this one has a lot of disks, are they SSD or are they spinning disks? OK Well, then you want to balance the air flow this way

And over time, you work the air flow through the room You shift how you do power coupling You have higher and higher efficiency power distribution units, and leads to better networking performance You get the idea that at some point, it doesn’t make sense to cool a data center Most data centers are room temperature today Well, that was where you see ML Control on the left We dropped it 40% in about three months, which resulted in a 15% lower budget requirement for all Google data centers Holy crap You should have seen the faces on the data center team They were so sad because they thought they had it It’s like they’d dropped the mic We’re the best people in the– wait a minute In three months, we see a 40% reduction on cooling bill, which results in 15% lower power draw And power draw is the largest capital expenditure you’ll probably see in Google’s 10K report So this is one of the opportunities that we see might be a repeatable thing if we can genericize the rule base to say, show me what your data center layout looks like Give me a description in UML for what is in each of the racks And we will tell you how to work the power flow and the power distribution in your data center, resulting in significant and real savings and for your own data centers Maybe not to the scale of Google, because we do such interesting things that are custom But what if HP had this capability to include in the box? You buy the box, you get the cooling savings These sorts of things become natural complimentary things And this work in three months Normally a rack layout change for a Google center takes 18 to 24 months of design time We work intensely with providers of the racks What thickness steel, where does the fan go, what’s the RPM, what’s the noise allowment? Are we going to do natural cooling or are we going to put it somewhere far north, where we can bring in environmental air? All those sorts of things factor in This was largely free It was three people for less than three months Two months were them learning what a data center did because they’re mathematicians So let’s get right to the basics This is why I get head count in resources 90% of the world’s data doesn’t say, hey, I’m a picture of Rob’s three-year-old girl at a birthday party It just has, picture And as we discovered, turns out most people on the internet don’t actually label things with what they actually are Sometimes they’re very bad things and you don’t want them in your data center Sometimes they’re already in your data center And how do you just delete the things that shouldn’t be in the data center in order to preserve the value of the larger things? So all these things originate from the base point that we’re trying to drive unstructured data into structured understanding Once it’s structured data against it, then you can provide value on top of it That’s largely what Google’s been doing since Google’s been created That’s why we started with all the ML APIS first before I got the ML Engine out the door It was to largely try to solve for this problem space There’s four things that matter, Rob says There’s three things on the slide Data and algorithms The data side we can provide, you can provide, or you can buy from someone else But data exists Getting clean data is super hard 75% of the workload for machine learning scientists is doing data modeling 25% of the time, they get to play with math By the way, they’re mathematicians You want to see a grumpy person? Ask a mathematician to do database stuff They really don’t like doing it So don’t make them do that So data Next piece is algorithms The algorithm space has been around for 40 plus years Many of these algorithms you get your PhD when you improve the algorithm’s efficiency by 2%, or it requires 30% less computational complexity, or it’s been [INAUDIBLE] for long running pipeline You get the idea that a lot of these are small tunes But overall, the techniques around them are enormously advancing Neural machine translation is a great example of that That art didn’t exist two or three years ago, and didn’t originate from research universities because the corpus required to do that work was beyond what a grad student was going to get a grant to go to work on So then you need commercial entities And we made a decision when we started this journey that we were going to open everything So I’m going to show you a natural language example in just a moment It’s based on an open source implementation that we released Anyone know the name of the open source natural language system that’s coming from Google? Brace for a weird story Parsey McParseface Yes, there are British people involved The Brits held a competition on what they should name the new boat that the British Navy was going to christen The name of the boat that won through popular demand in Britain was Boaty McBoatface So some Brits were on the brain team and thought it would be funny to say, why don’t we christen this with Parsey McParseface because they’re parsing language So that will save you maybe a free beer at a party The last point on the right is insights Insights are the predictions that you can connect to your rest of your business processes And the fourth thing is the people in the seats It’s you guys These things have to be created Someone has to write the code

Someone has to test the systems, get it sold, get it deployed None of this matters about what Google is releasing externally until someone uses it So happily, we’ve got 10,000 people And this turns out to be something pretty interesting for a lot of them I’ll quickly try to describe what a neural network does My huge apologies to the experts in the room that may find this well oversimplified to the point of offensiveness But it’s largely true Have a drink of beer So on the input side and the output side, you’ll see two rough squares there with little circles Think of the square as a layer And each layer as it forms turns into a network of things That’s why it’s called a neural network Each of the algorithms you can think of as a neuron And neuron, when the condition is met, fires an impulse That impulse is a signal that your brain uses to decide something In this case, each of those algorithms does a very simple, stupid, repetitively infinitely powerful thing So on the far left, let’s pick a Vision example, because I’m going to show you a Vision example in just a moment The idea here is, hey, maybe this is a pixel And the pixel is black, white, or something else It’s black, white, or something else is all the algorithm can detect The algorithm next to it might be able to say, oh, for sure it’s black or it’s white Another one can say, the pixel is in this shade range or this shade range You get the idea It’s very, very simple The next layer would say, ah, there’s actually two pixels And spatially, they touch The third layer says, these groups of pixels that touch, these ones spatially form a closed circuit The next layer might decide if the closed circuit is hollow or filled All the way at the end, you can look at it and it’ll say, this is the number six I just gave you the Hello World for machine learning, which is called the MNIST workload So if you want to try a sample on TensorFlow.org, that’s the one I would start with is training out your first MNIST model that can detect from very shaky handwritten, is it 0, 1, 2, 3, 4, 5, 6, 7, 8 9? That’s literally what your hello world is In this case, since we have more than one layer between the input layer and the output layer, this is called a deep learning neural network Now you sound fancy If it didn’t have any layers in the middle, it’s called a neural network Are neural networks by themselves worth anything? Not so much there’s not a lot of applications for a two layer network It turns out almost everything needs multiple layers So if you’re using neural networks, you can be fancier and just say, hey, I’m using deep learning And people go, awesome Your company’s worth 3x what it was just a moment ago OK Shall I flip to a demo since I’ve talked so much? I will skip the setup on cats, dogs, cars, and apples, in the interest of time So everyone see this OK? Star Trek moment You ready? Ooh OK So what this is There’s 80,000 images we dragged off Creative Commons and Wikimedia We grabbed Creative Commons so Rob still has a job and doesn’t get fired and doesn’t pay $3 million licensing images for a two minute demo So each one of these we ran through something called Vision API And what Vision API does is exactly what I’ve just shown on the chart with the neural network One side is raw imagery of pixels The other right side, it decides what those pixels are based on confidence So some of the things the models spit out is, that’s a cat, that’s a dog, that’s a tree, that’s a car And that’s only interesting if the model also tells you, it’s a cat, and we’re 99% sure it’s a cat In other words, we go to meetings at work not because we want to hang out with our friends, although that’s perfectly valuable That’s what bars are for We go to meetings so we can all agree that the data is sufficient enough to make a decision against that we’re going to apply resources for, and thus we defer risk So the whole idea for a meeting is that we establish a confidence interval that everyone finds sufficient Typically those aren’t mathematically relevant In this case, they are So on the far left, I’ve saved some core imagery But each one of those images that I ran through Vision API is one of these Let me see If I’m particularly careful, I can scroll to where it will actually render the image Each one of these dots is mapped to a logical scientific method namespace We remember this, right? There’s a null Then it’s an animal, vegetable, mineral If it’s an animal, is it’s hot blooded, cold blooded? Is it hot blooded, cold blooded with fur, not fur? Is it pretty, not pretty? All those sorts of things follow along the natural tree, which is exactly how a neural network does not work And I’ll show you that in just a moment So we think this way The neural network doesn’t think this way For those of you that are actively working on machine learning and using deep science, this is why your head hurts and why most of the days fall in failure Because you intuit why the neural network does a particular thing And that’s OK because it doesn’t mind if you intuit or not It does not care It does what it does So let me flip right over to the number one denizen of the internet It turns out this is our most popular image type that we can scrape

When we scrape the internet, turns out the internet’s made out of cats Everyone heard that the world sits on the back of an elephant What’s under the elephant? It’s elephants all the way down Turns out, no, it’s actually cats We can prove it In this case, this is what the system Vision API labels against this exact image We say it’s a cat It’s a pet, which is interesting because that’s a concept It’s not a thing A pet implies ownership and care and feeding But the system doesn’t mind that it’s a concept This picture is a pet And we will largely go, yep, that’s obviously not a wild animal Mammal, animal, British shorthair Oh Now we’re getting interesting All the way down to close up So now it’s a meta point about the fact that the cat’s head looks relatively large, given the focal distance, and there’s bokeh in the background, which indicates it’s probably relatively close to the sensor image So you get the idea that we can do sensor-based images Let me call out one quick thing on here Cat, 99% sure Vertebrate, 91% sure Wait, what? Welcome to neural networks A couple of other points And I’ll do as quickly as the business value so we can high five each other at the end of this Anyone in the room not seen a L’Oreal commercial? Perfumes, pretty people, everyone’s fit, those things They’re always in soft tones Everyone’s kind of out of focus and thus prettier That’s what I tell myself But they spend a lot of time managing their brand image through color, look and feel of their brand, as represented both in print ad and television Even radio, they always pick a similar music chain or a soft tone in the person doing the over voice So they care very deeply, because how in the world do you sell perfume on TV? Turns out they sell a lot of perfume based on TV commercials So what they do is they hire an army of people in Photoshop that make sure that all the imagery they use matches the color palette What if I could offer them an internet scale system that allows you to search by color distribution along a certain palette for cats with soft hair that are British shorthairs? We just made $1 million, folks So you get the idea that this drastically changes the why you would even do a particular workload with similar or better quality outputs against it Under the covers, this is in a very advanced text format called JSON It’s just a REST call There’s six lines of Python code that it did that it iterated through 80,687 times There was actually a little bit more than 80,000 Some of those images weren’t OK to show on screen And the reason I know that if I flip back, you’ll see I have an inappropriate content detection system inside of this I can pick out medical procedures, violent images, spoofs, and the adult stuff I heard that’s on the internet somewhere So you guys get the idea this can serve as a massively helpful way for you to see how your brand is being recognized, where it’s being posted at, the color palette on your competitors’ products, how they’re being represented Does your brand appear next to other brands on internet at worldwide scale? This is very, very exciting stuff And people are doing this today Two more examples and I’ll jump out of this demo Australians? Oi? Do I have an oi? Any other ois? Oi, oi? We don’t get to three oi’s, we can’t do the joke All right Two oi’s it is This is a real– by the way, the transition, we spent more time writing that code for that cool little flip than we did for the machine learning part I spent weekends on that That was awesome This is a real sign Tree kangaroos are a thing, and apparently they bite Let’s not go to Australia Everything wants to kill you What I want to call out here is it’s a sign, road sign, traffic sign So some poor computer scientist and machine learning person spent weeks of their life figuring out how to recognize signs across all font types, in poor lighting, with rain on the camera lenses, off internet things that were off angle So there was a lot of work went in So this is relatively clean picture We have surprisingly good efficiency because the Maps team has spent years killing themselves trying to figure out from a lot of Priuses with disco balls on top driving all the roads Some of those frames have really weird people standing in front of the lens doing this So they have to stitch together the image behind it to figure out what was on that one included image And then they get to blur the face of the idiot waving at the camera So all of that work is built into that one core system But we gain the benefits for something with the Vision API The thing I want to call out here is not only are we really, really sure that this is a road sign– 94%, 90, 90 So what I’m explaining to you guys here is I wouldn’t just say, hey, sign came back in the JSON It’s a sign I would say, let’s pull the other four or five top labels And if those are congruent in a language with being roughly the same entity, very, very sure that Google is correct If I’m at 93% or better, I haven’t seen us wrong from a human measurement point of view, meaning yep, that’s a cat, that’s a dog, that’s a lion

If it’s 93% or better, we’re pretty good But actually, you can make good business bets on anything north of 75 or so But we’re largely wrong in some cases But when you add up three 75’s in a row, eh, that gets to be pretty good So let me show something here We’re grabbing all the text We do this for 103 languages, all font types We can convert between the two Color palette is here And I’ll do one last one And this, in a way, is the one I’m kind of proudest of the team for Everyone, or most of us, know that we’re kind of close to a baseball stadium here in San Francisco It’s AT&T Park It turns out that if we were inside the baseball stadium, we would largely use the same language to describe verbally what a baseball stadium looks like There’s this grass park It’s kind of shaped like a snow cone There’s some dirt part And there’s stands around it And there’s a billboard in the background with a score We largely use the same verbiage What we’ve figured out here is it’s a stadium, team sport, ballgame, baseball All that stuff is perfectly fine Sports venue OK, we’re genericizing the case a little bit Multi-sport event, because they picked out that there’s multiple lines And it’s making a guess that maybe you can play football on this too So that’s getting pretty interesting The part I’m especially proud of– and it’s not in the XF data for the photo– is what exact baseball stadium in the world it is So we detect landmarks We do a pretty good job detecting the Eiffel Tower versus the Paris Hotel in Las Vegas versus a tricked out picture of the Eiffel Tower held in someone’s hand We’re not fooled, typically This is a case where, without any lat/long, we figured out what baseball stadium in the world it was And it’s Citi Field where the Mets play It might be the only chance any of us ever see it because no one goes to a Mets game, as we can see I’m here all week Let me flip back to the slides OK Slides are great We know that’s not an apple It’s for sure a cat This is the slide where I indicate that it doesn’t come for free It’s free like a baby, not free like a prize So you get to care, you get to feed, you get to maintain You retrain You operationalize You do worse than a spreadsheet most of the time for your first set of models Later, you start beating the spreadsheet You drop the mic and you go home And then you start trying to beat the simple models that used to be in place six months ago when you told your boss that you should do deep learning That is roughly kind of the cycle So you have data that comes in You apply an algorithm against it The algorithm probably isn’t the right one The data wasn’t clean You don’t have enough data I’m saying this is a whole workload that you will turn into something that you operationalize the same way you would saying, we have a data team, and they’re pretty good at doing data stuff You don’t say we just have database guys We have a data team Some people on the team specialize in this Some people specialize in that When you start operationalizing machine learning models at scale in production, you’ll have people that specialize in your data cleansing You’ll have some algorithmic experts that are really good at soft edge detection systems versus hard edge detection systems And yes, those are different ends of the math branch And then you’ll discover over time that some of these things, you know what, we don’t need to stay on the cutting edge We’re just going to use someone else’s black box ones like Google’s because they’re good enough And we want to write the specialized one where we have unique IP to lend against it Does this make sense? Not for free Baby, not prize This is the horrifying slide that has everyone going, man, that’s awful This literally is what our marketing team gets paid to create I do the names, they do the logos So the far right bar are our names So guess who’s overpaid? C’est moi Machine Learning Engine is the engine that you use to create machine learning models and training systems Our translation service is called Translation Our vision system is called Vision You get it So we try to stay pretty functional named If you can read the hieroglyphics, this is a great bar game for itself The part I’ll call out are naturally connected with each other Oops I broke the internet Are these two layers So things on the data side, of course, are naturally related to the machine learning side We’re getting a lot of free attention in this room and other rooms around machine learning because it’s new science and it’s unicorns and glitter It’s all magic at this point No data, no quality data, no machine data, no coalesced data out of 19 different databases into a single data store, no machine learning I have no solution for anyone in this room if you say, but a lot of my transactional data is in my Oracle financial system But my online system is in my e-commerce system, which is hosted somewhere else But don’t worry All my logging data which I want to combine into my learnings as well sits on my Apache servers, which is at my hoster Let’s do some machine learning And I’ll say, come back to me when you have big data So the only thing I can leave a lot of you guys with if you don’t have some sort of big data system figured out is go get some big data systems figured out We have a pretty good one in BigQuery that does a lot of things with data flow, data lab, other things with words “big” and “data” in them

other than Spanner And that was just too cool to change You guys get the idea This is one of the things that Google does at super large scale The reason we have the red column, Machine Learning, is because we had the data We don’t assume that everyone in this room is going to start ingesting a petabyte of data a week, exabyte of data a year That’s not a reasonable assumption What we can say is the systems that we use to train against a 10 gigabyte data size should execute in a non-batch UI, meaning I literally will run a job and I won’t grab that cup of coffee We’ve all had the compiler day, right? Today’s the day I’m going to build some stuff You compile If it’s non-trivial, you’ll go get a coffee or you go have some lunch You come back and you look at all the warnings and bugs, right? That’s the nature of the reality But what if you could just compile on the fly or always be compiling? In this case, what if you could always be training? Or, visually to you, the fractional unit of time of training always in the background as data changes, or as you do shifts on how you’re cleaning the data, if it’s always up to speed, you never have to take a break You spend so much more time iterating on the math and the algorithm choices, it moves you from a seven month cycle, on average, for a good model, to seven days Seven months to seven days And how do we do that? Let’s talk about that OK Customers have this problem We have this problem when we are a customer I need data access in a lot of places I need to develop, build analytical models A lot of these things are kind of non-negotiable We’ll all nod our heads These are the surface area that we’re trying to apply product to rather than just process to Today, largely, everybody solves these things in process if you’re doing machine learning We’d like this to be productized so you spend all your time doing other, more interesting things Because, again, 75% of the machine learner’s time is spent playing with data None of that time is time they want And for many people, that’s because they’re already on BigQuery For the people that aren’t using a big data system which is a fully managed service, they spend a lot of their time provisioning servers, misrouting networking, misconfiguring CPU choices, not understanding the fact that they lose those boxes at the end of the week because those virtual machines get reprovisioned by their CTO You get the idea that having these things fungibly available is actually part of the magic for how we don’t have to say there’s features here It’s implied Why in the world would you write things that do this if these things were just automated for you? And this is a little bit of my eye chart I’m happy to give the slides after this I think we ship out the slides next week For those of you with any particularly large screen, it’s fun to watch people take pictures with those If you’ve got an iPad, please hold those up and take photos Those are fun In this case, these are the overall vertical systems that we’re focused on Each one of those areas which right now are gray, I aspire to have a blue link– case studies, examples, partner lists, and model marketplaces that support these So let me pick one at reasonable random Let’s do financial services because money is always fun Sales and marketing campaign management and financial services That’s an odd one So what if I wanted to understand, what is the percent rate that I should publish on my website to gain high creditworthy people, West Coast US only, and I want people under the age of 50? That sounds like a pretty straightforward ad campaign, right? But what if I had some modeling data that was available from each individual cookie on the system that was [INAUDIBLE] clean, so none of that data is in the cookie But I can infer it based on a pattern of behavior by the sites they’ve gone to before, the sites they go to after me, what part of my website they look at, how long they linger per page That’s a straight up classification or recommendation problem People do that at scale today So if I had a simple enough experience around that, we would all shift from a heuristic-based system to machine learning system It’s just frankly better But the complexity involved means that only the people that specialize in this stuff would do it So the people that build recommendations for living, or the people in the retail space that do this, or e-commerce people will make this a very dark set of magic because that’s one of their core competencies There’s no reason it has to be that way And we’re going to post papers and release core models Let everybody do this stuff But we’re going to give the science behind how you get the data ready for it Each one of these will be a similar effort It’ll take us a while to bite at it And one of the questions I get pretty commonly at this point is, which ones are you going to focus on? And I’m going to give you a cloud answer You ready for a cloud answer? All of them You don’t or in the cloud, if you’re in the cloud game If you’re a platform, you need sufficiency on all the areas Some may come a little bit sooner than others because they might be simpler on the science side or on the sales side But we intend to try to go for the full space Hopefully that’s helpful And I’ve broken the internet again So this is the last interesting slide, but I have more Don’t worry I’ll try to put this onto a spectrum so you guys kind of get the idea on why we do what we do

Because why is always more interesting than what What changes so often In this case, TensorFlow on the far left Very, very powerful, pretty flexible It is, I think, 4x adopted in Kaggle versus other deep learning frameworks at this point I can say that because we’ve now closed on Kaggle That’s exciting Welcome to Google, Kaggle folks Super, super fun guys In this space, TensorFlow is largely meant to advance the science But it also, in many cases, is completely sufficient for production use It is the foundational layer that we’re using to create the services further to the right So not only would it be an area that you would apply research and core R&D on That’s what we are doing as well So as we learn things that TensorFlow needs or should have, we write those things and contribute those things We absolutely encourage you to do the same And the middle section, it’s Cloud ML Engine We went through a rename This basically automates the sucky bits of TensorFlow I need 64 GPUs I need 12 of these in the US I need 14 in Asia I need the following SSD IOs per second I need it for 17 minutes probably, but it could be three hours because I’m using convolutional neural networks I have no idea how long this stuff is going to take The model itself signals itself it wants to be retrained Boy oh boy, how do you do that stuff? Well, the answer is have someone else do the hard science So we’re largely there for version one kind of capability that allows you to t-shirt size your environment I want small, medium, large I want it eventually, or it’s a trivial problem Or it’s a reasonable problem and I want it reasonably quickly and I’m willing to pay for it Or I really freaking want it now and I don’t care how hard the problem is And then there’s custom after that that you call Rob for So behind each one of those, currently we’re templating out the hardware required based on the model choice and the data shape that we’re detecting and forming our own belief system around In the future, we’ll arbitrage all of that away You’ll just say, I would like an answer in an hour And we’ll say an hour, will cost you $3 You’ll say, actually, two hours is fine OK, two hours is $0.32 Based on your business, now you can start doing predictions on what it’s going to cost you to train or run predictions at scale for your system That’s where we’re going this is systems built on top of systems, elephants all the way down So on the far right side, this is the black box area And I will have solutions for you guys that I can’t talk about non-NDA But follow this space over the next coming months We’ll have a solution that slots in between I want to train my own thing but I want it to be automated, or Google does everything for me Like, what if we don’t label a particular kind of couch as this is French Renaissance? And you’re like, boy, I’m running a furniture antique site I really want it to label accurately to that level We would label couch, divan, purple We’re even getting to the area of human measurement-based labeling It’s pretty It’s comfortable The room is bright There’s a good view You can guess who I’m talking with– hotels, right? So those kind of people want to label, as people do reviews of their site, why can’t they then say, as you can see, we have 74 people that said the room was bright and airy, rather than having humans do it So this is the case where I want to add that capability myself, but I don’t want to have seven million images of couches to do it Maybe I only have a few thousand We have some science here that’s going to help us be able to do that And you’ll be able to either control whether you want that data to be public, meaning everyone can run predictions on that, and thus you have a model that you own or sell Or you want it to be available only for your own systems So this is our current gamut of experience Pause on this one Everyone can read I’ll look for a firm nod and a sip of beer Firm nods, sipping beers OK I did video already Let me flip back to this How are we on time? I will call this the last demo I do United States Congress This is a natural language demo So what this does is you type in a search query It will return the Wikipedia article It’ll apply natural language against the entirety of the article, identify all the entities, like what are the subjects of each individual sentence or the paragraph intent? Was it positive or negative in sentiment? And then what was the sentence construction? So we’ll do full syntactical analysis for you on the fly So in this case, you’ll see a lot of underlining So if I just highlight “US Congress,” you can see that we’re picking up the fact that legislature is the same entity as US Congress Congress is the same thing as Congress Kind of makes sense We also highlight the important entities on the side and relative matrix This demo will blow a weekend really fast So let’s do US House of Representatives So I click on the entity What we’re going to do here is the next logical step What are the entities related to this entity? There goes Sunday So you can follow it all the way through the system But when you look at relatively well-constructed articles, you can start figuring out where the articles are biased

And if you were to pull up Donald Trump’s, he’s a guy I hear he’s the president You’ll figure out that not everyone cares about Donald Trump because his Wikipedia article, which is meant to be down the middle of the fairway, is relatively negative Congress, by the way, is slightly positive You know the major contributor to the US Congress article? Aides from the US Congress Know your data, folks OK I said last one We can go long There’s no one coming after me The crew is– no, we can’t go long OK So let me quickly show this one, and then I can take questions Car lots I need to sell cars Where is my demo? So what we have here is somebody wanted to write an image detection service so when you go to a car auction in Japan, you buy the car basically sight undriven Just you get to see it on the screen You bid on it You get the car But from the photos from the car, it’s so competitive in the used car space there, they want to take the photos, directly ship it back to the business backside And it’ll self-identify what model of vehicle is and what they should have paid for it So in this case, they created a simple web page I have the exact imagery here from one of the guys that goes to the auction that take the pictures We drag, drop directly on the screen It ingests those images as we go They have a self-classification model that they’ve written to identify vehicle types from all the imagery This takes about 45 seconds Let me cut right to Betty Crocker You guys can see that they’re 89% sure that they should have paid between $39,000 and $43,000 And it’s very, very likely it’s a 2012 Toyota Land Cruiser Prado edition Prado’s not offered in the US It’s Asian market only So you get the idea that you can see, per individual image, which ones they had images of And not only does it identify what the vehicle is It’s identified that the guy who went to the auction did a really bad job They’re missing photos They can’t do the listing without all the photos being present Go back to the vehicle you just bought and take the rearview camera version of the photo Take the one for side rear emblem Those are missing from the site They’re not allowed to list it without all these images are being present Kind of cool This is a car company in Japan that did this in about two months OK So let me try to fulfill my promise of moving to the end as fast as I can No one cares about related sessions I’ll leave the summary up And then I’ll stop here Thank you guys [APPLAUSE] [MUSIC PLAYING]