so thank you for inviting me and I thought what would be useful to spend some time talking about is is it’s trying to frame the conversation of how we think about data how we we think about these different areas of data and data science these days and the first thing to start off with is something that we kind of all know we we all know data is important but if we ask ourselves how do we really know that data is important what tells us that data is important what is it about there that that says wow this is something that we’ve got to got to know about you know the first thing is any day where you look it’s on the cover of magazines its data deluge it’s it’s a popular science and all economists and there’s best-selling books on Amazon it’s really popular I mean people even put guys on lists all right there’s new stories and the New York Times about this stuff you got to watch out watch out for this guy not because he’s from Stanford but because he’s going to show up a bit later there’s even the demand for data scientists is at an all-time high and this is a graph from LinkedIn in the demand for that and this even goes this just goes up to 2010 if you look at 2011 the growth is just even staggering on top of that it’s another jump off so when you think about that it’s almost like dad is a new black it’s this new hot thing but I think at some point we have to think about ourselves in we have to bring a little bit of balance back especially here in an academic setting what do we ask ourselves what is what’s really the balance of data what is what’s the real promise what’s overhyped and I think sometimes it’s useful to bring it back to a very simple experiment so I brought an experiment with me and this in fact is a very pink towel that covers the experiment and here it is so if you guys can’t see this you’re going to need to move because that’s about that’s about all I can move it so you’re going to want to see it so to start this this is a real simple device in fact this is one of the first devices used to show that chaos theory existed it’s that it’s a double pendulum it’s very simple it’s a one pendulum connected to another you guys awake so far okay so everything about this it can be described with four variables you only need four variables you need the position of this the position of this the angular velocity of this the angular velocity of this that’s it think about the last time you looked at a spreadsheet or some type of data that had four variables on it all right my kids graph so like their healthcare doesn’t even have four variables all right there’s not there’s nothing out there that’s only got four variables and so here’s what we’re going to do to get a sense of this we’re going to play a little game so to do this game first you need to raise your right hand good so far everyone in Berkeley is correct with the rights all right Simon didn’t stay put him down now raise your left hand okay now we’re going to do is practice we’re going to clap on the count 3 1 2 3 this time we’re going to clap with a little more enthusiasm of three one two three okay so now you guys have got the trick now here’s your job your job is to guess when the last time this blue piece is going to go through there okay the last time it’s going to go through there now you also want to be the first person to guess when this is going to be the last time honorary degrees will be given for this berkeley Regents have told me so right so if it goes through and you clap and it goes through again what happens your lose don’t be a loser 10 if it goes through and it’s the last time it’s someone claps before you you lose don’t be a loser okay so this is actually a little more tricky so you guys are going to want to move nothing cuz also if it flies off that way truck okay so you guys ready okay now we know who the type a people are bikes in there yep that’s the cleanser he’s there the people that are too cool they’re leaning back they’re like okay I got it all right so here we go good job so far now this is going to lead a little tricky so instead of one clap you’re going to get two claps so think about this for variables last time you saw a spreadsheet with four variables you don’t see them right and here every one of you is a data product why are you a data product your eyes are taking in observations your brain is processing this and it’s making a prediction and what is it telling you What’s that’s the moment of surprise when you’re going like wow what’s that

doing right nice try alright that surprised your brain is making this forecast and they’re wrong this is telling you as a data product you suck you are a terrible data product right four variables and still I never do Berkeley was this conservative that’s pretty good so I think that’s it so that’s pretty good so here’s the thing this type of device if I put two of these together and let them go at the same time they would do exactly the same thing for about a second maybe a second quarter before they do very different behavior in fact you’ll see at this time it’ll may do a whole bunch of different stuff so suddenly with four variables all son this is a chaotic system and is incredibly hard to predict it’s incredibly hard to make a forecast yet we are talking about big data for variables such a simple thing so what does that tell us what do we take away from this from the simple experiment and such complexity especially when us as humans are trying to make these forecasts to make these predictions and yet we can’t do a very good job of it well there’s two very important lessons from it the first lesson is that you really have to have high speed timely accurate measurements now how do you think about this well there’s an easy way to think about it it’s a broomstick you ever try to balance a broomstick what do you do when you try to balance right rats right when you try to balance a broomstick what happens you’re trying to do these real micro controls right your eye is processing this information you’re taking in the observations and then you’re trying to balance it you’re doing real fast code control in fact you can do the exact same thing you can actually design a very simple control algorithm that will balance as many of these double pendulums on top of each other as you like it’s actually pretty simple to write and what you’re doing there is doing the same exact micro controls but the important thing about that is that you have to have high-speed observations you have to be see it now how do you prove that sell it to yourself you ever try to balance a broomstick drunk right that’s your homework right balance it drunk you slow you can’t see the observations your brain isn’t processing as fast it’s latency when that latency comes in that’s that’s the that’s the inability that’s when you’re going to drop it the second thing that is there is you have to have that at high speed control right that latency and high speed control have to come together so you’ve got to have the high-speed observations you got to be see something very quickly if your document to that try to do it in the darkroom try to do with the strobe light it’s equally as hard so you need the high-speed observations combined with low latency fast corrections so what do we take away from that how do we start thinking about that well what I’d like to tell you about is a quick story about this power of data science and how data science in some of these companies can massively change the game in fact the whole industry but before we say start about that I think it’s useful to think about how data science has changed massively in just the last 10 years and the way to think about that is the speed at which we can do analytics and this is one of my favorite graphs to explain that so how many you guys know that is Captain Kirk how many know that is the priceline guy right that’s how you tell the Janus the generational gap right there all right all right so this is from that famous scene where it’s in there going Khan right and here’s the great thing what this person did is said how many a’s are they incon out there on the web right so this is something that now we look at this when we go of course this is simple this is a this is an interview problem right this is something we’re all we do is we’re going to write a quick script out there we’re going to scrape and we’re going to get the cow think about this 10-15 years ago what that took first you got to go get a line an internet connection line you got to get a rack of computers okay now we need a database what’s our database going to be we’re going to go have to go buy a contract with somebody to get a database oh wait we might need an ETL layer we might need some processing it was a lot of money in fact if you’re in a company that’s a lot of paperwork to just do something so simple now you spin up an ec2 cluster you go scrape and get stuff and here’s the thing this curvature of the graph that’s not really that surprising right but what’s surprising is hell there’s somebody out here with 81 A’s what’s 81 AAS you like you’re following your face is falling flat on the computer keyboard to get that many a’s right and it’s not just one person or one page there’s a hundred pages out there with 81 a’s so

now what we can use the data to do is we can say ah we can quickly process the data but then we can take action on the data we can go and actually investigate what is it about this what’s unique about this element that allows us to say ah there’s something interesting there right and we can sort of look for these spikes in these clever things and so suddenly what we’re doing is we’re moving from just characterizing the data to being able to understand what’s really behind it what are the actions that are there and so back to that story of data science and how that exact idea comes to fruition and that comes down to this idea of people you may know and so some of you may have seen this on linkedin it’s right here in the box and it’s this real simple idea but here’s the thing you actually all know about people who know but the thing is how do you think about building people who know for the first time for a social network site because when linkedin was first built there wasn’t a people you may know that wasn’t anything on there when you show up to the site for the first time you’d get there and you see a age and then what you’d have to do is you’d either have to do one of two things right the first thing is you do address book import it’s like really on the first date I just look in for it to you do a search you got to search for somebody right away and so if your face was a search box to find your friends what do you do it’s like saying which friend do I put in if I asked you right now which friend which is you enter in there do you have one I use kind of thing oh gosh which person should I enter so it’s not a very good experience and so Along Came this guy Jonathan Goldman physicist out of Stanford and he had an idea and he recognized something and the idea kind of goes like this and you actually all know it for the very first time when you walked into this conference area and you went and you registered what’s the first thing that you felt like right you didn’t see anybody you know you’re kind of their your kind of feeling alone alone in your space and so think about any moment where you’ve gone to a conference maybe this one right when you get to the first time are you the type of person that does a clockwise walk around the circle counterclockwise walk either person that goes gets the drink you kind of hold it and you’re kinda like okay what’s up you look at people you kind of know so you know I kind of recognized and finally you see somebody you know that you recognize and then you’re stuck to them right you’re glued onto them until you’re like finally thank god there’s somebody else I know and I could leave this person right and how does your feeling how does your your modality change how does it actually your whole body language everything start to change when you realize that your kind of people are here right these are your people right suddenly you’re like oh this is my space this is my I feel comfortable I you know I can let down my hair I I feel okay right that same notion so how do you replicate that in a social experience like a social networking site like LinkedIn because here’s a challenge when you go to a social networking site like linkedin or facebook or anything for the very first time it’s exactly like showing up in the room showing up to that conference for the very first time but there’s two big differ says you can’t look across the room you can’t see people so it’s as if you were walking into a room that was dark the second thing is that you could teleport away instantly at any time why because people just close the browser window so you got to fight that so Jonathan’s brilliant in sideways said hey you know what we should be able to do this ourselves like we should show you who you should know on this site and so he came up with a very lightweight heuristic algorithm like a good idea and everyone said man seems like an okay idea so Johnson said no I think I can make this work and so what he did is he realized he said I can actually figure out an algorithm that’s going to do this and so here’s the secret sauce of people you may know you already know all the entire Algrim what’s the first thing you say when you meet somebody for the first time where do you work what do you do where do you live where’d you go to school Congrats now you actually know people who may know that’s it bunch of here 60 you know Clinton I know Quentin oh my gosh Quentin that guy ah right Olsen think about how that changes once you realize what is known as triangle closing that sudden shift of closing the triangle right that aspect is all you have to do you just have to close the triangles that simple idea Jonathan said hey we should do this and everyone as I saying because it seems like an okay idea so what Jonathan did is said there used to be an ad slot here he said you know what I’m going to take over the ad slot I’m going to make every single one of you of people we may know and I’m going to put three results in there and

you know it you’re going to click on them and if it go it’ll go to the right page if you’ve already taken action on it you see the ad again you’re still going to see the same results but here’s the thing that ad had some of the highest click-through rates anybody’s ever seen why because it’s relevant content and it was so powerful that suddenly you found sites all over the place sorry to use it Facebook start said oh my gosh that’s a great idea high five try to use everybody all the sites out there to this day are using some variant if you know if you go to sites where it’s not direct connections Twitter they trying to do recommendations in a very similar way so suddenly what you have happened is one guy one single data scientist who has an idea who’s looking at the data translate and says oh my gosh we can have a massive lever arm in the business we can massively change the way that we think about this and so what happens as a result the whole industry transforms and that’s really part of the essence of this is about data and what we can do nowadays because of the speed because of the processing the things that you can actually do with the data suddenly you don’t just have to be a back-office shop with data it’s about moving data from all the way from the back end to the front end where we’re building products and actually doing things and with that I think there’s some important lessons to take away and there’s a lot of talk these days about all the great stuff around data it’s not just that it’s big data and soon it will be bigger data and faster data that’s great that’s important but let’s also take for a moment and recognize some of the froth that can happen with this and let’s ask ourselves if we’re really honest sorts ourselves some of the challenges that can happen there and one of those challenges I think is the first lesson is to think about use the data to have a conversation rather than make a decision and this guy is one of my favorite data scientists why because what’s the first thing Captain Kirk says when he’s on the bridge when there’s the Romulans or the Klingons out there what’s he say what do you think Spock right when’s the last time you thought or heard of a data scientist directly on the bridge the bridge of malaria military institution maybe the front office of either a sports team or a fortune 100 company we don’t yet here they do they were far ahead of time and so think about that for a second because what is the analogy of that that’d be like Kirk saying hey we need some data on these Klingons and so somebody would say you know what I’ll go get the data team on it so they send some instant out to the turbo shaft and goes down all the way in the bowels of the ship and then he says okay I’m going to go captain want some data on this they say that’s not the daddy wants just the captain says he wants it give us the data they go okay fine here’s the data they go back up they go to the bridge in the cabinet says what the is this that’s not what I said he goes back down these are as much as I told you so that’s the same exact same thing that happens every day in business why are we putting the data scientists on the bridge the second reason I think why that happened is because when people walk in the room with the data they always put it on the screen and this is crazy Excel looking spreadsheet that people are like well I don’t understand it and everyone’s afraid to ask questions about the data because what’s going to happen a decision is supposed to be made how about instead of having a decision being made we decide to have a conversation about the data may be as a result out of that data will have a decision but instead of that we focus on the conversation because what that will do is that will uplevel everybody it will increase the sophistication of everybody in the organization no longer will data be a power play by those that hold the data because everyone suddenly will be talking about the data so let’s have a conversation rather than decisions number two design your data products with failure in mind I think it’s so easy to have an arrogance that we love to put in our data products let’s take that away and here’s a great example of that is it just to suck this clip here I team up I’m gay that’s right again let’s TiVo I tebow thinks I’m gay Plus tiro it’s a device to record television shows that you pick and then based on what you pick it records other shows that it thinks you like you record Star Trek tebow assumes you like that kind of thing and then when you’re at home it records the xbox so what’s the problem had it record will and grace a couple episodes of Ellen right away the damn thing thinks I’m gay keeps recording queer as folk every episode last night I recorded a Judy Garland call the company just tell them you’re not gay I want to beat it when you make that call exactly I actually tried to outfox it get its going the other way ahead of record MTV Spring Break playboy after dark swimsuit competitions thing won’t budge insists

I’m gay it’s a problem so it’s kind of a funny thing but think about how true that is right well we interact with data products often the thing is either like sorry I can’t do that or it has this kind of coldness right it’s got the data products are cold and you hear the saying if you look up like failures on amazon for collaborative filters there’s all the time like you see all these kind of crazy results of recommendations all right or if you type in in google gps and driving into a river or off a cliff mr. surprised at how many people do that right and it’s like why are we designing these products that that we know are going to fail because of relevancy let’s design them in a way that they have humility that they have a way to failure it’s kind of the same analogy when you have those four or four out errors or where your site is down the fail whale it almost became comical to disarm a product experience that was bad with comedy with humor what about if we did that the same way if you’re designing an insulin pump that’s supposed to do all sorts of great data analysis type things and you’re typing in a terrible number what about if it said did you mean instead of sorry I can’t do that right Pandora does a great job with this when the data product fails us and you put thumbs down it’ll say oh my god I’m so sorry I’ll never do this again it has a humility it has a tone let’s put that back in there data products third make the data actionable let’s not do data vomit it’s so easy just to throw a bunch of data up here this is one of my favorite products it’s a zo it’s a headband that makes you look like a dork when you sleep trust me my kids say it and it tells you how you’re sleeping and so this is great it tells me like I sleep like crap thanks because i needed a 100-dollar device to tell me that right now here’s a data scientist way of looking at it it’s like oh here’s a bunch of scores and numbers that’s great in fact the other day I scored 113 and yet I was terrible and the reason I was terrible was because I was sick and I was coughing but I slept for 10 hours so instead how about this that this device would tell me hey you know what you need to have between 55 minutes in an hour and five minutes of deep sleep in your first three hours of sleep otherwise your equivalent to being drunk for a day I’ve tested it no and what if it also said that hey you know what as long as you have that you only need five hours of sleep the same thing with scales when these scales are out there these fancy scales that measure your weight and report it to the web it’s like great that’s the same thing as a mirror how about telling me what I should do and so let’s and that’s why I’m picking on health care it’s easy to do that because we directly have it the same thing is true when you make a dashboard or anything in your company how about let’s instead of focusing on what the data we can put up there what about what we’re going to do because of the data so I think I’ll stop there and I don’t know if we have time for questions or we want to just jump to the panel let’s go straight to panel