[MUSIC PLAYING] VIACHESLAV KOVALEVSKYI: My name is Slava I tech leads in the Google Cloud I’m leading a small team that are developing several small products One is AI platform notebook that just got announced Now, there is Deep Learning VM and some other small initiative And there is not my name there, because my full name is Viacheslav But Slava is good enough Like don’t even try to pronounce Viacheslav It’s absolutely OK And today with me, I have a co-presenter, my good friend and co-worker, Mike MICHAEL CHENG: Hello, everybody My name is Mike I’m also a software engineer on Slava’s team under Google Cloud VIACHESLAV KOVALEVSKYI: OK So before we jump to actual talk, there are several things First of all, we have Dory Dory is our kind of internal tool, but now it’s external You can also use it to write down your questions At the end of the talk, we will try to allocate some time to answering them Even if we will not have time, we can answer them after It probably will stay open for several days We also might walk out on the corridor and just do a session over there if we run out of time Also, just a small, small disclosure upfront, I have a problem with my throat So I might be coughing I will try to switch off my mic, if I am doing so, but just bear with me just in case OK So we’re going to stay in the stage about the topic that we’re going to present by doing a kind of mock conversation that’s happening between two artificial characters It’s just novice data scientist and veteran developer don’t have a hat But like imagine I’m taking my hat and on Hello, Mike MICHAEL CHENG: Hello, Slava So our team’s been really excited to use Notebooks, but we’ve been having so many problems iterating as a team using Notebooks VIACHESLAV KOVALEVSKYI: Oh, wait, wait, wait Like iterating as a team? So you probably have tried to put it on GitHub or Git to some version control system, right? MICHAEL CHENG: Yeah, it’s like impossible to parse There’s some weird looking JSON text And the other thing we’ve been running into is we’ve noticed that it works on one guy’s computer, but then he tries it on some other environment, tries to push it to cloud, doesn’t work I don’t know Do you have any solutions for that? VIACHESLAV KOVALEVSKYI: Oh, so saying it’s like completely unreproducible? MICHAEL CHENG: Yeah VIACHESLAV KOVALEVSKYI: Version one– OK, OK Yeah, that makes perfect sense OK So now we’re taking off our imaginary hat Question, how many of you guys have tried to do something similar by using Jupyter Notebooks in the production together with your co-worker either through sharing with the Git or any other systems? Have tried at least once? Ouch, yes, yes I feel that pain I feel that pain of you guys, yes So today, we’re going to speak about how to reduce that pain So if you’re working in a big company, your company already have some bunch of the Notebooks that producing useful artifacts Let’s imagine that you have some particular research development or maybe it’s analytics developer And you have an analytics person who have created a nice Jupyter Notebook that’s showing you some analytics data That could be anything that producing value for your analytics department However, in your company, probably there is two separate universe One universe is these amount of useful Notebooks that completely uncovered by any practices of the development that you have on your laptop, machines, or your co-worker’s desktops and another complete disjoint universe that effectively production ready code that have continuous integration system, that has continuous delivery system, that has anything unique And in theory, you have these useful artifacts in one of the universe, and you would like to move with to this another universe Because, for instance, you have analytic department that would like to use the results of that Notebooks And you cannot just give this Notebook to that analytic department, because those two universes, they’re disjoined And what we’re hearing a lot, that moving your solution from your Notebook to actual production ready code might take days, weeks, in some company even months Usual pattern that we’re observing– if people taking Jupyter Notebook, they’re looking at what they have in their production environment and then literally reimplementing everything from that Notebook to plug into the environment that they have Maybe you have these nice Jupyter Notebook Now, you’re writing code to plug it into dashboarding system [INAUDIBLE] side of the company This is kind of strange Just think about it You have a JupyterLab that self-represent information as a dashboard And now, in order to convert it to dashboard, you need to rewrite the whole code on the solution that you’re writing in the production And there is usually no other way And this gets even more complicated If your researcher prototyped a new notebook, you need to restart it from scratch So today, we’re going to show you that effectively there is a way, by enforcing several rules of development

and best practices, that will allow you to use some of the Jupyter Notebooks as is inside of your production if you’re actually enforcing some of the core rules And this is important We’re not saying that you can just took any Jupyter Notebook and use it in production That just doesn’t work that way, the same like I can promise that you can take any Python and throw it in production, it will work No, it just doesn’t work But there are some small subset of Jupyter Notebook that you can And if you’re going to do so, you will see that you will reduce time from the prototyping to production, because now you no longer need to rewrite your solution And by reducing this time, you will increase likelihood that this useful Notebook ever reach your customers There are a lot of artifacts, like Jupyter Notebooks, that useful for a customer that never reaches the customer, because you just cannot inverse time to rewrite it in your production solution So our talk effectively will be about principles that one need to follow in order to be able to use Jupyter Notebooks in the production and example, how exactly you can follow them It will be less about exact ready-to-use recipes, because probably you have your unique CD deployment So we’re not going to enforce anything We’re going to show an example that you can replicate if you want You can even take our example They’re completely open source We will give you links And replicate them as is if they suit your need But, again, it’s more about principles So now, let me quickly walk you through all of them And then we’ll start diving to each and every of them The first one, probably the most important, follow established software development best practices We are going to dive to this in the moment, but this one probably the most important And every other principle is just derivative from here Second one, version control your Notebooks I know that some of you can say, this is crazy I already tried to put a Notebook on the GitHub I run git diff to see the difference diff from my commit And it was just not usable We’re going to show you how this can be solved Third one, Notebook need to be reproducible If you don’t have reproducibility, you cannot have continuous integration You cannot cover it with tests Your Notebook should be parameterizable So you should be able to rewrite some parts of the Notebook without actually touching your Notebook Otherwise, again, continuous integration and continuous deployment just not possible OK So all these principles, they are on this particular link, They are open source It’s just a GitHub ink We’re working with the community Community proposed new one We’re listening to our customers We actually have a principle number seven, who was developed by one of the customers And if you want to hear about it, just feel free to put the question on Dory We don’t have it here yet, because we don’t have nice coverage of the example for this seventh principle OK So this is six of the principle During the talk, we’re going to touch three main products We will be touching more, but three key of them that we need to mention One is AI Platform Notebooks It’s effectively a brand new service that Google have released as of this morning that allow you to use managed JupyterLab Notebooks on top of GCP Second one is Cloud Deep Learning VMs This is effectively our solution for writing [INAUDIBLE] any workload that requires GPU It’s preconfigured environment that are ready to utilize GPUs and has high-level frameworks like TensorFlow, PyTorch, and some others, that optimize precisely for the hardware that we have on Google Compute Engine And the third product is CloudBuild CloudBuild this effectively our CI system So we’re going to touch three of these products during the talk in order to showcase you examples of these principles But before we’re going to jump in, there is a P zero priority that sits on top of all of them that we don’t have on the list, but everyone actually know that it should be there If you don’t have an easy way to use your Notebook, everything else you can forget about it That is the first thing that you need to have And before we’re going to jump to that release that I just show, let me walk you through a new service, AI Platform Notebooks, that actually simplifies for your research department exactly this ability to use Jupyter Notebook So here is a quick– we will be using recordings a lot during our demos Mostly in life, we want to be able to show as many things that we’re showing today Oh, let me come back And let me do it again This is the brand new service And this is a UI where you can create your Notebook You need to answer only two simple question– which

framework you want to build it, to optimize it for, and are you going to use CPU or GPU? With CPU, you’re going to have some low-level inter-optimization for the CPU that we have Obviously with GPU, you have a full [INAUDIBLE] stack reconfigured One click, then we need to have your permission to install Nvidia drivers on your behalf That is it You have a fully configured Notebook with the link that gated by your account that you can use to access your Notebook and just start researching That is it, everything preconfigured in this particular example You have TensorFlow that has access to the GPU under the hood So this is simplicity in several click in which you can get from nothing to fully working Notebook on the GCP Now, with that, we have solved problem of the simplicity Now, we can start covering our first principle And I’m going to hand over the mic to Mike MICHAEL CHENG: Thank you, Slava So let’s cover the first principle Follow established software development best practices Notebooks are code And therefore, you should follow the same practices that you would use for any piece of code This means placing your code base under your version control and code review, making sure your code is reproducible and consistent, making sure you have a comprehensive test suite for your Notebook, separating artifacts per environment– this means developer artifacts versus product artifacts, and integrating a continuous integration and deployment system for quicker duration But you might be like, well, Notebooks to me are just a prototyping space Why does this matter? Let’s consider that you’re a dev working in Android Studio And you go up to your boss And you’re like, boss, I’m a dev working in Android Studio That means I don’t need to write readable code or test Screw that I’m just going to get push F. Your boss is are probably going to be like, Mike, let’s have a talk In the same way you can imagine that that dev in the Android Studio needs follow good code practices, a data scientist or engineer working with Notebooks should follow these same best practices Like your favorite IDE, Notebooks are just a tool And with any tool, there are trade-offs in development, in this case, the speed of prototyping versus a more opaque file format But these trade-offs shouldn’t influence the quality of your end product Let’s jump into the next software development best practice Version control your Notebooks Version controlling is the cornerstone of iterating as a team, as it allows you to develop feature branches in parallel, perform code reviews, and see the revision history so that you know who’s the expert in a certain code area However, for anybody who’s tried using version control with Notebooks, you’ll know that it’s not a very user-friendly experience So in this demo, we’re going to showcase the default behavior, which isn’t as user-friendly And then we’ll showcase the nbdime plug in in development by the Jupyter community which simplifies this behavior Let’s jump into it So click that All right, perfect So say I’m a new TensorFlow dev, and I found this cool repo filled with sample Notebooks that I want to explore So I’m going to clone it into an AI Platform Notebooks instance using the JupyterLab git plug-in, which allows you to clone from the UI I have that load in Great So now let’s take an example Notebook I think a good starting point that is hello world And now, we’re going to run this Notebook through Great So as expected, Hello, TensorFlow! So now, that I am a TensorFlow expert, I’m going to modify my Notebook Hello, GCP NEXT Let’s run that through And as expected, hello, GCP NEXT! Great So now, imagine that I want to share these changes Let’s go to the Sidebar, which shows my changes OK As expected, this file has changed And let’s do a diff of this file So you’d expect, because I’ve only changed one line, that only one line shows up But in fact, let’s run it, and you’ll see you’ll have a bunch of extra metadata Because Jupyter stores all the execution metadata, including inputs, outputs, execution order, that empty cell at the bottom that I didn’t even notice, and metadata about the environment that you’re executing You can imagine that for somebody code reviewing this they’re not going to want to sift through a bunch of JSON to see what exactly has changed and what hasn’t

They’ll just say, let’s not code review that But we can improve on this behavior using the nbdime plug-in So let’s jump back to our Notebook You’ll notice that in the top Toolbar, there is this Git icon If you click this Git icon, you’ll open up the nbdime diff, which provides you a side by side comparison So you can see where exactly you’ve changed– inputs, outputs, the empty cell And if you have to, you can also check out the metadata For those of you, like me, who are not UI people and prefer something more command line, nbdime also has an additional plug-in with git So let’s enable that And then once that’s enabled, we’ll run a diff again So we’re enabling it, right? And then we’re going to diff You’ll notice that now instead of seeing that fat JSON blob, you’ll see a bunch of nicely divvied up sections So you can more easily tell your inputs from your outputs and your execution metadata and so forth Nbdime also supports a three-way merge functionality But in the interest of time, we’re not demoing that here So with that, I’ll pop it back to Slava to discuss reproducible Notebooks VIACHESLAV KOVALEVSKYI: Thank you, Mike So yeah, this is cool And all these actually already fully preconfigured for you on AI Platform Notebooks OK So reproducible Notebooks, we’re going to start with speaking about the problem Many of you probably know how hard to make the Notebook reproducible You need to know environment You need to have your dependency And you know that Notebook can be executed in any order There is many, many problems But we’re going to focus on one additional problem that Notebooks usually have Notebook actually ties you to the hardware that you need to use to train that Notebook What does it mean? Let’s say you have a Notebook You prototype your model Now, you need to train on the real GPU that you want to pay money for This means that you need to change the environment Let’s say during the training you see that there is a bug What happens now? You don’t want to continue paying for the GPU while you’re prototyping, right? So you probably need to detach GPU, start debugging again When you fix, you need to attach GPU, and do it again, again, and again Let me first show how this currently looks like on AI Platform Notebooks So this is a video how you can easily attach a GPU on the Notebook And here is your running Notebook You effectively need to stop it After some point, it will reach a full stop As soon as it reach full stop, you have a menu to choose the particular GPU We’re only showing you GPU that available in that particular zone that you’re using it, count of the GPU And now we need to start it That sounds kind of trivial, but there is a catch here This process of reaching full stop, of adding and reaching full start, is taking, when I last counted– let me open my slides– four minutes This effectively means that, if you have one bug, you need to do this process attaching, reattaching If you have more bugs, like two more, you already spent about eight minutes So effectively, you are facing a tough choice You either need to provision for the peak provision for when you’re training and just stick to it and pay money for it Or you have to do through this process of attaching, reattaching Now, if you have a fully reproducible Notebooks, we can help But first let’s answer the question what exactly it means for the Notebook to be fully reproducible Effectively, it doesn’t mean much It’s only several items First of all, your Notebook need to have notion about the environment that was used to create that Notebook In our particular case, it could be TensorFlow M20, which is Deep Learning VM, and also environment on the AI Platform Notebooks Second part, all dependency should be installed by the Notebook itself, or it should be inside of the metadata of the Notebook We already have, in our environment, a lot of dependencies that self-test between each other So we do expect that majority of the workload will work as is just on the environment But you need either somehow embed additional dependency or install it in the Notebook And the last one, the Notebook should be executable from the top to bottom one by one Because, obviously, it cannot be reproducible if you’re expecting not linear order execution Now, if you can create Notebooks like that,

we can give you background training What does it mean? Imagine you have the Notebook on your instance that I just showed you, prototyping this notebook This is AI Platform Notebook instance And it’s only with CPU If you have a fully reproducible Notebook, you technically can send it to the background entity That could be different items This could be Deep Learning VM This [INAUDIBLE] entity could be AI Platform training jobs This could be [INAUDIBLE],, thanks to Ferenc And that background entity can do a training for you, give back your results And here’s an actual demo Let me start this video So we are inside of one of the Notebooks Our Notebook’s based on top of the VM So effectively, if you have a Notebook, you have a VM on GCE that you can do whatever you want Now, with this repository that we showed you with the principles, there is a special script that enables experimental plug-in that we’re going to showcase So this is the process of installing that experimental plug-in, very simple You need SSH on your VM As soon as you’re inside, you just need to clone the repository link of which we give you at the beginning And one script that enables background training So I just showing this process real quick You cloning the repository, and you executing the scripts– nothing, nothing too complex here I think it’s called enable background something– yes, enable_notebook_submission Yeah, enable_notebook_submission Don’t forget to make it executable before actually executing So this process will take some time And here, I just rewinded That’s it, done Now, let’s see what have changed on the Notebook side We’re opening the Notebook And on the Notebook, you will notice one additional button for any of the Notebooks Let’s use the same example, test example, that I think Mike have used in his presentation Let’s take one TensorFlow Notebook We are cloning them Done, it’s cloned Let’s take one of the Notebook that is considered to be fully reproducible according to our definition And now, there is a button on top, this one This looks like a nice, nice, nice GCP logo If you’re going to press that button, it will ask you a simple question Give me a configuration that I need to use for training, amount of GPU, type of GPU That is it Literally, that is it It’s just now training everything in the background And you can go in the GCE And you can actually look that you will have one additional machine that will be started at some point with the configuration that you have requested As soon as that machine finishes training, it will be self-terminated And your results will be propagated back to your Notebook with all the cells post populated So let’s come back to the actual Notebooks As soon as VM is removed, this means that the ground training is finished We can come back to our Notebook There is a folder Job Results, where you will see date and particular Notebook This Notebook has all the cells post populated This Notebook has been executed on the GPU somewhere else And you have paid only for the time of the training that took for that Notebook You have not paid for GPU during prototyping Even more, you can send as many Notebooks for background training as you want To give some more details what exactly have happened when I press that button, so we just showed you a Notebook in [INAUDIBLE] AI Platform Notebooks when I cloned some Notebook Second, the plug-in have sent that Notebook to Google Cloud Storage and have created background entity In this particular case, it was Deep Learning VM that has exactly one to one mapping with the environment We have user regionally TensorFlow environment that has created the same end version of Deep Learning VM That has downloaded that Notebook from Google Cloud Storage Then we have used Papermill, developed by Netflix for executing Notebooks in the background, and have created resulted Notebook with all the cells post populated And as one can imagine, we uploaded the results through the Cloud Storage Back In this particular case, if your Notebook is not up, your results still start on Google Cloud Storage You can retrieve whenever it’s needed And the best part, we self-terminate the VM as soon as this is done This is a quick link This is a link for this extension You can go it and install it on your AI Platform Notebooks It’s not going to work on other places outside [INAUDIBLE] Actually, no, let me rephrase If it will work, let us know, but it should not [LAUGHTER] It’s not that we deliberately making the thing blocked, but we’re using a lot of stuff that we’re expecting to be there So if you’re just trying to install it on [INAUDIBLE] Notebook probably it’s not going to work OK With this, just think about this You now have ability to execute Notebook in the background,

right? You have reproducible Notebooks What else do you need to build the, more or less, CI that will test functionally your Notebooks? Not much, you just need to make sure that you’re verifying that Notebook is executable, not only execute, but to verify that it has been executed correctly with one by one So we’re going to walk you through the CI that already fully configured that you can easily replicate if you want from the GitHub that works like this Notebook instance sends changes to the GitHub, and GitHub already there Let me show you an example how I am pushing a broken Notebook to show you that CI actually capable to check something to the GitHub So here is a quick video I’m coining a demo notebook I have my Jupyter Notebook instance I’m going to call on the demo repository with the CI Our CI comes with a simple Notebook, demo Notebook So I’m going to clone it I’m going to break it And I’m going to push it to GitHub, just to make sure that I get paged I mean, I’m not going to actually get paged My pages are not attached to that GitHub, but CI still [INAUDIBLE] to find me So here is a notebook This is a Notebook that we’re using our demo Let me write something gibberish, gibberish I’m pretty sure that is not a Python So I’m going to leave it as is And I’m going to push these changes to make sure that CI will see it We have to git UI integrated, so the pushing is fairly simple, [INAUDIBLE] push And the last part, actual push, need to be done from the CLI, because it’s asking your log in and password We’re working on moving even that part to the UI, but it’s not there yet OK So I have done a broken push of the Notebook I just introduced some horrible things Next, what will happen Cloud Build on the GCP will see that there is a new commit in GitHub It will pull this, so this can meet And they will execute exactly same process that we just showed you with that UI It will look on the which environment has been used for creating Notebook It will spin up background training And it will start this background training Here is a demo of the second step So effectively, as soon as commit went through– here is the actual commit– this yellow dot shows that training is in progress We can go in the Cloud Build And we can observe the log of the build process Inside of the log, you will see everything that’s going on in the background of the VM So we can actually look and investigate what’s going on there if something is broken, full output of your Notebook As soon as it’s done, Cloud Build is capable to figure it out if the execution happened successfully No, it failed Failed, because we have a broken cell on our Notebook So now GitHub will tell you that your Notebook is broken And this is actually cool So this test literally tests your Notebook And not only that, it give you back the result as another Notebook with all cells post populated It supports a result for each build for Cloud Storage So you, when you got paged, because someone just broke the master, can have some sort of an investigation So this is last part that effectively shows you how quickly you can look on your GitHub commit and download the Notebook that got executed So this is a broken one If we’re going through the Cloud Build, there is a link to particular artifact that it has produced, which is a broken Notebook that got executed You can easily download it on your AI Platform Notebook instance with just one [INAUDIBLE] commands So here is a copy As soon as copy is done, you can open it and see where exactly the problem is Let’s open it And bam, here is the problem Here is the problem Here is the commit I got notified I got Notebook I got everything that they need to fix it Then I can easily revert it and basically do it the same This is quick rewind I’m pushing again I’m building on background again, again, again, again, just to make sure that now it will show you a green dot instead of the red dot to make sure that CI actually doing what it’s supposed to do So we have finished Bam, success Now, the fix went through I have broken Notebook I have fixed Notebook And you can actually go to that CI and see that two particulars commit in action and play with it OK So this was two things, background training and consequently, with the background training, ability to execute CI Now, I’m going to hand over back to Mike to cover the last two principles MICHAEL CHENG: All right, so reusability of code is another best practice of software development And the best way to do so is through Notebook prioritization So you can think of production grade Notebooks as functions or job specs You’ll take in some sort of inputs, whether that be a data set or a list of variables And then you’ll generate some output This could be a model It could be a dashboard, whatever your use case calls for So for example, let’s say that I take a Notebook that generates a model And at some point during the model, you’ll probably have to tune your hyperparameters

So your data scientist, as their first iteration, is probably going to change these manually, and then rerun it But he’s going to get tired of doing that So he’s going to write some boilerplate to do some sort of grid search Great So now you have grid search But then he’s going to get tired of waiting for that grid search to run So now, you go write some extra boilerplate to do it in parallel So now, you have parallel grid search on top of your model But consider that you could separate these two things Imagine that you have the original Notebook that generates a model and, along with Slava’s background execution, you send this model training job with extra parameters, so that you can decouple your Notebook from your Notebook executor and your grid search can be reused along with your Notebook Basically, why edit Notebooks when you can use them directly? Let’s consider a different use case where we’ll show how this is possible So let’s say that you have a business intelligence dashboard I’m an analyst for some bike rental company And I have some data that’s stored in a BigQuery So I’ve written this demo Notebook, which takes in a start and an end date Query is BigQuery for usage in the data range Then it generates a top 10 list of my most-used stations and an interactive station map So you’ll notice that– let me full screen that, actually OK So you’ll notice that we’ve tagged the parameter cell aptly with the tag parameters This allows Papermill, which Slava mentioned before, to substitute in an additional cell after this if one is provided with parameters to override the defaults But first, we’re going to run this Notebook through, just so that you know what to expect with this demo Notebook So let’s run it through– executing, executing, executing All right, and then we’ll head back to the top So as you can see, we’re using the defaults from 2000 to 2020, basically our entire data set It’s generated our top 10 stations, the top being Central Ave. And we have our interactive map For those of you who can’t see, the middle section is more popular than the rest of SF All right, so now let’s reuse this parameterized Notebook using Papermill So now, let’s say that I want more recent data So instead of querying the entire data set, let’s just query the data set for last year’s data So we’re going to set the start date to 2018, January 1st and the end date to 2019, January 1st Run that So now, it’s executing in the background Once this execution is finished, it’ll generate an additional Notebook file We can nbconvert this Notebook file into HTML to make it more readable And then let’s open up this parameterized version So as you can see, as expected, we have a new parameter cell that overrides the other parameter cell Now, we’re only checking for data between 2018 and 1st of 2019 You can see the top 10 stations list has been updated to the ferry building being the highest And the station map has also updated So as expected, for anybody who’s lived here, SOMA is now the most popular area The key here is, by parameterizing, I can reuse my Notebook without changing it to generate an updated dashboard So now that you have a CI system that generates a tested, reproducible, and parameterized Notebook, let’s automate the generation of the artifact with a continuous deployment system So let’s recap from Slava’s CI The user is working in his Notebook instance He pushes the change to GitHub This kicks off a Cloud Build, which background trains the Notebook and uploads the results to Cloud Storage Let’s add an additional step to that If the tests are successful on Cloud Build, it’ll also upload a payload to Cloud Functions And what this Cloud Functions payload will do is it’ll perform the same build, just with parameters So it’ll send a parameterized version to Cloud Build, background train that, send the results the GCS Now, let’s start the orchestration

So for this we’re going to use Cloud Pub/Sub, which is a message passing framework, and Cloud Scheduler, which sends a message on a cron-like schedule So the first time the cloud function is deployed, if you set it up correctly, it’ll create a pub/sub topic that it subscribes to Anytime this pub/sub topic is sent a message, it will kick off a request to a Cloud Functions, and the function will execute To publish requests, we’ll be using Cloud Scheduler, which sends messages based on time But this could be any interface For example, if you have a trigger in your GCS when new data is uploaded, you can use that If you create a manual Notebook executor, you can send job specs manually that way Once you have that automation set up, that’s it So now anytime the Cloud Scheduler comes around, it sends a message to pub/sub Cloud Functions picks up that message, executes a build And the user can just see the results in Cloud Storage So let’s showcase that in action Let’s say you want to generate weekly reports of this station map And so let’s run it We’re going to start with the CI again So to recap the steps at the CI, we’re going to clone our repo Then we’re going to run the tests on the Notebook to make sure that it works If the tests pass, we’re going to upload two payloads The first is the Notebook plus the execution code This will be used later in the function And the second is the function code itself, which will be deployed as the last step of the CI job All right, let’s take a look at the function So the function defines its own Cloud Build spec, which takes in a start and an end date, puts that into a parameters file, and then executes the Notebook instance with the given parameters file, pretty simple We’ll jump back to the main function You’ll see that we’ll take in this Cloud Build, and we’ll add an additional parameter source which points to the Notebook and the execution code This way we always have the most updated tested Notebook generated by the CI system We’ll also take in a date from the request Or if no data is provided, we’ll use today minus a year, because our data set is not up to date And then since we want to generate weekly reports, we’re going to subtract seven days and set that as the start date And then we’ll execute it You also see it’s set up to trigger from Cloud Pub/Sub with the topic demo-notebook Anytime a request comes in here, it’ll execute the function So let’s hop over to the scheduling side We’ve created a job here And let’s inspect that job So you’ll notice that I wanted it to run every Sunday at midnight So I’ve set it to that cron job or that cron schedule And then we’re sending a note request to Pub/Sub demo-notebook with an empty payload This will use the default current date All right, so this is one example of the execution This executed two Sundays ago at midnight As you can see, we’ve substituted in the parameters, so that it’s from 3/24 to 3/31 And we run it with these parameters And we ran it successfully with these parameters So if we go down to the logs, we can go see the output, this directory over there And you’ll see that, indeed, we’ve created a parameterized Notebook And we’ve also nbconverted out two files So let’s inspect those So the first is, as we’ve shown before, just readable code in line Yeah You’ve seen this in the last demo The second is a clean version, which is useful if you’re doing something similar to this, using Jupyter Notebooks as dashboards So as you can see, we updated the date The top 10 stations has changed And our map is updated and fully interactive And that’s it So now, you have an automated dashboard generation pipeline from a single Notebook So if you’re interested and trying to demo out yourself, the code is available at this bit.ly link, notebooks-ci It includes the full CI/CD pipeline plus that demo-notebook

As a bonus, our partner team has created a cool demo UI for publishing in Jupyter Notebooks as dashboards So we’d like to showcase that here So here we’ve ingested those weekly reports into our dashboard UI As you can see, it’s fully interactive as before, same deal You can scroll through your top 10 stations You can interact with your station map as you would have with any Notebook You can hop between the codeless version and the coded version in case you want to quickly see what exactly you ran to get a certain output Let’s hop back to codeless You can jump between older versions So let’s say we want to go back in time and compare usage a month ago, right? And then you can also download it from this UI in case you want to play around with it directly And then if you want see the entire versions list, it’s there for you You can also imagine that later on there’d be functionality to manually submit a message through your pub/sub and then kick off a run, so you have the most recent data Yup That’s it So this is still very much in the ideation phase, but we wanted to showcase what you can build end-to-end with a Notebook-centric system And if your use case is similar to this Jupyter Notebooks as dashboards, we encourage you to reach out to us And with that, we’ve taken you from a single Notebook all the way to a production ready product using a Notebook-centric system by following these best practices So to recap, developing Notebooks should be no different than developing elsewhere Use your software engineering best practices Second, version control your Notebooks and leverage nbdime, which is a Jupyter plug-in, to make that easier Make sure your Notebooks are reproducible and consistent across environments This simplifies collaboration and unlocks background execution Make sure you test continuously That’s pretty self-explanatory Make your Notebooks reusable through parameterizations, so that they can be used as job specs And automate your artifact creation, which helps you get your work out faster So to recap where you can find these, the first one is left as an exercise to the listener You know, follow your best practices But for the rest of them, we’ve integrated the plug-in shown, JupyterLab Git and nbdime on to AI Factory Notebooks The other demos are available through the links we mentioned before The back end execution plug-in is available at nova-extension The CI/CD plus the demo-notebook is available at notebook-ci And the manifesto itself is available notebooks-best-practices And that’s it, almost [APPLAUSE] VIACHESLAV KOVALEVSKYI: Wait, wait, wait, wait, wait, wait, wait, almost There is always an almost You know, one more thing MICHAEL CHENG: So Slava, you’ve given me a lot to think about I’m definitely going to try Jupyter Notebook-centric development But I have one more question You’ve been focusing a lot on AI Platform Notebooks How does that differ from Colab? VIACHESLAV KOVALEVSKYI: That is a really, really great question First of all, how many of you guys have heard about Google Colab? Nice OK How many of you have heard about AI Platform Notebooks I mean, all of you We just spoke about it, right? [LAUGHTER] But, OK So this is a big elephant in the room, how to compare it And thank you, Mike, for asking this very great question Let’s first focus on the Colab So if you’ve tried Colab, you probably do know that it has a really nice time For start, it’s one click If you have not tried, please do It give you some free resources, like really free, 1 free GPU, 1 free TPU And it has amazing collaboration features If you have not tried, just go to try it There are some cons Nothing comes for free And the cons here– you don’t have persistency So effectively, the VM behind that you use time to time, that is what gives you the simplicity and time to access It’s not possible to use more resources You can’t say, I need for my production ready environment 8,100 GPU Doesn’t work You don’t have a versioning You cannot say, I need my Notebook with TensorFlow 112 You don’t have nice version control integration, as Mike showcased you on the AI Platform Notebooks

And you don’t have integration with GCP So effectively, cannot use all the environment inside of your GPU project On the other hand, if we’re comparing with AI Platform Notebooks, the biggest problems that it gives you persistency You have your resources, like Drive or anything that you’re attaching, available to you It’s not going anywhere We do have introduced nice integration with Git that was showcased to you earlier It has flexible resources, whatever you want Whatever is available in GCP, you can attach it there It has nice GCP integration And it has versioning You can say that you do need a particular version of TensorFlow for your workload Again, nothing comes for free Cons– not free Collaboration in the same Notebook, effectively not yet supported Not yet supported, because of the limitation of JupyterLab We’re working close with the community to address it Time to start is higher So here are the two solutions Today, we spoke with you about AI Platform Notebook But there is one thought that I want to convey Think about Colab as a really powerful tool that gives you initial scratchpad It’s like comparing Notebook with the G Suite Docs If you can use Notebook, if you can use more simple solution for workload, stick to it You probably don’t need anything else However, if you do need more enterprise readiness, like persistence, more resources, integration with GCP, a more powerful permission system, you don’t need to stick with AI Platform Notebooks This is a really good to test for yourself Can you use Colab? Yes, just use it It’s amazing tool If whatever you’re doing is just scratchpadding, that is amazing tool And we just covered AI Platform Notebooks Now, it’s actually thank you guys for being here with us [APPLAUSE] [MUSIC PLAYING]