[MUSIC PLAYING] JESSE ENDAHL: So quick background on Fleetsmith, we automate device setup, intelligence, patching, and security for your company’s Apple devices One of the coolest things we do is make it so that factory fresh laptops and iPhones and iPads can be drop-shipped directly to remote employees without prior manual setup by IT And as soon as they’re unboxed and connect to Wi-Fi, they automatically enroll All the apps and settings are installed automatically over the internet, securely We built our product on GCP, in part, due to Google’s high bar for security And we actually roll our own Kubernetes cluster on top of GCE and use Hashicorp Vault for secrets management, PKI, and crypto as a service If we were actually going to deploy Kubernetes today, we would use GKE We’re only rolling our own cluster for some legacy reasons MAYA KACZOROWSKI: Awesome JESSE ENDAHL: So this talk assumes that you have some knowledge of containers and Kubernetes and focuses on practically protecting your workloads in GKE Specifically, we’re going to dive into what’s your responsibility vs. what Google handles for you I will say also, just in general, security in the cloud is that shared responsibility between your cloud provider and you Google does a lot of things to protect the underlying infrastructure, like encryption at rest by default, and also providing capabilities to you that you can use to protect your workloads, like I am As newer infrastructure models emerge, it’s not always easy to figure out what you’re actually on the hook for So that’s really what we’re diving into on this talk So let’s jump in and review some Kubernetes basics, from a security point of view, to make sure that we’re all on the same page So within a Kubernetes cluster, there’s only really three core, fundamental pieces These are the parts of the diagram that you see in dark gray, and let’s walk through them from the bottom up So at the bottom layer, you’ve got the storage That’s etcd Think of this like the Kubernetes hard drive Next up, you’ve got the master that controls your entire cluster And then the next layer up, you’ve got the nodes That’s the virtual machines that are running your pods, and pods are just collections of one or more containers So let’s now walk through the damage that can be done at each layer, from the top down So the top layer, you’ve got the containers, again, in the pod If an attacker can control a container, they can do things like abuse compute to do things like mine cryptocurrency, for example, schedule arbitrary workloads on the node If they’re able to escape the container, potentially access customer data And if there isn’t a strong security boundary there, they could potentially escape to the node From the node, if an attacker can control a node, that means they can control all the pods running on a node They can do things like abuse compute at the node layer, kill existing workloads, and run arbitrary workloads They also may try to attack the master, the next layer up, and do things like denial of service attacks, for example So next up, you’ve got the master itself That controls your entire cluster It controls workloads, where they’re scheduled, everything So if an attacker compromises your master, that’s pretty bad They can completely take your environment offline And short of killing the entire cluster, there’s really not much you’re going to be able to do to be able to react to that type of attack Finally, etcd Again, like you’re Kubernetes hard drive, that stores the entire cluster state So it’s kind of like stealing a hard drive They’re going to get all your configs The attacker could create, modify, or destroy your entire cluster, and they may even be able to recreate or model your entire cluster elsewhere MAYA KACZOROWSKI: So let’s look at Google’s shared responsibility model and how that applies to Google Cloud So you might have heard this term, shared responsibility model, before It refers to how, in the cloud, the cloud provider and the user split the responsibility to protect what you’re running So to be successful at this, your cloud provider has to be really clear about which aspects of security they’re responsible for and which aspects of security you’re responsible for Generally, providers are responsible for securing infrastructure, and you’re responsible for securing workloads Let’s break this down for Google Cloud So as you can see in the diagram here, Google Cloud manages the blue bits, whereas the user manages the green bits For infrastructure as a service, on the left, users are responsible for things like network security, the deployment itself, and web app security, whereas Google is responsible for just the core infrastructure For platform as a service, in the middle, Google takes on more of the responsibility, including components like authentication, identity, and operating the service The user is still responsible for deploying and securing their application And on the right, for software as a service, Google is also responsible for these pieces And so you, as the user, are only responsible for protecting your data and content using the appropriate access policies

This breakdown is actually available in a white paper that we published late last year on incident response, and I’ll link to that at the end of the talk But looking at this, where does GKE fall? It’s probably closest to the middle That is Google manages a lot of the day to day running of the service, but you’re still responsible for your specific workloads that run on GKE So let’s quickly recap what actually happens at that bottom layer, first– the blue that goes across all of the bottom– and how Google protects those services on Google Cloud So one of the key security differentiators for Google Cloud is our core infrastructure The infrastructure doesn’t rely on any single technology to make it secure Rather, we build security through progressive layers that deliver defense in depth So starting from the bottom up with hardware, Google purpose builds hardware in our data centers, from chip to network, in order to ensure a secure supply chain Service deployment We have a build, test, and deployment pipeline that ensures that code running on our servers and our data centers is meant to be there, so that any application binary that runs the infrastructure is purposely deployed This lets us enforce requirements like ensuring that code is reviewed by a second reviewer and other such policies and requirements we might have Once deployed, we don’t assume any trust between services They actually need to authenticate to each other in the environment So we built this infrastructure to be multi-tenant from the get go At the storage layer, data storage and infrastructure is automatically encrypted at rest and distributed for availability and reliability We also limited strict data access controls and robust logging features at the storage layer, underlying several of the applications that you can use on Google Cloud Identity and access All identities, users, and services are strongly authenticated This authentication for users, for example, using hardware second factor, helps protect against phishing attacks Secure communication Communications over the internet to our cloud services are encrypted, and communications between services internally are authenticated and encrypted when they go between physical boundaries And operational and device security The scale of our infrastructure allows us to absorb a lot of attacks, including DDoS attacks And our ops team works 24/7 to respond to threats and incidents All of this is explained in detail in another white paper called our “Infrastructure, Security, Design White Paper,” that I’ll also link to at the end of the talk But just note that the way that Google approaches a lot of these things is unique Anything that runs on top of this stack, including Google’s workloads, but also Google Cloud workloads that our users want to run, benefit from having these underlying protections So going to GKE, beyond that base infrastructure, let’s look at Kubernetes If you’re running Kubernetes yourself, for example, on GCE, on prem, whatever it happens to be, you’re responsible for managing all of it, including the control plane and day to day operational work We often see users start with their own deployment, and then when it becomes too much to handle, migrate over to GKE If instead, you use GKE, Kubernetes engine does a lot of the heavy lifting for you So for Kubernetes Engine, at a high level, the items on the left, Google is responsible for protecting, in addition to the underlying infrastructure that I mentioned already, the Kubernetes distribution So Kubernetes Engine makes available the latest upstream version of Kubernetes, supporting several minor versions Providing updates to these is Google’s responsibility The control plane In Kubernetes Engine, Google manages the control plane, which includes the master VMs, the API server, and other components running on those VMs, and the etcd database This includes upgrades, again, including patching, scaling, and repairs all backed by an SLO The operating systems that your nodes run, such as Container Optimized OS, or COS, and Ubuntu– Kubernetes Engine promptly makes any patches to these images available Note that this is the system running on your nodes, not the OS running inside your containers Think about containers having two OSes when you think about this And the last piece that Google manages for you are Google Cloud integrations for IAM, cloud audit logging, cloud security command center, Cloud KMS, Stackdriver, et cetera These enable controls available for IS workloads across Google Cloud directly in GKE Conversely, the user is responsible for protecting the pieces on the right, the nodes that run your workloads, including VM images and your configurations You’re responsible for any extra software installed in the nodes or any configuration changes made to the default. You’re also responsible for keeping your nodes updated Google provides hardened VM images and configurations by default, manages the containers that are necessary to run GKE, and provides patches for your OS You’re just responsible for upgrading If you use node auto upgrade, it moves that responsibility of upgrading the nodes back to Google And you’re responsible for your workloads, including your application code, Dockerfiles, container images, content, and your running containers and pods This includes leveraging GKE features and other GCP features to help protect what you’re running in containers So let’s go through a couple of the components here, the control plane, the worker nodes, and then your workloads, to see how we would protect each piece

So the control plane, first Google, I just mentioned, is responsible for securing the control plane, which is a component of Kubernetes that manages how Kubernetes communicates with the cluster and applies the user’s desired state to the cluster This is Google’s responsibility What does that mean? Well, if you look at how GKE’s architected, in GKE, the hosted control plane components run in a Google owned GCP project And they control resources that are in a user’s GCP project Each customer’s control plane components run on GCE instances, and these instances are single tenants, meaning that each instance runs the control plane and its components for only one customer So what’s actually in that control plane? A couple of different pieces First, the master VM It’s a set of virtual machines that runs master components Like I just mentioned, these are single tenant GCE VMs in a Google owned project Google uses Container Optimized OS, or COS, for these and manages the health of these VMs, including updating the OS The API server, scheduler, controller manager, and other controllers run on top of these master VMs These are the typical functions of your Kubernetes cluster After all, the API that you’re interacting with when use GKE is the same as a normal Kubernetes API that you get by running it yourself It just happens to be managed by Google The control plane also manages etcd, which is the database behind Kubernetes which maintains state Just like the rest of GCP, content stored in etcd and GKE is encrypted at the file system layer by default So you don’t have to do anything to get that Etcd only listens on two TCP ports, one to communicate with the Kubernetes API and the other for server to server communication When transmitting traffic from one entity server to another, these communications are protected using mutual TLS Next piece, the control plane also runs the cluster CA and manages the cluster’s root of trust key material Each cluster has its own root certificate authority The API server and kubelets rely on Kubernetes cluster root CA for trust Each cluster runs its own CA, so that if one cluster’s CA would be compromised, no other cluster’s CA would be affected An internal Google service manages route keys for the CA, and these are privately held by Google outside of the cluster IAM authentication and authorization is handled by the master, as well This works the same way for the Kubernetes API server and etcd, as it’s done for other Google Cloud Services And lastly, the master includes the audit logging configuration for the master components This provides a detailed record, available in Stackdriver, of calls that are made to the Kubernetes API server That was quite a lot So the goal here and throughout Kubernetes Engine, as you’ll see, is to give you strong security everywhere, control where you need it, and sane defaults in case you never quite get around to configuring that thing yourself So keeping the goal of sane defaults in mind, we’ve been making hardening changes to the control plane over time So let’s see what’s actually changed, recently Let’s look at the change in some of the recent GKE releases First, in 1.7, the Kubernetes dashboard in GKE had its highly privileged admin access removed You might have heard of a couple of public hacks of Kubernetes, in which case, you’ve probably seen the dashboard prominently featured Tesla, Aviva, Weight Watchers were all affected in late 2017 and early 2018, where they had their dashboards deployed on the public web, not password protected, and had privileged cloud service account credentials in there Attackers found the dashboards, went in, took the credentials, logged into to the cloud service, and started cryptocurrency mining So on GKE, we had actually made this hardening change before these attacks occurred In 1.8, GKE enabled role-based access controller, RBAC, by default in the V1 API RBAC is a method of restricting access to resources based on roles and permissions In Kubernetes, it’s implemented to allow you control access to pods, deployments, config maps, jobs, basically, anything you can think of It’s meant to replace the prior authentication method of attribute based access control and provides a more intuitive way of handling permissions By using RBAC, you can more easily follow the principle of least privileged to set rules with minimum permissions to resources inside your cluster In 1.8.3, GKE enabled Cloud Audit Logging with the default Kubernetes policy, so that both your GCP API and the Kubernetes API audit logs are written to Stackdriver for admin activities This is to easily go back and see who did what and when in your cluster In 1.10, in response to the attacks that I mentioned earlier that were happening in the wild against Tesla and others, we– I just lost my mic I just lost my mic OK, good In response to the attack, which occurred in the wild, against Tesla and others, we disabled the Kubernetes dashboard by default in GKE Instead, you can use the Cloud Console

to have the same functionality to monitor your clusters The next bit requires a little bit of explanation So Kubernetes offers several authentication methods In GKE, the supported methods are OpenID connect tokens, x509 client certificates, and static passwords GKE manages authentication for G Cloud by using the OpenID Connect Token method, setting up the Kubernetes configuration for you, getting an access token, and keeping it up to date The other authentication methods, x509 certs and static passwords, present a wider surface of attack So basic OS has never really been a best practice And x509 certs in Kubernetes are poorly auditable So unless your application specifically needs these, you shouldn’t be enabling these in your cluster So starting in 1.12, GKE no longer generates these credentials for you by default, when you create a new cluster And also in 1.12, GKE removed access to the GCE legacy metadata server APIs for your cluster Some practical attacks that were happening in the wild against Kubernetes rely on access to the VMs metadata server to extract the node’s credentials Notably, Shopify, last year, had a bug bounty, where a researcher accessed this metadata and had broader access to the cluster than expected So we removed that by default Note that all of these hardening efforts that I’ve been talking about change the defaults for new clusters We can’t do this for older clusters to allow for backwards compatibility So if you have an older cluster that you’ve been continually upgrading, please go in and make sure to change some of these defaults yourselves to get the leading edge security So critically, though, Google is also responsible for patching anything running in the control plane We have a high bar for security, but bugs do occur in open source projects and things that are running in your clusters So we want make sure to properly patch these Google automatically upgrades the master VMs, but you’re responsible for upgrading your nodes And I’ll talk more about auto node upgrade in a second So since there are so many different pieces here, let’s quickly go over what this looks like in practice It depends where the vulnerability is If the vulnerability is in the kernel or an operating system, Google will apply the patch to affected components, including obtaining and applying the patch to host images for Kubernetes, COS, and Ubuntu Spectre, and Meltdown, and L1 TF are examples of such vulnerabilities If the vulnerabiliy is in Kubernetes, Google is involved in Kubernetes’ product security team and often helps develop and test patches for Kubernetes vulnerabilities Since GKE is an official distribution, Google receives the patch as part of the private distributors list If the vuln is in a component used in GKE’s default configuration, for example, Calico’s container network interface, CNI, or an etcd, then in this case, Google may receive a patch from upstream Kubernetes from a partner or from the distributor list of another open source project And lastly, if the vuln is in GKE itself, for example, if it’s discovered through Google Vulnerability Rewards Program, then Google is responsible for developing and applying the fix In all of these cases, Google makes the patches available as part of general Kubernetes Engine releases, which are the patch releases and bug fixes as soon as possible, given the level of risk, embargo time, and any other contextual factors Where the vulnerability is severe enough or user action is required, we publish a security bulletin detailing this information on our website So that was the control plane Now, let’s look at the nodes– the worker nodes For your nodes, Google does most of the hard work by providing the components that you need and the patches that you need for those when there’s a security issue But it’s still your responsibility to upgrade in order to reap those benefits In Kubernetes, your worker nodes are where you run your workloads Your worker nodes in GKE consist of a few different surfaces that need to be protected, including the node OS, the container runtime, Kubernetes components like the kubelet and kube-proxy, and Google system containers for monitoring and logging Google provides all of these components and, where necessary, you decide what you want for your nodes So you get to choose the operating system between COS and Ubuntu, and you choose a runtime that you’d like Separate containers run for Kubernetes components, like kube-proxy and kube DNS, and Google specific add-ons to provide logging, monitoring, and other services Google’s responsible for those containers In all of these cases, Google is responsible for patching these components, but you are responsible for deciding when to upgrade to apply those patches So for example, a patch to COS, to container D, or Kubernetes would all be provided by GKE, but the user has to upgrade to apply them Let’s talk a little bit more about the OS choice GKE provides two options for your node OS Again, this is the OS running on your node, not the OS running inside of your container

You can use Container Optimized OS, or COS, and Ubuntu As mentioned earlier, GKE’s control plane uses COS So you’re deciding what you want to run on your worker nodes So why should you consider COS? Well, it has a couple of nice security properties COS is based off of Chromium OS Google maintains all of these components as it is able to rebuild from source if a new vulnerability is discovered and needs to be patched, often allowing us to make a patch available sooner than if using a third party distribution COS is minimal and has a smaller attack surface COS is purpose built to run containers and so has a smaller footprint than other OSes This means less code, less complexity, and therefore less patching There’s less stuff to patch if you never had the component in the first place COS is hardened with good security defaults I really want to emphasize this This is very critical for some of the recent vulnerabilities we’ve had, like runC These were preventable because of COS’ security properties A firewall restricts all TCP UDP traffic, except SSH on Port 22 And load pin prevents kernel modules from being installed A read only root file system and verified boot makes persistence for attackers more difficult COS also allows security features like Satcom to further constrain your application to its normal behavior and EVPF for third party security products to monitor Lastly, COS offers automatic updates So in GKE, we provide COS updates on a weekly basis as part of the normal release process Your nodes will automatically download the updates in the background and then upgrade when you reboot So we’ve been talking about nodes and how your primary responsibility for these nodes, to protect them, is to update them To make sure you benefit from the hard work that Google does, to patch the pieces running on your nodes What if even that, you could just hand over to Google? And that’s exactly what node auto upgrade is It’s funny I think we talk a lot about node auto upgrade as being a reliability feature, but I entirely think about it as being a security feature Node upgrades help keep your nodes up to date with the latest, stable version of Kubernetes This means you stay up to date with feature changes, but it also means that Google can apply a security patch to your nodes when it’s critical Node auto upgrade is available for COS and also, more recently, for Ubuntu So you have no reason not to turn it on So I mentioned that node auto upgrade keeps you on the latest, stable patch How does this work if there’s a newly disclosed security vulnerability? Well, a patch with a vulnerability fix is rolled into the first possible release after public disclosure Often, this is the patch that’s released the day of a disclosure, but then takes multiple days to actually roll out and be available everywhere Then several weeks later, once a release is considered stable, it’s applied to the whole fleet For some security patches where it’s politically critical, this is actually done much sooner So why would you not turn on auto grade? Well, you might need deeper validation of an update yourself or want additional control and predictability for your SRE team to be on call If this is the case, consider turning on auto upgrade with a maintenance window, so that it will only apply it during that time period We also see users turn off note auto upgrade around major launches or events like Black Friday, where changes are particularly sensitive to their environment Node auto upgrade works by creating a new node and moving the workload over to that node Let me quickly explain how surge auto upgrade works, which are newer developments to node auto upgrade First, we create a new surge VM And once it’s up and running, create the node on that VM That takes a few minutes Once the new node is ready, the old node is drained of the workload And once it’s fully drained, the workload resumes on the new node Draining the old node takes a few seconds And with the workload up and running on the new node, we can now delete the old VM So we heard in the keynote this morning, talk about live migration This is applying the same concept for your workloads, for your nodes running in your environment Live migration also underlies all of GKE, so you get that automatically, anyways And because of the security benefits and ease of maintenance, we’re slowly moving to making auto upgrade the default for new clusters in GKE So for clusters created in the UI since about October, auto upgrade is checked by default For clusters created with G Cloud, starting for G Cloud beta in 1.13, you no longer have to pass the enable auto upgrade flag if you don’t want to I’m sorry, to enable this If you don’t want to use this, you can pass the dash dash no enable auto upgrade flag And it’s not yet the default in the API, though So be careful if you’re an opt in, if you’re primarily using the API In all of these cases, though, you can still opt out by unchecking a box or passing a flag Great Let’s talk about workload security JESSE ENDAHL: So what we’ve been talking about so far is the underlying infrastructure that runs your workloads But of course, you still have to worry about the workloads themselves, and application security, and other aspects of your workloads Protecting that is your responsibility, not Google’s

So when we talk about hardening your workloads, there’s really two places you need to look to secure them– your Kubernetes configs that pertain to your workloads and, obviously, the workloads themselves So last year, Maya and I presented a roadmap for securing your organization on Kubernetes as it matures Some of the things are handled automatically by GKE, and some aren’t Last year, specifically in terms of Kubernetes configs, we talked about a few of these things, not all of them Things like using a network policy, using namespaces, and setting a pod security policy And one that we didn’t go into in detail last year is protecting secrets So we’ll talk a little bit more about that, today In terms of additional controls you can put in place for your application, you can verify binaries, scan images for known vulnerabilities, and use sandboxing To learn more about all this stuff, check out that talk from last year It’s called Kubernetes for enterprise security requirements And for an up to date list of best practices, we recommend checking out the “Hardening Your Cluster Security” guide This year, we wanted to focus on a few of those areas we didn’t get to dive into last year, like I mentioned, such as protecting secrets So what is a secret? A secret can be a lot of different things Technically, it’s any sensitive data that your application needs either at build time or runtime Some common examples are TLS private keys, API keys, username and password combinations to access your database, for example, or external services like your payments processor, such as a stripe API key Things that are not considered secrets are your code and general configuration files So let’s say you’re using Nginx as your web server The Nginx config file is not a secret, but the TLS private key that’s referenced by that config is a secret It’s also worth calling out that there is sensitive data or personally identifiable information, PII These things are not considered quote unquote “secrets,” even though they’re obviously sensitive data And you still want to protect that sensitive data or PII, but generally, you want to do that by encrypting it before it’s written to the database, for example And you wouldn’t typically store that type of sensitive data directly inside your secrets management system So by default in Kubernetes, secrets are stored in that storage layer in etcd, but they have no additional protection They’re protected in the same way as the rest of etcd So a pod can access secrets via the file system, either as an environment variable or via an API call And effectively, all nodes can access all secrets, and users with access to etcd can exfiltrate secrets in plaintext Starting in Kubernetes 1.7, Kubernetes offers application layer encryption for differential protection of secrets Using this feature, your secrets are encrypted with a key that’s specified in the local etcd secret config But this option is actually not recommended, because it doesn’t actually improve your security If you think about your threat model, it’s actually the same exact threat model as it was before, since the critical data that an attacker would access is being stored in the same place as the key So it doesn’t really help you Starting in Kubernetes 1.10, Kubernetes offers something called envelope encryption of secrets using a KMS provider We’re not going to go into a lot of detail, here But a quick summary of how it works is that there are two keys and then the data that’s actually being encrypted So in this case, the data being encrypted, obviously, is the Kubernetes secrets that are stored with an etcd Key number one is called the key encryption key, or KEK The only purpose of that key is to encrypt the second key The second key is called the data encryption key And as you can probably guess, its only purpose is to encrypt the data For more details on how all this stuff works, check out the talk by Seth Vargo and Alex Chernokovski called “Securing Kubernetes Secrets.” MAYA KACZOROWSKI: It’s worth pointing out that on Google Cloud, we’ve used this functionality that’s from Kubernetes natively into GKE So we support this with a feature called application layer secrets encryption for GKE, which is in beta You can use a key in Cloud KMS to encrypt the data encryption key that encrypts your secrets in GKE It addresses the encryption requirements that a lot of users have for those secrets and provides that separation of duties between who manages the secrets in your clusters and who manages keys JESSE ENDAHL: So at Fleetsmith, we currently use Hashicorp Vault for secrets management This is because we currently run what I call a hybrid environment, where some of our application runs in VMs and some runs in containers So because of that, we rely on something called the GCP authentication plugin for Hashicorp Vault to bootstrap secrets onto new instances If you’re also running that kind of hybrid structure– lost the mic Hello? Testing So if you’re also running a hybrid type of cloud infrastructure, where you have both instances or VMs,

as well as Kubernetes pods or collections of containers, then Vault is actually a great solution for that You do have that challenge, though, of how do you safely bootstrap those secrets onto new instances and pods And that’s where things really get tricky So to make sure you’re safely bootstrapping those secrets, you want to use the GCP off method for your VMs with Vault, and you want to use the Kubernetes off method to bootstrap secrets onto your pods with Kubernetes And of course, regardless, you want to be really careful with your ACLs to make sure that you’re scoping the access to your secrets to only the pods or only the VMs that actually need access to those secrets I will actually say, though, if you’re starting fresh today, and you don’t have legacy reasons for rolling your own Kubernetes cluster, I highly recommend just going with GKE and using encrypted secrets If you want to learn more info on that topic, check out the talk, “Key-less Entry Securely Accessing GCP Services From Kubernetes.” That talk is tomorrow, Wednesday, April 10th, from 2:10 to 3:00 PM MAYA KACZOROWSKI: Awesome So other ways we can help with your workload security, Google offers a few products and features to make this easier First, managed base images and distroless images So I was talking earlier a lot about the node image, the node OS This is about what’s running inside your container You’re going to have a base image that you build your container on top of, and you build it into an application image And that’s actually what you deploy So your container image is built by taking that operating system base image and adding the libraries, the packages, the binaries that you actually need Managed based images are one option These are images that are patched for security vulnerabilities automatically by Google with the most recently available patches from upstream Google maintains several base images for building its own applications, including things like App Engine So managed based images actually have a couple of nice security properties as well that make them a desirable choice for your applications Specifically, they’re regularly scanned for known vulnerabilities from the CVE database This scanning is done using the same functionality as Container Registry Vulnerability Scanning, if you’ve seen that before And where a patch for these exists, it’s applied They’re built reproducibly, so that there is a verifiable path from source code to the binary That means that the image can be verified to ensure that no flaws have been introduced by comparing it to source And lastly, they’re stored on Google Cloud, so you can pull them directly from your environment without having to traverse networks You can pull these images using private Google access Managed base images are available for Debian, Ubuntu, and CentOS Another option, an alternative to managed based images are distroless images These images only contain your application and it’s runtime dependencies, greatly reducing the potential surface of attack A package with a newly discovered vulnerability can’t affect you if you don’t actually have the package in the first place This is how you become more secure, less stuff Distroless images remove package managers, shells, or other programs that you might find in a standard Linux distribution, so that you’re focusing on what’s actually important, which is dealing with the vulnerability scan results that you have and leaving you with less to maintain Distroless images are language focused and available for many languages, including Java, Python, Node.js, .NET, and more, as well as a static image for statically compiled languages like Go, Rest, and D Both distroless images and managed based images are good choices for your containers It really depends on what works in your environment If you need a full Linux distribution, including features like a package manager or a shell, then managed based images are a good choice If you want the most locked down option possible, then distroless images might be better for you So now that you have a base image and you’ve actually built your container image, you have something that you’re storing in your image registry that you’re going to deploy So for those application images, you’ll also want to avoid including any known vulnerabilities in those To assess that in Google Container Registry, you can use vulnerability scanning GCR vulnerability scanning, when enabled, scans all the images in your private registry for known CVEs from the CVE database This is displayed in your registry with info as to whether a fix is available and the severity of the vulnerability, et cetera, so that you can actually take action This scan is conducted on a new image when it’s added to the registry, as well as, for existing images, when a new vulnerability is discovered and added to the database GCR vulnerability scanning is available for Debian, Ubuntu, CentOS, and Alpine images And at the end of the day, if you’re responding to something happening in your environment, you can look up at Cloud Security Scanner Sorry, that made no sense You can also use Cloud Security Scanner to scan for common web application vulnerabilities Like for other services, security scanner can be used on public facing applications hosted on GKE

It scans for cross-eyed scripting, mixed content, outdated libraries, clear text passwords, and other common, like OS top 10 like vulns So now, we’ve protected our infrastructure– the control plane, the worker nodes, and your workloads themselves And unfortunately, in security, bad things still happen So what do you do when you have to respond to a security incident? Now, it’s also worth noting I’m making a difference here between incidents and vulnerabilities A new vulnerability, if it’s not currently being exploited in your environment, is not an incident You should have that handled by your normal vulnerability assessment response team So responding to attacks on Google’s infrastructure is Google’s responsibility, whereas responding to attacks on user workloads is the user’s responsibility Basically, we’re each responsible for responding to attacks on pieces that we protect That makes sense So how do you handle incident response? JESSE ENDAHL: So before we answer that question, we have to define what does that actually mean, being responsible for incident response on your workloads Well, it really means, to start, you need to understand the incident response lifecycle And so let’s just quickly go over those stages That means being able to identify when an incident actually occurs Establishing a runbook for what happens once an incident is declared Staffing your own response team, who will actually be following that runbook in the case of an incident Understanding the scope of the incident by conducting forensics And finally, notifying any affected users, if that’s applicable, and developing and executing on a remediation plan So if you’re responding to an incident, and you’re on Google Cloud, there’s a few tools that you can leverage with GKE to help conduct your own incident response First up is absolutely critical You need to make sure you have a centralized log aggregation system set up On Google Cloud, Stackdriver could help with this It comes prepackaged with common queries that are helpful for addressing incidents, specifically, that occur on GKE So for example, in the case of a cluster compromise, you might want to quickly check what resources were accessed by the attacker And you can leverage those types of built in queries instead of spending valuable minutes on writing those queries from scratch yourself And next, Stackdriver Incident Response and Management is a new product from Google That’s in alpha It’s a set of new tooling that they’re specifically developing to help you reduce time to mitigation in an incident response scenario on Google Cloud If you’re interested in more material on incident response, including how Google themselves uses a really awesome open source tool called Gur, internally, I highly recommend a talk from last year’s Google Cloud Next called, “Cloud Forensics 101.” And if you’re a startup that’s just getting started with incident response, there’s a really great starting point in the form of a blog post by someone named Brian McKehan It’s titled, “An Incident Response Plan For Startups,” and you can find the link at the end of this talk Finally, I do want to call out– Brian’s not a Googler He’s formally a director of incident response at Facebook, as well as director of security at CoinBase And he currently has his own consulting firm called R10N Security MAYA KACZOROWSKI: So if you are responding to an incident on GCP, then you probably want to [INAUDIBLE] at the Cloud Security Command Center, first Cloud Security Command Center is the single pane of glass for security events in your environment For scans that you run in your environment, you can the results that apply There are GKE clusters here As you’re looking to further protect yourself from potential container specific attacks, Google Cloud already has a range of container security partners integrated with Cloud CSCC These look for attacks against your container workloads and then alert you directly, natively in CSCC These include Aqua Security, Capsule8, StackRox, Sysdig Secure, and Twistlock, and we’re always looking to add more to our partner offering So we’ve talked a lot about a handful of security features that you can leverage in Google Kubernetes Engine and Google Cloud, but there’s actually lots more GKE also offers features like private clusters and pod security policy to help you protect your workloads Another one that I didn’t talk about, which I want to quickly emphasize, is binary authorization, which is inspired by what Google does internally to verify code before it’s deployed So you can actually use that to enforce what gets put into your environment based on an attestation So for example, to enforce that, code must come out of your build pipeline, or it must be scanned by your vulnerability scanner Using these pieces together leads to a stronger security posture, because after all, strong security is really about defense in depth So putting it all together, let’s answer that initial question of who protects what In GKE, Google is responsible for protecting the control plane, which includes your master VM, etcd, and controllers And you’re responsible for protecting your worker nodes, including deploying patches to the OS, runtime, and Kubernetes

components, and of course, securing your own workload So what could you do today to start securing Kubernetes on GKE? We’d recommend that you stay at the forefront of security by using node auto upgrade, protecting your workload from common image and application vulnerabilities using managed base images or distroless images, and GCR Vuln Scanning, and following the “Kubernetes Engine Hardening Guide” to get on the latest set of configurations that we’d recommend And when you have a bit more time, check out some of the additional content that we covered and additional documents we link to So here is a set of links to learn more The first one is a blog post that covers the content of today’s talk There’s a landing page on container security, a link to an overview on security, the hardening guide, and the security bulletins That’s what I mentioned earlier When there’s a new vulnerability disclosed, that’s where we post information It has an RSS feed You can use that to post directly to your team’s Slack channel, for example And then the last two points there are incident response white paper and our infrastructure security design white paper And if you’re at Next the rest of this week, check out some of the other talks on container security going on So we’re in the upper left, here But there’s a talk later today on secrets and on networking, and then tomorrow on securely accessing GCP services from Kubernetes Thursday, on software supply chain, policy management, and gVisor That’s it for us Thank you so much for your time [MUSIC PLAYING]