Good afternoon and welcome to this discussion of how data science techniques are starting to transform the energy sector My name is Kyle Bradbury. I’m the managing director of the energy data analytics lab here at the Duke University Energy Initiative and special thanks to the Alfred P. Sloan Foundation for making this event and the larger PhD Fellows Program which motivated this event possible So, today what we’re going to do, is talk about how like I said how data science techniques are starting to change how we think about energy systems problems starting with a short, short explanation I’m calling it the world’s shortest explanation of data science I’ll go on to talk about how energy data themselves are increasing, how we’re getting more and more sources of energy data emerging through sensors, and other aspects of the energy system, and then dive into some of the specific data science energy applications, and talk about a number of those that are really changing the face of how we generate, transmit, and consume energy Also of course, in this space there are some challenges around data science in the space, security privacy challenges We’ll talk a little bit about that, and discuss how to get involved in Duke’s energy data analytics lab, both for PhD students who may be involved in the PhD fellows program as well as undergraduates and graduate students interested in this space So you know, we have a wealth of emerging energy data resources But to have the right tools to make use of these data, we need to think about what are some of the disciplines involved and tools that are available. What is this data science thing? So, one way to think about it is, it’s actually a fusion of a number of disciplines Right, so probability, statistics, and math– certainly a very important part of it Computer science and programming, another key component, but also domain expertise And so if you look at sort of the intersections of each of these, you know, domain expertise and computer science, you think of software development for specific applications When you think of computer science, probability, stats, and math, well one of the children of that couple happens to be machine learning And then for domain expertise and prob, stats, and math, you find a lot of traditional research that kind of fits into that space And then at the nexus of those, is where we see data science fitting So it’s bringing together a number of disciplines for the purpose of gaining insights from increasing amounts of data. And so you know, when we talk about data science, machine learning inevitably comes up as one of the key components of it. But you know, there’s a lot of terminology that floats around. So you start off with hearing about artificial intelligence, kind of the big umbrella. That’s everything from cybernetics statistical learning to symbolic AI, which would be creating a lot of very specific rules that are done by experts, reflecting directly human thought into code But then, there’s a level below that, that kind of gets more specific and honing in on the statistical learning piece, which is machine learning, which can do a number of things So, we can uncover structure and data; it’s typically unsupervised learning Make predictions from data. Now predictions doesn’t– I’m not always talking about the future It’s a broader topic there than that But making predictions that supervised learning, learning from examples, and then learning by doing, reinforcement learning If any of you have heard of of AlphaGo, or any of the key breakthroughs in you know, a computer beating a chess master, that’s what you think of with reinforcement learning Learning to take actions that help you achieve a strategy Okay. And so then within machine learning, there’s deep learning, which basically is a set of techniques that is structured in such a way that it allows to really make the greatest use of advances in computation, things like graphics processing units, which allow extreme increases in computational speed, which are required to do these vast processes, to train deep learning models. So that’s a little bit of the hierarchy there, but so if we have, sort of these these tools at the ready, we have to think about what sort of data do we have in the energy space to work on there

So I mentioned at the very start of that talk, smart meters. So as you start to see more and more of these deployed, that’s making more and more available of these large data sets of a time series data at the individual building level, okay And then of course, we have grid level data, phasor measurement units or PMUs, which measure voltage and current throughout the grid We have smart appliances. So these are things that you might have heard of: a Nest thermostat, or a smart thermostat. There are certainly lighting solutions that are also within this space that have other additional sensors in them, and a whole connected home type of environment, that’s not only making your house slightly more comfortable or more energy efficient, but it may be also helping to generate data about different types of behavioral patterns in the home that you might be able to utilize to better understand your own energy usage, or third parties might be able to utilize to help make suggestions for improvements and cost savings. And then of course, we have power markets more and more producing vast quantities of data, you know, every five minutes, and some, you know, with some of the real-time markets that are available, these can produce a number of different indicators about what’s driving crisis electricity prices, and what’s the status of the capacity that’s available for use, you know, in electricity systems. And of course, we have vehicles, right? So you know electric vehicles and even some, you know, conventional vehicles are now generating data. How many of you have driven in a vehicle that has given fuel economy? Okay, okay. That’s, it’s giving behavioral feedback, which, you know, certainly we can use as humans, but also may be able to be analyzed through data science techniques. And lastly, an unexpected kind of source of energy data: satellite imagery. You can actually see different components of the energy system from above. So we’re going to think a little bit about this and first talk about a roadmap of how we go from making use of these data for specific applications. But it requires kind of going through some of the different components of the energy system and talking about key areas that can most benefit from the application of data science technique. So we’ll start kind of, from talking about resources through end use. So of course, when you talk about renewable generation, and oil and gas systems, and we can use those certainly to make electricity and active electricity markets. And of course, there’s transmission and distribution to consider there, and collectively, this top piece makes up our electricity’s system in part. And then you know, the electricity can go to buildings, we can have oil and gas potentially going to heat buildings as well, and both oil and gas and electricity may be going to fuel our our vehicles and transportation systems And kind of crossing all these boundaries is, you know, system level assessment and planning, how we can take a bigger picture look at all of these systems collectively So we’re going to start and go through them in these order to talk about specific energy data applications So starting with renewable generation, there’s actually a number of interesting innovations going on in this space So, first of all, generation prediction and forecasting. So this, you know, we’re talking about wind power forecasting, solar power forecasting stochastic generators on the grid lead to challenges in maintaining balance between supply and demand. And so improved predictions can significantly increase system reliability, but also reduce system cost for eliminating needs for additional backups. So there have been a number of companies that have been investing in this space There’s an interesting anecdote from IBM So they purchased the weather company, and a product called deep thunder, which is interesting deep learning technique, that is making hyper local weather predictions So why might you care about that? Well, hyper local weather predictions, that’s something where we can be predicting specific cloud cover over individual solar panels, right So this can significantly improve forecasts, and they’re advertising 50 percent improvements in prediction accuracy, so potentially significant gains. But when they announced the purchase of the weather company, there was a tongue-in-cheek remark that IBM had gotten confused of what it meant to do computation in the cloud

So yeah, renewable generation definitely a significant potential for improvement there And then of course, you have other areas that can gain from data science and machine learning techniques, optimally citing wind and solar techniques to produce the most power, but also to do so in a way which best leads to reliable integration into the grid And of course, optimal sizing to do the same So, these are complex problems with many, many variables and so kind of to make exhaustive searches of the possibilities it is usually not feasible That’s often where different machine learning techniques and data science techniques can really shine. And then there’s another interesting area of materials discovery– finding ways of identifying new materials. So I mean, perhaps it’s photovoltaic materials, perhaps it’s energy storage materials, by again, looking at many possible combinations of molecular patterns to find new materials that otherwise may take an exceedingly long amount of time to do trial and error research. So that’s another area where there’s some enhancements going on in this space So, you know, moving for renewable generation, a lot of interesting stuff going on in oil and gas as well So, you know, on the exploration side, analyzing seismic data For those of you who are not aware, seismic data, we’re talking about these, they have these large trucks that are trying to sense below the ground, whether or not there is oil and gas. So they do that by pounding the ground and waiting for that vibration to travel below the surface and bounce back, right And by doing that, they can get information about what might potentially be down there But this is a very complex process and results in these enormous three-dimensional cubes of data. So how do you identify in a really efficient way, whether or not there is oil and gas? Of course, what’s at stake is a very expensive, potential drilling process to go through there. So it requires some careful analysis Another piece is on the production side. So you know, what can we try to do here? Well, optimize output while minimizing cost and impact So you know, this might be choosing well pressures and flow rates very carefully, so that you can optimize each of these pieces But of course, we have human expertise that can be put to play here But are there ways that different types of data science, maybe reinforcement learning, can be used to improve that process and increase the efficiency of these types of systems. So then when we take our resources from, whether it’s renewable or oil and gas, we can talk about electricity markets and how things might play out there So first of all, forecasting, again, is it is another really important topic here So, when you’re looking at market clearing prices and bids into the market, so a generator may choose to bid X amount of dollars per megawatt hour or per megawatt attendance at capacity or energy market This type of information certainly would be of interest to traders, right? Who are trading in these energy markets, and players in the market, of course, as well. And the more information we have about these pieces, the greater the potential for increasing, once again, system efficiency. You’re going to hear a number of these terms– things like efficiency optimization repeated over and over again, but that’s critical because each of these will lead to some specific outputs And demand forecasting, so one thing is, how much will the price of electricity be, or the price of oil and gas be? Another piece could be, how much demand will there be? What do I have to prepare for? The closer that we can get the forecasted demand to the actual demand, the greater the efficiency of the system Because I don’t need, again, to schedule lots and lots of reserve requirements on the system And then another one which is probably one that may be least thought of when this slide was initially put up: enabling distributed peer-to-peer transactions So the seemingly ethereal word blockchain gets thrown in here So what are we talking about here?

So, imagine that you have solar PV on your house, and your neighbor has, well actually would like to purchase your solar PV electricity Right now there’s not really any way to facilitate that direct transaction This essentially, could allow something like that. Distributed peer-to-peer transactions, that are made secure because of what’s known as a distributed ledger Blockchain in a nutshell is the idea that, if you have a traditional, let’s say you were a bank, right. You have your central database, somebody wants to make a transaction, they tell you about making this transaction with this person You register that in your database and you are the one maintaining that. You have to keep that ultra secure, be very, very careful with all of that because if it gets out, there’s a lot of potential loss of personally identifiable information Or people could steal money from that obviously. Now, what if this was distributed instead? So, what does that mean? Instead of just this central bank has all of the information about every transaction, every participant, and thousands of nodes around the world have the entire set of transactions, so everybody is collectively recording this, and as you go through that, you can say okay, if one person tries to fake something on the ledger and that one is different from the other thousand, you can say, “oh no that’s, something weird’s going on there. We’re going to trust the masses in this case because it would be very hard for one hacker to hit all of those different notes.” So that’s the basic idea. You distribute the accounting, essentially, across many different individuals, and that’s essentially the nature of blockchain So this could significantly transform how we do energy transactions, and enable, sort of, these smaller scale, peer-to-peer transactions So certainly would be something to look towards in the next few years So then, moving from energy markets, we’re talking, okay we’ve sold things, there needs to be power flowing, we need to get our energy from the generator to the end-user So what are our challenges here or opportunities here? So one, detecting and predicting line falls. Was there a failure or is there going to be a failure? If we can do each of those more quickly, we can prevent outages and save system costs, and you know, thereby increasing the reliability, as well Also, preventive maintenance: can we anticipate what parts of the system are going to need the most servicing going forward, and proactively perform maintenance, rather than reactively, when something fails. And then a non-technical loss or “theft” detection is another important piece. Can we see if there’s a line in which there’s some power flowing that’s not accounted for anywhere on the system and is not accounted for by natural losses due to power flowing on the line and losing a little bit of energy along the way due to thermal heating and whatnot. Another piece of course is anomaly detection, which covers many of these things, but can we just figure out if something weird is going on in the system? Again, it kind of gets back up to that top point, on detecting and predicting faults Can we detect that beforehand though, or at least while it’s going on, before it disrupts the system significantly Okay, so now we’ve got our power flowing through, onto the transmission lines across from the generators, and we end up towards one of our first categories of end-users buildings. Okay, so what can we do with data science for for buildings when it comes to energy? One piece is internet of things devices, and how we can gain energy insights from those So, you can imagine, certainly talked about before, smart thermostats, different smart appliances. These different appliances could potentially provide information that could help the individual building owner to increase the efficiency of their homes by allowing these systems to be coordinated, perhaps making use of time abuse pricing changes, where its price is lowered some times of the day, higher other times in the day. Identifying inefficient appliances and getting insight into those appliances and perhaps even acting as a little bit of a decentralized demand response. So what’s demand response? Demand response is if there’s increased demand on the system and we don’t have enough generation to meet load, I can either increase generation or reduce demand Both of them put the two back into sync

But, to do that typically requires some sort of demand response aggregator to be to be actively managing all of that and turning off appliances and turning things on as needed, to do that. What if you had a system that was able to tell what the demand was on the system, on the larger grid, based on measuring voltage and current, and use that to proactively enable demand response? Certainly a possibility there, but it would take some data science techniques to enable that to happen Automated demand management is another piece, highly related to what we were just talking about there You know, the idea of peak shifting, from expensive times of day to less expensive times of day Arbitrage, buying energy when the price is low and selling when it’s high This could be enabled if you had automated demand management systems, and demand charge reduction. So typically, we’re used to paying a dollar per kilowatt hour, so a dollar per unit energy cost, but for many customers, especially commercial customers and industrial customers, you pay a dollar per kilowatt hour cost, your energy cost, plus, a capacity cost, a dollar per kilowatt. What is not just the total amount of energy used, but what was the peak amount of demand that you added to the grid at any point in time. If it was when you know, you turned on your pool pump in the middle of the day, and things spiked by an extra 400 watts, then you’re going to pay an extra added cost based on that addition to your total demand. And so that’s another piece that can be potentially predicted and mitigated through the use of data science techniques Automated building energy auditing–so what if we took, if we took all of this information that’s coming from our various smart meters and put those through a data science algorithm to break it down into device level information that corresponds to how much energy my refrigerator is using, my HVAC system is using, my TV is using, that provides insight into how we’re consuming electricity and provides actionable feedback that we can act on. Another word for that is non-intrusive load mopping, because you don’t have an active sensor in the home, you’re using the smart meter Demand-side management aggregation, talked about that a moment ago You know, this idea of aggregating all of these, you know, demand response units there Another piece is storage aggregation and certainly, energy storage is something that has come down significantly in cost, but still has a ways to go. It’s still somewhat expensive, but there are a number of entrepreneurial companies that are looking at how they can install a lot of individual energy storage systems, that provide backup and maybe demand charge reduction and energy management, but collectively, form a grid resource that they can bid into the market And so, storage aggregation and an optimized operation, these are things, these are decisions that can be made by an operator control room when you have thousands of assets all around, you know, with very different needs This is something where you need some sort of artificial intelligence to do, to assist with and the company like Stem of California, have been working on this for some time In the last piece of this section is customer segmentation So if we have information on smart meter use, then perhaps we can then take that information and determine what type of customer is in this pile over here and what type of customer is in this pile over here Maybe this type of customer, you know, all these customers have swimming pools and you know, happened to live in affluent areas. Well maybe it’s useful to know that to provide energy efficiency rebates that would be most targeted for them or other types of targeted opportunities So certainly there are many commercial customers and utilities that would understand that we’d be interested in that sort of technology That brings us to one of the other end uses, which is transportation So transportation, obviously with vehicles, and these are very complex systems Vehicles on roads and highways and we’re not just talking about cars here, but this could be buses, this could be, potentially, airplanes,

trains. You know, the old movie, Planes, Trains, and Automobiles Whatever it may be! But because there is so much complexity in these systems, having some amount of infrastructure planning is pretty critical So being able to eliminate bottlenecks and increase the efficient, increase the efficiency there, optimizing traffic signals to make sure that the flow is a little bit smoother and of course investment decisions and all of the above Improving engine design in individual cars is another piece that’s that’s quite important. You can imagine there are thousands of possible technologies that could be added in many different combinations, meaning there are millions or billions of possible combinations of ways that that engines could be designed, right. So you know, finding a way, for example, if you wanted to meet the corporate average fuel economy or cafe standards of fuel emission standard, what might be some of the most cost effective ways of doing that? Maybe you can use data science techniques to help in the design process without having to build hundreds of prototypes, but actually find ones that will meet those standards in the least cost way. Of course, if we had an entire fleet of cars available that we could autonomously control, that may allow us to very carefully route them in a way that leads to increased energy efficiency among other objectives–safety being a major one, of course, right? But you can imagine autonomous vehicle fleets providing that service. In addition, if that fleet happens to be electric vehicles, maybe that’s also a great resource, but it would need management and potentially mission learning techniques to help out there And the last component of the system: the system level assessment and planning So here, it’s not a mistake that there’s a satellite image here You know, certainly a broad scope to a lot of these types of pieces You can imagine using this to assess installed generation capacity. Looking at solar photovoltaic arrays that are on rooftops and using the size of them to estimate distributed installed capacity Maybe it’s finding power plants of that have recently been installed, where there aren’t necessarily records available of that You know, it can also be kind of the other problem, predicting where future distributed capacity may be installed, that may be of interest of course to solar PV installers looking to put it out there, but may also be of significant interest to utility planners and system planners, looking to see where the grids gonna grow, how systems may need to be upgraded to accommodate that. And lastly, an interesting application: monitoring global oil supply So there’s a company called Orbital Insight that actually uses images of oil tanks and you can see the top of the oil tank will go down or up, depending on the level of oil that’s contained within it. So by using satellite images that are taken frequently, you can see economic flows happening as that oil tank goes, top goes up and down. So with all of this, there’s a really important takeaway here, which is, you’ve heard again and again things like improve forecasts across these, optimized operation enhanced planning, and maybe in some cases, feeding into that, was expanded system disability, whether it’s through satellite imagery or other new data sources All of these, one of the outcomes they can lead to is increased efficiency and what is increased efficiency mean? Decreased energy consumption, decrease environmental impacts can decrease costs So collectively, you know even though there were a lot of applications that we’ve just discussed, here they’re going towards these overarching archangels. So I think that’s pretty exciting and of course though, there are some challenges here We have this bit of a trade-off. On one hand, we have wanting to maintain privacy and ensure absolute data security. On the other hand, we have making data available, increasing that availability of data for researchers, policy makers, planners, whatever may be. On the privacy side, there’s a few things that you might be able to learn from from some of the data that’s here. From smart things like activities in a building, potentially, to a level of resolution and to accuracy, there’s a big question mark around

But that might also lead to presence in a home and some of these things they can get, potentially, some sensitive information or information that some would view as sensitive. And some of the databases that are involved in here, if it’s building data or whatnot may contain other more personally identifiable information. So it makes sharing some of that difficult On the other hand, on the data availability, without having access to huge amounts of data on all topics, this is reducing innovation, by a certain extent, and inhibiting a system understanding an insight. If we couple that with the fact that data are often proprietary or restricted, so we either have to pay for it or you might just not get access to it at all, this is a bit of a limitation there So you know increasing data availability versus increasing privacy, this is sort of a question that comes up and we have to be considering this with every application that was mentioned here. What’s more important in each case So to kind of wrap up them on this part of the talk, I just want to mention all of this work–there’s a lot of math we’re not going to go through all of this– but the energy data analytics lab, which is the organization that is sponsoring this, in part sponsored by the Duke University Energy Initiative, the Information Initiative at Duke and the Social Science Research Institute, is trying to come together to tackle a lot of the problems that we’re talking about here, to investing in research– to transform from our energy application means to those system in performance improvements that we talked about earlier. So you know there are many of us in this room that are actively engaged in that and you know, we certainly hope that if there are opportunities for you to be involved, would love to explore that and so in particular, options to get involved: Data+ and Bass Connection Data+ is a summer program that occurs over ten weeks in the summer of an intensive exploration of a particular data science question and so, this is showing up a project from this past summer, where we started looking into how we can use satellite imagery to identify transmission and distribution lines, and automatically assess the presence of those lines. And that project will continue as part of Bass Connections So typically, there are a couple undergrads per team, and a graduate student mentor. So for grad students in the room, if that’s something of interest, you know, there will certainly be an application that comes open there. And Bass Connections is a two-semester opportunity during the academic year, in which students can again, engage in a project spanning a number of possible topics. We have one going on now, with the Energy Data Analytics Lab, that’s following up on this project that I just mentioned here Again, those applications will open up in the beginning of next year, so keep your eyes open for that if you’re interested now in being a graduate student mentor in either of those And for the PhD students in the room, we are gathered together here today because of the Energy Data Analytics PhD Fellows Program again, with funding provided by the Alfred P. Sloan Foundation, and there are a number of benefits associated with this program. It’s really meant to bring together energy domain expertise and data science. Tools and research into single projects and so, if this is something that sounds interesting and you are a full time doctoral student, consider applying this year for that program