today on the future of everything the future of neuromorphic computing now we are all familiar with conventional computers to some degree you plug them in they use electricity they calculate stuff word processors excel powerpoint they show movies graphics uh you can access the internet for social media you can play games now everybody with a desktop computer knows that they need to be plugged in and they can get hot some estimates are that a single personal computer is about as hot as a 100 watt light bulb everyone with a mobile computer or phone or tablet knows that the battery is key and it can get hot while it’s operating now what does this have to do with anything well recently artificial intelligence ai systems are emerging that can do pretty amazing things they are often based on a new technology called deep neural nets they can recognize an image they can translate and understand text and sometimes their performance starts to be similar to humans and that’s why they’re called intelligent but these systems are incredibly expensive and they’re expensive in a specific way they’re expensive for in electricity to build there are huge forms of computers imagine millions of 100 watt light bulbs that are just constantly running building these ai systems a recent model for understanding uh human language took an estimate of five million dollars to build uh all those computers crunching numbers to make the model and yet the human brain understands language interprets images all the time and it does not generate anything near like the heat of a of a small much more than a small light bulb if we’re going to continue to create expensive and powerful computer capabilities we need to figure out how computers can run more like the human brain perhaps and less like these multi-million dollar arrays of hot computers this is kind of the focus of this field of neuromorphic computing building computers that more that are more closely modeled on the way the human brain works it’s always been fascinating but there’s now an urgency to it as researchers and companies around the globe demand more and more power for this ai technology okay so professor quabina bohen is a professor of bioengineering electrical engineering and computer science at stanford he has worked on neuromorphic computing models that promise to be much less power hungry than current computers but will provide potentially even better and faster computational capabilities kwabina welcome and is is there in fact a crisis in the power and electrical requirements for the computing industry and how could neuromorphic computing helped that problem yeah there sure is a a crisis um we’ve seen this coming for you know a decade or two i mean the performance of your personal computers you know plateaued way back in 2007 to that so these upgrades i’ve been getting aren’t that useful yeah exactly you know we used to be every three years we get new servers and so forth we don’t do that anymore right because and even your phone now it’s only yeah so anyway so we’ve seen that coming for decades now because it’s been following the number of transistors on the chip has been following a very predictable trend called moore’s law and now transistors are getting so small it’s getting harder to stuff them other than not having a better phone what kind of crisis does this uh precipitate yeah so it’s exactly what you refer to now training these models because now it used to be that compute was getting for the same amount of energy you could do more compute because when you make transfers the smaller and you pack more of them they use less energy so you get more compute for free and i just turned out that because that has plateaued you just have to deploy more and more computers in these data centers or clouds and they use more and more power and the figure you quoted you know basically the market rate for like running these models on the cloud if it took 355 gpus running for a whole year to train this language model that you mentioned gpt3 and that the market rate for that is 4.6 million dollars and this has been doubling the amount of compute used to train these models has been doubling every three and a half months that’s a half months yes that’s 10 or 11 times a year per year and if and if i’m not mistaken that’s much faster than the moore’s law that you can yeah doubling every every uh two years you know and this is like seven times faster than most law and um and so and also you know so next year it’s going to cost them 46 million dollars to train this thing right

to be state-of-the-art right so you take your inspiration and always have from the human brain as a computer tell me what the key insights that your group and people in your field have had about the brain and how are we doing at translating some of that into uh silicon technologies yeah our thinking about that has evolved over time you know as neuroscientists have discovered more and more about how the brain works and you know kind of every 10 years we realized that i think it was very naive right 10 years ago i call it i call it raising your consciousness right so we always kind of raise in our consciousness in our respect and obsoleting everything we did on you right yes which is fascinating which is really exciting and so um i can break it down to you like this right i think we went from what i’ll call as synaptocentric so synapses are the connection between the neurons and yeah and these models uh rust introduce these deep neural networks just focus on that aspect of the brain okay so that’s why i’m saying they’re synaptocentric and the idea is that this connection has a setting strength and so we build these neural networks and we connect them together and the parameters are the strengths of these connections and when we are training them we are tweaking these strengths so it’s totally synoptocentric and so that was kind of like what we thought was important in the brain like 50 years ago because when you do look at slides of the brain which i’ve looked at you’re impressed with the number of cells but even more is the number of connections between the cells yeah yeah yeah they are the most numerous elements right and so it makes sense yeah but you know so that’s the synaptocentric view and then it moved to the next level i mean deep neural networks i still have the synaptocentric it’s also eccentric but i would say the next level of understanding came with the somatocentric somatocentric yeah soma is the body of the neuron right that’s where the inputs get translated to outputs right yes and the output of neurons are these spikes okay and so they send these little pulses of electricity they’re literally electrical spikes exactly yeah yeah yeah exactly and this you know in a deep neural network they abstract that away they just say oh the neuron is really excited or which means it’s firing a lot well it’s not so they they replace this discrete spikes with this continuous rate uh-huh right and so that when we go to the somatocentric uh model we go back to spikes right yeah and so that’s something that the neuromorphic computing has always been using these spikes and it’s fundamentally different from the representation that is using a computer okay so that seems key because we know the computer is using way too much energy is it related to the energy yeah i think like 10 years ago we thought that was key okay and we’ve sort of pushed that for the last 10 years but now we are seeing its limitations so i’ll get into that but yeah so that was kind of the previous thinking and um but you know so when i say the uh you were you’re asking about the energy that the spike yes is a it’s like the reason why computers are digital they use these sort of binary representation which means they have a high voltage or a low voltage and they send those represent two symbols uh-huh that’s what binary means there’s two symbols and then along with that comes with a base two representation so each place of the digit of the numeral representation you double the weight right and so this is the representation that computers are using and when you go to spikes it’s fundamentally different because there’s only one symbol right there’s no zero you know zero is no spike and right so nothing is sent right and so you are in a unary representation and you only have one symbol so it’s unary and there’s no place you’re in base one every position so how does it transmit information because you know people’s basic understanding is the nice thing about binary is you’re either zero one yeah and by giving me a long string of zeros and ones i can turn it into pictures and text and music what do i do with spikes yeah what you do with spikes is like it’s like you know uh you do roll call in your class right when the students show up in grade school we don’t do that in college but you know present present yeah exactly so you go along everybody says present present present and you know if somebody is not there they can’t say not present right so you know after you’re done you go and count the presents that you have and that’s how many people were in the room that’s how you present the information it’s a tally code okay you know it goes back pre-algorithmic who was the arabian guy who came up with the concept of zero that’s really a very modern concept like 3 000 years old before that there was no concept of

zero it’s very abstract thing so can these spike representations capture all the same information so that binary can do exactly so that goes to the next level of understanding which is just emerging right how do you most effectively use this unary representation so we’ve known that it’s unary for a long time but we haven’t fundamentally addressed that and so we found that when we use spikes we don’t actually code information as efficiently as binary desk this is why you know dpr networks are still using this binary representation right and and i can let me try and explain a little bit so for example in a you know suppose that i have a layer of like a thousand neurons okay or in base two a thousand twenty four okay that’s two to the ten right and i just allow one any one of those neurons to fire yes right then there’s one out of two to the ten choices yeah that means that that i can represent two to the ten patterns which means that any single spike carries ten bits of information right this is real information not binary high and low right right so this is how you present information activity is very sparse and because you can choose one out of many possibilities you have many codes that you can use to present information and therefore you know just a few spikes can carry as much information as you know so i would need 10 binary signals to carry 10 bits but i can just use one spike out of 10 24 newton i see so let me let me see if i got this so for t in the binary uh system i have to turn on or off all thousand of those and so i’m going to use power on basically on all thousands in a unary spike based computing you only have to pay for that one that you’re turning on perhaps exactly so that’s something like one one thousandth of the power i mean i’m making that exactly exactly so you’re packing if you pack more bits into each signal then you can use fewer signals and you know work equal to work is equal to distance force times distance right so so the things that are going the longest distance in the brain are these communication you know sending the signal from this layer of neurons to the next yes yes applying the weight and computing something is very takes very little power you just do that locally it doesn’t travel a long distance to do that and so if you can reduce how many of these signals you’re sending over these long distances then you can be much more energy efficient so great so that kind of makes sense and i’m i’m there and i know that part of the work and you’ve started to talk about distances and i know that part of the um of the innovation that’s occurred in neuromorphic and other computing is this idea that we don’t have to have these flat chips where everything is talking uh in a flat two-dimensional world there’s now we’re building real three-dimensional computers tell me what that means for this whole enterprise exactly so this is very interesting so this is the key part so now we are going to dendrocentric view of computing okay this is the third word we had synapto which was the connections yeah we had somato which was the neurons themselves that are doing the integration and now we’re you’re telling me about a third one yeah dendrocentric syndrome centric we talked about the output end of the neuron which is you know and then the input end of the neuron is the dendrite you know so when these spikes arrive at you know the their destination right and then the synapse applies a weight and then that it determines how strongly they excite or inhibit that neuron that’s receiving it the part of the neuron that receives this inhibition or excitation is the dendrite okay so it’s the input input part of the cell yeah yeah yeah the input end yeah exactly and so until we used to think that dendrites were these very boring things they just sum all these things together that’s kind of the synaptocentric view right and um but it turns out that over the last 10 years we discovered that dendrites actually are doing amazing things which and what that implies is that these unary codes we talked we thought about can be very complicated they can decode very sophisticated messages and so that allows us to pack even more bits in those unary spikes and you know save him more power okay and so and and and so the um what you were asking in terms of um but let me just finish that dandruff essentially so so the thing the new discovery this is actually 10 years old but now we’re just trying starting to translate it into these neuromorphic uh you know uh systems we’re building is that you know a dendrite is uh how do i put this you know i told you about you’ve got these thousand twenty twenty four neurons and you got one spike happening yes

so now let’s say we do two spikes instead of just one uh-huh right so we’ve got you know we still have 10-24 ways we can choose the first bike right and then we have 10-23 neurons left so we have 10 to 83 ways we can choose the second spike yes and because of that the second spike is also carrying close to 10 bits okay because it’s one out of 10 to the 23 possibilities yes so each spike we add is actually giving us another 10 bits right yep and you know when you do this kind of thing then you are encoding information with sequences right so this one goes first then this one goes and that one goes oh that one goes this one goes that one goes so those are all different codes yes right and that gives you keeps adding 10 bits per spike if you do it that way so now you’re coding information with sequences of spikes and so the dendrite end if you know you go one two three like neuron one goes neuron two goes near and three goes that’s a different code than eve neuron three goes then there you’re on two groups right so one two three is very different from three one two yeah exactly and it turns out that dendrites can discriminate this right so the same synapses the same weights but if one comes in at the tip of the dendrite and then two comes in and then three comes and the dendrite actually generates what’s called a dendritic spike it’s kind of it’s non-linear it just kind of like tends on and so it is sensitive that sequence can be can be deciphered by the dendrite and so it knows that there’s a different signal being said exactly exactly passes it on yeah it could be connected to all three and if three came first and two came and then one it won’t respond right so i like to say if it goes ding ding ding it fires and it goes down no it does right okay yeah but but i do understand how that sensitivity means that once again this is a principle of what you’ve been telling me now once again you can pack a lot of information uh with and presumably with low energy into these communication protocols because if you can decode if you are sensitive to the order then you can get every additional spike from that same population carrying 10 bits right right so it really kind of gives you this very yeah you know it just expands your code space so we’ve been spending a lot of time on and you’ve been getting this great lesson on theory how close are we to implementing this stuff in hardware that’s not the human brain yeah okay so um the you you i think we touched on the the 3d a little bit which which was the fact that you know distances you know work as you go to force times distances so of course you can shrink distances you can actually save a lot of power that way so it’s this combination of making activities fast reducing the number of signals you’re sending around and this is basic physics of the discipline yeah this is first dissipation of energy as you go along a while yeah first principles you know this is how you figure out new exciting stuff you know unbelievable yeah yeah and so and so um and it’s great because you could talk to peter you can explain you know we all know a little bit of physics right and this is intuitive but you can just think of like la you know the way we’re building chips is sort of like la we’ve been building the sprawl except that you know the the you know the the city doesn’t get bigger we actually are shrinking the rooms and shrinking the roads and we just keep squeezing these people into tighter and tight and it’s all in 2d yeah it’s all in 2d and we’ve got to the point where it’s like you know we’ve got these capsule hotels you have in you know that’s how small the houses you have in tokyo like we are packing and the people in this case are electrons we just like squish them so tight that we can’t make things any more smaller and so we have to go to like a manhattan style you know uh architectural age or go in the third dimension right and this is the way to like keep distances small anymore all right well listen this is the future of everything i’m russ altman or with turning la into new york and turning flat computers into 3d computers next on siriusxm okay that was believe it or not that was 19 minutes that was great so i let me just make a note here 19 minutes so we’re gonna have a nine minute session next and i just want a road map because now you’ve set up a lot of the good basics and i think people are going to want to say okay what does this mean for me so um well i think we’ll start if it’s okay we have to finish on this 3d idea so uh tell we’ll i’ll i’ll ask you a question about you know is 3d happening in computers and is it realizing the kind of benefits that we’re expecting and then what do you think would be the next best topics for the remaining i mean uh there was okay so here’s some of the things i was going to think about um you talked to terry about neuro and quantum supremacy yes that’s a pretty cool idea and that gets us to kind of macro discussion right yeah the competitive aspects of this what’s at stake uh and then i was going to go a little bit into the partnership of academics

and industry because this is of of extreme interest i would guess to industry is that a good topic though yeah i think that you know i’m going to when you talk about what’s happening now uh i’m going to get into 3d is already happening in memory so you’ve got these chips in your phone so so so that will tie into this industry academic punishment okay yeah and then the thing that i realized really yeah are you in a position to give kind of estimates of the savings because i think people would be very interested i talked about 100 watts and 5 million and i saw you you told me to watch the talk and i did and yeah you were down in hundreds of thousands of dollars uh i didn’t give it in dollars but i gave it in energy so you gave it an enemy yeah yeah so i’m gonna try to i’m gonna ask you about like where are we now but i think yeah if you say that you know 4.6 million is is energy cost then yes what you’re saying is right we’ll get down 40 000 fold and it’ll be like all right so i want to definitely get to that because that’s like a really really good yeah yeah okay all right great all right so i’m gonna do a countdown let me set this thing to be nine minutes hold on a sec um uh nine minutes good uh five four three welcome back to the future of everything i’m russ altman i’m speaking with professor quabana boahen and we’re talking about neuromorphic computing and the promise for bringing computational speed up and computational costs down so at the end of the last section you were telling us about this idea that we can get a lot more real estate on our computer chips by building up like like and you use the new york versus la comparison so is that a pipe dream or is that happening no actually so it’s actually happening so you know the memory industry so by memory industry you know the people like intel build processor chips and you know people like samsung and you know sk hynex build these memory chips that now it used to be that we had had this drives we don’t have them anymore like these magnetic storage devices now all your information is stored in actually a chip it’s just a specialized kind of chip it’s the same kind of technology but it’s really optimized for storing memory bits of information and so that happened back in 2007 the memory industry decided that forget about shrinking the snap is too expensive because memory has to be cheap you know so they started going 3d and they and so now they are up to 96 layers of planes of memory in a single chip so the reason why now we’ve got like very similar to the empire state building by the way exactly exactly yeah roughly that number of things exactly yeah this is the duo that they built it you know so so um so so so that’s what they’ve been doing and they’ve shown that it’s very cost effective and and and you know so i think that that’s really exciting and i think computing is going to follow the lead of the memory industry and um and you know so if we tie it back to um you know what you started with this language model gpt-3 it’s called it’s got this is the one that cost five million dollars to train yeah yeah just in the power that doesn’t include the people and everything else exactly yeah but you know if you look at that the size of that network is measured in the number of parameters these are like the number of weights in the connection and it’s got 175 billion weights okay now the memory industry just introduced a one terabit chip you actually if you have an iphone 11 uh 12 you already have these you know 90 you have the 64 layers of memory in the thing you’ve got half a terabit chip in there right and so it sounds like three of these one terabit chips can store all the parameters of this gpt-3 so that means that you could put this model on a phone okay so that’s amazing and and so and so we think of it as big but memory is amazing 96 laws right so so so now so now the trend then is like what we want to do is something called compute and memory so instead of separating taking these 105 billion parameters from these memberships to the processor crunching on them and bringing them back you know the distance where i could go force times distance right so you’re moving a lot of bits and therefore you’re spending a lot of power yeah burning a lot of energy if you could just like inside the memory chip itself you know compute right basically you apply those parameters as weights and you get the result it’s called computer memory this is something that neuromorphic we’ve been doing you know all along and and now this has become you know the most promising way to save power and again so you go this the memory is leading the way so they’ve already stole the thing now we just have to

so uh that okay so the technologies you were telling us all those things you were telling about us about the the cell bodies the at the uh synapses the dendritic models they now move into this 96 layer chip and you’re doing the com the compute in the same place where the memory is being stored and you’re saving now what kind of power savings are we talking about here so i’ve worked it out so something um this is you know kind of like covet and lockdown has been great for just sitting and thinking [Laughter] i salute you for your positivity you know after a while you start feeling a little isolated but you know you can just you know one of those people that i can my brain can entertain me you know for hours so so so i’ve spent a lot of time thinking and i’m telling you sort of my latest thinking on there yeah and these ideas i’m really excited about these ideas and um but you know what my inspiration for for this thinking was really i was looking at the physicists and they talk about quantum supremacy and i’m like wow these guys got game you know they got a really nice story i’m telling you about this right and this is the idea that the people who get control of quantum computing first are going to have a huge advantage in many areas and i think that’s the same for ai but no quantum supremacy is an idea you know a guy a professor at caltech john prescott coined this term and this is the idea actually you know he popularized that term but the idea goes back to feynman in 19 you know 88 or 90 or so basically laid out how you can build a computer but instead of using bits classical bits you use quantum bits and instead of using ants and ores and negation as your primitive computations use superposition and entanglement these are quantum primitives yes and he showed that just by coding it that way with quantum bits and using these quantum primitives it was supreme you know the you know ask them ah so it wasn’t a competition between nations no no it’s like so you can take something that that is exponentially hard for a traditional computer let’s say i had q bits we just talked about bits already right in in a computer those q bits can represent two to the q different pieces of information yes yes right and i need to the queue places to store them in a quantum computer just q quantum bits qubits can store all those two to the q pattern superimposed gotcha now we only have a couple of minutes i want to make sure we get to the supremacy to the uh pneumomorphic supremacy yeah so it’s just that you know so supremacy yesterday thing do they think that as the problem gets harder the gap between you and the guy that you’re supreme over just expands yes yes and this is what’s important because as you can see the whole game of the brain and all this stuff is just building it bigger and bigger and some of you can get you know on the thing that he’s opening that gap he’s just going to get more and more yeah it’s really when i take all kind of gotcha and give you a supreme so so so so so these pieces if you combine this dendrocentric view and these generic codes in 3d yes able to show that that’s just supreme neurally supreme compared to these deep neural networks running on these gpus gotcha and it leads to the power advantages so so so in terms of numbers for example if you know a network of a million neurons the amount of energy is using right now could be enough for a network with 40 billion neurons whoa like you know that same amount of energy could do 40 times 40 000 times bigger network or you know to translate it into dollars you take those five million dollars divided by like 40 000 right and you’re down to maybe a hundred thousand to train these networks if that your energy bill so that would be a total game changer in the ability of both industry and academics to build these models they probably wouldn’t just build the same models cheaper they would probably just build much bigger models yes but you also have to develop a whole new the whole stack but you know but the point is that we are kind of at the coding and primitive stage we’ve identified the right codes to use and the right primitives to operate on them just like the quantum guys like feynman did in 1990 but we don’t have like shaw’s you know factoring algorithm for prime numbers running on this so now we’ve developed that right so now we have to and that’s what you say when you say the full stack you mean even when we have the hardware there’s the software there’s the debugging layers there’s the education in the workforce but this is the vision for neuromorphic supremacy that’s your word for today thank you for listening to the future of everything i’m russ altman if you missed any of this episode listen anytime on demand

with the sirius xm app you