all right um I haven’t asked these questions before right who actually knows he’s a cast good who have used it so far awesome all right then oh let’s start with distributed computing with hazel cast I’m normally walking around so if I walk away from microphone just tell me I guess you have seen those things a lot of times it actually tells you everything I say don’t take it don’t take it for a rule don’t make business decisions on it blah blah blah all right Oh have fun these sites are uploaded and actually I’ve seen the pictures it’s moved so I’m Chris just call me Chris because s you see the name nobody gets right my twitter handle I was doing eight years plus of Java weirdness doing performance stuff she’d see traffic all that kind of things and actually commit a patchy committer I was working in the gaming industry you won’t see that but it was actually Ubisoft was working in treble management at hrs and that’s pretty much it so our short space trip today will be an quick introduction to hazel cast I think we can just move shortly over it because most of all people already know it distributed computing what it actually is where it comes from or way it came from we distribute executors we offer in his er cast the entry process oh and the biggest section will be actually the kind of New Madrid use API and we have a short break afterwards for questions so hazel cast is like picking the diamonds it’s the good java api we all know and we offer it as an in-memory data grid transparently for the user distributed around the cluster we do data partitioning i’ll set all these Java collections API implantations Java concurrency API just as a distributed computing platform so why hazel cast well it’s automatic partitioning it doesn’t give you the teardrop in your eye if you say were charting like for database it is fault tolerant by default we normally have at least one backup you can define as much backups as you want to we have asynchronous and synchronous back up so you can say I want to have more synchronous backups and a lot of more asynchronous ones we are fully distributed and for sure in memory for high speed available so how hast cast works you have your distant different notes over here every note has some partitions he actually owns on some partition holds backups off by default i will set there’s one replica going on and you see those are the different partitions as i said i guess most of you already know hazel cast if not just come by the booth and we’ll give you a bigger introduction to it so why do you want to use distributed computing well we all know skating up doesn’t work anymore anymore at least not as good as it was in the past and what you normally do now is you scale out that means you use commodity hardware and you spread it all over the place you spread your data and your computation to those different machines to calculate in parallel and to do a distributed way of requesting things querying things and calculating things by the way I have three mux so the first free good questions actually get a muck I have some more pans and just feel free to ask and intercept me so why in-memory computing well as I set the trend to actually is the higher the higher the the megabytes per price get the lower the price so when we started in nineteen i think it’s 1983 you can’t read that it was like six thousand four hundred dollars per megabyte of mmmm so in the the the common of ram was about less than a megabyte even less than six hundred kilobytes so in 2013 it was about 16 gigabytes per computer an average and a megabyte cost more less than one cent so it’s true memory gets cheaper and cheaper and the cheaper member gets the more memory you get into your computer so the more more data you can hold in memory whoops wrong one so with the higher speed and memory

for sure you can see that the in-memory computing is way faster than you actually would have computation on disk the reason why people move away from Hadoop or they have kind of night night jobs like calculating things overnight coming back on the next day and you get your calculation so if you if you need some kind of near real-time calculation or real-time calculation the way to go is going for in-memory computing so distributed computing or as I call it multi-core on steroids it’s like the basic ideas you can build your own mini data center at home you reuse your commodity of hardware like an old server or an old desktop another old desktop maybe your PlayStation free okay now it would be a PlayStation 4 maybe and some old hard as laptops so the basic idea is just use standard hardware don’t have special servers but have more of those things so when we started it was like an 80 80 it was a single CPU is single core it was slow but actually the idea of multi-core is not that new so even in the 386 people started to have multiple course even if it’s not really 22 course in the same way but we had the the floating point and the non floating point operations totally separated on disk and that was actually the next next step people started to say okay we don’t want to have a single core but multi-core we can do things in parallel that means we have to work differently but we can do a lot of things in parallel and speed it up so they started to put multiple CPUs on just a single board eventually it gets to those guys we’re in oracle conference Sun is part of Oracle now so I choose this picture what what they did is they move the the multiple cores from one board to multiple boards and just interconnected them so they started to build supercomputers and we know all what supercomputers today look like it’s cloud computing everybody can achieve it and everybody can afford his own super computer just by running it on on Amazon or Google Cloud a google compute engine so this is where we go and the the way of of of of calculating things in parallel in the past was quite hard you had to do your own distribution you had to do the the distribution of the processing you had to do the the reducing of all the results and the calculation of the results so nowadays you can do this pretty quickly and pretty easy and one of the ways is hazel cast so what what we have is we have three main ways of doing distributed computing the first one is the distributed executor service it’s a Java concurrent executor service implementation it run it executes the normal color builds the normal runner bolts what is important it shouldn’t run it the the gullible and V runnable needs to be serializable but it should not run work on data that is important because if you work on data with executor service it means you have to lock the data there can be multiple data processing in parallel if you just read is okay if you try to mutate data you might end up in an unexpected state because you don’t get an implicit lock so you would either to help lock the data or you’re just reading data so no much mutating so shouldn’t work on data is complete not completely right should not mutate data that would be the right turn and the nice thing about the executor service for hazel cast is you can say I want to execute it in either one machine I want to execute on all machines or maybe just a few of them so you can select which machines actually you want to work on a pretty basic example as you see I’m pretty big favorite of lambdas you just create your runnable you get your ass Kaiser cast for the executor instance or executor service instance but by by its name and then you say okay please run it on all of the members so you just give it the runnable its realises it distributes in the cluster it executes it so let’s have a quick look how to do that so before i’m working on that actually what I did because I like lambdas I mean make it

bigger actually that is readable well let’s make it 20 all right so what I did em I create an interface that actually is serializable that’s pretty much it so serializable just works for okay please please make this lambda i’m actually working on serializable too so i can it send it over the wire because the normal Randall won’t be serializable afterwards so we copy the template stuff and actually we start a new class executor X I cure so is test copy in the template the template does nothing more than help me building the cluster I’m defining the internal IP address to be just my computer so if anybody hearings and hazel cows cluster if I would use multicast sure even larger ok let’s try to make it even larger the one awesome okay so if by D by default hazel casts use multicast to configure or to find all the notes so if anybody here would run in hazel cast cluster on its laptop you would screw up my presentation so what I do is I just say please find the police find only notes that are on my local machine alright so i actually have I don’t have it here or is it very I have a rep method that takes the lambda but just only take the lambda I create and put it in a runnable adapter so i can send it over as a runnable so what I do is I say around bowl wrap and then I’m go to going to use my lambda I inject the his class instance because I want to say something quite nice like print alarm hello from note then I get cluster and my local member all right so if I now what it does is it takes the it gets behavior cast instance of the note is running on so if i distribute in a cluster and I’m running it on a spree node cluster I will get three different hazel cast instances that means just asking my normal hast cast instance for place gimme an executor service we called it default what I do is wow that’s a nice name let’s call it yes and then I said execute on all members and I put in my runnable and that’s pretty much it so i created a lambda i say ok please print out something and i put it in there so if I execute that one wow that’s even what very very small hmm there we go where it is there you go so you see an executor on for all three different members we have member of 570 1570 2570 free it depends if you don’t have any kind of of us sorry so the question is if it doesn’t need to be just serializable do I need to have the actual class on on the classpath of the other notes right

yeah so for that that’s true for for normal classes so if you have a normal class you have to have it on the classpath of baelor note 44 Landis what is distributed is some kind of a receipt of the recipe how to to deserialize the lambda and how to execute it so as long as you don’t capture any external state it will just be distributed in the cluster I if you only use round bolts you wouldn’t have to really pull out the cluster but if you for example create a whole class hierarchy and you want to execute something and change something you would have to redistribute it so we don’t have a remote class loader that gets it from you or that retrieves the classes from your client because that but it could be we thought about it and we thought it is more like a security back hole because everybody that can connect to your cluster no matter if it’s a client or note could inject some classes even malicious classes into your cluster so we think the deployment way to better want to do but I think that definitely deserves it amok all right so going back to the where it is all right going back to the presentation the second step is the entry process and the entry processor is some kind of lock free data operations lock free because every entry processor is implicit is running on its own data threat so if you have a key for example a key one and you have multiple operations for q1 all of them will be serialized on the same fret so you don’t get parallel data mutation that means if you have entry process so you can be sure nobody else modifies or removes your data while you’re processing something so it prevents external locking because we get some kind of an implicit lock it guarantees atomicity so if you do something longer in an entry processor all of the stuff you do is definitely atomic as long as you stay in the same partition what you should remember don’t do so long operations ll set the operations on the same key RC relies on the same fret that is actually true for all of the partition so you get an implicit lock on the partition and if you do long operations you block out any other operation of the partition but it’s some kind of a cluster white 107 DJ cash yes the JK shrimp limitation has an entry process on it’s not the the same one I’m showing here but i think the the specification or the the respect for the enter processor from je cache is fully implemented as far as I know since yesterday so we yesterday released the jays are the heavy cast version 3 31 and that has the first time J cash but for that please come by the booth we have the developer for the JK Shin plement ation on the booth so you can directly ask him so yeah it’s some kind of a cluster white thread safe operation so that means if you were mutate data on the same partition you will be fully thread safe again a quick example we’re incrementing a counter that means even here i’m using method handled so if anybody doesn’t know them yet just raise a hand okay so are creating a method increment are giving a map entry because I’m iterating about all the entries in the map and what I do is I get the old value incremented by one set the value inside of the entry to mark it as 3 over here so the set value marks the internal Valley map entry as dirty and I’m returning the value to get it back to my execution and what I do is I’m using a map actually to increment values so I can have a map of multiple counters and a current can increment them just like by giving them a different key and using the same entry processor to to all the time execute them so I asking the hazel cast instance for a map I’m say ok please execute this entry processor on a key on a given key and give me back

whatever it processed so again if I go for a short presentation even here because our entry processor is not that one by one is it’s not yet our fully lambda compatible so I created a lambda function interface with just one method let’s go for creating your class we have entry processor and again let’s copy the template code and and we want to show that it is atomic and that actually we don’t end up with double values and incrementer so first thing we do is we say okay I want to have wrong key I want to have multiple threats let’s create five of them for may be all right and for and then we want to have we create a task which actually is the runnable so I can execute it in inside off to the threat oh sure amp laments there you go and we want to have the increment method it returns end sorry now this will run directly on the notes so I’m creating for threats and I’m just executing them so entry get value plus one okay what I do here is I’m just calling in from I did somehow differently in the example just let me have a pig look okay maybe we just walked through that one hey I actually forgot the code so oh okay I did it that way so what I do is I have a map counters and I stop with the key foo and the increment value to that 1000 and I have the increment method down below here that’s actually what I did and so what I do I just wrap that one in into the entry process and to my adapter so that pretty much is my lambda it’s a method reference that means I’m referencing the method down below and I’m waiting four thousand elements to be incremented and I’m doing four different threats that’s that’s what I wanted to show so even if I have multiple threats mutating the same key you won’t end up in the same value I see some weird faces okay not not clear yet I have thousand threats mutating the same key so I’m always executing on key foo and I’m giving it my entry processor which is actually the increment function so I’m executing 1000 operations at the same time so the if they are not atomic there would be a chance of getting the same value of multiple times but if I execute

the one sure let’s remove that one it’s always good to have some backup code yes they will wait that means every time here you made a deadline the threats definitely will wait at that line until they are executed let’s try to oh it’s still on connecting yeah ready let’s make it bigger gone get down there you go so you see all the keys are in order and you never get a set at the same key two times or three times so L set the operation is atomic it is executed in order and and the thing is it’s a short operation but if you do it in a long apparation exactly as you asked yes the feds will wait and you will slow down the application so entry processor can mutate data in a lock freeway but it shouldn’t be a long operation so if you want to have a long operation you probably go more for things like comparing two operations all right and let’s get back to the demonstration alright so the lats last piece of the demonstration it’s half an hour okay it’s the black magic from pliant google we all know that google said we abandoned MapReduce now but actually even their streaming API on underneath users still the MapReduce implementation they it did so the typical use cases for for memory use our kind of lock analysis et al distributed data varying distributed sorting they’re kind of nice sorting algorithms fuzzy sorting and stuff so these are the basic stuff et al is is like extracting data from something transforming it to something else and loading it back to another map into a file and into whatever you want to do the simple steps are like reading data mapping means eventually turn forming them maybe and reducing the the different sir do I okay okay so the question is in general do we have a persistence layer in general or especially for memory use or entry processor okay so the question is do we have a persistence layer the answer is pretty much no but we have a map store interface that you can implement and you can offer a store a back-end storage to to a database to Cassandra to couchbase to whatever you want to even to a flat file if you can have a distributed file system for example okay so the basic steps are reading data mapping transforming them and eventually reducing the intermediate results into your final result for hazel cast it’s a bit more of that first of all all the faces you probably know from from how to are running in fully in peril so there are not really the faces you might know and you might expect but in general there’s still like existing somehow they are just in parallel so you’re reading data the the reading data or the read data are mapped into the mapping phase eventually you might combining them combining means pre combining before you actually send it over to the wire so if you have multiple values of the same key you might can pre combine them before sending them to the reducers grouping shuffling is not one hundred percent Lee existing existing in how to terms grouping and shuffling means something like partitioning so you take the key and you do some partitioning to spread it again over the cluster so for for hazel cast this is done then the typical hazel cast way we’re taking the hash code of the of the key and we’re using the modulo operator with the partition count and then we’re distributing the reduce operation again in the cluster um other side you’re reducing the the intermediate data and eventually that is the face that comes from infinite span so it’s it’s like you can do some some final decisions on the on the reduced data before you even actually give it back to the user so for example you want to search for the top

10 elements you do some combining on the top 10 elements per chunk of data you do the reducing and give back the top 10 and then you might have multiple top 10 so in the colliding phase you can use all the top 10 elements that come from the different reduces and find the real top 10 just before you give it back to you requesting user a small workflow example i have three sentences i’m not sure if you can read that i don’t think so it says saturn is the planet earth is a planet and pluto is not a planet anymore what I do is in the mapping face you split up the sentences into the single words it’s worked out an example pretty much the hello world of the memory use world you split it up into these different words you emitting always the word as the key and one of the value in this face with in this example we don’t have a combining face so I’m sending over every single word to to the reducing face so you see the grouping is called grouping because all the different words from the same from the different all the same words from the different sentences are ending up in this in the same reducer so if i take the word is you see it’s coming from here it’s coming from here and coming from here and all of them are ending up in the same or user so in this in this way i’m sending the same value multiple times if i find the same word multiple times in this sentence but no no so the question is if if we’re reducers on the same note no it’s not so mapping and combining runs alongside on the same mapper on the same note but the reducer is different again distributed in the cluster so that’s the reason why you have the combiner because if you have multiple times the same value on one note you might can pre combine it forward counter example it would be like you find multiple times to say the word planet and instead of sending planet one plant at one point of one you say okay planet free and then you just get a way of one-third two-third of the traffic okay and in the combining phase you creating the different values and your final result is just the merged intermediate results from the reducer phase so in the end it says I found such a room one time is three times a 3 times planet three times not anymore both one time and actually it should say earth and Pluto as well one time but for some reason it got lost that actually doesn’t happen the Erie implementation I used that picture a few times and actually just a colleague told it to me I guess two weeks ago so nobody found it for a long time absolutely that’s the reason for the small font so in a basic example or in the basic title code mapping the data means you for every entry or for every key value pair in the in the store you get a call to your map function and in this time in this example we’re splitting up the document in the different words and for every word in the document we’re emitting the word itself and the value one just as we saw in the and the workflow example for combining in this case i’m using combining i see insider code i get the word and a list of integers and the list of integers would be 1111 if I have multiple ones and what I do is I emitting a new a new key value pair which again is the word but the sum of all the different counts well as you might expect the reducing face looks quite similar because combining and reducing is pretty much the same stuff we again getting the word and the list of integers but that time if you have a combiner it might not say 111 it might say five eight 11 and what I do is in this time and in this case I don’t need the word anymore because I already grouped by key so I’m just summing up the different counts and I’m returning them if somebody likes mathematics have a quick look tell me if it’s wrong I took it from Wikipedia nose still hope it’s correct but so far nobody complained about it and that is actually a picture of took from from one of the Google presentations from memory juice it’s actually one of the first ones it’s about october two thousand four and you see they start they published the white paper somewhere in

beginning of 2003 and even by the end of the year they were about a end of two thousand four there were about 1000 applications that actually we’re running on MapReduce so they can’t tell me that you’ve got rid of all the memory operations yet maybe in ten years maybe not so again back to a demonstration let’s see let’s close everything like that and we try to to build again the word count example let’s try the other one so all right so what I did I have again a small template I’m again creating a hazel cast cluster of three notes in film map with data I actually read in some amazingly important texts you might guess it’s lorem ipsum if anybody can tell me what what it says I would be totally happy so I have free files filled with Laura different versions of lorem ipsum just 22 count them I’m i think that’s not special reading in every file I’m putting it into the map and using the file name actually as a key so I’m really putting in a full document inside of a map okay so let’s create a new package we call it word count we’re not using the previous one the predefined one but ok let’s try read count for the key ok the question is if automatically the the so if the key is if the hash code is used as as a key right yeah not not directly so what we do is we utilize your key and we’re using the hash code of the bite object of the byte stream and that is actually what is useful for partitioning and for the petitioning calculation so what we do is we create a new word word count mapper that’s actually a mapper and we want to go from string string so filename document to string integer word and an account oh I did it again all right so what we now get is key let’s make a nice document string integer context I’m amazed all right so we get the kit for every entry in the map which is now free because i have three files i get a call to the map function and I get my document so i’m using a stringtokenizer to split it up and I had some nice function to actually clean it up because the string to open as I have a big problem with dots and all the kind of things it’s word count clean word awesome so while tokenizer has more tokens that’s not the one I say okay please give me the word which is actually a Newton the next token and please clean up my word so I remove spaces and all kinds of conservation and I say context image word one so as we saw in the in the example code or in the pseudocode we’re moving we’re splitting up the document in the different words we’re getting each word and we’re emitting it just as word and value one next next thing is where we’re stepping over the combiner yet so we see the different values bird count reducer let me guess it’s implements all right so in this case it definitely is implements because for a mappa you only have one mapper per note but you might have well you might end up with multiple reduces per note because the multiple keys can be submitted to the same note for producing that means we’re actually

creating the reducer on side of the note so we have a static class let’s call it my reducer I can tip not producer and we know our reduces are still straying integer because we have our word and a number for some reason mode not good at typing today ello set so you might end up with multiple keys on the same on the same note for reducing so you might end up with more than one word user every reducer is just one Ricky so every time you get a key here you create a new reduce of exactly for that key because the reducer has to have intermediate state so either you would have to hold a map of the different reduced keys inside of your reducer or the the simpler way is creating an reducer perky and the the framework takes care of that for you so we can see that um for sure this time its X so the important thing here is volatile why it’s not about that you want to be sure the change of the value is atomic but reduces our implement in some kind of an actor way that means they might do some fret hopping if your user runs out of work it gets to sleep mode and it might be reactivated later on another threat so if those states are pretty near together and it gets to sleep mode and reactivate it again in the same millisecond you might end up with the old state and you lose data and you don’t want to do that but by default or reducer runs only on a single threat at the same time but it might end up hopping to another threat over time so we have some here something’s wrong on that oh yeah my fault for sure it’s integer integer so we’re getting an integer some and or it may be predefined value and we want to reduce to to an integer not my best talk today so what we do is we say please add the value to the internal some and eventually we will reduce to to an integer that’s why it is integer integer not string integer right you get the key if you create in your reducer so if you want to use take care of the key or use the key for something you just put it into your reducer the what of the volatile as a set so the reducer running an actor like way they always run only on a single threat but they end might end up hopping to another threat and the same thread pool yeah it is it’s not about utama city it’s about memory visibility if you end up in another threat yeah that is the input to the reduced method and that is the return value of the reducer so if you for example get in a string and you want and you know it it might be an integer you can transform it to an integer and reduce an integer so same goes for here string into alright so now we have a string key and what we do is we create a new reducer on every call of new reducer so let’s get back here what we do is we get a job tracker job tracker pretty much is the entry point into the memory use framework you can configure a job tracker it’s like a shared resource the jobtracker configures the fred pool how many threats there are you create new map and reduce jobs by a job tracker which actually takes track of the job that’s

why the reason is bear the name is there and if you want to configure a fixed resource management or a fixed resource pool you end up configuring the job tracker and you might even have different multiple ones or different ones so what’s the in the XML configuration various is set of predefined values it might not be the best one I think by default it’s five frets for Fred pool in the tracker but you need to think about if you have multiple jobs you always have a mapping fret running you might end up with at least one reduces fret and if you have multiple jobs you might end up with having all the five threats filled up all the time so this is something you need to find out for yourself so we have a key value source that actually is implementable by your own it creates of any kind of input data it tries to create a key value pair per entry so you might want to implement that on your own we wouldn’t you we don’t need to use the map so we quickly read the files just in the input in the key value source if you would implement it but fold for the sake of simplicity we created a lot of different values so what I do I getting the imap again um string string map is it cast innocence get map and I think it’s map name there you go so the key value source has on static methods to transform the standard hazel cows data structures into a key value source so we put it in here we will see now now let’s create a new job the full API as you as you see is completely generic so there’s pretty much if you use the generic there is no chance of doing a mistake so we create a job we ask the job tracker for a new job and we give it the input source in the end we I complete future we get back a map of string and integer because that is our mapping between words and the amount of the count of the words let’s call it future and then we go to to the real configuration so we say we have a new mapper which is actually our word word count mapper and we have a reducing which is the new word count reducer and then we just submit it to the framework and we’re good to go so we can execute that one let me just add something here so we want to see the difference between using combined are not using combiner so I’m just a printing out the the different values that come into a very user again its soaring up the cluster oops wrong one but again we see the Bellas are not mutated concurrently let’s execute the correct one 10 minutes left some good alright so we see it actually counted up the words and here we see no matter what every time there wasn’t one so actually it did all the these single steps so let’s let’s now create a combiner what count combiner

and again it’s integer integer so input value and output value type integer it’s not my day indifference to the reducer here you don’t need the vault-tec keyword because it’s always running alongside the threat of the mapper you can do that but it’s not needed so we do some and again we say plus equals let’s call that value I don’t like it and actually an indifference to the reducer the the combiner works in chunks that means by default every 1,000 emitted elements a chunk is created and sent over to their uses so that’s what I said everything works as much as possible and in parallel so we ever time sending over some new data to the reduces to keep them busy you can configure the chunk size if you want to so if you set it to 21 or 20 and you have a combined I it has pretty much nothing to do so we return the sum and we actually set or reset because after chunk is created we want to reset the sum to zero again and the interesting thing about setting up a combiner is well that that single line when word count combiner that’s oh for sure again that’s it needs it needs to be a factory because we implement wonder that’s definitely not my day ok and again there we go and not to forget you normally this is a step I totally forget to replace the new and then you get a nice nullpointerexception but you can show that you can show people by that that actually exceptions are propagated back to the color which might not be the best way to show that okay so it’s again storing up and hopefully we don’t see the 111 but something else was done right yeah so so the question is if something fails what happens if the other different strategies on handling on that currently the only implemented strategy is is stopped and you have to retry but for four new versions there should be automatic retry in hopefully 34 I’m able to implement snapshotting so you can snap shot all the time and restart at a certain point and another way is giving you the option to implement a strategy if you say if you’re for example say I don’t care for it just go ahead and and you lose some data but it might not be important for you then you can just go ahead so again I mean we see the work the the elements are counted up let’s find something that is maybe not zero whatever it is zero I don’t know oh there we go okay we see multiple of the same words I counted up front before they actually were sent to reduce it whatever there were so many not zeros I don’t know oh I actually I know ha I know it the reason is and that is

important um let’s make it and we can make it an end so every value that is returned from a reducer or form a combiner that should be no value at all let’s guess what it should be well it should be null so if if you don’t want to return a value from a combiner because there is nothing that is done you just return null and it’s not sent over so what if I happy now it should send it shouldn’t send anything if there is no word um well the well the the the thing is the key was used previously so the word was found previously and the combiner is still available because I don’t recreate the combined us all the time I’m raising you are using them that means the combiner is still there but it has nothing to do in that chunk so this time although not zeros are gone well the question is is there a plan to use the optional type from Java 8 and that tells you know because how many people here in the room are using Java 8 ok how many don’t use Java 7 java 6 turn around and tell me if there is an option you know that’s that’s a problem java 8 is from from my perspective at least four years ahead that’s that’s the problem you can’t go for for using Java 8 features the reason why we actually have an eye complete will future you have completed a future in Java 8 but you can’t use it because nobody of your customers is actually using generate all right so last slide yeah probably we create an eye optional type but the optional type has different problems like you’re creating a new object if you don’t need it because your value is null so but you still create a new object you as a GC performance special it should know that II yeah that’s true you don’t need to create every time so I still have to mux so two more questions oh let’s see who won’t yeah okay um yes so the question is can be combined and reducer kind of different because in the example they were very very similar actually yes they can for example if you do an average calculation you have a number or an amount and a number so I had 15 elements and the amount of those 15 elements together is 20 so what you combine it is it combines a couple of two values whereas your reducer is just using those and summing them up the the amount of all the different tuples and the number of the different tuples and divides them so you get an average value so it definitely can be different ok the question is do you can have multiple mapping and reducing tasks one after another currently not that is part of the streaming API that is coming up or hopefully coming up and in the end in the end you will be able to do that even in a more functional way so their function to face and and those kind of things you said a couple of questions okay yes not yet but in the end you will be able to currently not but we can talk after that just come by the booth and we can discuss it currently it’s it’s not L on the roadmap as far as I know whoever of you guys wants to have it