okay so having opened that database do you see something like this okay great so let’s go first explore just a little bit what we’re dealing with here this this is the element of tableau that allows you to connect to various data sources and generally speaking within a given tableau workbook on you will often connect with a single data source a single database excuse me but you’ll be connecting at four different types of analyses with different sub tables within this database okay let’s go take a look at what those sub tables look like the tables here on so that we can relate them to what we saw this morning with Kevin so I’d like you to go over here to excel for example actually let’s go first a battery let’s go click click on that little on a little table that’s there let me show you how I did that again so here you notice when I hover over this there’s a small table icon that appears I’d like you to click on that and it will load for me a certain number of roads a small subset of that data so that I can kind of browse what’s there and what do we see that lies before us can anyone narrate what’s the first column what do you think that represents yeah unique identifier of a given participant hmm there’s actually a little bit of a distinction I make em I’m assuming here a participant it’s the same as a phone actually of a particular phone but for the most part that will correspond to a particular participant you’ll notice that there’s an indication this is the battery cable after all whether it’s plugged in or not not plugged in for them to the AC and some of them will stay plugged into USB status as to whether it’s discharging or charging current battery level whether it’s concealed this is whether the snooze button is is a press then if you drag this across you’ll see a number of other columns that I won’t go into but which kevin briefly mentioned this morning so tab provides you a way to kind of browsing the data on a per table basis and getting some feel for what’s there for sure some of these tables on will well the tables in general will differ radically from each other so let’s take a look for example at this survey table this is actually quite different than the other than many of the others this is not based on sensor data but is in instead of based on responses to surveys so once again we see a certain amount of commonality there’s a certain orderliness to to what we’re seeing we’re seeing once again a creator ID in a constant remember all the data collected here is timestamps and it’s recording who is sending it but there’s some differences here one thing will indicate whether the survey conned out or not remember that notion of a survey that will time out if you don’t answer it within a certain time it may be set to sort of self destruct after a certain amount amount of time but there’s also a narration is concealed but there’s also an answer that this person would have gives as to whether they in the past half hour have smoked okay this was a single item signal question questionnaire that was asked and you’ll note that there’s actually several types of data within this table some of them are our numbers 0 and if you scroll down further you will find in fact one scattered among there but then there’s this value null which which indicates basically no meaningful data collected so it’s almost like an na it’s not it’s not applicable why not because it timed out so there’s no answer to this it’s not meaningful to think of it as 0 or 1 you’ll notice that some of the timeouts also have nulls associated with them which i believe for indicative of a case where um there was perhaps the phone was was set to was actually not in a state to record data at that moment I’d be interested to know know what that is um but in any case there you wouldn’t want to look at those that are null um either um this is me know the nose that means it did not time up excuse me well on nulls there means it did not turn out which is a very poor label but that’s fun this means it did not time out therefore there was data de okay on this is a survey let’s take a

look at another one bluetooth okay this was a this was a study where no bluetooth was create was was collected so there’s actually no data no data here uh if it did collect it it would record basically the ID of other phones that were detected at a certain moment in time and whether or not in fact the data recording was concealed concealment it the concealed column will indicate whether data is being suppressed actively by the user so we know if the data if the user asst news the phone we want to know that the phone was capable of collecting data at this time and simply it’s being censored because by user requests that’s a useful thing to know to understand that the phone didn’t simply run out of power or simply unavailable and offline yeah that’s that’s hideous and that that should be banned um no that’s that’s horrible it should be it should be a boolean it’s trivia for Liam software engineer and ankles me oh oh man um anyway um that’s like using integers were true and fall um well no most emphatically not this would be a tight system that’s well define any case let’s work it accelerator here are the accelerates that the accelerometer readings so here we have creator IDs timestamps is concealed but we also also have x y&z readings from the different accelerometers and if Kevin notice and tick as Muhammad noted briefly these are these are taken in the context of three axis accelerometer II and generally speaking on they will they will be combined in order to process the data but sometimes you may want to look at each independently for certain types of analysis so this is just a glimpse at some of the data that’s collected here and you’ll find that there’s many other table some of which are derivative upon this and which we won’t be going into but my main focus today is to introduce you to some to some uses of tableau for analyzing the sources of data so you get some feel for sort of exploratory data analysis using this tool so let’s talk a little bit about on the conceptual the conceptual underpinnings for the sorts of analyses will be will be taken to account here so um Pablo is interactive data visualization exploration software it can collect connects to diverse data sources which include traditional relational databases databases names you may have heard such as mysql such as oracle DB two and ms sequel server etc it can also connect to text files sources in spreadsheets like excel and no SQL which is a gross trend within the big data area that’s used to handle data that doesn’t fit neatly or with adequate performance into relational database context and the outputs from tableau that we’ll be exploring will include both graphs and tables that we may wish to output in a in a textual fashion on so the conceptual model assist you with tableau is one of cross tabulation which I think most of the people in the room from health science back home would have been comfortable for if I said that that we’re working with a crosstab would that make sense to most people have you seen that before sort of a a summary of a say breakdown of a population in two different say age and sex groups so you might have a total population of a thousand preps and they’ll be males and females and will have different age groups here on into age categories and

this might total up the number of males who are of this age group say 0 to 4 and this would be the number of females of that age group this would be 5 through 9 you’re familiar with this sort of construct they’re also called contingency tables and they are very popular with in FBS a source apps for sort of breaking down slicing and dicing data from a given group the the basic mode of operation that will be exploring is will be taking data from some table it’ll be aggregated up but broken down into categories in a multi-dimensional fashion and typically will be dealing with one or two dimensions so we may break it down by male and female only aggregating up across different ages or we may break it down into age groups only aggregating up across male and female or we make main break it down by boat and within the data to be broken down it’s not always simply counts it could be count the number of people in each of these groups but we might use a crosstab to summarize for example the you know the the prevalence of pertussis in these these different younger age categories right there will be a familiar use in epi um and we can do a similar thing here we will have data that we wish to to summarize across different breakdowns of the population where that summarization metric maybe to take the average maybe to total it up it may be to take a standard deviation so the process will be going through a selecting data to be aggregated celesta dimensions by which to break it down so the data viagra might be counts it might be it might be prevalences of of pertussis it might be data on income levels for different for different ages like the dimensions by which to break down the chart type to use and then we may further put in place filtering or pages to see different subsets of the data and we may want to have sort of visually distinctive indicators for certain types of information so it sounds fancy but we’re going to we’re going to go through some exercises where you get a concrete feel for what I mean by this ok so we’re going to be summarizing up for example it may be accelerometry data and we’ll break it down into according to people according to according to their demographics and we’ll select a chart type that provides provides a bar graph and then we will limit it to two people who are over a certain age something like that okay so um I want to introduce you to the basic user interface is to do this so the first step selecting data to be aggregated we will be using data from the database and selecting for example data on location and here we will have some data related for example to latitude longitude whether the phone is concealed the accuracy of it meters the Creator ID these will be different data items and we will be choosing from these data items which which data we wish to display so we may be seeking to to summarize information for example about the accuracy or seeking to display or to summarize information with a number of records of a sort and sort and then in the next step will be selecting the breakdown that will occur not over here is the lower left foot will be instead dragging at Reagan elements from here to this place in the upper right for the columns in the Rose those will determine the columns and the Rose associated with the structure that we’re going to be defining we will be selecting the chart type to impose whether we wish to render this data as a table with text whether we wish to render it with a bar chart whether we wish to render it with a scatterplot whether we wish to render it with a sort of Gantt chart or with a map for example so we’re going to be lending a rendering of this data that we’ve broken down in the fashion according to

certain certain possible representations and that can be chosen over here in this marks area where you choose which tool it can also be assisted with this bar here if you’re an automatic update mode this one on the right so here will sometimes do breaking that data down for example as a pie chart and other times as a line graph other times as a bar chart ok and so we’ll be using these widgets again and again again on another thing you can do if you could choose visual distinct visual distinctions that you wish to impose for example the data points you may wish to distinguish them via color or via size or vote via some label that indicates a particular type of information and you can choose that with this marks area not this drop-down which sets the type of graph but with by dropping elements in each of these items so we might for example drop the Creator ID into color so that we have visually distinct colors for different Creator IDs or we may drag into color here the plugged in description to indicate that we wish to have different colors indicate whether the device is plugged in or not ok um and another element that we may use is to impose a filter the filter will will basically restrict the display to certain subsets of the data so we may for example wish to restrict the display of this cross tab here to children who are you know up to and including 19 years old and that’s all so we don’t have other other sort of rows of this and so it is here so in the filters area you can drag an item there maybe it’s the Creator ID or maybe it’s the time stamp and you can then choose to only include certain subsets of I Creator ID you could select which phones do you want to see for the for the time stamp you may say I only want to see data after a certain date and it will only show that date after the certain date finally we can select pages as well and I guess they don’t show it here but up at the top here we can basically set it so the graphs I see right now for one subset of the of the population but can successfully go through successive subsets of that population displaying each of them in turn so maybe I go I want to see the data just for the 0 to 4 year olds male and female and then I want to see on a separate 1529 and then 10 to 14 on a separate graphs etc and we could do that through the pages okay this is just orienting you’re going to be coming back again and again and again to these elements associated with with how we compose things whether it’s filtering whether it’s indicating with colors and other visual distinctions whether it’s indicating the type of graph the the break down the rows and columns and the data to be displayed this this elements on the last year will be a point an area to which we go back and forth because it is from this area this data area that will be dragging things to color will be dragging things two columns and rows will be dragging things to indicate what to display okay so this this is going to provide us with kind of the palette of data items that will be displaying and and by which will be breaking things down so this is a brief glimpse of some of the elements will be using any questions on this before diving in to take a look at this you want to go what do you think okay so let’s go through a bunch of exercises with this okay um so for the first exercise ladies and gentlemen um I would like you to go and uh and choose over here on the left hand side this battery area okay um there’s going to be a battery data source remember or not yet in the point where we can we away of all those widgets first we have to define a data source that will depend on batteries and that data source then we can start to drag those things over and break it down in different ways there may be many sheets where we create different visuals that all depend on battery so we’re going to take battery here we’re going to drag it up to this table area a so battery is going to be dragged up to the table area and how did i do that

i clicked on battery i dragged it over and i released ok so let me do that again i will go down to this table area click on this drag over and release it get the tea i should get ready because there’s good things are going to come thick and fast ok so prepare for deployment ok so are people in sync with me here are you and think okay okay great so ah ladies and gentlemen um what I’d like to do now is to go down here and click it should say something like sheet one here you ready with that ok let’s click on sheet 1 so what we’re going to do is we’ll actually double click here and give it a name so we won’t call it on uh we will call it a battery battery a come on on battery record count Oh paying on ok so type battery record count here and go battery record count ok oh just like that see that are you folks okay okay no one’s gone needs ta help yet ok so how did I get this i double-clicked here and i entered it you come true with that ok ok so having done that ladies and gentlemen you will see that up here on the upper left there is a battery label where did that come from from whence come who created that sorry it was in the database in fact that’s what we just created you created that and you should be proud of it ok um the the creator of that lies not in the stars but but in yourself so that’s that’s what we just created it was called battery by virtue of the fact we drag that battery and we could have named it something different for that suitable does everyone see that there who needs ta help who would like a TA to sit next to them ok ok not yet ok great um ok so this is going to serve as our data source for this this particular record and and it happens to be the only one so it’s it’s selected here ok so now ladies and gentlemen I would like you to go and um drag here that remember I said that we’re going to specify what to aggregate what we’re advocating over here is number of records so I’d like you to take this number of records and drag it to this area so you’re going to click on it and drag there are you comfortable with that ok ok so I’m clicking and dragging and what appears there ok so the number of records in the battery table this is a small study small but expertly managed by noches and and this is the number of battery records that are shown ok battery records there that’s the total number of batter records in our database for this study and what we’re going to engage in is we’re going to break this down into various types of crosstab ok so ladies and gentlemen I would like to drag over creator ID into rows to break this down by Rose there we go so how did I do that I again I clicked here creator ID into rows in it and it breaks it down ok there we go so what is this anyone want to hazard a guess what is this summarizing okay okay these are the different phones you’re on the left and what do you think these numbers are here the number of records per phone that’s right so we’ve just taken this number of records and by dragging this creator ID we basically said summarize for me the number of records on a per phone basis okay um that’s we’ve created that sort of cross tablets as if we’ve created this component of the crosstab right um let’s go now to elaborate it let’s drag over

here the time stamp to the columns area okay so i just clicked on time stamp and i’ve dragged it over here now where it says you’re there well okay tell me what is this show here this shows something a wee bit interesting that’s that’s an interesting question um I’d like to to know something about that myself um but uh yeah yeah so this is the most intriguing one here are two from 99 and 2000 and and we’ve seen that before um that’s if the phones are kept off for long periods of time and their internal battery runs out they may collect a certain amount of data before their they reconnect with the network in which case now they know they’re in 2015 2016 is a bit of a mystery there um but let’s change this from year and I’d like to change it to um like to change it today okay today and having changed it today what are we seeing here does anyone want to hazard a guess we’re actually not seeing quite the thing you think you have to be a little bit careful with with this this is a reason I’d like to emphasize this so what are we seeing what’s the what’s the range over which we’re seeing this okay up to what maximum value now so day here is actually the day within the month if you actually want sort of the actual day the calendar day you can do day down here this will actually give the absolute day this is day within month do you have to be a little bit careful about that so I’m going to click on the lower day and it will then show me various sort of indicators here and here are all the ones from 2015 so the next challenge for you folks which you’re going to tell me is suppose I wanted to screen out I only wanted to focus on those that are 2014 or later how could I do that how could I limit the data shown for just being for those 2014 what thing could I use here filter yeah so let’s go let’s go on drag drag the timestamp into filter here and you’ll notice that I want to filter on okay i will say range of dates you know so I how do i do that i clicked here i dragged it to filters and i want to choose own range of dates and i want to click Next okay and you’ll notice it lets me select a range of dates using this sort of of mechanism and i would like to drag this over and I’d like to drag it over to 2014 okay and all set I’m not particular about the data editor well let’s set it to say mark 15 2015 okay around march early march through mid-march stereo okay or people okay with that do you see this okay and you can say apply here and it will go do the requisite work and it will show show us okay this type of of data there okay that’s great and suppose we want to set this later data to only go through about the current date so let’s make it through um for about 8 8 11 or 8 12 something like that the study actually ended well before that so we could say okay and now it’s going to oh ok that’s interesting um I thought I did that properly but here we go so I would have thought that that would attempt sure

capture it but looks like uh okay okay sorry oh I went back to 2014 sorry I’m sure oh man thank you all right so this should be 2015 sorry I did it mark midmark 2014 i’m a year behind um and and then we’ll want to go through about mid mid August 2015 yeah there we go okay so so here we’re getting some more specific ones you notice these are the number of records that are collected on different days here but this display is not terribly helpful what we’re seeing or think seeing things labeled and we might be able to make them out after all if we hover over these we can see how many records were created on different days but we’d like to have more kind of meaningful summary of this so ladies and gentlemen we will now make use of different renderings of this okay and I’d like to try a rendering using this scheme here okay and what you will see here so i just press this one and what this does is visually summarizes the number of Records received on different days according to the area of a square so smaller squares indicate fewer records and bigger squares indicate more records on a per person basis here hmm do you see that are you able to do that to use this this item here okay we could alternatively have specified it over there okay um so we can scroll over and we can see you know generally the sort of ebb and flow of different people’s data collection on different days this is the battery so as a general rule of someone’s keeping the phone charged there should be battery records received and you’ll find that on some days there’s very few but in some days there’s many and generally different participants will vary in terms of the number of records that they’ve that they’ve recorded by hovering over these you can get a total for different days indicating the number of Records received for that day so this is an indication of the use of tableau for one type of data okay I’d like to now put in place another type of data okay um and this other type of data will be a map data okay um and this map data will be drawn from a different part of the database the same database but a different table in that database so i just created a new worksheet using this little icon here I clicked on it and this worksheet of canvas came up and now what we’re going to need to do is to go and set a data source because we’re not going to be drawing information from the battery but information instead from the from the location data the location table so um who needs ta help to help them with us you ok so far ok so ladies and gentlemen I’d like you to right-click on the battery icon I’d like you to say duplicate this is this is what we in computer science call a hack um it’s a little trick a little bit like encoding things with no um it’s convenient and quick it does the job but sometimes you learn to regret it later um ok so ladies in general we just copy this I’d like you now to right click on it and edit it okay so so let me let me do that again I hear the hubbub um so ladies and gentlemen I’m going to delete this uh and i am going to UM to do that again so let me just do on clothes okay here let me do it again I right click on this I think on the Maxis command click ctrl

click ok ok um so uh control click OK control click on that and I’d like you to choose duplicate ok duplicate boom ok now for this I’d like you to control click and do edit data source okay and for this data source I would like you to type location as the name and I’d like you to drag out battery and I’d like you to scroll down and dragged in location okay okay how did I do that well originally battery was here but I’m going to to go over here and say I had battery like that you can either do an X here you can drag it out and then as I said I went down scroll down here and I dragged in location okay there we go location hmm ok people okay are you a synced up are you do you want another minute to do that who would like another minute okay so its location so ladies and gentlemen having done that we now have recourse to a data source that provides location lips let’s see what’s in there let’s go use this new data to see what’s inside of that what do you think would be inside of it riddle me that lettuce latitude and longitude most emphatically accuracy and metered know this is a terribly miss named table it when I see it it kind of gives me the willies um because higher values of this means it’s more inaccurate that give you the heebie-jeebies also oh man um it really troubles me this is basically the standard deviation associated meters associated with the measurements as I understand it roughly yeah on time Sam the datasource note some of these were judged by GPS and I think network is an occasion it actually got the information from Wi-Fi related or cellular okay cell towers yeah and here’s the Creator ID and again it’s concealed and some other ones like speed which can be useful for distinguishing vehicular context speed is sort of estimated through the through the location um we’re going to be making use of latitude and longitude is its main focus here okay so you can close this are we are we in sync here people ok ok so now ladies and gentlemen having done that let’s go click on sheet2 and we will call it we will cause it’s a participant location so I double clicked on this you folks have like Neanderthal mice like their single click mice oh man okay okay so can you double click yeah sometimes okay okay um yes so um so I double-clicked I said participant locations or folks okay with that this is on a new workbook are you previously creative with this are people okay who needs ta help okay great okay ladies and gentlemen now what I’d like you to do is we’re going to go someplace you’ve never gone before okay so um we’re going to go drag latitude to the rose now you’ll notice that something weird happens well at least it’s weird in my book um that it it shows a bar graph that that summarizes I think this is in fact the average of the latitude to shown there we actually will need to do something different here this is very important we need to click on this drag down and you need to set this as a dimension you’re not trying to summarize information about it you want to sort of use that as an access to display this

information so how did I do that I went up here and I pulled this down and I said I chose dimension here rather than being a measure where I have to set whether it’s the song or the average it’s not something to be summer all it’s something that to use as an axis so I pulled down and I went to dimension on people okay with that okay okay are you hanging in there okay I don’t know disconcerting um okay ah now let’s drag longitude up to the columns and similarly let’s drag that down and choose dimension ok now we we see something of interest here what do we see anyone recognize certain geographic features or even polygonal shapes that maybe maybe familiar darn right darn right it even make a like ah ok so ladies and gentlemen on if you hover over this map you will see that you have the ability to zoom in and as you zoom in you will see additional features appear but it is still somewhat opaque so I’d like you to actually go well we’ll do something more to liven it up in just a minute but first I want to distinguish ladies and gentlemen I want to distinguish the dots here well all these dots are blue from whom are those dots drawn from whence do they come from which person which participant all of them their summary of all right we haven’t told to break it down to participants suppose we wanted to make participant thoughts differ according to their color how do we do that okay drag it into color so let’s go do that now create our ID drag to color hmm okay so now we have this chart broken down you’ll see that one participant said participants q greater than sign yg six took a long sojourn to the fair province of Manitoba um but let’s go zoom in on this cluster here let’s go zoom in ladies and gentlemen on our fair city of saskatoon okay so so I’d like you to drag this over and I’d like you to click click successively n and you’ll start to see different participants within our fair city delineated by color and you’ll see the different areas that they traversed to this was a very small study we didn’t have many participants but they did on they didn’t cover up their bit of the city and they’re so trone’s okay ladies and gentlemen no I’d like to add additional information to this map I’d like to add safe Street information so if you go up to map here you should be able to pull down to map options and you’ll notice that over here in map options there’s options of displaying certain information certain of these features are less directly relevant to our Prairie context so the coastlines will confer little additional value for this graphic although in in in in prehistoric times they lapped the shores of nearby inland seas um but um but I’d like to click on streets and highways and you will see laid before arrayed before you the streets of our fair city and you will see them see the various participants uper imposed on that do you see that here um so you could play around with with other things here if you wanted to but suffice it to say that you know if you need more you need more specifics at a detail level you will be able to see it okay so this is a location data for the participants on and it’s you’ll notice that it exhibits a certain granularity in other words there’s no one for one reason or another between these points are and and that seems to be due to rounding associated with the received

received values yeah sorry yeah yeah so um so here there’s a sort of limited resolution within this now is that is that inherent Kevin in the data received through gee I thought I’ve seen it with more decimal places in the actual underlying database that’s what I’m wondering yeah exactly yeah okay um so thanks um ok so here we have information on a map and I’d like you now to imagine suppose you want to see a single participant here how would you how would you do that give me give me at least one way how would you how would you limit this to a single participant we want to see data separately for each participant okay um so you could try click on color so I’m not sure I’ve ever done that and yes oh look at that thing of beauty look at that oh great oops oops oops that one that was a unfortunate thing um yes sweet um there’s another there’s another way though and to do this we’ll have to get rid of map options you can close it here without fear in the X and we can drag participant creator ID up two pages okay and now we can page between different participant IDs here and display display the values unique to two different groups in the population so this shows the kind of wanderings of particular individuals within this database in this map context okay okay so you’re getting a bit of a feel for for this okay shall we go further yes Jeff sorry okay okay the TAS in front of the okay okay so ladies and gentlemen does anyone else need ta healthier ok so we’re exploring a lot of the basic mechanisms here I’d like to now go and examine some some additional options for analysis so ladies and gentlemen I’d like you to create a new worksheet that’s great yep ok I’ll take that a level higher maybe so um if we were to look at this so I really like that idea and we’ll do this um but I’m also going to innovate it in a way to give additional information on that so ladies and gentlemen um if we looked at this graph could we say anything about or perhaps where that person was spending the most time right now each of these could be you know 40 different dots it could be just a single time they were there it’s not so clear from context so ladies and gentlemen what I’d like to do is I’d like to take right click on this on this thing here control club and I’d like to say duplicate sheet and this will allow us to to duplicate this sheet and I’d like to change it to say participant I’m going to say participant location count hmm how’s that does anyone okay is everyone okay with that participant location come okay now what I’d like to do that is right now we’re only doing one participant at the time because we’re paging by participants I’d like to take therefore we don’t need to distinguish them by color because it’s only 14 different types so I’d like to drag creator ID out of color just drag it out if they’re so we no longer distinguish things by color they will all become blue which is going to be fine for our purpose and now ladies and

gentlemen I would actually like to summarize this information according to the number of record so suppose I wanted to color them darker or lighter according to a number of records how do you think I might do that anyone want to suggest how you might approach that how might you approach that yeah okay so you couldn’t what size would would slice them differently to the number of records if we actually just drag them to color will get a lot of the way ok so we drag to color here and what we’re going to what we’re going to see is a that that it’ll actually color the darkness of this according to the number of records that have been reported at a given location okay um all of these okay so actually what we need to do is we need to do it yeah for for that particular for that particular individual okay so um so for example if we if we go to this one here will find this one here had a lot of readings here 957 of their readings were right here and but for other individuals this is actually the total number of records across all all individuals in fact so what we will do is further and this is a distinction okay because we have different pages for different individuals i think what i’m going to do is to drag creator idea out of pages and I think I will drag creator ID into filters and I will choose for example the first creator ID here and here’s the first creator ID you’ll notice that this for this particular individual we had a disproportionate number of records at that point if we went to another creator ID say the second one we will have more records there so in short we can summarize where people spent the most time according to their the number of records that are landed at a certain point we can we can recognize some of these represent many many readings but many of the represent just an occasional reading now lending insight into this further I’d like to go choose one of these in the filter and I’d like to choose the first this o wz here and I’d like to how did I do that let’s let’s try that again I did filter and I chose owz and I’m going to zoom out here and I’m going to tell it further that i would like to filter it by the sea of accuracy so how would I do that how would I filter it by accuracy ladies and gentlemen what could I do drag accuracy over to filter and I want to filter on all values yeah and I’d like to to not include values above a certain point remember this is this is 2879 meters so that’s 2.8 kilometers I’d like to drink this down to all things that are say below what a hundred meters maybe something something on that order i’m going to say apply here and and now I’ve got a subset of the data which are which have a higher accuracy associated with it okay um and maybe just to bring it out more clearly I’ll get rid of this number of records so here we’re focusing only on cases where there’s fairly accurate data available and you’ll notice that accordingly we have people for example here crossing the bridge here and and they’re not they’re not that far off this one is a little bit in the midst of the river okay um this is maps and you’ve seen how you can filter things they can display things distinguish things I’d like now to go through two or three additional

exercises on so one thing I’d like you to do is to create a new a new worksheet here and we’re going to go and and examine this so this will be plugged in status okay okay and what I’d like to do here is to is to go drag so we’re going to be using here battery here and I’d like to drag in time stamp over to the to the columns area okay and within this within this area I’d like to take it will say year so how did i do this i selected battery and i drag time stamp over two columns and i’d like to take this and like to say use the exact date okay okay next ladies and gentlemen I’d like to go and and go in and take the okay so right um ok and next I’d like to put in the plugged in description excuse me creator ID here I’d like to drag it into the rose there we go so what do you think that will produce if we if we drag creator idea into the rose anyone mm-hmm this is taking a long tongue okay okay so you’ve actually seen something like this before do you remember where you saw that yeah so so we’re going to go now and expand this further but I want to get rid of dates that are before 2015 so so again what do I do it’s actually two ways to do it I could right click on this and do scale access if you’d like to or I could do what what can i do to only show things in a certain range filter so I can drag to filter here I can do range of dates and I could set this to be to be to say start in mid-march so 33 14 and go to the middle of middle of August or so and you should see something like that do you see that hmm ok now how would I color it so that we can delineate cases where the where it’s plugged in or not what do you think what items over here would allow you to see whether it’s plugged in yeah the plugged in description good so let’s let’s drag that down to advantage a good name in let’s drag it down to color okay okay it’s trying getting hit by a lot of things at the same time here Oh okay so so that would be that would be good okay so so here you ever graph showing three different states cases where it’s not plugged in it’s plugged in to via AC or plugged into USB for a given user right so we can look at for example the patterns of charging

associated with this device for a given a given user here ok so that’s another visual associated with with with this this sort of data ok um so ladies and gentlemen we’re going to go to to do something a little bit more advanced now before finishing up so I’d like you to create a new new worksheet here sweet 5 and I’d like you to go and create a new data source ok um and we’re going to use a data source associated with smoking status ok so we’re going to call this sheet smoking status by time ok ok and I’m going to call this I will be releasing the sheet for you to look at in case any of you had trouble following it so this will be a you know tableau exercise 8 2015 saskatoon ok great so ladies and gentlemen how would I create a new data source that will tap into smoking information what do I need to do to create a data source well we did it for does anyone remember how we created this this location one okay yeah so we right click on this guy we said duplicate okay and then we can right click on it and say edit data source and we’re going to pull out battery and we are going to call this one smoking status okay and we are going to find the appropriate table here so smoking status here is indicated indicated by by this Aoi okay this actually excuse me I’m going to do it via there’s a PP smoking table here and I’m going to drag PP smoking up there where is that it’s down here located down here in the tables TP smoking okay okay now let’s go see what’s in PP smoking so we’re going to click on it and you can see there’s an indicator at the start time and the stop time assist you with smoking and as an indicator here is smoking yes or no are people okay with that who needs ta help okay th deploy sector 2 okay okay well sneak ta help okay i th deploy to sector to left okay okay who else needs help people following long so what did I do let’s let’s rehearse it I went to create smoking status by time i then went right clicked on battery and I did duplicate I then went to edit that edit data source i called it smoking status i dragged out the battery table and I went down and I dragged in PP smoking are people comfortable with that mmm remember this was a study run by narges that had intervals of smoking reported in fact it at intervals of non-smoking for certain periods for the first for the first one are just how long was it for the first how many days they had to record both smoking and non-smoking interval okay but that was for for smoking but three days for non smoking okay so ladies and gentlemen sorry yeah okay okay so ladies and gentlemen yes because they recorded when they stop smoking and when they they would win a non-smoking interval

bleep from the last time they smoke to the next one smoking interval is the the time over which they smoke very specifically correct after three days they only indicated they smoke yeah yeah okay so ladies and gentlemen I’d like you to now go back to where it was smoking status by time and you’re going to make sure smoking status here is selected okay now what I’d like to do is to drag start time to the columns and we are going to and initially when you drag start time to the columns it should say year of start time you see that and I’d like you to drag it down and choose exact date okay what do you think that shows what lies before us yeah everyone all together sort of they started smoking in fact actually it includes something sort of noise okay um so it’s pretty clean unced in the earlier phases okay so that’s great um ladies and gentlemen suppose now that we want to break it down according to up to creator ID we want to see different different roles for different people how do we do that what would we do instruct me what would I do cuz the Creator ID in rows you know so there we go okay okay now we see it broken down by person here okay great now suppose I wanted to distinguish whether someone or not with smoking at that time according to their color how would I do that well you’ll notice there says is smoking what could I do drag it to color ok now you’ll notice that it says something very strange some of is smoking we actually want it instead to consider this we don’t want this to be considered as a sum we instead want it to be considered here as a dimension okay and it should be in fact a discrete dimension so it should be dimension and then discreet what do we see here of what did narges speak not five minutes ago what do we see so so no will mean okay there’s no information about it zero is an indication that they’re not smoking one is an indication that they are smoking what do we see in the first several days sorry yeah there’s a lot of non-smoking intervals and with gaps between them if they if they had an off person for example um they weren’t carrying it with them and stop that interval and then there’s some periods of smoking interspersed year and then there’s a bunch of other periods where they reported smoking now I’m glossing over some things we haven’t went longitudinal extent to the duration of smoking and we can do that by taking a difference between the stock time in the start time but time is short night and we need to move on to additional topics but um suffice it to say on that here we could to lineata visually the the the periods of smoking and in their duration and distinguish them from the periods where we’re not active non-smoking was was recorded okay people comfort with that okay so I’m giving you some ways you can explore this hugely voluminous data set some some way to grapple with all this data which could otherwise be kind of challenging but all of the analyses we’ve done yet all of the analyses that have been the focus of our work thus far have have been distinguished by the fact that they depend only on a certain table

one table we haven’t linked up data say from smoking with data regarding who’s around them or data regarding you know physical activity accelerometer I’d like to give you a sense of how you can do that well I don’t have time to cover it all I’ll give you one final example where we do that where we linked together smoking status and location okay okay um and enjoy if if this precedes well we may make examine one other small things okay so ladies and gentlemen I would like to create a new worksheet how do I do that riddle me that what do I do okay yeah so we could click on this thing right here it’s one night okay and I’d like to say smoking status smoking by location okay and for this ladies and gentlemen we’re going to have to define a new data source because this is going to draw an auto one but on a pair of tables so how do i define a new data source hmm okay we can go up here and maybe right click on smoking status and say what duplicate right remember that okay and i’m going to edit this okay and you’ll notice there’s PP smoking already there what we want to do our people people in sync with me i’m going to face i’m going to call this now smoking and location this is a data source from which i’ll draw and ladies and gentlemen or cross-linked information regarding location with information regarding their smoking status okay so are you ready to proceed are you ready to go where no person has gone before well okay um where no one in this room could well okay a couple of us have gone before oh okay but none of you have gone before okay um ladies and gentlemen so yeah yeah fair enough um so I’d like you to drag up PP location to join PP smoking within this area and you will see something unbeknownst to you you will see something unfamiliar and this thing will open a Vista on new areas for your learning okay so ladies and gentlemen here we are it is referring to that we’re cross-linking PP location PP smoking in other words ladies and gentlemen we are going to have people’s locations tied in with their smoking data so we can ask when someone smoking what’s their location hmm Crossley okay so the trick will be here in having logic to link these up and that logic lies in that overlap between circles there so ladies and gentlemen like you to click on us just like that and you’ll notice that there’s a pitiful attempt that attempting to to link them up which which which is going to be insufficient this one is not what we want you have to be very careful about this this is what’s called performing the join in a in a relation in a relational database hunts so from this date of sorts I would like to you to drag down and choose creator ID here we’re going to need to align these up so that when we when we have a piece of data from the location data its natural the corresponding smoking data and that’s going to involve three criteria one of those criterias have to be from the same person Italy right we don’t want Joe’s data matched up from the location match up with Jill smoking step that would not be fruitful so we want to select the Creator ID of 1 equals the Creator ID of the other must it not must okay next here we’re going to line up the times ladies and gentlemen there’s

going to be a time stamp associated with the with the location data if you pull this down for this there’s a time stamp for PP location and we need that time stamp to fall within the start and stop times of the of the disco king interval or the non-smoking interval as it were okay so ladies and gentlemen we’re going to go pull over here in the left the start time now some of you may have it the opposite kind of ordering maybe p locations left the point is PP location you got it you got to choose the time stamp that’s that’s a location for a certain instant in time and we need that incident time to fall within the interval so I need bad that interval smoking that interval is a start time and what do you think if the time stamp for location needs to fall within the interval does it need to be equal to the start time or does it need to be less than the start time or greater than or greater than or equal to or what in what case at the least what what does it need to do greater than and I would say or equal to I mean that’s kind of little bit of debate but if it starts at a certain time I’ll say greater than or equal to so it can form right at the beginning if stake in it exactly five o’clock on then you know it will be and the interval starts at five o’clock we’ll treat it okay and then the other Claus needs to be the timestamp ladies and gentlemen needs to be what what’s the other point if it’s within the interval if we need to consider the start time and the what stopped on okay so we need this to be we need oh okay so so excuse me excuse me oh gosh what am i doing um start time needs to be less than or equal to the time stamp in other words timestamp can’t be before the start time the stop time needs to be greater than or equal to’ right the time stamps have to fall between the stop time I and the start time hmm that makes sense yeah so this one has to be greater than or equal to that this one has to be less than or equal to ask on our people okay with that people okay that’s a fairly complex join there but it’s one that’s required here typically you’ll need well often two elements in it okay okay who would like this up a little bit longer a little bit longer okay okay so quite commonly you might match by creator ID and the time or creator ID in the duty cycle it’s very common what what interval which is from in this case we need a little bit more logic because we have a start and stop time are people okay who would like a little bit more time who would like ta help the TS would be glad to offer their assistance okay okay so if we’re done with that ladies and gentlemen we can we can close this it or remember this after we close it and I’d like now to go to smoking by location and I’d like you to now tell me ladies and gentlemen how to first create a map how do i create a map remind me what do i need to do okay longitude latitude latitude goes to what what does it go to okay good good thank you um okay so why don’t you just going to roast now you’ll notice it’s kind of annoying Lee doing this calculation here when it doesn’t need to let me teach you a trick it’s not quite a hack but it’s a it’s a trick to prevent that from happening up here ladies and gentlemen under this this this cylinder which is used by computer sciences to denote databases there’s a thing that calls auto update worksheet turn that off for a minute and then it won’t try to refresh it every time you add something try to redo it from the start so let me do that again to click that auto update this have some cost to it it doesn’t it

doesn’t provide the the sort of guidance and a fist and as much assistance with these I think they stop working without the auto update being on and it is in in some ways it’s not not as user friendly but in some ways it doesn’t go hip the database every time so you can now drag it there and just choose what do we have to do we need to change it to dimension well you folks are good okay and now what do we have to do drag longitude two columns and we have to do dimension for that right shall we not ok so now ladies and Jenna if you want it to calculate it you do run update see this little thing here you can press that and it will go hip the database at least it didn’t hit it all along the way it’ll go and it will fetch the data and fetching it is and in this case hopefully it won’t take horribly ball joints take long but wind chill worked on this one yes in general the joins many port joins in place it has to do quite a bit more work to matchup matchup the data and this is the case where it’s trying to find all these different pieces of data and figure out if they match the criteria to be paired up and it’s it’s often somewhat expensive and the fact that we’re all doing it together it’s going to be expensive so do our people comfort with that notion of a join of that of that cross-linking of that data so that for a given item will have both the location and the smoking status for it I’m glossing over a little bit of detail having to do with suppose there’s multiple locations for given interval how are we going to deal with them well the answer is that’s what we have to deal with with the aggregation it may have to take the average ref to take the medium etc right now the remove easily make it faster they say well it’s cheaper to write data multiple times so for example if you want to run the same query over and over again and different data sets for you might as well just write this copy of this data that has all the elements of different parts included so you don’t have to not join over and over again so basically then you some types of databases don’t even have doing in the country front row the idea of the content is that the data POV to be dated to satisfy the need that previously were satisfied by right so in this case um I think winter will sometimes put in place views that will will significantly speed it up on in this case winchell it’s right so this is taking a very long time so Winchell we did PP PP location PP smoking those are the ones that you had worked on this all right right right right yeah yep okay so this this may not converge in an inadequate time here um okay so uh this this does bring up a you know a feature of this data in general when we need to do largest joins which are difficult for our work on there’s often some work on the backend that someone like wind chill will do to try to make it much much faster and there are ways that you can put in place

mechanisms that will significantly speed up speed aquaria some of them are custom queries some of them are in the form of indexing to match up data quickly from one table to the other and then others in the form of derived tables or or views in this case we’re having um it looks like there was not enough enough mechanisms put into place to make it fast enough for all of us to hit it simultaneously and it may take several minutes to complete so on but this is the basic mechanisms by which we join on data from one source to another and that allows us to then graph it out or summarize it on in this sort of cross cross-linked fashion okay so I think we’re going to have to leave that going um and you know this there’s some other examples that I could do but but I think would be better to just let this guy complete in the fullness of time here so any questions on the use of tableau that’s a tool to blow is yes right yet the the export functionality of a peploe on is pretty good for my experience I mean we’ve we’ve used it to produce a lot of visuals which we’ve been exported in one form or another um what I what I know we’ve also made use of is the ability for for tableau to import data from a variety of sources for the exporting arm I’m trying to remember wind-chill did you did you do some interfaces between tableau and are at all for exporting from tableau to our yeah yeah so it could be read from an excel file could be read from from many types of processing software by SAS by SPSS by our aput from chrome tableau yeah I’ve done that quite a lot quite heavily yeah yeah so for example I do box plots are very handy to do within this I’ve also done a lot in in our but it’s it’s very visual and very quick and can give you you know the ability to rapidly iterate um and then once once they’re obtained you can just do i do a screen shot at them yep Phil yeah I would do the analysis in our for sure that’s right said that’s right and i’m trying to think i’m pretty sure he remember any cases of it but i think you can annotate these graphs clearly handily with with additional text and so on so so rather than being static objects you can kind of put some information to accompany them that will be helpful you have any comments night or any of the tas Winchell have you placed on to the annotation textually or graphically on it on on tableau charts that’s right you get a good import from excel sheets

and yeah so in short you can you could use it to visualize and interactively explore data that’s originally in a excel file just you shouldn’t use it as the interface to modify that excel file oh yeah yeah yeah yeah in fact I’ll cancel out of this but you can keep yours going but let me let me show you briefly how to do that so if i wanted to do that here what i could do is depend on smoking status and here i will i will drag a start time start time here and do this as an exact date and i’ll drag creator ID to to the rows here um and one thing i could do would be to define and this is where it gets a specific i could define a new calc what’s called a calculated field and thus calculated field be like smoking a duration on now I have to be careful here because this is graphed out in days you’ll notice April 2012 20 second so i have to be a little bit careful to do this in days here and so what i could do is date diff and I’m trying to remember I think it’s minute um I want to calculate its precise sort of time and but in units of days and I’ll do start date would be on start time and and then I’ll do stop time and and then oh and that will calculate the number of minutes and i’ll divide by the number of minutes in a given day and that would provide for me a smoking duration which I could then use to to sort of indicate the end and yeah so um and then I will um write um and I will I could go and I could convert this this is a Gantt chart there oops on I want to do this by my exact date and then I could drag in the smoking duration and I could use it to set i think i do it here to the size on and i will set the average of this duration as being sort of that the size of this and so what that would mean is so social some of these you’ll notice these are now measured in sort of fractions of a day that they spend smoking yeah so so that’s that’s how i do that ok ok any other questions so it’s free / students yeah I think um good good question i would i’d need to learn more about Kepler’s pricing model and I’m not not certain about that but I would be surprised what was that different I know for example staff says this very what is it per year per seat which is really expensive and i don’t think that this adheres to that that sort of model um what I do know is they have they have a version for instructors if you’re going to use it for classes they have a version for on students which I think it’s geared towards education but as you say this kind of its fuzzy area for

research yeah so um so time time is going on and we need to move to two other items but I will say that you know tomorrow is really really good for interactive visualization if you have data coming in over time you can refresh the data and it will do the query and and you know put out the new data um it’s very good for sort of drilling down in getting getting an understanding of some of the patterns you can actually go get you know item information on particular data items here and and in general it’s a very flexible and accessible tool on there’s many features obviously I don’t have time to to explore you can drill down to particular data items that are shown in terms of their specifics this is not a tool that I would use as the kind of final you know the tool port that i use for for final statistical analysis for that i would draw data are and we do have examples of SQL queries where you can within our suck data from these databases potentially with joins like this one and you can then manipulate the data as data frames in our so you can treat it as you know a large a large set of data as you would data from a spreadsheet or what have you you can also write it back to databases from our although we haven’t done that on mostly we we just analyze the data originally drawn from the databases and there are some ways I’m told in our although I haven’t used them to do interactive queries against data frames this is what I was hearing yesterday so you can actually ask SQL queries to select subsets of the data from a data frame SQL is an extremely powerful language that’s a language that I’m going to be teaching to students which include Cheryl and maybe others in the rum this this fall who are not from computer science background but want to learn this but in general it’s a fairly accessible language that can be used within or to drag the same sort of data in and do your final visualizations or it’s also the tool that we use for a lot of the machine learning analyses such as those that William is working on their assiduously in the back which narges in and wind chill appalled so worked on so and in fact Rahim as well so anyway um bit a glimpse of tableau some high-level comments on our any final questions before we move on mumble um I’m yep for GIS data yeah um I am yet that yeah so it I get your question oh yeah very interesting so yeah one very attractive thing about Excel is oh sorry about tableau is there actually even though it’s it’s you know drag and drop in point and click and and so on it actually is highly customizable so for example you can do a new custom SQL string and I can do you know select you know star from UM from PP Wi-Fi left join with with whatever and and input in

place custom SQL queries I can also with some of these these sort of components that they do a fair bit of customization so I wouldn’t rule it out yet but I haven’t seen anything about gif shapefiles I have to say it does seem to be able to import you know from from different sources but I can’t say anything about about the exporting um yeah yeah but for the top of the closet we want today to start spoiling him we spent between our exploration I think that’s kind of where it is supposed to be an interactive exploration once you think you’ve got a handle on what’s going on but by the way Sherrill or someone asked about annotation you could see that here so annotate you can annotate a point or you can annotate an area and I just I just put in place that annotation so it it’s a fairly s kevin was saying it’s fairly flexible in terms of you know writing custom text for different regions he can you can color it formatted etc and in put it in place so so there’s a fair bit of customizability there once you drill down there’s also a fair bit of of ease of use this is a product which has been around for a while and which has gotten refined over the years it has heavy use in industry right now and I think industry forms the bulk of their their support financially it’s stamford wasn’t okay so that was a glimpse of a