hi folks my name is Jeff cots and I’m here today to talk to you about continuous probability distributions if you recognize any of these slides it’s likely that you also recognize this textbook from which they whence they came it’s basic statistics for business and economics Pyland Marschallin way –then if you’ve been following along in this series that should come as no surprise moving on if you don’t have these slides you don’t have that textbook hopefully you’ll be able to gather something about the stuff I’m gonna show you so right now we’re talking about the things that we need to know the things that we’re gonna cover today are properties of the normal distribution so we can pick up where we left off last time we’ll talk about how to use the z-score formula and lastly we’ll start to introduce the notion of how you can take a raw scores well probability or proportion and how you can take a probability or proportion and go the other direction and find the appropriate raw score this is the type of stuff we’re going to be looking at for Chapter seven to let you know this is gonna come in two parts the first part is going to explain some of the theoretical stuff the second part is gonna talk a lot about examples and all the different types and to reinforce the idea that you can take a raw score to a probability a probability to a raw score and that’s something that’s sort of evolved out of the years I’ve been teaching this topic exactly how procedurally to take information so small as a raw score and yet the answer that we need and that we always have to go through calculating a z-score and using the z-score tables to find the area associated with it a lot of times that part of the process gets messed up a little bit but hopefully I’m going to start laying out procedures it will make a lot more sense to you so where are we left off in chapters oh gosh let’s say three we introduce the notion of the empirical rule now what the empirical rule says is that if you have normally distributed data like this picture here you can use couple of cut-offs one standard deviation two standard deviations and three standard deviations away from the mean to help understand how frequently values in that distribution occur for example 68% of all values of all respondents of all things whatever 68% of all stuff is within one standard deviation below and one standard deviation above the mean and that’s what you can see right there where it says mu that’s your mean value and then you can add a standard deviation or subtract subtract a standard deviation that’s 68% of the data we talked about this a 95% rule is two standard deviations and practically all or three standard deviations gets covered at ninety-nine point seven percent as three standard deviations so we’ve seen the empirical rule before but right now it lacks a lot of precision it’s great for shorthand estimates if you know your data is normally distributed you can give you an idea of how likely something is to occur but it doesn’t get all the precision that we need now here’s an example so as part of its quality assurance program the Autolite battery company conducts tests on its D cell alkaline battery I’m gonna go ahead and say that’s an ABCD sort of thing thankou but we find out that the mean is 19 hours then on average batteries lasts about 19 hours with a standard deviation of one point two hours questions about 68% of batteries failed between what two values about 98% of batteries failed between what two values and virtually all the batteries fail between what two values now that’s to say at some point each of these batteries has a lifespan and where it ends well that’s how long it lasts so it’s its failure how long it lasts we go ahead and use the empirical rule here we can see that number one about 68% fall within sixteen point eight and twenty point two hours that’s just taking the mean and subtracting one standard deviation or adding one standard deviation 95% fall between two Center da and practically all within three standard deviations fifteen point four to twenty two point six with all of these we’re using percents percents are decimals decimals are less than one we can use percents whether we derive them empirically for empirical probabilities or if we can actually demonstrate the normality of the data with the picture like here we’re able to were able to take that to another step being there

less than one in their percents we can derive an empirical probability we can use this for predictions based on how likely these certain values are going to occur the connection between the proportion or percentage here and the probability is a very crucial one we’re going to use those in continuous probability distributions which is the crux of chapter 7 the normal distribution itself looks like this it’s the bell-shaped curve we’ve been discussing this now for a couple of chapters the normal curve is symmetrical with two identical halves on each side theoretically it sends out to infinity those tails are asymptotic they never actually touched the x-axis but they get very very close and if you have the mean or if you have a normally distributed data set like this you know that the mean and the median and the mode all three measures of central tendency are the same they will divide the data in half median be the highest point mode and it will balance mean all three central tendency points are the same than normal distribution however there’s a whole plethora of normal distributions and they can differ on a couple of characteristics first of all their standard deviation can differ like we have here we can see that they all have the same mean of 20 years of service right there but for the one in red the Camden plant that’s a very small relatively speaking standard deviation their standard deviation is small three point one compared to three point 9 and 5.0 we can see that we end up with a narrow or leptokurtic distributions let’s leptokurtic which means it’s very peaked and pointed unlike the blue one here which is the Elmyra plant it has a larger standard deviation so it’s much flatter we call that platykurtic somewhere in the middle we have the green curve that standard deviation is between the leptokurtic and the platykurtic so we’re gonna call up the Meza kurtik get somewhere between leptokurtic and platykurtic a couple easy mnemonics for that type of stuff is that well first of all mezzo starts with M and it’s in the middle all right mezzos in the middle platykurtic plat sounds like flat or you can think like a platypus which is one of those flat animals with the beak and the beaver like tail and I think it lays eggs but it’s a mammal anyway it’s flat so plat flaps lot of critics bottom and then leptokurtic is the narrow one it’s like you’re leaping or you’re jumping you’re trying to get up those are the three types of curves the curves here the big takeaway is this the smaller the standard deviation the narrower your curve if you have a large standard deviation your data is very spread out if you have a small standard deviation data is very close together and you end up with a leptokurtic curve like the one in red here not only can they have different shapes by the way the only thing that affects normal curves shape is its standard deviation shape is determined by standard deviation position however is determined by its mean you can see across the bottom these three curves have different means by their shapes you can tell they have different standard deviations but their position is determined by the mean or we can hold that shape constant and have them all have the same standard deviation where the only thing that differs between them is their means which determines their positions I mentioned earlier that the crucial point of the person being a decimals that allows us to make predictions and it makes predictions about proportions well in a normal probability distribution like a the probability distributions we talked about in Chapter six a lot of the same things apply but in the case of a normal probability distribution first of all it’s bell-shaped and we’ve been discussing that about the normal distribution for chapters now a two its symmetrical hey that’s a normal distribution property two we’ve seen that before it’s asymptotic we talked about that earlier the mean median and mode are all equal we discussed that as well the total area under the curve is one now this is like saying that all possible probabilities that we can use the normal curve to predict exists everything that can possibly happen whatever point you select along the x axis that determines all possibilities their likelihood is the vertical distance what’s going to occur is across the x axis so all possible occurrences are under the curve which is why the sum

is 1 and the last one is a sort of a corollary of having a symmetrical distribution where the total area is 1 all the area to the left is 1 all the area to the sorry all the area to the left is 0.5 all the area to the right is point 5 these are properties of the distribution and they’re gonna be pretty useful when we start doing examples the standard normal probability distribution is so important what it allows us to do is to take a data set whatever that data set is and convert it to a standard unit now those units are Z units represented by the formula down there the standard normal distribution is a normal distribution a bell-shaped distribution where we define two things about one is that the mean of the standard normal distribution it’s always zero zero is in the middle the standard deviation is 1 that’s how dispersed or how spread out the data it’s once we define normally distributed data with a mean of zero and a standard deviation of one we want to convert whatever data we have to that distribution why because it gives us a fantastic means of comparison if I know that whatever my data set is if I standardize it if I put it in terms of Z and you tell me oh yeah the z-score was zero I know you’re talking about the mean if you say the z-score was positive 1 I know that you’re talking about something that is greater than the mean at a certain distance if you tell me that a z-score was negative 0.5 I know that it’s less than the mean and by a small amount these are things that we’ll learn about the standard normal distribution defining the characteristics here 3 is pretty important and those were the z-scores I was just talking about first of all it is a signed distance signed means that it’s positive or negative z-scores can have both positive and negative values this isn’t like probabilities where they can only be positive your Z scores can be negative if their while talking about a raw score that’s less than the mean so the Z value is assigned distance between the selected value designated X so we’re gonna use some x value you’re gonna tell me some raw score X and now I’m going to generate from it a signed distance that sign distance is the z-score positive or negative and how far away its Z values of some distance from select value X and the population mean mu divided by the population standard deviation Sigma and that’s the formula down there that’s it there’s only three parts to it well the first thing I noticed is that if I choose an x value that’s the same as my mean if X and mu are equal if I subtract them from one another one from the other I end up with zero as the numerator for my Z formula 0 divided by anything zero given a nonzero standard deviation but anything zero divided by anything is still zero so if I choose X as the same value as the population mean the z-score is zero which means it’s not different from the mean second thing I notice is that if I have a let’s say relatively large difference on the top relatively how do we know well relative comes from the standard deviation so the relatively large difference on top could be a difference of a hundred let’s say the mean for my particular dataset is a hundred I’m talking about IQ scores where typically the mean is determined to be a hundred and the standard deviation is let’s say fifteen now if I have a difference on top someone has an IQ of 300 I don’t even know how that would be possible 300 minus the mean of 100 gives me a numerator of 200 now is that a big difference or a small difference I don’t know I divide it by the standard deviation and then it will tell me whether or not that difference is big or small that numerator difference whatever it is if I divide by a big number and then that large difference is presumed to be relatively small I have that big number on top and I divide by a small number a big number divided by a small number is still a big number that thing I divide by and that’s Sigma on the bottom tells me how spread

out my data is so if I have a difference on top whatever it is and I have a small standard deviation on the bottom whatever it is any number divided by a small standard deviation is gonna give me a big z-score in sum the Sigma across the bottom that we divide by tells us about how dispersed our data is if you have a small Sigma that means your data is not very spread out that means most things are the same or at least similar and then if you have something that is very far away from the mean it is not typical because I determined typical to be small standard deviation therefore a big difference divided by a small standard deviation gives me a large z-score indicating that it is far away from the mean it is a distance or if everything is let’s say very spread out our standard deviation is very large it is typical for things to be far away from the mean so it’s a large denominator a large standard deviation whatever difference I have on top divided by the fact that everything is mostly spread out is likely going to lead me to smaller z-scores typically so that what we might have thought was a big difference in another distribution perhaps a distribution very narrow in terms of its standard deviation a large difference there makes for a big z-score a large difference where things are typically spread out not a big deal occurs quite frequently in fact these are properties of the z-score the more you use the z-score use this formula the clearer what those properties are that’s why there’s so many examples in Part two if we wanted to do this particular example to actually use z-score we can see how that works here so here’s an example weekly incomes of shift foreman in the glass industry follow the normal probability distribution with the mean of $1000 and a standard deviation of $100 what is the z value for the income let’s call it X of a foreman who earns $1100 per week so we’re gonna say X is $1100 and that’s the first example we have right here we put X in for $1100 in for X put $1000 and from you and $100 in for the standard deviation we do the arithmetic boop-boop and then we’ve got a Z value of one that means that $1100 is 1 z unit or one standard deviation above the mean above the mean yeah because it’s positive it’s a signed distance $1100 is one standard deviation above the mean $900 on the other hand is still wants to enter deviation away but this one’s negative so we can have positive distance or negative distance from the mean that’s the basic z-score example here and we come back to this drawing which you talked about earlier which was the empirical now look at that we’ve got these two scales here we’ve got a scale of X and a scale of Z scale of X is in your raw units your dollars your hours your people’s here whatever those units are that’s what this is so that thousand dollars was the mean the $1100 is over here $900 is over here those are all in the X units I talked about how important it was to convert to Z units that’s the standard unit well that’s down here notice that 0 is that the mean that’s the property of the Z and that we’ve got units down here one standard deviation two standard deviations that one corresponds to that one that 2 corresponds to that to that negative three corresponds to that negative 3 in other words when we convert things to Z units all we’re doing is talking about how far away something is from the mean in terms of standard deviations sure this is if we go back to the form an example this is $1,000 we go over here and it’s $1,100 because we’ve added one standard deviation well why don’t I just go ahead and say it’s want

or it’s negative 1 instead of $900 this could be particularly useful if let’s say there’s another industry like the steel industry where we find a similar mean but a different standard deviation or a different mean and a similar standard deviation whatever the case may be we can’t expect steel workers to make the same amount of money as glass workers however within each industry we can still say that someone who earns $1100 makes more money than this many whatever this number ends up being 84 percent makes more money than that many of his or her comrades where we go to the steel industry someone might make more money in terms of dollar amount but relative to their peer group they might not make more money than as many people like we said if someone is one standard deviation above the mean they make more money than 84 percent of their peers that’s the reason that the Z unit down here makes for such great comparison between data sets Z values are more specific than the empirical rule and those things I was just talking about in terms of the ones the twos and the threes and stuff that the mean value of X will always have a z-score of zero scores that get a z-score or rather z-scores bigger than two are a bit suspicious and z-scores bigger than three are a bit strange the reason is how different they are how far away they are from the mean they’re not impossible they’re just things that might need some investigation similarly z-scores less than negative 2 less than negative 3 same stuff now Z is assigned distance how far away the sign indicates whether it’s full above or below the mean but ultimately it’s the absolute value or how are away from the mean we’re investigating this is another example that extends from the problem we were just talking about in terms of the glass factory for men for people’s performance so if I wanted to ask the question alright if I know that about someone who makes $1,100 makes more than about 84 percent of their peer group by the way that number is a number I knew it we’re gonna figure that out you’re gonna figure out how to figure that out later in the case of this question we’re trying to find the answer to what’s the probability if I select one of these four men at random that this person makes between $1,000 and $1,100 that’s a question we might ask hey we asked it how do we answer this question let’s think back to when we were talking about discrete probabilities this comes from chapter 6 if we wanted to let’s say determine the likelihood that between 1 and 3 flights inclusive would be late all we’d have to do is add up the heights of each of these bars random variable X across the axis indicating zero late flights 1 late flight 2 late flight the height of each bar determines the probability well because it’s discrete I know I just have to add up one two three things I had up those three things and I know the probability of a certain range of late flights occurring I can do that because it’s discrete because they’re just squares they have Heights and I just add up three things but I can do it because it’s discrete not continuous continuous I don’t have individual bars that I can measure their heights what’s a boy to do as it turns out you just make those rectangles very very very very very very narrow so narrow in fact that a smooth curve can fit under them it’s called calculus folks and that’s where this comes from now we’re not going to get into the calculus because I want to use it for the probability aspect but suffice it to say changing from discrete to continuous is a matter of integration so we found the z-scores for all of these values in the previous example I know that this is a zero and this is a one so really I’m trying to find the area between Z score of zero and the z score of one the standard normal probability distribution has a definite shape that does not change its mean zero and its standard deviation is 1 standard deviation determines the

shape of a curve because the shape does not change given any two points along the bottom of the curve I can find the area that fills up between those two points even with this weirdo beautiful-looking curve up top I can do that too because that weirdo looking curve is always shaped exactly like that we have to use a table to find these answers well we don’t have to we could a do the integration and the calculus and do it that way I don’t have time for that or two we can do it in Excel and I definitely have time for that but for this third example we’re talking about the way those of you who have me for class might likely be seeing on an exam and that’s using this Z table the way you use this Z table is to first note that on the outside are the axes that’s the Z values over here we have the whole number first position as well as the decimal for the tenth spot across the top we have the hundredth spot this is an abbreviated table because there’s no point oh nine over here would go down below one point eight there but to use this table let’s say we wanted to find the area between zero and one well this table always starts at zero it always says that one of the boundaries is zero and I’m going to the right to whatever z-score you decide to select in this case I’m going over to 1.00 first I find one point zero on the left there we go second I find zero across the top zero is the hundredth spot I find out where those two things intersect and right there point three four one three is the area between Z equals zero and Z equals one in other words the area in that range between 1,000 and $1,100 is 0.34 1/3 that area is the probability that area is the proportion this means that there’s a thirty four point one three percent chance that a foreman selected at random will have a weekly income between 1,000 and $1,100 you go back to thinking about the beginning call back to the arrows we started with the raw scores the raw scores of 1000 and 1100 dollars we started with raw scores we converted those to raw scores to z-scores a z-score of zero and a z-score of 1 we use those z-scores to find the area under the curve we used the z-scores on the outside of the table to find the area under the curve all of these decimals inside the table all indicate areas under the curve which is good because they’re under the axis they’re under that curve it’s straight but whatever it’s under there all those things under there underwear yes under there are the areas the outside indicates the position so we take the raw scores we convert them to z-scores we use the z-scores to find the area that area in here is the probability is the proportion because thirty-four point one three percent that proportion is under the curve there that’s also the probability because the area under the curve is one so 34 percent are between there and that also it’s probability I’ll use probability and proportion to mean the same thing white frequently so that’s going from raw scores to probabilities hey we just got there if we didn’t want to use the table we could use Excel there’s a couple of different excel functions this one here is normdist I also believe norms disk might also work or norm dot dist there’s a few different values or a few different functions they all basically have the same arguments which is I need an X value I need a mean and I need a standard deviation in some circumstances you might find a cumulative property or you just find not cumulative you’re just doing forget what the term that Excel calls it they you have different terms but generally speaking you want to accumulative anyway because you’re looking for areas so that’s how Excel does it but when we do it we do it in class there’s pretty specific procedures that I want people

to follow the first step always draw the picture I want you to draw a normal curve I want you to draw something doesn’t have to be perfect but it needs to have let’s say four elements first of all should have curve second of all have an x-axis third it should have the line that marks the middle fourth you draw the mean in here so make sure you have all those points start by drawing the curve we’ve got a draw it every time because there’s so many different scenarios that we’re going to be converting to standard probability standard normal probability distributions I’ll show you so the area under the curve we always use our Z table that’s where you’re gonna have to do it I think for exams as well yeah I know for example the area under the curve is always the area between the mean and whatever Z value you select always between the mean and the Z value or the x value X you convert to Z however there are so many different scenarios that you can get them pretty easily confused we just saw an example of all right we’ve got one value as the mean the other value is you know some center deviation away that’s fine I know how to calculate that that’s the number the table gives me we might have to go to the left side of the axis and know that we’re subtracting the standard deviations not adding them maybe we want to find the area just in this little tail not the body maybe we need to get this sliver between two chunks the textbook the table is only going to give me that amount as well as that amount in other words the area captured between zero and two fifty and the area between zero and 150 in order to find this sliver we need to subtract them or heck we might have some on both sides of the mean but we have to use different Z scores for each side it’s absolutely wrong in this bottom example here to say I could just look up three point six and Z table because one point six plus two point O is three point six and that’s how many Z units are there absolutely the wrong don’t do that we need to calculate each side on its own because of the shape of the curve sometimes we’ll even be given a proportion and that proportion cuts off a certain portion of the tail and I want to know what x value or what raw score will determine this proportion that’s a circumstance where you start with the proportion and then you have to work backwards to find this area that area we use to calculate a Z and that Z we use to calculate a raw score so sometimes we go from raw scores to probabilities and proportions sometimes we use proportions to determine raw scores we can go in both directions if we want to go from a raw score to a probability of proportion always draw that curve first right the process right the prime right this thing out so you know what step you need to do and draw the curve that’s so important on every problem just get into the habit of doing it it’ll keep you in the mindset of I know what I need to do next I know what I have and so on and so forth so right the process draw the curve second or third at this point you extract the essential information from the problem what do you have do you know X do you know meu do you know Sigma what sort of information do you have label what you can on that drawing so if I know the mean is 43 I should put 43 right here because I know it’s in the middle the other pieces of information for example X we should put relative to the mean if X is 40 we should put it on the left side X is 49 we should put it on the right side left and right tells us Direction Z is assigned distance that sign for Z is so important if it’s on the left side Z’s gonna be negative if it’s on the right side Z’s gonna be positive it’s important to label all that information on this drawing as you can the next step is to convert those raw scores to z-scores if you use the z-score formula you put an X you put in you put in Sigma and now you’ve got your z-score label the z-score along with wherever you put your raw score should be easy for the mean it should be easier than easy for the mean because the mean is always zero in terms of Z units once you have that once you’ve got the z-scores on your drawing you can use

those z-scores to go to the table and find areas that you need again in some circumstances it may be as easy as finding one area maybe you recognize it’s on the left side you got to do that maybe you’ve got to subtract something maybe you’ve got something on both sides whatever the case you’ve now got the z-scores and the z-scores are the things you need to calculate those areas calculate the areas look up the areas you use the z-scores to look up the areas and then the last thing you do is you take that area that you’ve looked up and do whatever copy you need to do whether it’s subtracting them like this one adding them like this one maybe you take this sort of compliment thing point five – it sort of compliment think not exactly compliment but sort of compliment we use those areas to answer the probability proportion again sometimes you have to add them sometimes you have to subtract them but once you get that far most of the work is done for you alternatively you may be given a probability or proportion it asks to find the raw score that cuts that off always draw that first right the process draw the curve write the process draw the curve write the process draw the curve extract the essential information and label what you can on the drawing as it turns out those steps are the same almost all plays in this case instead of labeling stuff across the axis down here you’re likely to draw one of these lines and write in the area that’s there or if it says let’s say cut off the top 5% you’re going to draw some line over here say that 5% of the area is in the tail that you cut off meaning 45% is between the mean and where you cut off the tail label what you can with your given information next color in the area that represents the proportion in questions so you know what you’re looking at on your drawing use that area to go backwards out of the Z table for example if in this example we were looking for an area in the table of 0.4 3 3 2 we might need to find that area sorry we look for an area we do our drawing first we draw the curve right the process draw the curve say that we’re trying to find 43 percent of the area in here so in our drawing we’ve written point 3 3 2 we would actually find that number in the body of the table and then go awkward to find out the Z value that’s associated with that area once we find the Z value that represents your area don’t forget to attach the plus or minus sign sorry please remember to attach a plus or minus sign once you have a z-score on there you can use the z-score formula again you put the z-score on the left hand side and fill in mu and Sigma on the right solve for X boom you’re done now you’ve got the appropriate raw score sometimes you have to find one sometimes you have to find a range of values whatever the case this is the process if you’re working backwards from a probability in proportion to a raw school that’s why you draw the picture every single time what we’ve covered today are properties of the normal distribution how to use that z-score formula knowing it’s essential elements and this process of taking a raw score to a probability or proportion or taking a proportion of probability back to a raw score that’s it for the first half going forward come back for chapter two and we’ll talk exclusively about examples about how to use these and all those different shapes of those bell curves with things that got covered in so I’m not going anywhere but I kind of am and I guess I’ll see you around bye