41 best Statistics jokes images on Pinterest | Statistics, Math humor and Psychology
Statistics Notes, Math Help, Fun Math, Maths, Standard Deviation, Math About Be greater than average Math Puns, Math Humor, Calculus Puns, Algebra Humor , Scatter Plots Statistics Math, Gre Math, Relationship Meaning, Scatter Plot. Most of these jokes were posted to Usenet news groups. People The great majority of people have more than the average number of legs. Amongst the . Thou shalt not infer causal relationships from statistical significance. Because control vs. sample, standard deviations, normal curves. Why would it be difficult to guess the direction of the relationship? Labels: central limit theorem, Dilbert, humor, mean, meme, normal distributions, p-value . Spearman's sports spotify SPSS standard deviation standardized data Star Trek.
Likewise if you were looking for your average rising time and woke up at 11pm, 12am, 2am, and 3am you would use -1, 0, 2, and 3: Don't conflate inability to work with time properly with the confusion surrounding statistics. A better way of dealing with this would be to measure amount of time asleep and awake going to bed at 1am after 48 hours of being awake is not the same as going to bed at 1am after being awake for 16 hours.
Owens not verified on 24 Mar permalink Actually, Chas. Owens, Peter does know what he's talking about. It is a statistical problem, and there are whole books on the subject.
I'm having a hard time finding a good introductory source on this, but here's the wikipedia article. Log in to post comments By Dave not verified on 24 Mar permalink To Chas Owens I was using that as an example of how you can go wrong with the mean.
Whether you say the problem is about "working with the data properly" or "confusion about statistics" is, to me, irrelevant. Your method is one way of getting a right answer. It's even more complex than I thought.
36 best Statistics Jokes images on Pinterest in | Jokes, Statistics and Accounting humor
Chas Owens solution which is what I would have suggested will, I think, work in most cases. It bogs down when the angles that is, times are uniformly distributed: But if people generally go to bed around the same time which seems likely then I think the methods are roughly equivalent, but right now I don't have time to check.
By Peter not verified on 25 Mar permalink "The mean is a tricky thing. It's not nearly as informative as you might hope.
A very typical example of what's wrong with it is an old joke: The basic reasoning is that the probabilistic expectation for samples is that they'll be narrower than the full population. Using the "N-1" denominator is a compensation for that.
Log in to post comments By bill not verified on 25 Mar permalink Much appreciated, thank you! I've never studied the formal derivations of the sampled standard deviation, so I may well be wrong. My father, when he taught me this stuff, told me that it was purely an empirical thing.
The fact that the sample is likely to be narrow should be sort of clear: That's what narrows the standard deviation. So the fact that some correction will help describe that should be fairly obvious.
When you do a linear regression, the denominator in the unbiased estimator of the variance is N-p, where p is the number of parameters being estimated. Estimating the mean in the manner described by MarkCC can be viewed as a special case of linear regression where there is only one paremeter being estimated -- hence N You can calculate it if you've got the mathematical chops. I don't have the chops, but I get the same formula from Bayesian posterior expectations Here's the intuitive explanation which I got from David MacKay 's book.
When you estimate the distribution mean using the sample mean, the estimated mean minimizes the sum of the squares of the residuals SSR. Any other estimate of the distribution mean would give a larger SSR -- and in particular, the true distribution mean would give a larger SSR. The denominator N-p exactly counteracts in expectation the shrinkage of the SSR. This is what people mean when they say that you use up a degree of freedom estimating the parameters.
Log in to post comments By Canuckistani not verified on 25 Mar permalink edit: I misspoke, I should have said "Don't conflate inability to work with time properly with the confusion surrounding what the mean average and other statistical functions mean. Another example of meaningless input could be mean "running shoes", "socks", "slacks", "underwear", "shirt" to try to get the average price of the clothes a person is wearing.
In this case it is obvious that the the understanding of the data is at fault, not the understanding of the statistical function being used because they don't look like numbers like time does. Like the time problem, this is not an issue of the statistical functions producing data that is not very enlightening about the population as is the case with the salaries from the articlebut rather a problem of how to represent the data in such a way that the functions can operate on them.
The time issue would be a wonderful thing to bring up if the article where about the GIGO rule, but this article is about what the various statistical functions mean and how to use them to get information about a population. Owens not verified on 25 Mar permalink Chas Peter's example was entirely appropriate for the article. If you read the article carefully you will see that MCC uses the example of Bill Gates walking into a homeless shelter to illustrate misuse of mean values.
Peter's bed time example leads to similarly humorous results. He knew full well that it was a silly way to compute a mean. The average of "running shoes" and "shirts" is, as you point out, obvious nonsense. The average time going to bed is perfectly meaningful, and, as pointed out in another comment, your solution which, I admit, was mine too wasn't even fully correct.
Is this a data problem? But, it's also a problem with understanding what the mean is. Bill Gates walking into a homeless shelter makes the mean increases the standard deviation, but the mean is still correct.
Trying to take the mean of 11pm and 1am produces garbage if you naively average 23 and 1 producing and average of 6. It isn't a matter of the result having a large standard deviation, the result is pure garbage because the inputs where pure garbage as bad as my example with the clothes. The data just doesn't look like garbage because they are numbers. The example from the article shows how people abuse valid results, the time example is not a valid result.
Owens not verified on 25 Mar permalink Peter: The clothes example suffers from the same problem as the time problem: It is entirely possible to find out what the mean average of the cost of the clothing a person is wearing, but first you must convert the names of the items of apparel to their monetary value. Both problems have nothing to do with the mean, except that when the mean is presented with garbage its output is also garbage. Owens not verified on 25 Mar permalink Why is the sum of squares used in the standard deviation instead of the absolute value?
I have wondered this a long time, but no one has been able to give me a good answer.
A measure called the standard deviation from the mean will be bigger when the numbers are more spread out. Lots of results will cluster within 1 standard deviation SDand most will be within 2 standard deviations. From here, it's a hop, skip to another calculation based on the mean that you often come across in health studies. It's a way to standardize the differences in means average results called the standardized mean difference SMD.
The SMD needs to be used when outcomes have been measured in similar, but different, ways in groups that researchers are comparing. For example, there are several scales used to measure fatigue in people with cancer. When researchers wanted to find out whether exercise reduces or increases fatigue for people with cancer, the clinical trials of exercise they found used different scales to measure fatigue. To get a perspective on the results of these trials, the SMD gave them the tool they needed to standardize the result from each trial.
Having one standard way of seeing whether fatigue went up or down, meant the study results could be combined and compared. Exercise reduces fatigue in people with cancer.
There's a lot you can make sense of when you know what the means mean! The SMD is calculated by dividing the differences in the mean in two groups by standard deviations. Feel like testing your knowledge of the mean, median, and mode?
The mode is the number in a set that occurs the most often: