Making Sense of Statistics Sidney Tyrrell Coventry University Making sense of Statistical concepts

Mean and median Probability Standard deviation Expected values Data analysis

Sampling The value of charts Tricky tables Hypothesis Testing How would you describe this collection of stones?

Measures averages You have just looked at some of these. What about the mode? Need it if you want to describe colour. All are summary statistics. What dont they tell you about the stones? Measures standard deviation The challenge

To calculate the standard deviation of 51 53 51 53 in under 3 minutes Knowing nothing except Standard deviation measures spread. Measures standard deviation Measures spread 0

2 0 2 Deviation from what? The mean which is 1 What is the deviation of each of those numbers from the mean?

Measures standard deviation Now look at 1 3 1 3 Standard deviation measures spread. Are the numbers more or less spread out, or have the same spread? Whats the standard deviation?

1 3 1 3 has a standard deviation of 1 Add 50 to give 51 53 51 53 whats the standard deviation? Measures standard deviation Double the numbers: 2 6 2 6 Standard deviation measures spread.

Are the numbers more or less spread out, or have the same spread? Whats the standard deviation at a guess? Measures standard deviation What about 6 6 6 6 ? Standard deviation measures spread. Whats the standard deviation?

Measures standard deviation What next ? Start with easy numbers: 2, 3, 7 Attach a meaning and units Time spent brushing ones teeth You can think of a better example. Measures standard deviation Measures spread Deviation from what?

The mean, which is 4 minutes. What is the deviation of each of those numbers from the mean? Measures standard deviation Deviations: 2, 3, 7 each minus 4 gives -2, -1, 3 Standard deviation implies an average Theres a problem! The average deviation is 0.

Coincidence? Good teaching point. Measures standard deviation To get round this we square the deviations which gives 4, 1, 9 The sum is 14 and the mean 43333 Hold it the units are square minutes. How does this measure a spread?

Take the square root Measures standard deviation Measures standard deviation Use a spreadsheet to see what happens if you change a number. 2, 3, 10 minutes for instance gives a standard deviation of 6.16 minutes.

One value can have a large effect. Data analysis is about letting the data tells its story Use real data that students can identify with eg CensusAtSchool. http://www.censusatschool.org.uk/ Is closed for time being but resources still there. See examples in my resources.

Data Stories http://askten.co.uk/newsletter A weekly bulletin of 10 interesting facts. Mark Carney to examine soaring credit card debt. Credit card debt is rising at its fastest rate in more than a decade, says the Bank of England, increasing by 9.3% in the year to February. Households slapped 600m on

their cards last month and now owe 67.3bn. How do they know? Data Stories http://askten.co.uk/newsletter Bankers 'to blame for weak UK productivity'. The Office for National Statistics says that just 5 sectors

are responsible for two-thirds of the decline in productivity growth: bankers, telecoms companies, energy producers and management consultants and legal and accounting services. The Independent How do they know? More real data Full fact https://fullfact.org/

explores the facts behind news stories. Significance magazine: http://www.statslife.org.uk/significance Mori Polls: www.ipsos-mori.com/ Neighbourhood Statistics (brill!) Data should prompt questions

What would be interesting to know? Off syllabus answers should be permissible. First of all LOOK at it ! Anscombes data sets Looked? Now chart. Why chart?

To help us understand the data. Sometimes a chart tells us far more than just computations. Frank Anscombe produced 4 data sets; data sets that were described by the same linear model Was this what you expected? 10

15 10 5 5 0 0

0 5 10 0 15

5 10 15 14 15

12 10 10 8 6 5 4

2 0 0 0 5 10

15 0 5 10 15

20 Anscombe had a moral Whatever the figures say The model may not be appropriate. Here is it only appropriate in the first case. Thats why we check for a pattern in the residuals.

Regression- an aside Do they understand the equation of a straight line? Even if they did have an A* at GCSE. The value of charts. Graphical excellence consists of Complex ideas communicated with

clarity, precision, and efficiency. Gives the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. It tells the truth. Edward Tufte What chart illustrates all this? Found in the newspaper.

What is it? Graphical excellence consists of complex ideas communicated with clarity, precision and efficiency. Consider for a moment just how much information is here.

And a few more Napoleons retreat from Moscow Minards chart Dr. John Snows map of cholera deaths 1854. The small lines

represent deaths and were centred round the Broad Street pump. Florence Nightingale Cockscomb chart The Challenger Disaster

28 January 1986 Time for a break Look at your bottle of water. Find the e symbol. What does this mean? The E (e) mark Weights and Measures (Packaged

Goods) Regulations 2006: These Directives set out three rules with which packers must comply: First rule the actual contents of the packages should not be less, on average, than the nominal quantity;

Rules 2 and 3 the proportion of packages which are short of the stated quantity by a defined amount (the tolerable negative error or TNE) should be less than a specified level; and no package should be short by more than twice the TNE. The Tolerable Negative Error

How does the production manager cope? Using sampling. Whats the general idea? How much of the picture are you getting? Look at the jigsaw pieces. The jigsaw How much did you get?

Compare the jigsaws What are the problems with sampling? Important for the pollsters! The best sample is a simple random sample which means that every item

has an equal chance of being chosen and so does every subset of your items. A good example 20 people in a room We want a random sample of 2 people A random sample of 2 people would include

the possibility of 2 men. If we randomly select 1 man from the 10 men and 1 woman from the 10 women Each member of the population would have an equal chance of being chosen. But we would never get a sample of

2 men. Random numbers 13962 70992 65172 28053 02190 83634 66012 70305 66761 88344 43905 46941 72300 11641 43548 30455 07686 31840 03261 89139 00504 48658 38051 59408 Can also use for simulation. Different types of samples:

Simple, Systematic Simple, Stratified, Cluster, Will you get the same answer each time? Even if you use the same technique but start in a different place? Unitown exercise. Unitown Spreadsheet of responses for each address.

Enabling you to carry out a survey of the town using different methods. I have a hypothesis I need to collect some data And then evaluate the evidence Here goes: Choose and write down one of the following numbers:

1 2 3 4 Collect the data The hypothesis was: More people choose 3 than any other number.

What do you think? What does the evidence suggest? How do you make up your mind? Heres one collected earlier. What does the evidence suggest? How do you make up your mind?

What about this one? What does the evidence suggest? Can you make up your mind? This illustrates the basic steps State the hypothesis Collect the data Evaluate the evidence

The hypothesis test Uses the evidence to decide whether the null hypothesis can be accepted or not. A good analogy is . An English court of law: The defendant is assumed not guilty until

there is sufficient evidence to find otherwise. And there can be Mistakes! Which is why statisticians are picky Do not accept rather than reject. Do not reject rather than accept. Different tests A good example for stats tests is

clinical trials data for a weight reduction drug. See Excel 2013 stats tests.xls for the data and step by step instructions for the tests. Significance level A practical example: ESP extra sensory perception using playing cards.

95% Significance level Look at the squares. Any idea how many? 126,000 The yellow bit of paper is 5% The Salk Vaccine Trial A problem to be solved a challenge.

An experiment to test the efficacy of a new polio vaccine in the U.S.A. Biggest public health experiment ever: the 1954 field trial of the Salk Poliomyelitis Vaccine. The Salk Vaccine Trial: test the efficacy of a new polio vaccine What would you do? Resources almost boundless.

Difficult to predict the next outbreak, where and when. The Salk Vaccine Trial: test the efficacy of a new polio vaccine How will you set up a control group? How large will your sample be? Are there any ethical issues? What happened

What happened Interpreting the results. Using tables. But tables can be tricky. 2 stories might help The first: the Titanic survivors Tricky Tables Titanic data

what % of survivors were 1st class? Class Survived 1st 2nd 3rd

Total No 15% 19% 66%

100% Yes 43% 26% 31%

100% Total 25% 21% 44%

100% Tricky Tables Titanic data what % of 1st class were survivors? Class Survived 1st

2nd 3rd Total No 40%

58% 81% 66% Yes 60%

42% 19% 34% Total 100%

100% 100% 100% Not the same questions Be clear in your own mind what you want to compare.

Characteristics of passenger classes? Characteristics of the survivor groups? Do men exercise more than women? Put each group in a separate room in your mind You want %s to add to 100% for that room.

Tables are tricky what % of men are very active? very active not very active total men 75%

25% 100% women 50% 50% 100% Very active Not very active

men 86% 68% women 14% 32% total

100% 100% What % of the very active are men? very active not very active total men

75% 25% 100% women 50% 50% 100% Very active

Not very active men 86% 68% women 14% 32%

total 100% 100% This is not the same question Be clear in your own mind what you want to compare. Characteristics of men v women? Characteristics of the activity level

groupings? Look at the tables again men women very active 75%

50% not very active 25% 50% total

100% 100% men women total

Very active 86% 14% 100% Not very active

68% 32% 100% Do men exercise more than women? This brings us to probability

hypothesis testing, sampling, and the second (simple) story. Probability Corks and long term relative frequency Expected values Coincidence?

Expected values Choose 1 of 3 boxes 1 10p 1p (total 111p) Expected value is 37p What would you expect to get? Coincidence ? a card trick see the

end of the slide show Now look at the beans There are 5 bags of a sample of 30 beans. In statistics this is considered a large sample. Suppose your hypothesis was that in

the population of beans 30% were white Which sample or samples would support that hypothesis? The 10%, 20%, 30%, 40% or 50%? Would those samples also support a different hypothesis? This is a problem!

How can we improve our confidence? The answer is to increase the sample size. Now for the normal distribution And The Central Limit Theorem The knock out of statistics

And here We used DISCUS sampling which is in the resources. The End (but the card trick follows) Thank you. Coincidence ? The card trick follows. Is it a coincidence?

Algebra shows not! Spread the 52 cards out face down Now to use your ESP Person 1 picks up 8 cards which they think are black DO NOT LOOK AT THEM Person 2 picks up 7 cards which they think are red

Person 3 picks up 6 cards which they think are black Person 4 picks up 5 cards which they think are red One person Takes those cards Shuffles them and holds on to them. Leave the rest face down on the table. The person with the shuffled

cards Turns one over and place face up If its black someone else chooses what they think is a black card and places it next to it face down If its red choose a red card. Continue turning over the shuffled cards Until you have 4 piles

Red turned over + what you think are red face down Black turned over + what you think are black face down How many ? How many black cards do you think there are in the face down pile next to the black cards? How many red cards do you think there are in the face down pile next to the red cards? Count them!

Coincidence? A good exercise in algebra.