Main content
Course: AP®︎/College Statistics > Unit 1
Lesson 3: Representing two categorical variables- Two-way frequency tables and Venn diagrams
- Read two-way frequency tables
- Create two-way frequency tables
- Two-way relative frequency tables
- Analyze two-way frequency tables
- Interpreting two-way tables
- Interpret two-way tables
- Mosaic plots and segmented bar charts
- Analyzing mosaic plots
- Mosaic plots
© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Mosaic plots and segmented bar charts
We can display categorical data from a two-way table with a segmented bar chart or a mosaic plot. Created by Sal Khan.
Want to join the conversation?
- Are segmented bar charts used a lot?(0 votes)
- They are mainly used when you have to represent 2 data points at once, like adults and antibodies.(7 votes)
- How would you be able to show the number of people tested in an age group having no knowledge about the other graphs/tables? For example, if we didn't have the tables, how would we be able to find out how many total people were tested and how many people were tested in the children group?(2 votes)
Video transcript
- [Instructor] Let's say we're looking at some type of disease and we want to see if
there's any relationship between people having
antibodies for that disease and whether they are
adults, children or infants. And if you don't know what antibodies are, these are things that your
immune system keeps around so it's very easy to
recognize future infection. But you don't have to
worry too much about that for this video. In this video, we're just
trying to think about how we can visualize data to understand if there's a relationship
between having antibodies and the age of the individual. So let's say we go out and
collect a bunch of data. So we test 120 adults and 114 have antibodies, six don't. We test 60 children, 54 have antibodies, six don't. We test 20 infants, and then eight have antibodies and 12 don't. So we can just look at this data, but this really still doesn't give us a visual representation
of what's going on. One step we can take,
what's still doesn't give us a fully visual representation, is to just think about percentages that might help us think
about the likelihood of having antibodies. So if we calculate the percentages, we might see something like this. For example, 114 over 120 is 95%, or 95% have antibodies. That 114 over one 20 is 95%. And then the number that
don't have antibodies, this six right over here, that is 5%, six over 120. And you can do that for
each of the categories. 54 over 60 is 90%. While six over 60, you can do that math in your head, is 10%. And we can do the same
thing for the infants. Eight out of 20 is 40% while 12 out of 20 is 60%. So that helps us a little bit. It helps us think about, well, what's the percentage of adults that have the antibody
or children or infants? But if we really want to visualize it, we can look at two different
types of visualizations. One we can call a segmented bar chart, and I will show a segmented bar chart for this data right over here. Now in a segmented bar chart, we group, we have a bar for each category here and we're making adults,
children and infants the different categories
because we're thinking maybe that has something
to do with the likelihood of having antibodies. And then for each bar, for
example, this adult bar, you can see the percentage
that have the antibodies and the percentage that don't. So 95% of the adult bar is filled in blue. That's for yes, they have the antibodies. And 5% is filled in red. And then for children, you can see that 90% is filled in blue and 10% is filled in red because 10% don't have the antibodies. And then for infants, you can see that 40% is filled in blue and 60% don't have the antibodies. Now, this by itself is pretty useful to visually see, alright,
it looks like adults are much more likely
to have the antibodies than children, and children
are far more likely to have the antibodies than infants. And so it looks like
this idea of making a bar for each of adults, children or infants was a good way to start to understand the likelihood of having antibodies. You could have done it other ways. You could have had a
bar for have antibodies and another bar for not have antibodies. And then you could have
segmented the bar chart by whether they are adults,
children, or infants. But if you did that, that would have been trying to understand whether having antibodies
or not having antibodies is predictive of whether you're
an adult, child or infant while this one makes, at least to me, a little bit more sense that
whether you're an adult, child or infant might be predictive of whether or not you have antibodies. But there is some information lost in this segmented bar chart. For example, we have lost the fact that we have sampled, or we have tested a lot more adults than children and far more children than infants. So one way to incorporate that data back into a visualization to essentially show how many people you sampled in each of these categories, we can generate what's
known as a mosaic plot. So this is a mosaic plot right over here. And one way to think about it is we have just adjusted the width of each of these bars based on how many people we tested. So we tested 200 people. And so you can view this
width right over here as being 200. And you can see that we tested 120 adults. So the width of this first bar, I guess you could say, although now we're dealing
with a mosaic plot, this width right over here would be 60% of this entire width, which you can see that it is. And then the children are 60
of the 200 that we tested. And so this width right over here would be 60 over the entire 200, or it would be about
30% of the entire width. And we can see that we tested the fewest number of infants. And so this 20 right over here represents the 20 infants we tested. And the reason why this mosaic plot conveys more information, it conveys all the same information that our segmented bar chart does. But it also gives us
a sense that we tested more adults than children and far more children than infants. And it's also easy to then
look at it and say, okay, of the total number of people
who don't have the antibodies, so that would be the red
area right over here, even though we tested the
fewest number of infants, it looks like infants
make up a large chunk of the total number of folks who don't have antibodies. So I'll leave you there. The whole point of this
video is to just understand why a segmented bar chart or a mosaic plot will be useful in future videos. We'll get more practice analyzing them.