<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Visualization | Selina Baldauf</title><link>https://selinabaldauf.com/tags/data-visualization/</link><atom:link href="https://selinabaldauf.com/tags/data-visualization/index.xml" rel="self" type="application/rss+xml"/><description>Data Visualization</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Fri, 02 May 2025 00:00:00 +0000</lastBuildDate><image><url>https://selinabaldauf.com/media/icon_hu11112823492780087531.png</url><title>Data Visualization</title><link>https://selinabaldauf.com/tags/data-visualization/</link></image><item><title>Clear figures - stronger stories</title><link>https://selinabaldauf.com/post/data-visualization-lecture/</link><pubDate>Fri, 02 May 2025 00:00:00 +0000</pubDate><guid>https://selinabaldauf.com/post/data-visualization-lecture/</guid><description>&lt;p>In today&amp;rsquo;s post, I want to share what I&amp;rsquo;ve learned about creating clear figures that effectively communicate scientific results. This is the written version of a lecture from my &lt;a href="https://selinazitrone.github.io/tools_and_tips/sessions/09_data_visualisation.html" target="_blank" rel="noopener">&amp;ldquo;Scientific Workflows&amp;rdquo; lecture series&lt;/a>.&lt;/p>
&lt;p>I began exploring data visualization because I often disliked my own figures but wasn&amp;rsquo;t sure why. Then, I encountered visuals, like the one from the New York Times below, that show Arctic sea ice extent&amp;mdash;simple yet elegant and powerful. I used to wonder why my figures never looked like that. So, I set out to learn more about the principles of effective figures and how to create them in a scientific context.&lt;/p>
&lt;figure>
&lt;img src="images/nyt_ice_cover.png" data-fig-alt="A line graph of the annual changes in Arctic sea ice cover by Derek Watkins (New York Times). Every line represents one year and it becomes clear how big changes from 2010 to 2014 were." alt="Annual changes in Arctic sea ice cover by Derek Watkins (New York Times)" />
&lt;figcaption aria-hidden="true">Annual changes in Arctic sea ice cover by &lt;a href="https://www.nytimes.com/interactive/2015/03/24/science/earth/arctic-ice-low-winter-maximum.html?_r=0">Derek Watkins (New York Times)&lt;/a>&lt;/figcaption>
&lt;/figure>
&lt;p>Before diving into details, it&amp;rsquo;s helpful to reflect on what defines a &amp;ldquo;good&amp;rdquo; figure. In my view, a good figure should be:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Correct and transparent&lt;/strong>: It should represent the data truthfully and maintain integrity.&lt;/li>
&lt;li>&lt;strong>Useful&lt;/strong>: It should convey and support your main point.&lt;/li>
&lt;li>&lt;strong>Easy to read and understand&lt;/strong>: It should be accessible to the entire intended audience.&lt;/li>
&lt;li>&lt;strong>Beautiful&lt;/strong>: It should be visually interesting and pleasing (keeping in mind that we are scientists, not designers).&lt;/li>
&lt;li>&lt;strong>Appropriate&lt;/strong>: Different contexts have different requirements.&lt;/li>
&lt;/ul>
&lt;p>In this post, I&amp;rsquo;ll guide you through 7 steps to meet these requirements.&lt;/p>
&lt;p>Data visualization is a vast topic and I can only scratch the
surface here. If you want to dig deeper, I highly recommend looking at the following free books:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://clauswilke.com/dataviz/" target="_blank" rel="noopener">Fundamentals of Data Visualization (Wilke)&lt;/a>: No coding, but filled with examples and practical tips for creating accurate, beautiful and effective scientific figures.&lt;/li>
&lt;li>&lt;a href="https://socviz.co/lookatdata.html" target="_blank" rel="noopener">Data Visualization: A Practical Introduction (Healy)&lt;/a>: An excellent introduction with practical examples using R and ggplot2&lt;/li>
&lt;li>&lt;a href="https://ggplot2-book.org/" target="_blank" rel="noopener">ggplot2: Elegant Graphics for Data Analysis (Wickham et al.)&lt;/a>: Focused on learning plots with ggplot2&lt;/li>
&lt;/ul>
&lt;h2 id="-1-consider-the-context">👣 1: Consider the context&lt;/h2>
&lt;p>Before designing a figure, consider the context in which it will be used. Ask
yourself &lt;strong>who your audience is&lt;/strong> and how familiar they are with the topic. The
complexity of your figure will depend on whether you present it to lab colleagues or
a general audience.&lt;/p>
&lt;p>Also, &lt;strong>consider common practices&lt;/strong> in your field. If there are established
plot types or colors for indicating certain elements, it&amp;rsquo;s best to adhere to them. This helps
your audience quickly understand the content.&lt;/p>
&lt;p>Another important factor is &lt;strong>where&lt;/strong> you will present your figure, as different contexts require different designs.&lt;/p>
&lt;p>In a &lt;strong>paper&lt;/strong>, readers typically have the time and expertise to digest complex visuals. While figures are usually viewed on a screen, some may print papers in black and white, so your figure should function without color.&lt;/p>
&lt;p>&lt;strong>Posters&lt;/strong> offer more design freedom, which can help attract people (e.g. by unconventional figures or flashy colors). Although people can spend some time at a poster, simplicity is still important here, as readers might quickly move on to the next poster as soon as they feel overwhelmed by complexity.&lt;/p>
&lt;p>During a &lt;strong>talk&lt;/strong>, figures should be much simpler than in a paper. Your audience often only has a minute to understand the figure before you move to the next slide. Therefore, copying complex figures from
a figure into a slideshow usually does not work very well. So simplify your figure for your talks. For more complex figures, make sure to take advantage of animations to build your figure as you speak.&lt;/p>
&lt;h2 id="-2-make-your-data-transparent">👣 2: Make your data transparent&lt;/h2>
&lt;p>When you make a figure, you always have the choice &lt;strong>what to show&lt;/strong> (and not show) and &lt;strong>how to show it&lt;/strong>.
There are many principles here, and I want to talk about two of them.&lt;/p>
&lt;h3 id="principle-1-summary-statistics-hide-data">Principle 1: Summary statistics hide data&lt;/h3>
&lt;p>Maybe, you know the famous &amp;ldquo;Datasaurus&amp;rdquo; figure below. It shows 13 completely different data sets with identical summary statistics. Therefore, &lt;strong>visualizing the raw data points&lt;/strong> is crucial for understanding what&amp;rsquo;s going on and how the 13 data sets differ.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/show-fig-datasaurus-stats-1.svg" width="768" alt="Datasaurus Dozen: 13 different distributions with the same summary statistics (mean and regression line)." />
&lt;figcaption aria-hidden="true">Datasaurus Dozen: 13 different distributions with the same summary statistics (mean and regression line).&lt;/figcaption>
&lt;/figure>
&lt;p>On a more serious note, the &lt;em>different data - same statistics&lt;/em> phenomenon is important
to consider when interpreting bar graphs with error bars (commonly
showing mean and standard error of the mean or standard deviation). In their &lt;a href="https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128" target="_blank" rel="noopener">short
paper from 2015&lt;/a>, Weissgerber et al. advocate for abandoning bar plots for this reason.
In their figure below, they demonstrate that data with different characteristics (symmetric, outliers, bimodal distribution, unequal sample size) yield different statistical p-values, yet all data sets have the &lt;em>same summary bar plot&lt;/em>.&lt;/p>
&lt;figure>
&lt;img src="images/beyond_bar_paper_tests.png" alt="Figure 1 from Weissgerber et al. 2015 showing 4 data sets (B-E) with different characteristics and p-values for the group comparisons (highlights added by me - green = significant, blue = not significant). However, all data sets have the same bar plot (A)." />
&lt;figcaption aria-hidden="true">Figure 1 from &lt;a href="https://doi.org/10.1371/journal.pbio.1002128">Weissgerber et al. 2015&lt;/a> showing 4 data sets (B-E) with different characteristics and p-values for the group comparisons (highlights added by me - green = significant, blue = not significant). However, all data sets have the same bar plot (A).&lt;/figcaption>
&lt;/figure>
&lt;h4 id="alternatives-to-bar-plots">Alternatives to bar plots&lt;/h4>
&lt;p>So, what can you use instead of bar plots? Let&amp;rsquo;s explore some alternatives using
an example data set.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/create-barplot-1.svg" width="768" alt="A bar plot showing mean and standard error of the mean as error bars but hiding a lot of the underlying data in the 4 groups." />
&lt;figcaption aria-hidden="true">A bar plot showing mean and standard error of the mean as error bars but hiding a lot of the underlying data in the 4 groups.&lt;/figcaption>
&lt;/figure>
&lt;p>A first step to displaying more of your data is to use a &lt;strong>boxplot&lt;/strong> instead. Boxplots provide
more information about the distribution of the data:&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/anatomy-of-boxplot-1.svg" width="768" alt="Anatomy of a boxplot: The different summary statistics a boxplot shows." />
&lt;figcaption aria-hidden="true">Anatomy of a boxplot: The different summary statistics a boxplot shows.&lt;/figcaption>
&lt;/figure>
&lt;p>In our example data set, the boxplot below already provides a clearer view of the
data distribution. We can see, for example, that group
B has several outliers, group C has high variance and group D has very low variance in the interquartile range (i.e., the box is large and small, respectively).&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/print-boxplot-1.svg" width="768" alt="A boxplot showing the summary distribution of the data in the example data set." />
&lt;figcaption aria-hidden="true">A boxplot showing the summary distribution of the data in the example data set.&lt;/figcaption>
&lt;/figure>
&lt;p>We can improve further by adding the raw data as jittered points to the boxplot.
Jittering means adding a small, random value to the x-value of the points to prevent overlapping. This
is useful when you have categories on the x-axis, where slight shifts to the left or right do not affect interpretation.
This plot makes the raw data transparent and provides a better understanding of the data distribution without being overly complex. We can see that group A has a symmetric distribution, group B has many
outliers, group C is bimodal, and group D has only a four data points.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/print-boxplot-with-points-1.svg" width="768" alt="A boxplot with jittered points showing the summary distribution and the raw data from the example data set." />
&lt;figcaption aria-hidden="true">A boxplot with jittered points showing the summary distribution and the raw data from the example data set.&lt;/figcaption>
&lt;/figure>
&lt;p>One step further are &lt;strong>raincloud plots&lt;/strong>. They show everything from
raw data points, to summary statistics and the data distribution.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/print-raincloud-plot-1.svg" width="768" alt="A raincloud plot showing raw data, boxplot and density distribution of the data in the example data set." />
&lt;figcaption aria-hidden="true">A raincloud plot showing raw data, boxplot and density distribution of the data in the example data set.&lt;/figcaption>
&lt;/figure>
&lt;h3 id="principle-2-the-principle-of-proportional-ink">Principle 2: The Principle of proportional ink&lt;/h3>
&lt;p>Another important principle of transparent data presentation is the &lt;strong>principle of proportional ink&lt;/strong> (see &lt;a href="https://callingbullshit.org/tools/tools_proportional_ink.html" target="_blank" rel="noopener">this post by Bergstrom and West&lt;/a> for more examples and explanations). This principle states that the sizes of shaded areas in plots should be proportional to the data values they represent.&lt;/p>
&lt;p>The most important example for this is bar plots. Bar plots code values in two ways:
the length of the bar and its position on the y-axis. To adhere to the proportional ink principle,
bars must always start at zero. If they don&amp;rsquo;t, the bar lengths no longer represent &lt;strong>relative data proportions&lt;/strong>.
This misleads readers who intuitively compare bar lengths and thus overestimate the differences between the groups.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/barplots-start-at-zero-1.svg" width="768" alt="Bar plot of the same two groups with different y-axis starting points. Plot a) is correct, starting at 0, where the red bar is slightly higher than the blue one. Plot b) is misleading; it starts at 75%, making the red bar appear three times larger than the blue bar. This misrepresentation confuses readers who intuitively compare bar lengths that are no longer proportional to the data values." />
&lt;figcaption aria-hidden="true">Bar plot of the same two groups with different y-axis starting points. Plot a) is correct, starting at 0, where the red bar is slightly higher than the blue one. Plot b) is misleading; it starts at 75%, making the red bar appear three times larger than the blue bar. This misrepresentation confuses readers who intuitively compare bar lengths that are no longer proportional to the data values.&lt;/figcaption>
&lt;/figure>
&lt;p>Not every plot needs to start at zero though; there are valid reasons to begin the y-axis elsewhere. Below, you see two plots displaying the same data: life expectancy across different continents. Both plots are effective but highlight different aspects of the data. In the bar plot (panel a), relative comparisons are easy because all bars start at 0. In the point plot (panel b), the y-axis begins at the minimum value, simplifying absolute comparisons
by setting the baseline at the African continent. Since we use points and their positions on the y-axis rather than shaded areas to represent values, the reader is not misled if the y-axis does not start at 0.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/proportional-ink-life-expectancy-1.svg" width="768" alt="Two plots showing life expectancy in different continents. The bar plot (a) starts at 0 and allows for relative comparisons. The point plot (b) starts at the minimum value of the data and highlights absolute differences." />
&lt;figcaption aria-hidden="true">Two plots showing life expectancy in different continents. The bar plot (a) starts at 0 and allows for relative comparisons. The point plot (b) starts at the minimum value of the data and highlights absolute differences.&lt;/figcaption>
&lt;/figure>
&lt;h2 id="-3-choose-the-right-chart-type">👣 3: Choose the right chart type&lt;/h2>
&lt;p>There are countless chart types for various data and messages.
However, exploring all of them is not the purpose of this post. Instead, I want to provide you with
two excellent online resources to explore different chart types that allow you to
specifically find options that fit your data structure and the message you want to convey:&lt;/p>
&lt;div class="gdoc-columns gdoc-columns--regular flex gap-16 flex-mobile-column">
&lt;div class="gdoc-columns__content gdoc-markdown--nested flex-even">
&lt;!-- begin columns block -->
&lt;figure>
&lt;img src="images/data_to_vis.png" style="width:50.0%" alt="From data vo viz: Browse through a decision tree depending on your data, get ideas for chart types and links to code examples in different programming languages" />
&lt;figcaption aria-hidden="true">&lt;a href="https://www.data-to-viz.com/">From data vo viz&lt;/a>: Browse through a decision tree depending on your data, get ideas for chart types and links to code examples in different programming languages&lt;/figcaption>
&lt;/figure>
&lt;/div>
&lt;div class="gdoc-columns__content gdoc-markdown--nested flex-even">
&lt;!-- magic separator, between columns -->
&lt;figure>
&lt;img src="images/datavis_100.png" style="width:50.0%" alt="The dataviz project: A nice overview of different chart options and for which types of data they can be used." />
&lt;figcaption aria-hidden="true">&lt;a href="https://datavizproject.com/">The dataviz project&lt;/a>: A nice overview of different chart options and for which types of data they can be used.&lt;/figcaption>
&lt;/figure>
&lt;/div>
&lt;/div>
&lt;p>Something fundamental to keep in mind for all different chart types is that different channels
for coding the data are perceived with different accuracy by humans.&lt;/p>
&lt;p>Below, you see different channels with which we can represent the same data.
How accurate we can judge the data decreases from left to right.&lt;/p>
&lt;figure>
&lt;img src="images/different_channels.png" alt="Different channels to represent the same data: the differences in numbers between 3 groups. The accuracy of our judgment decreases from left to right. This means that the chart types on the left are suitable for accurate judgement, the ones on the right for more generic judgement." />
&lt;figcaption aria-hidden="true">Different channels to represent the same data: the differences in numbers between 3 groups. The accuracy of our judgment decreases from left to right. This means that the chart types on the left are suitable for accurate judgement, the ones on the right for more generic judgement.&lt;/figcaption>
&lt;/figure>
&lt;p>You can see this quite clearly:
In the point plot, the differences between the three groups are evident, and I can read the exact
numbers from the axis. I would argue that this holds true for the bars and the pie chart as well. However, it becomes more challenging to obtain precise numbers from these charts, especially as the number of groups increases. In the area and color plots, comparing the groups is extremely difficult. Can you see, for example, that the green area is half the size of the blue area?&lt;/p>
&lt;p>That doesn&amp;rsquo;t mean color is bad&amp;mdash;it&amp;rsquo;s effective for showing trends or patterns when exact values aren&amp;rsquo;t critical (like in heatmaps). However, be mindful of which visual channel suits your purpose. If you need accurate judgment, use chart types that rely on the left channels. For a more general assessment, you can utilize the right channels.&lt;/p>
&lt;p>And you can always &lt;strong>combine channels&lt;/strong> &amp;mdash; like using both length and position, or adding labels to increase the accuracy of judgement.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/compare-lollipot-pie-1.svg" width="768" alt="A lollipop chart showing the number of different penguin species (y-axis) in the data set. The plot combines position (point) and length (line) channels, adding exact numbers for increased accuracy of judgement." />
&lt;figcaption aria-hidden="true">A lollipop chart showing the number of different penguin species (y-axis) in the data set. The plot combines position (point) and length (line) channels, adding exact numbers for increased accuracy of judgement.&lt;/figcaption>
&lt;/figure>
&lt;h2 id="-4-focus-on-the-core-message">👣 4. Focus on the core message&lt;/h2>
&lt;p>The reader&amp;rsquo;s attention is limited, so be concise and focus on your main message.
Identify which &lt;em>variables you need&lt;/em> and which &lt;em>variables you
can omit&lt;/em>. Then make design choices that help to convey and highlight the main message.&lt;/p>
&lt;h3 id="arrange-your-plot-so-its-easy-to-extract-the-main-message">Arrange your plot so it&amp;rsquo;s easy to extract the main message&lt;/h3>
&lt;p>The same data can be arranged in different ways that focus the reader&amp;rsquo;s attention on
different main messages.&lt;/p>
&lt;p>For example, if you plot life expectancy in Asia and Europe over time using side-by-side bars (top plot), you emphasize the year-by-year comparison, highlighting that Europe is consistently higher than Asia, but the gap is closing. If you plot the two continents separately (bottom plot), you emphasize the temporal trend within each continent. This difference is subtle, but if you include more continents, the distinction in messages between the left and right plots becomes much more pronounced.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/arrange-plots-1.svg" width="768" alt="Two plots display life expectancy data in Asia and Europe over time. The left plot uses side-by-side bars for a year-by-year comparison, while the right plot uses separate plots to highlight the temporal trend within each continent." />
&lt;figcaption aria-hidden="true">Two plots display life expectancy data in Asia and Europe over time. The left plot uses side-by-side bars for a year-by-year comparison, while the right plot uses separate plots to highlight the temporal trend within each continent.&lt;/figcaption>
&lt;/figure>
&lt;h3 id="choose-an-appropriate-plot-type">Choose an appropriate plot type&lt;/h3>
&lt;p>In addition to arranging the data, you can choose plot types that emphasize your main message. Different plots tell different stories, even when showing the same data.&lt;/p>
&lt;p>If our main message from the life expectancy data is to highlight the closing gap between Asia and Europe, we can use a &lt;strong>dumbbell plot&lt;/strong> (top plot below). This type is effective for showing differences between two groups over time, clearly focussing the reader&amp;rsquo;s attention to the distance between the groups. If our main message is to show trends over time, we can use a &lt;strong>time series line plot&lt;/strong> (bottom plot below). This type is ideal for showing the trends, and readers will immediately recognize it as representing a time series.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/life-expectancy-story-1.svg" width="768" alt="Two plots showing life expectancy data in Asia and Europe over time. The dumbbell plot (top) emphasizes the closing gap between the two continents, while the line plot (bottom) highlights the trends over time." />
&lt;figcaption aria-hidden="true">Two plots showing life expectancy data in Asia and Europe over time. The dumbbell plot (top) emphasizes the closing gap between the two continents, while the line plot (bottom) highlights the trends over time.&lt;/figcaption>
&lt;/figure>
&lt;h3 id="keep-it-simple">Keep it simple&lt;/h3>
&lt;p>Don&amp;rsquo;t overcomplicate your figures and bury your main message. I may be tempting to include all variables in a figure since you&amp;rsquo;ve already measured them and can only put a limited number of figures in a paper. However, remeber that the reader&amp;rsquo;s attention is limited; we should focus on the main message and avoid unnecessary distractions.&lt;/p>
&lt;p>In the (admittedly exaggeraged) example below, I plotted all possible variables from the
life expectancy data set in one figure. It shows everything, but readers will likely be
lost and need to spend a lot of time extracting any message from it.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/expectancy-complex-1.svg" width="768" alt="An overly complex plot showing all variables in the gapminder dataset at once. The plot is cluttered and difficult to read, making it hard to extract any meaningful information." />
&lt;figcaption aria-hidden="true">An overly complex plot showing all variables in the &lt;code>gapminder&lt;/code> dataset at once. The plot is cluttered and difficult to read, making it hard to extract any meaningful information.&lt;/figcaption>
&lt;/figure>
&lt;p>Let&amp;rsquo;s say our main message focuses on life expectancy and GDP differences worldwide.
Why then plot population sizes and trendlines? Why use a scatter plot? To show differences in life expectancy and GDP between continents, I could use two ridgeline plots instead and indicate the world average with a vertical line.
This approach makes it easier and quicker to understand the results related to the main message.&lt;/p>
&lt;p>So only plot what you really need to convey your message. For additional variables, you can
always use the appendix.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/expectancy-simplified-1.svg" width="768" alt="Two ridgeline plots showing only the variables needed to convey the main message about differences in life expectancy and GDP. The plot is cleaner and much easier to read, allowing for a clearer understanding of the message." />
&lt;figcaption aria-hidden="true">Two ridgeline plots showing only the variables needed to convey the main message about differences in life expectancy and GDP. The plot is cleaner and much easier to read, allowing for a clearer understanding of the message.&lt;/figcaption>
&lt;/figure>
&lt;h2 id="-5-consider-the-journey">👣 5. Consider the journey&lt;/h2>
&lt;p>I know this soundsstrange but reading a figure is a &lt;strong>timeley experience&lt;/strong>. What
I mean by that is that we don&amp;rsquo;t look at a figure and understand everything at once.
We look at the elements step by step before we come back to understand the figure as a whole.&lt;/p>
&lt;p>So when creating or improving a figure, put yourself in the reader&amp;rsquo;s shoes and consider
their journey through it. What will they look at first, second, etc.? How many steps does it take to understand all the elements? The goal should be to make the journey for their eyes (but also their brains) to understand the
entire figure as short as possible.&lt;/p>
&lt;p>So let&amp;rsquo;s say we want to tell a story about the GDP of China and India compared to the rest of Asia.
We might have start with a plot like the following:&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/no-highlight-countries-1.svg" width="768" />
&lt;p>So, let&amp;rsquo;s consider the journey. First, the reader reads the title and sees that it&amp;rsquo;s about China and India. So they will look for the bars for China and India. This is challenging, as the bar labels are not in alphabetical order, and the reader must rotate your neck to read them. Once they find China and India, they want to compare them to other countries. However, if they look away, they risk losing the bars for China and India and have to search them again.&lt;/p>
&lt;p>So the question is: How can we make the journey easier and more pleasant?&lt;/p>
&lt;h3 id="flip-the-axes">Flip the axes&lt;/h3>
&lt;p>First, we can flip the plot axes to avoid the neck rotation. This makes it easier to read the labels and compare the bars. Flipping the axes is often possible in bar plots and is usually a good idea to improve readability of long text labels.&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/rotate-barplot-1.svg" width="768" />
&lt;h3 id="highlight-the-main-message">Highlight the main message&lt;/h3>
&lt;p>In the next step, we can highlight our main message about China and India. In general, we can achieve this by highlighting the
important parts of the plot, while simultaneously de-emphasizing the less important parts.&lt;/p>
&lt;p>Consider the following text:&lt;/p>
&lt;p>&lt;span style="color:rgb(164, 163, 163); ">Effective visualization helps us understand data quickly. &lt;/span>&lt;span style="color: #4169E1; font-weight: bold;">Patterns&lt;/span>&lt;span style="color: rgb(164, 163, 163); "> emerge naturally, while &lt;/span>&lt;span style="color: #4169E1; font-weight: bold;">colors&lt;/span>&lt;span style="color: rgb(164, 163, 163); "> enhance meaning. Good &lt;/span>&lt;span style="color: #4169E1;font-weight: bold;">design&lt;/span>&lt;span style="color: rgb(164, 163, 163); "> choices and proper &lt;/span>&lt;span style="color: #4169E1; font-weight: bold;">emphasis&lt;/span>&lt;span style="color: rgb(164, 163, 163); "> make insights accessible to everyone.&lt;/span>&lt;/p>
&lt;p>The blue words pop out immediately because they are highlighted against the light grey words.
These highlights draw the reader&amp;rsquo;s pre-attentive focus to specific elements, which they notice without conscious thought. You can use color, size, shapes, or arrows to create such highlights.&lt;/p>
&lt;p>In our example, we can highlight the bars for China and India while de-emphasizing the other countries. This makes it clear that those two bars are essential to the message without even reading the title.&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/highlight-countries-1.svg" width="768" />
&lt;p>We can make the understanding even faster,by using different colors for China and India and introducing them already in the title. This way, we place the information where the reader&amp;rsquo;s eyes are already are. When they first read the title they will know what the colors represent without the need for a legend or axis labels to distinguish between China and India.&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/highlight-countries-2-1.svg" width="768" />
&lt;h3 id="order-your-data-purposefully">Order your data purposefully&lt;/h3>
&lt;p>Another way to make the journey shorter for the reader is to &lt;strong>order your data purposefully&lt;/strong> instead
of using the default ordering which is usually alphabetical.&lt;/p>
&lt;p>Consider the following two plots that display the same data but with different country orders. Both figures are effective for different purposes. The left plot is ordered by GDP, making it great for comparing values across countries, but it is not ideal for quickly locating a specific country. In contrast, the right plot allows for quick country lookups, but comparing GDP between countries is quite cumbersome.&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/order-categories-1.svg" width="768" />
&lt;p>Below, you find another example of how ordering data shortens the journey. In the
figure, the legend is not arranged alphabetically, which would be the default.
Instead, I ordered the legend to match the lines. This allows the reader to follow the lines and the legend easily from top to bottom. This simple trick makes the reader&amp;rsquo;s journey much more convenient.&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/order-legend-1.svg" width="768" />
&lt;h2 id="-6-less-is-more">👣 6. Less is more&lt;/h2>
&lt;h3 id="the-importance-of-differences">The importance of differences&lt;/h3>
&lt;p>Humans perceive differences (e.g. in colors, shapes, sizes, etc.) very well and attribute meaning to them.
You already saw this when we talked about highlights that pop out:&lt;/p>
&lt;p>&lt;span style="color: rgb(164, 163, 163); ">Effective visualization helps us understand data quickly. &lt;/span>&lt;span style="color: #4169E1; font-weight: bold;">Patterns&lt;/span>&lt;span style="color: rgb(164, 163, 163); "> emerge naturally, while &lt;/span>&lt;span style="color: #4169E1; font-weight: bold;">colors&lt;/span>&lt;span style="color: rgb(164, 163, 163); "> enhance meaning. Good &lt;/span>&lt;span style="color: #4169E1; font-weight: bold;">design&lt;/span>&lt;span style="color: rgb(164, 163, 163); "> choices and proper &lt;/span>&lt;span style="color: #4169E1; font-weight: bold;">emphasis&lt;/span>&lt;span style="color: rgb(164, 163, 163); "> make insights accessible to everyone.&lt;/span>&lt;/p>
&lt;p>But if there are too many differences, we don&amp;rsquo;t see anything specific anymore:&lt;/p>
&lt;p>&lt;span style="font-size: 90%; font-style: italic; color: #8B4513;">Effective&lt;/span> &lt;span style="color: #4169E1; font-weight: bold;">visualization&lt;/span> &lt;span style="color: #FF69B4;">helps&lt;/span> &lt;span style="font-style: italic;">us&lt;/span> &lt;span style="font-size: 110%; color: #9932CC;">understand&lt;/span> &lt;span style="font-size: 85%;">data&lt;/span> &lt;span style="color: #CD853F; font-style: italic;">quickly.&lt;/span> &lt;span style="color: #4169E1; font-size: 120%; font-weight: bold;">Patterns&lt;/span> &lt;span style="color: #FF4500; font-size: 110%;">emerge&lt;/span> &lt;span style="font-size: 95%; color: #4682B4;">naturally,&lt;/span> &lt;span style="font-style: italic; color: #DA70D6; font-weight: bold;">while&lt;/span> &lt;span style="color: #4169E1; font-size: 120%; font-weight: bold;">colors&lt;/span> &lt;span style="color: #20B2AA; font-size: 105%;">enhance&lt;/span> &lt;span style="font-size: 115%; color: #CD5C5C;">meaning.&lt;/span> &lt;span style="font-style: italic; color: #4B0082;">Good&lt;/span> &lt;span style="color: #4169E1; font-size: 120%; font-weight: bold;">design&lt;/span> &lt;span style="color: #8FBC8F; font-size: 110%;">choices&lt;/span> &lt;span style="font-style: italic; color: #B8860B;">and&lt;/span> &lt;span style="font-size: 95%;">proper&lt;/span> &lt;span style="color: #4169E1; font-size: 120%; font-weight: bold;">emphasis&lt;/span> &lt;span style="color: #6A5ACD; font-size: 105%;font-weight: bold;">make&lt;/span> &lt;span style="font-style: italic; color: #FF7F50;">insights&lt;/span> &lt;span style="color: #9370DB;">accessible&lt;/span> &lt;span style="font-size: 90%; color: #556B2F;font-weight: bold;">to&lt;/span> &lt;span style="font-style: italic; color: #8B008B;">everyone.&lt;/span>&lt;/p>
&lt;p>The same is also true for figures. Therefore, differences should always be used to communicate and not to
decorate.&lt;/p>
&lt;p>For example, in the left plot below, I differentiated countries by color. This is not necessary and the colors are just a distraction. But intuitively, readers try to interpret the differences in color where there are none. In the right plot, I only use one color for all countries. This makes the figure much cleaner and it&amp;rsquo;s easier to focus on the important message.&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/differences-to-communicate-1.svg" width="768" />
&lt;h3 id="declutter-your-figure">Declutter your figure&lt;/h3>
&lt;p>A helpful concept for decluttering figures is the &lt;strong>data-to-ink ratio&lt;/strong> &amp;mdash; the idea that as much of the &amp;ldquo;ink&amp;rdquo; in a figure as possible should represent actual data rather than decorative or redundant elements. While this is partly a matter of taste, consider which parts of your figure are essential and which may be duplicated or unnecessary. However, be cautious not to remove elements that assist the viewer in reading and understanding the figure.&lt;/p>
&lt;p>Below you see the same plot, showing unemployment over time in the US, in a more (left) and less (right) cluttered version.&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/declutter-1.svg" width="768" />
&lt;p>To declutter the figure on the left, I removed the x-axis title since it is clear that it represents a time series. I also replaced the &amp;ldquo;in thousands&amp;rdquo; label on the y-axis with a &amp;ldquo;k&amp;rdquo; after the numbers and removed the grey background. These changes made the plot cleaner and more elegant.
Other decisions are debatable and depend on the figure&amp;rsquo;s purpose, such as whether to keep all grid lines. While they help the viewer read exact values, removing some can make the plot cleaner and emphasize the overall trend.&lt;/p>
&lt;p>Ultimately, what you keep or remove should align with your message and how you want the reader to interpret the figure. Decluttering often enhances your figure&amp;mdash;just make sure you don&amp;rsquo;t oversimplify.&lt;/p>
&lt;h2 id="-7-make-it-accessible">👣 7: Make it accessible&lt;/h2>
&lt;h3 id="element-size">Element size&lt;/h3>
&lt;p>Make sure that all elements are adequately sized, including text, line width, and point size. The appropriate size depends on the context. For instance, when creating a figure for a presentation, use larger text and element sizes, and test them
on a projector beforehand. Often, the default sizes when creating figures on computer screens are too small for other settings.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/element-size-1.svg" width="768" alt="The same lollipop plot with different element sizes." />
&lt;figcaption aria-hidden="true">The same lollipop plot with different element sizes.&lt;/figcaption>
&lt;/figure>
&lt;h3 id="contrast">Contrast&lt;/h3>
&lt;p>Ensure the contrast between elements is strong enough for clear readability. If you&amp;rsquo;re unsure, use tools to check your contrast levels, such as &lt;a href="https://colourcontrast.cc" target="_blank" rel="noopener">this one&lt;/a> or &lt;a href="https://www.figma.com/color-contrast-checker/" target="_blank" rel="noopener">this one&lt;/a>.&lt;/p>
&lt;figure>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/contrast-1.svg" width="768" alt="Different contrast between background and text colors" />
&lt;figcaption aria-hidden="true">Different contrast between background and text colors&lt;/figcaption>
&lt;/figure>
&lt;h3 id="intuitive-colors">Intuitive Colors&lt;/h3>
&lt;p>Use colors that are logical and intuitive, keeping in mind our associations with them.
This &lt;a href="https://informationisbeautiful.net/visualizations/colours-in-cultures/" target="_blank" rel="noopener">can be cultural&lt;/a>, as different societies link the same colors to various concepts and emotions.
However, some colors are more universally understood and should not be changed to avoid confusion (e.g., red is hot, blue is cold, green is forest, and blue is lake).&lt;/p>
&lt;figure>
&lt;img src="images/intuitive_colors.png" alt="Don’t use unintuitive colors. Taken from a blogpost on colors by Lisa Charlotte Muth (Datawrapper)" />
&lt;figcaption aria-hidden="true">Don’t use unintuitive colors. Taken from a &lt;a href="https://www.datawrapper.de/blog/colors">blogpost on colors&lt;/a> by Lisa Charlotte Muth (Datawrapper)&lt;/figcaption>
&lt;/figure>
&lt;h3 id="colorblind-friendly-colors">Colorblind-friendly colors&lt;/h3>
&lt;p>Choose color palettes that are colorblind-friendly. There are many options available, such as the &lt;a href="https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html" target="_blank" rel="noopener">&lt;em>Viridis&lt;/em> color palettes&lt;/a> and &lt;a href="https://colorbrewer2.org" target="_blank" rel="noopener">&lt;em>colorBrewer&lt;/em> palettes&lt;/a>). If you are unsure about your color choice, you can use tools to test your palette against various types of color blindness:&lt;/p>
&lt;figure>
&lt;img src="images/viz_palette.png" alt="The Viz Palette tool is a great tool to test your colors in different contexts and for different types of color blindnessto test for color blindness" />
&lt;figcaption aria-hidden="true">The &lt;a href="https://projects.susielu.com/viz-palette">Viz Palette tool&lt;/a> is a great tool to test your colors in different contexts and for different types of color blindnessto test for color blindness&lt;/figcaption>
&lt;/figure>
&lt;p>If you are an R user, you can have a look at &lt;a href="https://journal.r-project.org/articles/RJ-2023-071/" target="_blank" rel="noopener">this article&lt;/a> and the &lt;a href="https://colorspace.r-forge.r-project.org/index.html" target="_blank" rel="noopener">&lt;em>colorspace&lt;/em> package&lt;/a> which provides many tools for selecting and manipulating colors.&lt;/p>
&lt;h3 id="redundancy">Redundancy&lt;/h3>
&lt;p>Redundancy increases the likelihood that everyone can see the differences. Instead of relying solely on color to differentiate between groups, you can also use shapes (at least for data with few groups). This benefits people with color blindness and those who print your paper in black and white.&lt;/p>
&lt;img src="index.markdown_strict_files/figure-markdown_strict/add-redundancy-1.svg" width="768" />
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>We now have looked at &lt;em>7 steps&lt;/em> to make our figures better:&lt;/p>
&lt;ol>
&lt;li>Consider the &lt;strong>context&lt;/strong>&lt;/li>
&lt;li>Make your data &lt;strong>transparent&lt;/strong>&lt;/li>
&lt;li>Choose the &lt;strong>right chart type&lt;/strong>&lt;/li>
&lt;li>Focus on the &lt;strong>core message&lt;/strong>&lt;/li>
&lt;li>Consider the &lt;strong>journey&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Less is more&lt;/strong>&lt;/li>
&lt;li>Make it &lt;strong>accessible&lt;/strong>&lt;/li>
&lt;/ol>
&lt;p>Of course, each step could be a separate blog post, and I&amp;rsquo;ve only scratched the surface here.
And it&amp;rsquo;s also not necessary to think about all this for every figure. But even a small tweak can
sometimes make a big difference.
For me, it was already immensely helpful to learn these core concepts. I started using them to improve my own figures and analyze the figures I see in papers or posters.&lt;/p>
&lt;p>Let me know in the comments if you have other tips and tricks for improving figures!&lt;/p></description></item><item><title>Introduction to data analysis with R</title><link>https://selinabaldauf.com/workshops/intro_r/</link><pubDate>Wed, 01 Jan 2025 00:00:00 +0000</pubDate><guid>https://selinabaldauf.com/workshops/intro_r/</guid><description>&lt;p>Duration: 3 days + 1 day for project work (6 hours per day)&lt;/p>
&lt;p>Check out the &lt;a href="https://selinazitrone.github.io/intro-r-data-analysis/" target="_blank" rel="noopener">workshop website&lt;/a> for more information.&lt;/p>
&lt;h2 id="workshop-description">Workshop description&lt;/h2>
&lt;p>The workshop’s main goal is to equip you with essential R skills for analyzing your own research data, covering data processing and visualization. After the workshop you are equipped to confidently advance your R skills for your specific research needs.
Outline&lt;/p>
&lt;p>In 4 Workshop days we will cover the following topics:&lt;/p>
&lt;ul>
&lt;li>Day 1: Introduction to R and R Studio, Good programming practice, Reading data into R&lt;/li>
&lt;li>Day 2: Data visualization with the ggplot package, data wrangling with dplyr&lt;/li>
&lt;li>Day 3: Data cleaning with tidyr, statistical tests, AI Tools&lt;/li>
&lt;li>Day 4: Bring your own research data (or get some real-life data from me)&lt;/li>
&lt;/ul>
&lt;h2 id="concept">Concept&lt;/h2>
&lt;p>To learn programming, it is essential to practice writing code yourself. Therefore, every input session will be followed by practical exercises. On day 4, you can apply the methods you learned to your own research data or a provided real-life data set.&lt;/p>
&lt;p>The workshop will be in English and will be in an online format.&lt;/p>
&lt;h2 id="for-whom-is-this-workshop">For whom is this workshop?&lt;/h2>
&lt;p>This workshop is for beginners without prior experience in R or any other programming language. You can still participate if you already know other programming languages though.&lt;/p></description></item></channel></rss>