Five days of R

A crash-course in R

Author

January Weiner

Published

Invalid Date

Introduction

Five days of R

I have been teaching R to biologists and medical students for many years now. At the Core Unit for Bioinformatics at the Berlin Institute of Health, Charité - Universitätsmedizin Berlin, we have developed a five-day, 5 hour per day R crash course running for the last three years. This book is a companion to that course.

This is also the reason for how the materials in the book are arranged. Rather then discussing everything about vectors first, then everything about matrices etc., we start with easy things, and return to them later to build on them. I call this “helical learning”1 – we spiral around the same topics, but each time going a bit deeper, and each time you will understand a bit more. This is also why some topics are spread between the days – by trial and error, we have found the amount of material that can be covered in a day of learning.

1 I got this idea from the professor Barbara Płytycz from the Jagiellonian University, who taught me my first “helix” of the immune system.

There are two goals of this book. The first one is that after five days of learning R, you will be able to load, inspect, manipulate and save data files (such as Excel tables or CSV files), make some basic plots and perform simple statistical tests. The second goal is that you are in a good starting position to continue learning R on your own.

In other words, this course should give you a jump start, allowing to overcome this first big hurdle in learning R.

Prerequisites

Installing R and RStudio

Before you dive in to the course, we would like to ask you to install both R and RStudio: you need R (obviously) and RStudio is a great and very popular interface for R. Minimal installation guide:

  1. Install R. Go to R website and download the version for your operating system. Install it.
  2. Install R Studio. Go to R Studio installation page and follow the instructions.

This should be it. However, if you are stuck or have problems with installing R or RStudio, please search the internet – the installation and especially related problems unfortunately depends on your operating system and the version of the software, so it is hard to give a general advice.

Here are some other websites with somewhat more detailed instructions:

Potential problems:

  1. You already had R installed on the system. Having multiple R versions may be a source of problems. If you have an old version of R, unless you have a good reason to keep it, uninstall it.

  2. You need to compile packages. Packages come in two forms: precompiled packages and source packages. Some exotic packages might not have the precompiled version for your system. If you need to compile packages, you will have to install additional packages. For example, for Windows, you need to install Rtools - make sure you download the version that corresponds to your version of R.

Installing R packages

During the course we will use several R packages that you need to install on your computer. On Day 2, we will discuss installing and loading packages, and we will make a note to install the required packages when they are needed. However, you can also install them right now, after you have installed R and RStudio. This can be more effective – after all, installing might take some time.

Here is the list of packages that you will have to install:

  • tidyverse
  • ggplot2
  • skimr
  • pander
  • readxl
  • writexl
  • janitor
  • broom
  • cowplot
  • ggbeeswarm
  • pheatmap
  • tinytex (only if you want to produce PDF output from Rmarkdown)

You can install them by running the following code in your RStudio:

install.packages(c("tidyverse", "ggplot2", "skimr", "pander", 
                   "readxl", "writexl", "janitor", "broom", 
                   "cowplot", "ggbeeswarm", "pheatmap", "tinytex"))

The structure of this book

This book is divided into five chapters, each corresponding to one day of the course. At the beginning of each chapter, you will find a short list of topics for the given day.

Some parts of the book are highlighted:

Code blocks with output:

# this is a comment
x <- 1 + 1

The numbers on the left (if present) are not part of the code – they are just line numbers. You can copy the code by clicking on the “Copy” button (📋) and it will not copy these numbers.

Exercise 1 (Example)  

  • This is how an exercise looks like!
  • Please do all exercises. It helps a lot.
  • Some things are learned only through exercises.

Some exercises have a solution which you can click to reveal.

Useful tips
  • Some exercises have solutions in the “Solutions” chapter. If they do, please read the solution after you have completed the exercise – often there will be a comment or a hint that will help you understand the material.
Remember!
  • Run all code chunks in your RStudio.
  • Do all exercises.
  • Go through the “Review” section at the end of each chapter and make sure you understand everything on the list.

Take a look at the right margin!

New concepts are highlighted on the right margin of the book

And again2!

2 And also footnotes.

In each chapter there are several exercises. However, that does not mean that you should only do the exercises. In fact, you should try out every piece of code that is in the book. Copy it (there is a 📋 button next to each code block that will do it for you), paste it into your RStudio and run it. Then try to modify it and see what happens.

Exercises in this book are important. They are not only there to check if you understood the material, but they can also introduce new concepts or ideas. This is because this book is not only about learning R, but also learning how to learn about R. So, for example, sometimes we will want you to figure stuff on your own rather than give you a ready-made answer.

Many of the exercises have a solution provided, either inline (you have to expand it by clicking on the “Solution” button) or in the “Solutions” section of the book. It is really important that you do not give up too quickly. Try to solve the exercise on your own, and only then look at the solution.

Each chapter is ended by a “Review” section, which contains a list of things that you have learned that day. It is really important that you go through that list and make sure that you understand everything on it. Some of the new things appeared in the exercises, so if you skipped them, you might want to go back and do them.

If you do all that, I personally guarantee you that by the end of this course you will be able to use R in your work.

General advice

The course is a real crash course. There is a lot of material coming at you in a very short time. You will feel overloaded and overwhelmed – this is normal. Don’t worry! It will soon get better, and in a few days you will be able to do fairly advanced things with R.

The key is to keep playing with your R; trying out new things, breaking it. Please go through all exercises in that book, even if they seem simple at the first glance (some of them are tricky, others are used to smuggle in new concepts and useful tidbits of information).

Whenever you feel you don’t understand something, stop and try to figure it out. Use internet search very liberally. Many answers can be found on sites such as StackOverflow, R-bloggers, or simply in the R documentation3. Try out the code you will find in these sources – just copy-paste it into your RStudio and adapt it to your needs. Feel free to use Large Language Models (such as GPT) – they are very good at explaining code, especially when you are learning basic concepts.

3 You can access the R documentation by typing ?function_name in the console. You can search for concepts using ??keyword.

However, if you want to learn R, simply doing this course will not be enough. You need to start using it in a real world setting. Unfortunately – the better you already are at Excel, Word and other such tools, the harder it will be switching to R: tasks that are a breeze in Excel will at first require you to spend substantially more time in R. However, trust me: it pays off in the long run. Therefore, for best results, force yourself to use R even if at first it is less efficient then other tools.

Finally: programming can be fun. Most programmers I know simply enjoy doing that. It is a satisfaction similar to building Lego models or solving puzzles or reading a crime story – and also very much like doing experiments. Unlike experiments, however, you can’t break expensive equipment and the results of your attempts are usually immediately visible, no need to wait for weeks for the results. You can play, experiment, try out various things at no expense and no risk. If you can get into this mindset, learning R will be much, much easier.

Acknowledgements

I have been teaching statistics and bioinformatics for more than two decades now. About ten years ago, I started teaching the first “R crash course” with the goal of introducing PhD students and postdocs to R as quickly and as painlessly as possible. The course evolved over the years, and I had many partners and co-teachers.

In the first place, I would like to thank Carlo Pecoraro from Physalia Courses for the opportunity to teach my first R crash course for Physalia.

Several times I have taught the course together with my colleague, Dr. Manuela Benary from the Core Unit of Bioinformatics at the Berlin Institute of Health, Charité - Universitätsmedizin Berlin. Manuela has been a great partner in teaching the course, and many ideas in this book are actually hers.

I would also like to thank many colleagues and friends who had the patience to go through this book and provide feedback.

Finally, I thank you, the Reader, for your interest in this book. I hope it will be useful and that you will enjoy using R in your research.