Download

Please download the files on github

Introduction

What is my objective giving this formation

  • Show basic usage of Rmarkdown
  • Share some personnal advices and reflexions about its usage

Why Rmarkdown?

In my M.Sc. Analysis folder I have… 313 documnents with a “.R” extension!

How many time do you think I searched in more than five folder to find the code to reproduce a result? How many times I open the correct R file without finding the code of interest?

How many duplication do you think I have in this mess… It’s about Git, but it is another story.

Rmarkdown allow you to :

  • Clearly write your way of thinking when trying something/analysing your data set
  • Display reproducible code for anyone interested about your data/results/analysis (Research Director??)
  • Produce professionnal level documents with figures without worrying about formating (Supplementary material!!)
  • Create interactive documents!

In one sentend, Rmwarkdown favor reproducibility, communication, structured thinking, and aesthetism.

Rmarkdown has some advantages :

  • It is easy (“If you can write an emoticon, you can write RMarkdown”)
  • It is quick (far more than producing plots at the right dimensions, then copy-pasting to a word document, then screaming on your computer about layout)
  • Allow you to write automatically internet-friendly document (even books!), without having to worry about html, css and so on.
  • It produces full plain-tet files which works well with version control! I see one con :

  • We (partially) loose the reviewing facility of word.

Let’s begin!

Code and figures

Chunks

Code can be displayed inside chunks. They are delimited by ```{r} and ```.We can insert chunk in the following ways :

  • Ctrl + Alt + i (or Cmd + Alt + i)
  • Add chunk command in the editor bar
  • Code > Insert chunk

Let’s try something very simple

plot(iris)

Chunks option

  • include = F configure the chunk to not displaying the result, but keep it running, with its results available for other chunks!

The line loading the ggplot2 package does not appear in the document, but the ggplot2 packages is available!

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point() +
  theme_minimal()

  • echo = FALSE allows to display only the results of chunks, without showing the code!

  • message = F allows to hide chunk’s messages in the output. Let’s add it to the previous chunk!

  • With eval = F, code inside chunks will not be evaluated. Let’s add it to the previous chunk and decrease our witing time.

  • warning = F and error = F will hide warning and error messages in the final output

cor( c( 1 , 1 ), c( 2 , 3 ) )
## [1] NA
  • fig.cap insert a caption below your plot!

  • You can specify fig.height fig.width and a lot of other figure proterties. Please click to access the reference guide!

  • We can personalize the chunk option for all the document with the following command, specifying the chunk as setup :

knitr::opts_chunk$set(echo = TRUE)

  • We can name the chunks, which is very convenient in long projects! The name won’t appear in the rendered document, but will help navigation when working on it, or in notebooks.

Exercise Create a new Rmarkdown document with the following characteristics:

  • Warning should never be displayed
  • The ggplot2 library shoud be loaded in a chunk which will not be evaluated nor displayed
  • Create a plot with no code displayed
  • Create a ggpairs with the code displayed but without the message associated
  • Generate a warning

Other languages

It is interesting to know that chunks can highlight and run code of other languages, like

  • Bash
hostname
## lucasd-HP-EliteBook-2560p
  • Python
x = "Hello, Python!"
print(x.split(" "))
## ['Hello,', 'Python!']
  • stan
// saved as 8schools.stan
data {
  int<lower=0> J;         // number of schools
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood
}

Inline code

It is possible to display the result of a code in directly in the text. We will exemplify that with an lm exemple

fit <- lm(data = iris, Sepal.Length ~ Sepal.Width)
coef(fit)
## (Intercept) Sepal.Width
##   6.5262226  -0.2233611

Globally, without accounting for among species differences, a 10 cm increase of Iris’s sepal width imply a 0.22 cm decrease of sepal length.

Tables

Formatting a R csv output with excel is often highly fastidious, especially if the results are subject to changes, and the formatting to be done again and again. Fortunatly, Knitr is able to format table. With a little bit of efforts, one might even produce publication standard tables.

Classical kable

A model selection table, selecting for weights > 0.1
Total sleep (%) Body weight (g) Brain weight (g) Diet (Dummy) AICc \(\Delta_{AICc}\) Model weight
0.20 114.01 0.00 0.4211620
0.21 0 116.05 2.03 0.1524904
0.20 0.35 116.09 2.08 0.1490451
0.17 + 116.26 2.25 0.1370259

Publication quality table

One can format very easily publication level table in pdf document using latex. In html, we have to use a little trick and use ghostscipt and kableExtra.

Exercises

  • Could you display the relevant information to understand and evaluate the results of the above linear regression?
  • stargazer package produces automatically interesting tables to compare models, would you try it with the best and second best models?

Output formats

Let’s explore the parameters for our current html output!

Documents

  • html_notebook - Interactive R Notebooks
  • html_document - HTML document w/ Bootstrap CSS
  • pdf_document - PDF document (via LaTeX template)
  • word_document - Microsoft Word document (docx)
  • odt_document - OpenDocument Text document
  • rtf_document - Rich Text Format document
  • md_document - Markdown document (various flavors)

Presentations

  • ioslides_presentation - HTML presentation with ioslides
  • revealjs::revealjs_presentation - HTML presentation with reveal.js
  • slidy_presentation - HTML presentation with W3C Slidy
  • beamer_presentation - PDF presentation with LaTeX Beamer
  • powerpoint_presentation - PowerPoint presentation

An exemple of presentation can be downloaded here.

Table of content

One can include pretty beatifull table of content by modifying the yaml options!

Exercise

  • Would you try to transform the present document into a decent presentation?

Basic formatting

Let’s have a look at the Rmarkdown cheatsheet

# Header 1

## Header 2

### header 3

...

and a blockquote is superior! >

  1. And an
    1. unordered
      1. list
  2. to conclude this part!

Equations

Inline equations

One can insert inline latex equations, just by using $. For example, \(y_i = \beta_1 x_i + \beta_0 + \epsilon_i\) represents a simple linear equations. Thus wonderfull functionality allow to use any mathematical notation. This is useful to indicate that \(\beta_1\) is the slope and \(\beta_0\) the intercept!

Block equations

One can insert block latex equations using $$. For example, one can estimate the parameters of a linear model containing \(p\) predictors (\(p < n\)) using the following linear algebra formulation and a little bit of linear algebra.

\[ \mathbf{Y} \begin{bmatrix} Y_1 \\ \vdots \\ Y_n \end{bmatrix} = \mathbf{X} \begin{bmatrix} 1 & x_{1,2} & \cdots & x_{1,p} \\ 1 & \vdots & \ddots & \vdots \\ 1 & x_{n,2} & \cdots & x_{n,p} \\ \end{bmatrix} \mathbf{\beta} \begin{bmatrix} \beta_0\\ \beta_1 \end{bmatrix} + \mathbf{\epsilon} \begin{bmatrix} \epsilon_1\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \]

References

It is easy to cite a reference in text and to automatically create a reference part if the needed citation are stored in a bibtex file.

First, add the following line to the yaml

bibliography: My_Collection.bib

Then, one can site the reference as follow : Gelman et al. (2013) is a wonderfull introduction to hierarchical models and bayesian estimation methods.

Bookdown

It is possible to write books using RMarkdown, following a set of rules

  1. Create a new folder
  2. Create ordered file for each of the part of this document, for example 1_Introduction.Rmd
  3. The first file shall begin with the standard YAML specifying the titles, Author(s), dates
  4. Each other file must start directly with the header of the section
  5. Create a file named _output.yml, it will contain the output parameters.

An exemple of _output.yml could be

bookdown::html_document2:
  1. Once everything is done, run `bookdown::render(input = “1_Introduction.Rmd”)

Some advantages of bookdown (everything can be found in the manual)

-Numbered and referenced equations; Theorems; Special headers; Text references. - Possibility to use version control on each chapter independantly. This kind of tools is not really adapted to a plain book! - Possibility to compile chapters separatly and prepare the whole for internet publications

References

Gelman, Andrew, Hal S Stern, John B Carlin, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. Chapman; Hall/CRC.