Best Practices for Writing R Code (2024)

Last updated on 2024-04-02 | Edit this page

Overview

Questions

How can I write R code that other people can understand anduse?

Objectives

Describe changes made to code for future reference.
Understand the importance about requirements and dependencies foryour code.
Know when to use setwd().
To identify and segregate distinct components in your code using #or #-.
Additional best practice recommendations.

Keep track of who wrote your code and its intended purpose

Starting your code with an annotated description of what the codedoes when it is run will help you when you have to look at or change itin the future. Just one or two lines at the beginning of the file cansave you or someone else a lot of time and effort when trying tounderstand what a particular script does.

R

# This is code to replicate the analyses and figures from my 2014 Science# paper. Code developed by Sarah Supp, Tracy Teal, and Jon Borelli

Also consider version-controlling your code as for example taught bySoftware Carpentry’sgit-novice lessons. Frequent commits with explanatory,synoptic commit messages are an elegant way to comment code. RStudioenables this with a Git integration.

Be explicit about the requirements and dependencies of yourcode

Loading all of the packages that will be necessary to run your code(using library) is a nice way of indicating which packagesare necessary to run your code. It can be frustrating to make ittwo-thirds of the way through a long-running script only to find outthat a dependency hasn’t been installed.

R

library(ggplot2)library(reshape)library(vegan)

Another way you can be explicit about the requirements of your codeand improve it’s reproducibility is to limit the “hard-coding” of theinput and output files for your script. If your code will read in datafrom a file, define a variable early in your code that stores the pathto that file. For example

R

input_file <- "data/data.csv" output_file <- "data/results.csv"# read inputinput_data <- read.csv(input_file)# get number of samples in datasample_number <- nrow(input_data)# generate resultsresults <- some_other_function(input_file, sample_number)# write resultswrite.table(results, output_file)

is preferable to

R

# checkinput_data <- read.csv("data/data.csv")# get number of samples in datasample_number <- nrow(input_data)# generate resultsresults <- some_other_function("data/data.csv", sample_number)# write resultswrite.table("data/results.csv", output_file)

It is also worth considering what the working directory is. If theworking directory must change, it is best to do that at the beginning ofthe script.

Be careful when using`setwd()`

One should exercise caution when using setwd(). Changingdirectories in a script file can limit reproducibility:

setwd() will return an error if the directory to whichyou’re trying to change doesn’t exist or if the user doesn’t have thecorrect permissions to access that directory. This becomes a problemwhen sharing scripts between users who have organized their directoriesdifferently.
If/when your script terminates with an error, you might leave theuser in a different directory than the one they started in, and if theythen call the script again, this will cause further problems. If youmust use setwd(), it is best to put it at the top of thescript to avoid these problems. The following error message indicatesthat R has failed to set the working directory you specified:

Error in setwd("~/path/to/working/directory") : cannot change working directory

It is best practice to have the user running the script begin in aconsistent directory on their machine and then use relative file pathsfrom that directory to access files (see below).

Identify and segregate distinct components in your code

It’s easy to annotate and mark your code using # or#- to set off sections of your code and to make findingspecific parts of your code easier. For example, it’s often helpful whenwriting code to separate the function definitions. If you create onlyone or a few custom functions in your script, put them toward the top ofyour code. If you have written many functions, put them all in their own.R file and then source those files. sourcewill define all of these functions so that your code can make use ofthem as needed.

R

source("my_genius_fxns.R")

Other ideas

Use a consistent style within your code. For example, name allmatrices something ending in _mat. Consistency makes codeeasier to read and problems easier to spot.
Keep your code in bite-sized chunks. If a single function or loopgets too long, consider looking for ways to break it into smallerpieces.
Don’t repeat yourself–automate! If you are repeating the samecode over and over, use a loop or a function to repeat that code foryou. Needless repetition doesn’t just waste time–it also increases thelikelihood you’ll make a costly mistake!
Keep all of your source files for a projectin the same directory, then use relative paths as necessary toaccess them. For example, use

R

dat <- read.csv(file = "files/dataset-2013-01.csv", header = TRUE)

rather than:

R

dat <- read.csv(file = "/Users/Karthik/Documents/sannic-project/files/dataset-2013-01.csv", header = TRUE)

R can run into memory issues. It is a common problem to run out ofmemory after running R scripts for a long time. To inspect the objectsin your current R environment, you can list the objects, search currentpackages, and remove objects that are currently not in use. A goodpractice when running long lines of computationally intensive code is toremove temporary objects after they have served their purpose. However,sometimes, R will not clean up unused memory for a while after youdelete objects. You can force R to tidy up its memory by usinggc().

R

# Sample dataset of 1000 rowsinterim_object <- data.frame(rep(1:100, 10), rep(101:200, 10), rep(201:300, 10))object.size(interim_object) # Reports the memory size allocated to the objectrm("interim_object") # Removes only the object itself and not necessarily the memory allotted to itgc() # Force R to release memory it is no longer usingls() # Lists all the objects in your current workspacerm(list = ls()) # If you want to delete all the objects in the workspace and start with a clean slate

Don’tsave a session history (the default option in R, when it asks if youwant an RData file). Instead, start in a clean environmentso that older objects don’t remain in your environment any longer thanthey need to. If that happens, it can lead to unexpectedresults.
Wherever possible, keep track ofsessionInfo() somewhere in your project folder. Sessioninformation is invaluable because it captures all of the packages usedin the current project. If a newer version of a package changes the waya function behaves, you can always go back and reinstall the versionthat worked (Note: At least on CRAN, all older versions of packages arepermanently archived).
Collaborate. Grab a buddy and practice “code review”. Review isused for preparing experiments and manuscripts; why not use it for codeas well? Our code is also a major scientific achievement and the productof lots of hard work! Reviews are built into GitHub’s Pullrequest feature

Best Practice

What other suggestions do you have for coding best practices?
What are some specific ways we could restructure the code we workedon today to make it easier for a new user to read? Discuss with yourneighbor.
Make two new R scripts called inflammation.R andinflammation_fxns.R. Copy and paste code into each scriptso that inflammation.R “does stuff” andinflammation_fxns.R holds all of your functions.Hint: you will need to add source to oneof the files.

BASH

cat inflammation.R

OUTPUT

# This code runs the inflammation data analysis.source("inflammation_fxns.R")analyze_all("inflammation.*csv")

BASH

cat inflammation_fxns.R

OUTPUT

# This is code for functions used in our inflammation data analysis.analyze <- function(filename, output = NULL) { # Plots the average, min, and max inflammation over time. # Input: # filename: character string of a csv file # output: character string of pdf file for saving if (!is.null(output)) { pdf(output) } dat <- read.csv(file = filename, header = FALSE) avg_day_inflammation <- apply(dat, 2, mean) plot(avg_day_inflammation) max_day_inflammation <- apply(dat, 2, max) plot(max_day_inflammation) min_day_inflammation <- apply(dat, 2, min) plot(min_day_inflammation) if (!is.null(output)) { dev.off() }}analyze_all <- function(pattern) { # Directory name containing the data data_dir <- "data" # Directory name for results results_dir <- "results" # Runs the function analyze for each file in the current working directory # that contains the given pattern. filenames <- list.files(path = data_dir, pattern = pattern) for (f in filenames) { pdf_name <- file.path(results_dir, sub("csv", "pdf", f)) analyze(file.path(data_dir, f), output = pdf_name) }}

Key Points

Start each program with a description of what it does.
Then load all required packages.
Consider what working directory you are in when sourcing ascript.
Use comments to mark off sections of code.
Put function definitions at the top of your file, or in a separatefile if there are many.
Name and style code consistently.
Break code into small, discrete pieces.
Factor out common operations rather than repeating them.
Keep all of the source files for a project in one directory and userelative paths to access them.
Keep track of the memory used by your program.
Always start with a clean environment instead of saving theworkspace.
Keep track of session information in your project folder.
Have someone else review your code.
Use version control.

Best Practices for Writing R Code (2024)

Overview

Questions

Objectives

Keep track of who wrote your code and its intended purpose

R

Be explicit about the requirements and dependencies of yourcode

R

R

R

Be careful when usingsetwd()

Identify and segregate distinct components in your code

R

Other ideas

R

R

R

Best Practice

BASH

OUTPUT

BASH

OUTPUT

Key Points

Be careful when using`setwd()`