Introduction to R
This section is designed to ensure you are familiar with the R environment.
0.2 Getting started with R at home
Given that we’re all working from home these days, you’ll need to download R and RStudio onto your own devices. R is the name of the programming language that we’ll be using for coding exercises; RStudio is the IDE (“Integrated Development Environment”), i.e., the piece of software that almost everyone uses when working in R.
You can download both of these on Windows and Mac easily and for free. This is one of the first reasons to use an “open-source” programming language: it’s free and everyone can contribute!
IT Services at the University of Edinburgh have provided a walkthrough of what is needed for you to get started. I also break this down below:
Install R for Mac from here: https://cran.r-project.org/bin/macosx/. Install R for Windows from here: https://cran.r-project.org/bin/windows/base/.
Download RStudio for Windows or Mac from here: https://rstudio.com/products/rstudio/download/, choosing the Free version: this is what most people use and is more than enough for all of our needs.
All programs are free. Make sure to load everything listed above for your operating system or R will not work properly!
0.3 Some basic information
A script is a text file in which you write your commands (code) and comments.
If you put the # character in front of a line of text this line will not be executed; this is useful to add comments to your script!
R is case sensitive, so be careful when typing.
To send code from the script to the console, highlight the relevant line of code in your script and click on Run, or select the line and hit ctrl+enter on PCR or cmd+enter on Mac
Access help files for R functions by preceding the name of the function with ? (e.g., ?table)
By pressing the up key, you can go back to the commands you have used before
Press the tab key to auto-complete variable names and commands
0.4 Getting Started in RStudio
Begin by opening RStudio (located on the desktop). Your first task is to create a new script (this is where we will write our commands). To do so, click:
Your screen should now have four panes:
the Script (top left)
the Console (bottom left)
the Environment/History (top right)
Files/Plots/Packages/Help/Viewer (bottom right)
0.5 A simple example
The Script (top left) is where we write our commands for R. You can try this out for a first time by writing a small snipped of code as follows:
x <- "I can't wait to learn Computational Text Analysis" #Note the quotation marks!
To tell R to run the command, highlight the relevant row in your script and click the Run button (top right of the Script) - or hold down ctrl+enter on Windows or cmd+enter on Mac - to send the command to the Console (bottom left), where the actual evaluation and calculations are taking place. These shortcut keys will become very familiar to you very quickly!
Running the command above creates an object named ‘x’, that contains the words of your message.
You can now see ‘x’ in the Environment (top right). To view what is contained in x, type in the Console (bottom left):
print(x)
## [1] "I can't wait to learn Computational Text Analysis"
# or alternatively you can just type:
x
## [1] "I can't wait to learn Computational Text Analysis"
0.6 Loading packages
The ‘base’ version of R is very powerful but it will not be able to do everything on its own, at least not with ease. For more technical or specialized forms of analysis, we will need to load new packages.
This is when we will need to install a so-called ‘package’—a program that includes new tools (i.e., functions) to carry out specific tasks. You can think of them as ‘extensions’ enhancing R’s capacities.
To take one example, we might want to do something a little more exciting than print how excited we are about this course. Let’s make a map instead.
This might sound technical. But the beauty of the packaged extensions of R is that they contain functions to perform specialized types of analysis with ease.
We’ll first need to install one of these packages, which you can do as below:
install.packages("tidyverse")
After the package is installed, we then need to load it into our environment by typing library(
What now? Well, let’s see just how easy it is to visualize some data using ggplot which is a package that comes bundled into the larger tidyverse package.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
If we wanted to save where we’d got to with making our plots, we would want to save our scripts, and maybe the data we used as well, so that we could return to it at a later stage.
0.7 Saving your objects, plots and scripts
Saving scripts: To save your script in RStudio (i.e. the top left panel), all you need to do is click File –> Save As (and choose a name for your script). Your script will be something like: myfilename.R.
Saving plots: If you have made any plots you would like to save, click Export (in the plotting pane) and choose a relevant file extension (e.g. .png, .pdf, etc.) and size.
To save individual objects (for example x from above) from your environment, run the following command (choosing a suitable filename):
- To save all of your objects (i.e. everything in the top right panel) at once, run the following command (choosing a suitable filename):
save.image(file="myfilname.RData")
- Your objects can be re-loaded into R during your next session by running:
load(file="myfilename.RData")
There are many other file formats you might use to save any output. We will encounter these as the course progresses.
0.8 Knowing where R saves your documents
If you are at home, when you open a new script make sure to check and set your working directory (i.e. the folder where the files you create will be saved). To check your working directory use the getwd() command (type it into the Console or write it in your script in the Source Editor):
getwd()
To set your working directory, run the following command, substituting the file directory of your choice. Remember that anything following the `#’ symbol is simply a clarifying comment and R will not process it.
0.9 Practicing in R
The best way to learn R is to use it. These workshops on text analysis will not be the place to become fully proficient in R. They will, however, be a chance to conduct some hands-on analysis with applied examples in a fast-expanding field. And the best way to learn is through doing. So give it a shot!
For some further practice in the R programming language, look no further than Wickham and Grolemund (2017) and, for tidy text analysis, Silge and Robinson (2017).
The free online book by Hadley Wickham “R for Data Science” is available here
The free online book by Julia Silge and David Robinson “Text Mining with R” is available here
For more practice with R, you may want to consult a set of interactive tutorials, available through the package “learnr.” Once you’ve installed this package, you can go through the tutorials yourselves by calling:
0.10 One final note
Once you’ve dipped into the “R for Data Science” book you’ll hear a lot about the so-called tidyverse in R. This is essentially a set of packages that use an alternative, and more intuitive, way of interacting with data.
The main difference you’ll notice here is that, instead of having separate lines for each function we want to run, or wrapping functions inside functions, sets of functions are “piped” into each other using “pipe” functions, which look have the appearance: %>%
.
I will be using “tidy” syntax in the weekly exercises for these computational text analysis workshops. If anything is unclear, I can provide the equivalents in “base” R too. But a lot of the useful text analysis packages are now composed with ‘tidy’ syntax.