Chapter 16 Introduction to R

(Introduction to R)

16.1 Overview

16.1.1 Abstract:

This page collects the learning units for an introduction to R.

16.1.2 Objectives:

...

16.1.3 Outcomes:

...

16.1.4 Deliverables:

No separate deliverables: This unit collects other units and has no deliverables on its own.

16.1.5 Prerequisites:

This unit builds on material covered in the following prerequisite units:

RPR-Coding_style (R Coding Style)

This is a "milestone unit". Its purpose is merely to collect a number of preparatory units into a single, common prerequisite. It has no contents of its own; you are expected to be familiar and competent with all preparatory material at this point.

16.2 The ABC RStudio Project

R-scripts and other resources for the learning units of this course are collected in an RStudio project. This makes it easy to update and distribute code. I push update material to the GitHub repository of the project for any unit, all you need to do is to pull the updated project to receive all updates and new files on your computer. Version control is really useful for this. However, there is an issue that you need to be aware of. If you create your own, local files and then commit them, git will complain that it would be overwriting such local material. As long as you don't commit your files then all should be fine. This means you'll need to do your own "versioning" by saving your own scripts under a different name from time to time. Once again: in this context:

  • saving your own files is fine;
  • committing your own files to version control will cause problems;
  • changes you make to course material files and save under the same filename (like adding comments and notes) will not persist, these changes will be overwritten with the next update. You need to "Save As..." with a new filename (for example, prefix the original name with "my").

16.3 Task 28

  • Open RStudio and create a New Project... cloned from a git version control directory. The repository URL is https://github.com/hyginn/ABC-units. Create this in the same way as you did for the R-tutorial.
  • As requested on the console, type init(). This will create a file called .myProfile.R and ask you for your UofT eMail address and Student ID. You need to enter the correct values because other scripts will assume that these variables exist and are valid.
  • Work through the task: "Local script" in the RPR-Introduction.R script.

16.4 Self-evaluation

  • Understanding the setup
    • Imagine you made a typo when you entered your eMail address and now the file .myProfile.R contains a mistake. How do you fix this?24

  1. and when you click on the arrow to the left, this will take you back to where you came from

  2. Proportional fonts are for elegant document layout. Monospaced fonts are needed to properly align characters in columns. For code and sequences, we always use monospaced font.

  3. [1] means: the following is the first (often only) element of a vector.

  4. A "wrapper" program uses another program's functionality in its own context. RStudio is a wrapper for R since it does not duplicate R's functions, it runs the actual R in the background.

  5. For example C:Documentswould be interpreted as C:Documentsew because is the linebreak character. Even though that's actually the path name on Windows, in an R command you have to write C:Documents/new

  6. Projects that I create for teaching are configured to use this option by default, thus once the project is loaded, the Working Directory should already be correctly set.

  7. Actually, the first script that runs is Rprofile.site which is found on Linux and Windows machines in the C:\Program Files\R\R-{version}\etc directory. But not on Macs.

  8. Operating systems commonly hide files whose name starts with a period "." from normal directory listings. All files however are displayed in RStudio's File pane. Nevertheless, it is useful to know how to view such files by default. On Macs, you can configure the Finder to show you such "hidden files" by default. To do this: (i) Open a terminal window; (ii) Type: $defaults write com.apple.Finder AppleShowAllFiles YES (iii) Restart the Finder by accessing Force quit (under the Apple menu), selecting the Finder and clicking Relaunch. (iV) If you ever want to revert this, just do the same thing but set the default to NO instead.

  9. We use a predictive mental contents-model when we type - something like an inbuilt autocorrect-suggestion mechanism; thus if you type something unfamiliar or surprising (e.g. a subtle detail of syntax), you will notice and be able to figure out the issue. Pasting code is a merely mechanical activity.

  10. A GUI is a Graphical User Interface, it has windows and menu items, as opposed to a "command line interface".

  11. lastNum < 6 | lastNum > 10

  12. lastNum >= 10 & lastNum < 20

  13. ((((9/7) - ((((9/7) * 10) %/% 1 )/10)) * 100) %/% 1 )^(1/3) == 2

  14. We call these "variables" because of what function they perform in our code, they actually are R "objects".

  15. and this means [, ] is correct.

  16. That's assuming the worst case in that the attacker needs to know the pattern with which the password is formed, i.e. the number of characters and the alphabet that we chose from. But note that there is an even worse case: if the attacker had access to our code and the seed to our random number generator. If you start the random number generator e.g. with a new seed that is generated from Sys.time(), the possible space of seeds can be devastatingly small. But even if a seed is set explicitly with the set.seed() function, the seed is a 32-bit integer (check this with .Machine$integer.max) and thus can take only a bit more than 4 X 10^9 values, six orders of magnitude less than the 10^15 password complexity we thought we had! It turns out that the code may be a much greater vulnerability than the password itself. Keep that in mind. Keep it secret. Keep it safe.

  17. The terms parameter and argument have similar but distinct meanings. A parameter is an item that appears in the function definition, an argument is the actual value that is passed into the function.

  18. countDown <- function(n) {
    start <- n
    countdown <- start
    txt <- as.character(start)

    while (countdown > 0) {
    countdown <- countdown - 1
    txt <- c(txt, countdown)
    }
    txt <- c(txt,"Lift Off!")
    return(txt)
    }

    countDown(7)

  19. I'm serious: I have reformatted major pieces of code more than once after learning of a better approach, and if that creates better code it is very satisfying.

  20. It is happening more and more frequently that functions in different packages we load have the same name. Then our code's behaviour will depend on the order in which the libraries were loaded. Evil.

  21. For a complementary perspective, see here.

  22. In my opinion, base R uses far too many function names that would be useful for variables. But we're not going to change that. So I often just prefix my variable names with my- or this-, eg myDf, thisLength etc.

  23. Here are more names that may seem attractive as variable names but that are in fact functions in the base R package and thus may cause confusion: all(), args(), attr(), beta(), body(), col(), date(), det(), diag(), diff(), dim(), dir(), dump(), eigen(), file(), files(), gamma(), kappa(), length(), list(), load(), log(), max(), mean(), min(), open(), q(), raw(), row(), sample(), seq(), sub(), summary(), table(), type(), url(), vector(), and version(). I'm sure you get the idea - composite names of the type proposed above in CamelCase are usually safe.

  24. .myProfile.R is itself a file in the local working directory. Simply open it with the RStudio editor, fix the error, and save. Then type source(".myProfile.R") into the console to overwrite the old (wrong) definition with the corrected one.


  1. .myProfile.R is itself a file in the local working directory. Simply open it with the RStudio editor, fix the error, and save. Then type source(".myProfile.R") into the console to overwrite the old (wrong) definition with the corrected one.