A Quick Introduction to R

_Wintermute · on Feb 6, 2022

My least favourite things about R is its desire to keep on running when it should have errored on something about 50 lines before and happily spitting out some nonsense result - maybe with a warning, often not.

One of my previous jobs basically turned into an in-house R consultant for a department in a pharmaceutical company, and I caught so many bugs when investigating some other issue which meant the results people were reporting were completely wrong. A really common one is multiplying 2 vectors of unequal length where broadcasting shouldn't be possible and it just recycles the shorter vector - but hey, it ran without error and there's an output so many researchers don't notice.

Not to mention trying to handle errors is pretty miserable, if you want to catch a specific error you have to match the error string, unfortunately the error message changes depending on the locale the R session is running in.

haddr · on Feb 6, 2022

This is why it is a good idea to do “set options(warn=2)” to turn warnings into errors & easily spot problems.

melling · on Feb 6, 2022

Any other tricks that are helpful for a beginner to know?

    Use Rstudio 
    Include tidyverse
    Turn warnings into errors

funesrequiem · on Feb 6, 2022

I can't recommend the "R for Data Science" (https://r4ds.had.co.nz) book enough, which is written by one of the creators of the tidyverse, Hadley Wickham. This opinion might get challenged here, but if you're going to use R primarily for data science/analysis and not for programming I think it's a better idea to start learning it with the tidyverse than with base R (beyond the basics, of course, which are also covered in the book).

I use R professionally for biostatistics and I can't remember the last time I had to use the base syntax because something couldn't be done with the tidyverse approach.

_jules · on Feb 6, 2022

Would be interesting if you could expand. I've used r (data.table) extensively in the last years for biostatistics in a research organization. I was able to get away with not learning tidyverse and stick to data.table. Main reason for choosing data.table was speed - I'm working with tens of hundred of GB of data at once.

vharuck · on Feb 6, 2022

What's worked for me is reading Hadley Wickham's "Tidy Data" paper[0] and then applying the concepts with data.table. The speed is nice, but I really love what's possible with data.table syntax and how any packages work with it. That's opposed to what many people have decided "tidy" means, with non-standard evaluation and functions that take whole tables and symbols of column names instead of vectors.

[0]: https://vita.had.co.nz/papers/tidy-data.html

tfehring · on Feb 8, 2022

Compared to data.table, tidyverse offers significantly better readability and ergonomics in exchange for worse computational and memory efficiency, with the magnitude of the performance ranging from negligible to catastrophic depending on the operation and your data volume. At that data volume, you're probably doing some things that would OOM or hang for days if you translated your data.table code to the corresponding tidyverse code.

carlmcqueen · on Feb 6, 2022

dtplyr is an option as well which lets you use tidyverse syntax with data.table backend. Speed and syntax.

If you learned data.table however, it's better to just stay in data.table. Nothing in the tidyverse can touch the efficiency of data table.

jstx1 · on Feb 6, 2022

Don't use tidyverse if you don't need to. Certainly don't start with it if you're a complete beginner. Base R goes a long way on its own.

defaultuser9 · on Feb 6, 2022

Agreed. IMO Tidyverse is a fantastic suite of R packages and worth learning after understanding how to use base R/with minimal dependencies. I personally started with base R and evolved to use tidyverse. Now I use base R when writing R packages and use tidyverse for data analysis/modeling workflows.

adeelk93 · on Feb 6, 2022

I would say this is bad advice. Don’t learn base R, focus on tidyverse. Tidyverse is what people write and use.

jker · on Feb 6, 2022

I’ll second this, though with some hesitation. If you just want to get stuff done, start with tidyverse. But if and when it’s time to start writing classes and packages, you may have to go back and gather some of the fundamentals.

trts · on Feb 6, 2022

I agree with both you and GP. Doing heavy stats work in base is pointlessly painful.

Hadley's Advanced R is a great reference for getting down to those fundamentals.

https://adv-r.hadley.nz/

kuhewa · on Feb 6, 2022

I'm a base R purist personally, but that's mostly because of how long ago I picked it up and don't get any improvements in development speed from dplyr verbs with a few exceptions. But I disagree with this take for beginners especially non-programmers, with the advent of tidyverse it is incredible how fast newcomers pick up enough fluency to handle basic data massaging, analysis and visualisation.

I think exceptions where base-R is necessary can be taught as they arise.

fn-mote · on Feb 6, 2022

There are several comments below that suggest not using tidyverse because "base R" is the foundation for everything.

I think it is important to use tidyverse because of the many quirks, surprises, and inconsistencies in base R. It would be helpful if others share their reasoning, or at least point to their favorite blog explanation, so that beginners can understand the problems they will face.

Unfortunately 5 minutes of Googleing failed a to produce a reference for me --- the start of some advanced R book that begins by asking "do you need to read this?" and showing examples whose results are predicted incorrectly by most people. Perhaps another user can provide the info.

* Reference to good HN thread: https://news.ycombinator.com/item?id=20362626

* Particularly pointed notes on base R problems: https://news.ycombinator.com/item?id=20363806

_fnhr · on Feb 6, 2022

This depends on what you are using R for. Tidyverse is focused on handling data.frame objects and everything that comes with them. Even ggplot2 uses a data.frame as a default input. And tidyverse has a competitor - data.table, which can be substituted instead (given that you are familiar with base R).

However, some data are better suited to be represented in the form of matrices. Putting matrix-like data in a data.frame is silly, since performance will suffer and you would have to convert it back and forth for many matrix-friendly operations like PCA, tSNE, etc. The creator of data.table shares this opinion [1]. And similar opinions are generally given by people who are familiar with problems that fall outside the data.frame model [2].

[1]: https://twitter.com/mattdowle/status/1037949621773844480?lan...

[2]: https://www.youtube.com/watch?v=9Objw9Tvhb4&t=225s

ProjectArcturis · on Feb 6, 2022

I recommend data.table instead of tidyverse. The syntax is harder to learn initially, but it's much faster.

greazy · on Feb 7, 2022

The equivalent is dplyr not tidyverse, which is huge suite of tools.

ekianjo · on Feb 6, 2022

Don't learn R from books that are more than 5 years old

salamandersauce · on Feb 6, 2022

Is this really a unique to R or do all programming languages have some foibles? For example I spent an hour recently debugging C++ because I forgot that it loves to do integer division despite the fact it's going into an explicitly typed double. No error, no warning. You just have to know and I highly doubt it's desired behavior for most cases.

Most researchers are not programmers and don't care about programming. It's a tool to get the job done and I think you'd run into similar problems with other languages.

dtgriscom · on Feb 6, 2022

If you divide two integers, you get an integer. You can then cast it to whatever you want. Or, if you want some other type, you need to cast it before the operation is done.

salamandersauce · on Feb 6, 2022

Okay. But I'm storing it in a variable explicitly declared to be a double. That should be enough. If I divide two integers in python or R or Julia or a dollar store calculator I don't get an integer and I don't even have to explicitly type the variable. You have to know that C++ will do that. It's not common sense just like R recycling shorter vectors.

nightski · on Feb 6, 2022

I agree with your point that all languages have their quirks. This is a very poor example however. If it automatically converted to float what would you do if you wanted integer division? I think automatic casting tends to get messy/be pretty evil in general but of course there are exceptions.

salamandersauce · on Feb 6, 2022

You could always do something like: int divRes = IntA/intB; double something = divRes*5.342;

At the very least it could warn me. I just tried it in rust and that will error out if you try to divide two ints and store the result in a float which is fine by me.

funesrequiem · on Feb 6, 2022

Hi, would it be possible to contact you to ask some career questions related to the pharmaceutical industry and data science? I'm a biostatistician who uses R for everything and lately I've been thinking about doing a career change, but I'm a bit lost with all the available options.

_Wintermute · on Feb 6, 2022

Sure, my email is in my profile.

fithisux · on Feb 6, 2022

They should have done more for the software engineering side of things because people use it for this reason.

For repl driven development or academic code or exercises it is excellent.

stewbrew · on Feb 6, 2022

Maybe these overly self-confident software engineers should just go RTFM.

clove · on Feb 8, 2022

Sounds like a fun job. How'd you get that position? If you're retiring soon, I'll fill the position for the company.

sva_ · on Feb 6, 2022

My least favorite thing so far was indices starting at 1. It seems blasphemous, in a way.

On a more serious note, I agree that R being too charitable in interpreting things (seemingly without warning) seems to be a problem. You'll have to do some debugging to make sure it actually does what you intended it to do. I've only dabbled in it a bit though.

kergonath · on Feb 6, 2022

> My least favorite thing so far was indices starting at 1. It seems blasphemous, in a way.

In the real world we start counting from 1. CS people cannot stop complaining about it but it makes sense in languages used for mathematics and statistics. Zero-indexing is not very relevant if you don’t care about memory layout.

sva_ · on Feb 6, 2022

I know plenty of mathematicians who insist 0 ∈ ℕ. It's a bit of a joke, like arguing over tabs vs. spaces though.

May I recommend you this fabulous short essay by Dijkstra: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...

It has nothing to do with memory layouts.

kergonath · on Feb 7, 2022

> It's a bit of a joke, like arguing over tabs vs. spaces though.

It is taken very seriously, though. This “issue” comes up very often when some people come and lecture others about how stupid the language they use is.

> May I recommend you this fabulous short essay by Dijkstra

That essay is not fabulous, it is obnoxious. I know you either love or hate Dijkstra and he enjoyed being a contrarian, but he’s unconvincing. The only point that surfaces during arguments on 0-indexing is iterating over 1..N-1 instead of 0..N. That’s basically what he wrote himself. This could have been solved with just a bit of syntax if it were really a problem, and it remains largely because C did it that way to simplify pointer arithmetics. It does not change the fact that for the vast majority of people, the first element in a list is, well, first.

The proper way of handling this is to allow for arbitrary indices, because you will always find contexts where a different scheme makes sense (e.g. iterating from -10 to 10 is sometimes natural, and would otherwise require some index gymnastics). Insisting that one narrow view is the correct one is just annoying.

sva_ · on Feb 7, 2022

I dunno, it seems you misunderstood me, or idk. I clearly said that it is completely arbitrary to choose one over another, and expressing a preference over either one is just a way of poking fun at people who are anal about choosing a specific one. So there isn't really any disagreement, though I'm always amazed at what lengths people go to, to express what they think, while they're really just arguing about the definition of some thing.

> It is taken very seriously, though.

And those who do take it terribly seriously deserve being poked at ;)

CornCobs · on Feb 6, 2022

Honestly indices starting from 1 fits really nicely in most situations. 1-based indexing together with ranges and inclusive range-based indexing makes loops and subsetting code really readable IMO

hashimotonomora · on Feb 6, 2022

It’s pretty standard in math software such as Matlab and Octave.

ProjectArcturis · on Feb 6, 2022

That's my favorite thing for R relative to Python. Far more intuitive to start at 1 rather than 0.

ekianjo · on Feb 6, 2022

> It seems blasphemous, in a way.

It's more natural. You never count from zero with real life objects.

Mikeb85 · on Feb 6, 2022

It's standard because indexes start at 1 in Fortran. Not sure why it's an issue, especially because you never need to use loops in R anyway.

epgui · on Feb 6, 2022

Many languages are like that. I learned recently that indices also start at 1 in PostgreSQL / PLpgSQL.

streamofdigits · on Feb 6, 2022

R is frequently compared with python and julia which are general purpose programming languages but it is not really a proper comparison. Once you approach R as a domain specific language / system then its various quirks and pecularities are more palatable and explainable: they are in a sense the price to pay for tapping a large domain of statistical analysis expertise that is not available elsewhere.

jstx1 · on Feb 6, 2022

This is mental gymnastics. People have some job to do and are looking for an appropriate tool for it; sometimes that’s R and other times it isn’t. Who cares if you call it a DSL or a general purpose language. If I want to do something and the language makes it difficult, telling myself “oh but it’s a DSL” doesn’t get me any closer to solving my problem.

tmoertel · on Feb 6, 2022

> If I want to do something and the language makes it difficult, telling myself “oh but it’s a DSL” doesn’t get me any closer to solving my problem.

Unless the thing that makes the language difficult is your expecations. In that case, offering you an alternative mental model that helps you make better decisions when using the language does get you closer to solving your problem.

ineedasername · on Feb 6, 2022

>makes it more difficult

Yes, sure, as long as you recognize that as a very subjective determination.

From the statistician's non-programmer POV the syntax of R or some other language are similarly opaque. Learning one vs. another will present similar investments in time. From their perspective, R does not make things more difficult, and the fact that it's more of the lingua franca within the field has it's own benefits.

The people I see complain about R are usually people that learned a different general purpose language first and find that when work requires data analysis they much prefer the GPL for working through the non-analytical portions if their work. (Especially with python where pandas and numpy have made less specialized tasks much easier)

asdff · on Feb 6, 2022

From a statisticians POV the R syntax is great. Here is the t test:

t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, …)

A statistician opens the vignette and already knows what all of these variables represent mathematically, and can begin producing analysis immediately.

ineedasername · on Feb 6, 2022

Yes, precisely. Very much not the pythonic way but that only matters if your prior background before R was python. If your background was SPSS then many of these would be drop downs or check boxes, and (IMO) it's superior to the SPSS scripting language as well.

Heck, my background before using R was python and SPSS and I still prefer R for precisely the example you gave: fine-grained control built in as above, specifying how to handle missing values etc.

I end up using python for large scale data prep.

stewbrew · on Feb 6, 2022

It's important to keep this in mind though because R (or rather S) is primarily supposed to be used interactively. A prof of mine used to call the R REPL and then go on from there. He called an editor from the REPL, wrote source files from the REPL etc. Once you see someone working with R like that, you start seeing R as what it is.

The beautiful it is to be used interactively, it really takes a lot of practice to write reliable code that doesn't abort with some error now and then.

jstx1 · on Feb 6, 2022

I think the point about interactivity is pretty well understood. Another comment in the thread pointed out how the majority of people who write R do it in RStudio and RStudio's defaults push an interactive workflow on the users (the nature of the work you do has a similar effect). So even for someone very new to the language it's pretty obvious.

legerdemain · on Feb 6, 2022

Saying that R is a domain-specific language for statisticians, and thus its quirks are ignorable, is an incomplete answer. An R program is never just a series of calls to specialized library functions. Programs still need to ingest and emit data, manipulate data ad hoc, take conditional branches based on some runtime condition, and so on. And that glue code must still be written in R. I've had to write a lot of that glue code in R.

As someone who mostly writes not-R, my own R irritation comes from a handful of things:

- The dot character "." has no semantic meaning in identifiers. It's just a valid character for names. Looking at function names like "is.numeric" really messes with my reading comprehension.

- Ambiguously, "." also separates identifiers of objects in one of R's type systems from method calls. In some cases, `foo(bar)` and `bar.foo()` are equivalent. But only in some cases.

- Even better, a popular R library defines a function `.()` (i.e., its name is just a single period character), whose job is to expose a surprising quote/unquote expression evaluation semantics.

- This is not to mention the special meaning of "." in formula literals, which are fairly ubiquitous in R.

- Different authors use different naming conventions. Base prefers "as.numeric," Tidyverse might have "to_factor," another library might prefer camel case.

- Finally, R has a surprisingly extensive syntax, exercised by different libraries to different extents, and a correspondingly rich semantics, with "types," "modes," multiple class systems, "expression" objects, immediate and lazy evaluation, expression quoting and unquoting, metaprogramming, and homoiconicity. It is a zoo of a language.

j7ake · on Feb 6, 2022

Once you include the statistical packages, ggplot2, and dplyr, there is nothing that beats R in ease of prototyping for data exploration, model fits and sanity checks, and data visualisation of high dimensional data.

funesrequiem · on Feb 6, 2022

I don't know if you've heard about it, because it is a relatively recent development, but the tidymodels ecosystem of packages (https://www.tidymodels.org) is also breaching the gap from data exploration/visualization to advanced modeling and machine learning in a way that feels really natural if you're used to the tidyverse way of doing things. It's developed by RStudio as the improved version of caret. I've been using it for differential gene expression analysis and it's a game changer in how much time it saves me.

folli · on Feb 6, 2022

What about python and its countless packages? (Honest question, I 'grew up' using python in an academic setting, but haven't caught up with the latest developments)

j7ake · on Feb 6, 2022

I use python for more data engineering, large scale processing, and PyTorch.

In my experience, the specific things R does well, python does it in a clunkier way.

Statistical software written by statisticians in academia, bioconductor, and quick prototyping is still much faster in R than in python.

My use case is to prototype in R, then move to python if things become more production rather than exploratory.

FranzFerdiNaN · on Feb 6, 2022

Python can of course do the same, its just so much clunkier to do it.

stewbrew · on Feb 6, 2022

Python isn't really an advancement. But it's a more obvious choice for people with a background in software engineering. I have some hopes for Julia though.

jhbadger · on Feb 7, 2022

As someone who used ruby (yes real ruby, not rails) before python or R, I definately think R is better for data science and ruby better for everything else. Sadly, I predict a future where python rules over everything.

Dyac · on Feb 6, 2022

I've been using https://exploratory.io/ a lot, which is r in a really nice wrapper where you can do everything point and click, by writing code by hand or a mix.

fithisux · on Feb 6, 2022

Tidyverse!

dm319 · on Feb 6, 2022

I love R. Once you get it, there is something beautiful about its functional approach. I like using either tidyverse or data.table with pipes, split, map, reduce. The code looks like layers of a filter that data flows through.

halhen · on Feb 6, 2022

Agree! That, and (almost) everything is a vector... Which makes perfect sense for an analytics language.

Once I grokked that R became my default language for anything analytics.

zenlf · on Feb 6, 2022

On the contrary, I'm not a fan of R, I'm only a fan of Hadley Wickham and how the Tidyverse and ggplot2's API are designed.

They are just incredibly intuitive and easy to use. ggplot2 has fundamentally influenced how I think about plotting.

With my limited experience, I have never seen anything like it.

dm319 · on Feb 6, 2022

If it wasn't for some of the lisp-like capabilities of R, you would never have tidyverse

EDIT: reference https://news.ycombinator.com/item?id=15869039

jhbadger · on Feb 7, 2022

On the other hand, if it were lispiness that was the issue, surely xlispstat would be the winner. I love xlispstat. I used it in grad school in the 1990s and even maintain the github repository https://github.com/jhbadger/xlispstat . But the fact is xlispstat never appealed to the general statistical community and R did.

dm319 · on Feb 7, 2022

I thought xlispstat was a big deal in statistics at its peak? I suppose both R and xlispstat are (to varying degrees) lisp-based, so another way of looking at it is that statisticians like lisp?

jhbadger · on Feb 8, 2022

I don't think it got as much popularity in its day as R does now, but it was popular to a degree. But that was also because at the time it was pretty much the only free statistics programming environment -- at the time the choice was either xlispstat or pay for a licence for S-PLUS, SAS, or the like.

CornCobs · on Feb 6, 2022

Seconded. I was taught ggplot by a great stats professor and the framing of visualizations as a language (gg actually stands for the grammar of graphics!) describing the relation between data and visual elements (layers in the graph) really made something click.

The amount of consideration and careful design behind tidyverse APIs (tidyr, ggplot, dplyr) really astounds me. I've never felt the need to actually memorize any of them but they come to me so naturally whenever I type "library(tidyverse)". Very few DSLs, libraries or APIs have ever made me feel this way, and certainly NOT Python and the mess that pandas/matplotlib/scikit is. Even more impressive that he managed to build such a consistent layer atop the hack that is base R.

Note that I've nothing against base R. It really appeals to the hacker in me and it certainly has a ton of cool features (a condition system, multiple function evaluation forms - in what other language are `if`, `while`, `repeat` and even parentheses `(` and the BLOCK STATEMENT `{` all implemented as functions?) but damn if it isn't a mess of corner cases and gotchas.

folli · on Feb 6, 2022

If you get started with R, I heavily suggest to use some kind of IDE such as RStudio or Jupyter Notebook. It makes your life so much easier.

salamandersauce · on Feb 6, 2022

ESS (Emacs Speaks Statistics) is also a great Emacs package for dealing with R.

nomilk · on Feb 6, 2022

I don't have a source for this, but I think R as a language has one of the highest concentrations of users in a single IDE - and for good reason - something like 80% use the free (and amazing) RStudio IDE.

funesrequiem · on Feb 6, 2022

And what would you say the remaining 20% use? Because I've never seen anyone using R outside of RStudio.

_fnhr · on Feb 6, 2022

I use R with Vim. Usually the R script file is open on top and there is a :terminal buffer with R running below. And I use a small vim-plugin [1] for sending commands from the editor to the REPL.

This has a few advantages, major being that you can run any language with a dynamic REPL this way, without changing your setup. Or, you can even have two files, written in two different languages, open side by side with a corresponding REPLs running beneath each of them. The downside of course is that you miss on auto-completion and other integrations like that. These are not impossible, but you would have to torture your Vim setup quite a bit in order to implement them.

[1]: https://github.com/karoliskoncevicius/vim-sendtowindow

phonebucket · on Feb 6, 2022

I also use vim for R for mostly the same reasons.

However, you do indeed get autocompletion and many IDE amenities with the language server protocol. Naturally it’s not at the same level as RStudio. But one tool to play with any language is a very nice thing.

nomilk · on Feb 6, 2022

Guessing here: probably Jupyter notebooks, Emacs and vscode, and perhaps the (very minimal) R IDE (if we can call it that) that comes with the installation of base R.

I use R directly from the terminal quite a bit for any small jobs, like calculations, purely due to the <1000ms boot time.

dm319 · on Feb 6, 2022

nvim-R for me. opens up a terminal and allows me to send code across from my scripts.

Mikeb85 · on Feb 6, 2022

Honestly, R Studio is still the best IDE I've used for any language...

bitcharmer · on Feb 6, 2022

In my domain q/kdb is used extensively. I don't have a decade to master obscure syntax/grammar just for one simple purpose of extracting some data set from a larger population and maybe do some basic statistics on it.

If you're like me R is a godsend. You'll also love the tonnes of free packages. You can't get wrong with R if you appreciate simplicity and intuitiveness.

scottmcdot · on Feb 6, 2022

The difference between assigning variables via "=" versus "<-" is not mentioned. That would be confusing to someone learning R.

_fnhr · on Feb 6, 2022

In practice the difference is almost non-existent, unless you start doing assignments within function calls, which is a popular style among some R stars, like Martin Machler [1]. But on the other hand some of them resolve to just always use "=" everywhere, including one of R's creators - Ross Ihaka [2].

Anyhow, explaining the difference at that part of the tutorial is not easy, so I chose to omit it for now. But might introduce it later, along with "<<-" and "->>", probably after describing closures.

[1]: https://github.com/cran/diptest/blob/master/R/dipTest.R#L37

[2]: https://www.youtube.com/watch?v=88TftllIjaY&t=2101s

huhtenberg · on Feb 6, 2022

> Recycling

https://github.com/karoliskoncevicius/tutorial_r_introductio...

Gotta say this is very elegant.

awild · on Feb 6, 2022

Maybe someone can help me with this, how do you integrate r as a cli tool? I'm in a mostly R shop but its integration is so confusing and/or bad with other tools that we usually just rewrite everything in python for integration (which obviously is a huge waste of time). R packages etc have me as an outsider confused,though seem like the obvious choice?

dm319 · on Feb 6, 2022

I love R, nothing better for data analysis, stats and plotting. However, if I was making software for other people to use, repeatedly, I would probably pick another language. The R language does have breaking changes, especially in commonly used packages.

fastaguy88 · on Feb 6, 2022

You can always write: Rscript --vanilla your_r_script.r (and #!/usr/bin/env Rscript --vanilla )

Command line arguments are available as:

args <- commandArgs(trailingOnly=TRUE)

And there are three getopt()-like packages: getopt, optparse, and argparse.

awild · on Feb 7, 2022

We've done that but someone used relative includes/require statements and everything broke. It's exceptionally annoying.

eliashaddad · on Feb 15, 2022

In this case you should probably use the "here" or the "rprojroot" packages (libraries in conventional R parlance). They both simplify the usage of relative paths inside a project/repository.

If you have a project root with the folders code, data, etc and are running a project on /path/root/code, you can then just call data_dir <- here::here("data") for the data folder, as the here package uses several always to find the root of a project (e.g., looking for a .git folder).

gompertz · on Feb 6, 2022

Personally I use my programming language of choice to generate a ".r" script and then use the os exec system call of said language to call Rscript scriptname.r... If I'm understanding your question correctly.

jack_squat · on Feb 6, 2022

This is the R resource I recommend: https://www.amazon.com/Using-Introductory-Statistics-Chapman...

Takes a weekend to work through the book and you get a statistics refresher as a bonus.

Gatsky · on Feb 6, 2022

Great overview. Slight shame it leaves out 'lapply' etc though (and says as much at the top). I just remember realising that you can have lists and run functions on them when I was learning R, and it seemed like a superpower.

upbeat_general · on Feb 6, 2022

R reminds me a lot of matlab. Used mainly for compatibility/libraries/ecosystem but still a frustrating interpreted language at its core.

aseerdbnarng · on Feb 6, 2022

This is probably written by a programmer for that reason (and reading the ‘why R is bad’ comments) shows how misunderstood R is by most programmers. Its like giving someone an introduction to the english language by showing them the alphabet and listing punctuation. Yes technically all true, but none of it will stick

dm319 · on Feb 6, 2022

Yes, there is a lot of R-bashing by people used to imperative languages designed for efficiency in repetitive tasks, not a functional language designed for numerical analysis. The complaints fall into these categories:

1. It's not zero-indexed (even though most numerical languages aren't)

2. Loops are slow (though if you're looping in R you're probably doing it wrong)

3. It's inconsistent

4. The syntax is weird.

But people don't talk about the somewhat beautiful functional ability of the language to wrangle data almost magically. Its basis in lisp allows for the tidyverse and data.table to exist[1], and ggplot is a formidable analysis/plotting platform that Python doesn't come close to.

[1] https://news.ycombinator.com/item?id=15869039

throwawayboise · on Feb 6, 2022

I attended an intro to R workshop and found it very confusing. Being "functional" had nothing to do with it. Inconsistent, yes very much so in my opinion. It felt like a lot of little separately developed tools thrown together into a bundle. But I think mostly my difficulty with R is that I'm not a researcher or statistician. My exposure to and experience with those domains was an undergrad class or two many decades ago. If you don't deeply understand the problem space for which R is intended, you will be lost and confused trying to learn it.

dm319 · on Feb 7, 2022

It's a very different language to imperative languages out there, so it's not surprising that an introductory course would be confusing. There are several ways to do things in R (for example subsetting data, or pulling out elements of structured data), but that doesn't mean it's inconsistent - they are convenience functions. As you say, you have to do some statistics 'in anger' to really get why R is so good. When I've taught introductory sessions on R I focus more on a very short analysis to demonstrate what it is good at.

curiousgal · on Feb 6, 2022

Best thing about R is Shiny

fithisux · on Feb 6, 2022

bookdown has a wealth of online books for R.

nomilk · on Feb 6, 2022

1^NA is 1, and 2^NA is NA. Bizarre!

nosianu · on Feb 6, 2022

It works and is IMO quite okay because NA is not the same as NaN (not a number). NA _does_ actually stand for a number, it's just that we don't know it.

Which is an interesting detail in R that should be mentioned anyway, the difference between NA and NaN. Anyone used to languages which just NaN may confuse NA for that non-value.

https://www.r-bloggers.com/2012/08/difference-between-na-and...

https://cran.r-project.org/doc/manuals/r-release/R-lang.html... (lots of details how NA and NaN are handled)

Except - 1^NaN also is 1... now that IMO is wrong. But you can try the same in your browser's JS console and you will get 1 as a result too, so R is not the only one.

There are several NA values in R - NA_integer_, NA_real_, NA_complex_ and NA_character_, and the results will be different if you use some of them. NA_character_ and NA_complex_ will produce errors (different ones).

_fnhr · on Feb 6, 2022

Agree about 1^NaN being strange.

Also R's cleverness with NA's is not so consistent. For example:

    median(c(1,1,1,NA))

Should return 1, since no matter what value is behind NA the median is still 1. But it returns NA.

fithisux · on Feb 6, 2022

R has many cases of inconsistency. Like substitute

"its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged."

dim and dims :-)

R could do more here. I really like R.

nomilk · on Feb 6, 2022

Interesting. I must admit I've never used substitute.

I tried dims but: Error in dims(iris) : could not find function "dims"

I do find the occasional oddity. I've noticed more very useful messages/warnings (particularly in common tidyverse functions) recently, so I think they help.

To be fair, these quirks are generally very uncommon in day to day use.

nojito · on Feb 6, 2022

You may want to check out substitute2 which is much easier to use.

https://rdatatable.gitlab.io/data.table/reference/substitute...

sin7 · on Feb 6, 2022

1*0 = 1 1*1 = 1 1*-1 = 1

2*0 = 1 2*1 = 2 2*-1 = 1/2

When 1 raised to any power equals 1, does the power matter at all? Even if it's unknown, the answer is 1.

nomilk · on Feb 6, 2022

Good point. Although if NA were complex: 1^1+0i is 1+0i.

_fnhr · on Feb 6, 2022

There are different NAs for different types of objects. And that's why you get:

    1^NA_complex_   # NA

But:

    1^NA_real_      # 1