solving for better small-data systems

R, I Love You

It is easier to critique than it is to create. I write this post with much gratitude for R, the R community and particularly R-Core who are paid $0 to bring us R. I’d like to offer an idea and I’m wondering if people are interested in rallying around it.

Julia, I’m in committed relationship

You might have caught the post titled “Julia, I Love You”. It’s the top article on Rbloggers. Perhaps you had the same reaction I did. I read the material, repeated “wow” a few times, and slipped into a contemplative space. Am I betting on an outdated technology? I slapped myself (figuratively) and snapped back to reality. I vaguely remember when Revolution Analytics released side-by-side performance figures there was pushback about an apples-to-oranges comparison. Some tests had to be reworked (or am I just making that up?). People have added comments to the Julia post with performance fixes to the R code used to benchmark against Julia. And in the end, languages come and go but R has withstood the test of time.

I use R | Julia because. . .

Why do people use R? In my (informal, anecdotal, not rigorous, no medals of honor conferred) survey, the reasons people use R are:

  1. It’s free
  2. There are lots of packages on CRAN
  3. It's easy to code

Food for thought:: Julia has #1 and #3 covered and #2 is just a matter of time if the adoption curve is upward sloping. All things being equal…

Performance IS an issue

Something is bugging me. What’s bugging me is how defeated I feel when I see R benchmarked against anything. I see the figures and I think, “it is what it is.” I’m sure the R-defenders will point to all sorts of stats and tell me that I’m wrong. I’ll still shrug. I use R on a daily basis and it consistently feels like an order of magnitude slower than my internal benchmark (lots of hand-waving there). Yeah, I know I can use C++. It's not my cup of tea. If it were I would just use C++ and I wouldn’t try to shim it into R. I’m using R because I want to write in R. I do byte-compile all packages, which helps a little.

Also, R is lacking on multicore. On Windows it's not possible to run tasks in parallel, in-process (if you have gotten this to work, please let me know). Out-of-process communication is slow and there’s the memory tax of maintaining multiple instances of R. Most of the time all cores but one are idle. Tech moves fast. If we don’t challenge ourselves then we decay.

R, I Love You, so here’s my idea

Stop writing new features. R has enough features. Let's make 2.16.0 a performance fix release. This is not an uncommon practice. Instead of treating performance as something we tackle incrementally, let's steer our open source mindshare towards the single goal of performance for a release or two. Let's excite developers who get their jollies from optimization.

Heck, let's pay for it if we have to. There’s an R Foundation, right? Kickstarter has been enormously successful in raising funds for projects that people care about. If a team on Kickstarter can raise $1M for a video game that has a limited shelf life, surely we can raise capital for a tool that some of us practically live in. People will donate because they stand to benefit.

We can organize around 3 broad categories:

  1. High performance, multi-core math libraries
  2. JIT
  3. Lightweight, in-process task parallization on all major platforms

So that’s my idea.


I welcome your reaction and/or a comment (anonymous or not), below!

Subscribe to get new posts via e-mail.
[email protected]