RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://vincebuffalo.com/blog/2012/03/07/thoughts-on-julia-and-r.html below:

Thoughts on Julia andÂ R

Hello, Julia

Julia is an exciting new technical computing language. Itâs still in its infancy, but itâs fast (see below), and already does a lot.

Comparison of Julia to other languages

Thereâs been some excitement on Twitter about Julia. Excitement combined with open source often yields development, which then leads to further excitement, until a mature open source project arises. One of Juliaâs explicit goals is to challenge other statistical computing environments, including R.

Whatâs wrong with R?

R is, without a doubt, changing the world. Itâs being used by industry giants like Facebook and Google, while also providing academic researchers in statistics, biology, psychology, and countless other fields with not only a free and open source statistical environment, but a huge number of user-contributed package through CRAN. Now methods papers in many fields are often accompanied by CRAN or Bioconductor packages. Itâs also a brilliant platform for reproducible, open research, as Bioconductor beautifully illustrates with packaged and version-controlled genomes, microarray probesets, etc.

However, R is suffering from growing pains. For example, there are now 64-bit versions of R, however, vector indexing is still limited by R_len_t (see definition in src/include/Rinternals.h):

/* type for length of vectors etc */
typedef int R_len_t; /* will be long later, LONG64 or ssize_t on Win64 */
#define R_LEN_T_MAX INT_MAX

It appears that one can simply change this to a long and recompile to increase the longest possible addressable vector, but no. Take a look at R_euclidean in library/stats/src/distance.c for an example why: almost all variables for iterating over elements in vectors are defined as integers and donât use this type. One would have to read through every function, and every line of code to fix this.

R_len_t is just one example. Another issue is that R has been slow to adopt new compiler technologies (i.e.Â JIT, optional type indications, etc). R almost always gains speed from pushing stuff to C (the recent bytecode compiler is an exception). This isnât a problem, but itâs a huge limitation to require developers to not only know R, but also C, and also how to interface the two. More modern languages (Java, as well as Python and Julia come to mind) spend more time tracking compiler technology developments and implementing them than R core does (again, Luke Tierney and the bytecode compiler are exceptions). Itâs still sometimes efficient to use C with these languages (consider Cython), but developers in these language arenât cracking open Kernighan and Ritchie everytime they need to have a for loop do something quickly.

Another gripe I have is that R language development is somewhat closed. Despite a quickly expanding user base, the number of R core contributors is not increasing. I find it hard to believe this is due to lack of interest. It seems much more likely this is due to institutional reasons that need to be changed. The nice thing about language development that itâs really hard, so opening up R to more contributors wonât likely flood the existing core with bad ideas and patches. Personally I would dedicate much more time profiling, reading the source, and working on the R language if it were more open.

The last gripe I have is that R is fragmented. Consider Python:

import re
re.search(r'R-([\d]+).([\d]+)', "R-2.15").groups()

Now, consider R:

gsub("R-([\\d]+)\\.([\\d]+)", "\\1", "R-2.15")

# or

library(stringr)
str_match("R-2.15", "R-([0-9]+)\\.([0-9]+)")

Now, Python also has PyPIâs re2, but most developers are using re. The motivation behind stringr is that Râs currently family of string processing functions are horribly inconsistent:

# (my ... to avoid writing all parameters)
grep(pattern, x, ...)
regexpr(pattern, text, ...)
gsub(pattern, replacement, x, ...)
strsplit(x, split, ...)

But rather than deprecate these and move forward, we now have two sets of string processing functions. Both are being used. Iâm not saying Hadley Wickham is to blame here; quite the contrary, heâs trying to fix a very annoying problem in the language and should be commended. I think the community needs to be more open; for example, before writing a package that processes strings, letâs discuss an implementation plan, deprecating old functions, etc. If not, in the future R will be highly fragmented, and end up with five different object orientation systemsâ¦ oh, wait.

What would it take to âchallengeâ R?

Contributors to Julia are optimistic they can challenge R based on a solid foundation of JIT compiling, parallelism, and nice language semantics. I salute this optimism, but I think we need to realistically consider what it would take to âchallengeâ R.

First, we would need to build an equal statistical computing environment. Consider moving all of stats, MASS, graphics, grid, etc to Julia. Is Julia sufficiently faster than R will be in the time it takes to port these base packages? Remember, R is a moving target; despite my few earlier gripes, R will evolve and get faster. Now, consider adding the extremely popular CRAN packages like ggplot2 and lattice to Julia. In the time it takes to port these, is Julia still sufficiently faster than R will be?

Suppose it is still faster than R. What about after we port the rest of CRAN, and all of Bioconductor to Julia? My point isnât say that itâs unimaginable that Julia will surpass R. Itâs that developers should really dissect what makes a successful language successful before they try to challenge it. I donât have a horse in this race; I would love to see Julia surpass R. But if all developers want is a fast environment to analyze large data sets using a wealth of methods and libraries, it may be a lot easier to make R faster than to develop a new fast language and hope/wait/beg the community to move over.

Update: Julia core developer Jeff Bezanson sent me a very kind email on March 9th, 20012 about this post. In it, he said the âchallenge Râ statement was made by a community member and is in no way the mission of the Julia language. He had many kind words to say about the R langauge and its statistical functionality.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4