If you know English, you would be able to code in R…

Numbers around us
Numbers around us
Published in
4 min readOct 27, 2022

Every programming languages has its own advantages and disadvantages. Every has the most appropriate paradigm (sometimes it is only appropriate…), every has own syntax, but what I’ve observed so far, every programming language finally is not enough for programmers. That’s why new languages are created, but also inside certain languages frameworks, and dialects appeared. And finally that’s why new versions are still under development.

SQL is nice, right? So why limiting rows looks different in MS SQL and MySQL?
Python is used in versions 2.XX and 3.XX, where many things are made differently.
Why we need Angular or React, if we have CSS, JS and HTML?
* Yes, I know that not every language here is a programming languages, but query and markup languages follow these rules as well.

And what is the answer for questions above?

Evolution.

Some people just used certain languages and get the ideas like:
- Why not do that this way?
- Ok, I understand but I need someone else to understand it as well.
- Hey, all is understandable, but these words looks unfriendly.
- I need this language to be more able to transfer my thoughts.

But what was at the very beginning? As usually in computer science… Zeroes and ones. This is so called machine language which tells computer what kind of sequence means what actions and results. But we understand nothing at all (except individuals).

Then comes second generation of programming languages — assemblers. Usually there are no more only zeroes and ones. Hexadecimal system appears for example. It still not readable for human being, except experts, but one operation written in this language is still one operation on processor.

Later comes the third generation (3GL), when common people could finally guess what is going on. Abstraction goes up, but performance weakens.

But why? Development should mean being better in performance, shouldn’t it? Like in biological evolution: bacteria feed itself much faster and much more effective than mammals, because processes are simpler (under the hood), not looks simpler. For example elephant has to spend huge ammount of energy to gain some. Process looks simpler, but is cosmically more difficult inside.

3GL languages are like higher forms of animal evolution. We see it as much easier to read and even write, but there is some “magic” involved. This magic is translation to lower level language to machine language at the end. and this translation is the reason why performance is the cost of nicer language.

In third generation there are: all C’s (C, Objective-C, C++, C#), Python, Scala, Ruby, Java, Fortran, BASIC and many more. They are difference between them, some are more difficult, some easier, they use different paradigms, but usually they are general purpose languages.

And here comes the knight on white horse… the fourth generation of languages. I omitted word “programming”, because not all of them are strictly programming languages. In this generation there are usually highly specialized domain specific or purpose specific languages as SQL, Matlab or our long awaited friend… R.

But they are usually very readable and understandable for common person. And I said few paragraphs above, they have to be translated to lower levels, what cost some performance. From my experience speed of writing usually rewards speed of execution.

What was this long story above for? Because this was another thing about computer science that I learned about not early enough. This story could show you if your journey with data science is not starting in wrong place. It can let you know that your level of abstraction is closer to another languages, without kicking you out from programming world.

In post title I mentioned that if you speak English (or maybe even know English on “understanding” level), you would be able to code in R. Probably the same could be said about another high level languages, but I’ll focus on R. Why?
- because it is almost pure language (with its own grammar, even grammars ;D)
- because it is domain specific for position I worked and work now: Data analysis.

As I wrote above R has grammars, but what does it mean. That like in other languages there are some dialects, which can change many things, from readability to performance.

Let me tell you about few basic. There is base R where you write as creators of language wanted you to do it, then there is “tidy R” with philosophy of tidy (tabular) data and Hadley Wickham, and finally “data.table” which comes with better performance, but looks little bit less readable on first sight. I personally prefer tidy approach.
Oh, and there is also grammar of graphics in ggplot library based on Leland Wilkinson idea about grammar of graphics, and few smaller.

And finally proof for claim from the title. Imagine that you have database/table/datasource about pupils in schools in your county containing age, class, weight, height and gender. And here is your sample code ( %>% should be read as “and then”).

school_kids %>%
filter(age == 12) %>%
group_by(gender) %>%
summarise(mean_weight = mean(weight), mean_height = mean(height))
And in English:
TAKE school_kids TABLE AND THEN
FILTER KIDS WHO ARE 12 YO AND THEN
GROUP THEM BY gender AND THEN
AND GIVE ME THEIR mean weight and height.

This so called piping (or chaining) can be much longer and more sofisticated, but this way of writing could represent human order of thinking which in domain like data analysis or data science can be very big facilitation.
Just learn English, if it is not your native language.

My next post will be next step into world of R and specifically “tidyverse”.

--

--

Numbers around us
Numbers around us

Self developed analyst. BI Developer, R programmer. Delivers what you need, not what you asked for.