INTRODUCTION TO BASIC R
To start with, R is an interpreted language, not an accumulated one, implying that all directions composed on the console are straightforwardly executed without requiring to build a complete program like in most coding languages.
Second, R’s syntax is exceptionally straightforward and natural. For example, linear regression can be done with the command lm(y ~ x) which signifies “fitting a straight model with y as reaction and x as indicator”. In R, so as to be executed, a capacity in every case should be composed with enclosures, regardless of whether there is nothing inside them (e.g., ls()).If one just types the name of a function without parentheses, R will display the content of the function.
At the point when R is running, factors, data, functions results, and so on, are put away in the dynamic memory of the PC as objects which have a name. The user can do activities on these objects with operators (arithmetic, logical, comparison, . . .) and functions (which are themselves objects). The use of operators is moderately natural.
An R function may require no contention: either all contentions are characterized by default (and their qualities can be adjusted with the alternatives), or no arguments has been characterized in the function. Every activities of R are done on items put away in the dynamic memory of the PC: no temporary files are used.
Data with R
R works with objects which are, of course, characterized by their names and their content, but also by attributes which specify the kind of data represented by an object. In order to understand the usefulness of these attributes, consider a variable that takes the value 1, 2, or 3: such a variable could be an integer variable (for instance, the number of eggs in a nest), or the coding of a categorical variable (for instance, sex in some populations of crustaceans: male, female, or hermaphrodite). It is clear that the statistical analysis of this variable will not be the same in both cases: with R, the attributes of the object give the necessary information. More technically, and more generally, the action of a function on an object depends on the attributes of the latter. All objects have two intrinsic attributes: mode and length. The mode is the basic type of the elements of the object; there are four main modes: numeric, character, complex, and logical (FALSE or TRUE). Other modes exist but they do not represent data, for instance function or expression. The length is the number of elements of the object. To display the mode and the length of an object, one can use the functions mode and length, respectively.
A vector is a variable in the commonly admitted meaning. A factor is a categorical variable. An array is a table with k dimensions, a matrix being a particular case of array with k = 2. Note that the elements of an array or of a matrix are all of the same mode. A data frame is a table composed with one or several vectors and/or factors all of the same length but possibly of different modes. A ‘ts’ is a time series data set and so contains additional attributes such as frequency and dates. Finally, a list can contain any type of object, included lists! For a vector, its mode and length are sufficient to describe the data. For other objects, other information is necessary and it is given by non-intrinsic attributes. Among these attributes, we can cite dim which corresponds to the dimensions of an object. For example, a matrix with 2 lines and 2 columns has for dim the pair of values [2, 2], but its length is 4.
Creating objects
It is possible to make an object and determining its mode, length, type, and so forth. This approach is interesting in the point of view of controlling objects. One can, for example, make a ‘empty’ item and after that change its components progressively which is more proficient than putting all its components with c().It can likewise be helpful to make objects from others. For example, if one wants to fit a series of models, it is simple to put the formulae in a list, and then to extract the elements successively to insert them in the function lm.
Matrix
A matrix is actually a vector with an additional attribute (dim) which is itself a numeric vector with length 2, and defines the numbers of rows and columns of the matrix. A matrix can be created with the function matrix:
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
The option byrow indicates whether the values given by data must fill successively the columns (the default) or the rows (if TRUE). The option dimnames allows to give names to the rows and columns.
Data frame
We have seen that a data frame is created implicitly by the function read.table; it is also possible to create a data frame with the function data.frame. The vectors so included in the data frame must be of the same length, or if one of the them is shorter, it is “recycled” a whole number of times.
If a factor is included in a data frame, it must be of the same length than the vector(s). It is possible to change the names of the columns with, for instance, data.frame(A1=x, A2=n). One can also give names to the rows with the option row.names which must be, of course, a vector of mode character and of length equal to the number of lines of the data frame. Finally, note that data frames have an attribute dim similarly to matrices.
List
A list is created in a way similar to data frames with the function list. There is no constraint on the objects that can be included. In contrast to data.frame(), the names of the objects are not taken by default.
Time-series
The function ts creates an object of class “ts” from a vector (single time-series) or a matrix (multivariate time-series), and some options which characterize the series. The options, with the default values, are: ts(data = NA, start = 1, end = numeric(0), frequency = 1, deltat = 1, ts.eps = getOption(“ts.eps”), class, names)
data: a vector or a matrix
Start: the time of the first observation, either a number, or a vector of two integers (see the examples below)
end: the time of the last observation specified in the same way than start
frequency: the number of observations per time unit
deltat: the fraction of the sampling period between successive observations (ex. 1/12 for monthly data); only one of frequency or deltat must be given
ts.eps : tolerance for the comparison of series. The frequencies are considered equal if their difference is less than ts.eps
class: class to give to the object; the default is “ts” for a single series, and c(“mts”, “ts”) for a multivariate series
names : a vector of mode character with the names of the individual series in the case of a multivariate series; by default the names of the columns of data, or Series 1, Series 2, . . .
Expression: The objects of mode expression have a fundamental role in R. An expression is a series of characters which makes sense for R. All valid commands are expressions. When a command is typed directly on the keyboard, it is then evaluated by R and executed if it is valid. In many circumstances, it is useful to construct an expression without evaluating it: this is what the function expression is made for. It is, of course, possible to evaluate the expression subsequently with eval(). Expressions can be used, among other things, to include equations in graphs. An expression can be created from a variable of mode character.
The following link is the R operator codes uploaded in github for reference