24 de noviembre de 2015
R
es una implementación del lenguaje de programación especÃfico de estadÃstica SPreguntar claramente y con un ejemplo reproducible.
El R desde el CRAN.
El RStudio desde la web en Product -> RStudio
Nota: El R se puede usar solo. RStudio necesita el R.
Help
?
o con la función help
.?sum help(mean)
??
??read
Packages
–> Installinstall.packages('rmarkdown')
25 + 3 # 25 - 3
## [1] 28
23 # 2/3
## [1] 23
12 > 4 # 12 >= 4; 12 < 4; 12 <= 4
## [1] TRUE
!TRUE # (negación); TRUE & FALSE (and); TRUE | FALSE (or)
## [1] FALSE
class(2)
## [1] "numeric"
class(2L)
## [1] "integer"
class("2")
## [1] "character"
class(TRUE)
## [1] "logical"
NULL # NULL object
## NULL
NA # missing value indicator (not available)
## [1] NA
NaN # 0/0 (not a number)
## [1] NaN
Inf # Infinity
## [1] Inf
Datos homogéneos:
Datos heterogéneos:
data.frames
2D (Formato BBDD habitual)c(1,2,3)
## [1] 1 2 3
c("a", "b", "c", "d")
## [1] "a" "b" "c" "d"
c(TRUE, TRUE, FALSE, TRUE)
## [1] TRUE TRUE FALSE TRUE
1:3
## [1] 1 2 3
seq(1, 3)
## [1] 1 2 3
seq(from = 10, to = 0, by = -2)
## [1] 10 8 6 4 2 0
c(1,2,3)
## [1] 1 2 3
c(1,2,3) - c(0, 2, 1)
## [1] 1 0 2
c(1,2,3) + c(0, 2, 1)
## [1] 1 4 4
c(1,2,3) %*% c(0, 2, 1)
## [,1] ## [1,] 7
un_vector <- c("a", "b", "c", "d") # <-(=) asigna a una variable un_vector[1]
## [1] "a"
un_vector[c(2,3)]
## [1] "b" "c"
un_vector[-1]
## [1] "b" "c" "d"
un_vector[c(TRUE, TRUE, FALSE, TRUE)]
## [1] "a" "b" "d"
numFac <- factor(c(13,22,38, 13)) numFac
## [1] 13 22 38 13 ## Levels: 13 22 38
as.numeric(numFac)
## [1] 1 2 3 1
as.numeric(as.character(numFac)) # convertir a numeric
## [1] 13 22 38 13
un_df <- data.frame(x = 1:25, y = runif(25), ynorm = rnorm(25), z = sample(c("a", "b", "c", "d"), 25, replace = TRUE), t = "RUGBCN") head(un_df) # tail(un_df)
## x y ynorm z t ## 1 1 0.9956050 0.6770899 d RUGBCN ## 2 2 0.8284240 -1.2556696 d RUGBCN ## 3 3 0.9692599 1.8618999 b RUGBCN ## 4 4 0.7036978 -2.4980486 a RUGBCN ## 5 5 0.2674196 -2.3462070 d RUGBCN ## 6 6 0.8795207 -0.1480149 d RUGBCN
dim(un_df) # nrow(un_df) ncol(un_df)
## [1] 25 5
names(un_df) # colnames(un_df) rownames(un_df)
## [1] "x" "y" "ynorm" "z" "t"
summary(un_df)
## x y ynorm z t ## Min. : 1 Min. :0.0670 Min. :-2.4980 a:7 RUGBCN:25 ## 1st Qu.: 7 1st Qu.:0.2674 1st Qu.:-1.2012 b:5 ## Median :13 Median :0.4896 Median :-0.1457 c:5 ## Mean :13 Mean :0.5121 Mean :-0.2281 d:8 ## 3rd Qu.:19 3rd Qu.:0.7037 3rd Qu.: 0.5735 ## Max. :25 Max. :0.9956 Max. : 1.8619
un_df[1, 2]
## [1] 0.995605
un_df[1, 1:3]
## x y ynorm ## 1 1 0.995605 0.6770899
un_df[1, ]
## x y ynorm z t ## 1 1 0.995605 0.6770899 d RUGBCN
un_df$x[2]
## [1] 2
un_df[1, c("x", "y")]
## x y ## 1 1 0.995605
cond <- un_df$x > 20 un_df[cond, ]
## x y ynorm z t ## 21 21 0.68035988 -0.4926122 d RUGBCN ## 22 22 0.07550184 -1.5778731 d RUGBCN ## 23 23 0.23681802 0.5657970 a RUGBCN ## 24 24 0.70004275 0.1931108 a RUGBCN ## 25 25 0.84435685 -0.1456663 a RUGBCN
max, min, prod, sum cummax, cummin, cumprod, cumsum, diff range mean, median, cor, sd, var
max(un_df$ynorm)
## [1] 1.8619
mean(un_df$ynorm)
## [1] -0.228119
sd(un_df$ynorm)
## [1] 1.176321
head(un_df) # tail
## x y ynorm z t ## 1 1 0.9956050 0.6770899 d RUGBCN ## 2 2 0.8284240 -1.2556696 d RUGBCN ## 3 3 0.9692599 1.8618999 b RUGBCN ## 4 4 0.7036978 -2.4980486 a RUGBCN ## 5 5 0.2674196 -2.3462070 d RUGBCN ## 6 6 0.8795207 -0.1480149 d RUGBCN
summary(un_df)
## x y ynorm z t ## Min. : 1 Min. :0.0670 Min. :-2.4980 a:7 RUGBCN:25 ## 1st Qu.: 7 1st Qu.:0.2674 1st Qu.:-1.2012 b:5 ## Median :13 Median :0.4896 Median :-0.1457 c:5 ## Mean :13 Mean :0.5121 Mean :-0.2281 d:8 ## 3rd Qu.:19 3rd Qu.:0.7037 3rd Qu.: 0.5735 ## Max. :25 Max. :0.9956 Max. : 1.8619
table(un_df$z)
## ## a b c d ## 7 5 5 8
?rbind
/cbind
: Combinar objectos por filas o columnasmerge
: Combinar data.frame
por columnas igualesEjemplo.R
Familia funciones ?tapply
, ?apply
, ?lapply
, ?aggregate
, etc.
Aplica una función sobre uno lista datos.
tapply(un_df$y, list(un_df$z), FUN = 'sum')
## a b c d ## 3.374749 2.623353 1.988986 4.816182
packages
más rápidos como data.table
(Más rà pido) y dplyr
(Más fácil).Basado en Advanced R. El capitulo Vocabulary contiene un amplio conjunto de funciones útiles. A continuación una muestra:
# Ordering and tabulating duplicated, unique merge order, rank, quantile sort table, ftable # Linear models fitted, predict, resid, rstandard lm, glm anova, coef, confint, vcov # Random variables (q, p, d, r) * (beta, binom, cauchy, chisq, exp, f, gamma, geom, hyper, lnorm, logis, multinom, nbinom, norm, pois, signrank, t, unif, weibull, wilcox, birthday, tukey)
?plot
plot(un_df$x, un_df$y, main = "Mi primer plot")
plot
en ?par
plot(un_df$x, un_df$y, type = "l", col = "red", lwd = 3, lty = 2)
?title
(ann = FALSE)
plot(un_df$x, un_df$y, type = "l", col = "red", lwd = 3, lty = 2, ann = FALSE) title(main ="Mi tercer plot", xlab = "Eje de la x", ylab = "Eje de la y")
?legend
plot(un_df$x, un_df$y, type = "l", col = "red", lwd = 3, lty = 2, ann = FALSE) title(main ="Mi tercer plot", xlab = "Eje de la x", ylab = "Eje de la y") legend(1, 0.2, "points", lwd = 3, lty = 2, col = "red")
hist(un_df$y)
densidad <- density(un_df$ynorm) plot(densidad)
boxplot(un_df$ynorm)
Desde ventana Environment
–> Import Dataset
Función ?read.table
y familiares
read.table("mtcars.txt", header = TRUE) read.csv2("mtcars.csv", header = TRUE)
Package foreign
y heaven
permiten importar diferentes formatos como sas, spss, dbf, stata, etc
Package XLConnect
permite importar Excel.
?write.table
y familiareswrite.table(mtcars, file = "mtcars.csv") write.csv2(mtcars, file = "mtcars.csv")
XLConnect
permite exportar a Excel.Plots
–> Export?png
y familiarespng("Primer_plot.png", width = 640, height = 480) plot(rnorm(10)) dev.off()
¿Dudas? ¿Sugerencias? ¿Insultos?
LluÃs Ramon