r - Merging files on the basis of columns -
r - Merging files on the basis of columns -
i have multiple files many rows , 3 columns , need merge them on basis of first 2 columns match. file1
12 13 13 15 b 14 17 c 4 9 d . . . . . . 81 23 h
file 2
12 13 e 3 10 b 14 17 c 4 9 j . . . . . . 1 2 k
file 3
12 13 m 13 15 k 1 7 x 24 9 d . . . . . . 1 2 h
and on. want merge them obtain next result
12 13 e m 13 15 b k 14 17 c c 4 9 d j 3 10 b 24 9 d . . . . . . 81 23 h 1 2 k 1 7 x
the first thing comes mind these types of problems merge
, perhaps in conjunction reduce(function(x, y) merge(x, y, = "somecols", = true), yourlistofdataframes)
.
however, merge
not efficient function, since looks want "collapse" values fill in rows left right, not default merge
behavior.
instead, suggest stack 1 long data.frame
, reshape after have added index variable.
here 2 approaches:
option 1: "dplyr" + "tidyr" usemget
set of data.frame
s list
. use rbind_all
convert list
single data.frame
. use sequence(n())
in mutate
"dplyr" grouping info , create index. use spread
"tidyr" transform "long" format "wide" format. library(dplyr) library(tidyr) combined <- rbind_all(mget(ls(pattern = "^file\\d"))) combined %>% group_by(v1, v2) %>% mutate(time = sequence(n())) %>% ungroup() %>% spread(time, v3, fill = "") # source: local info frame [7 x 5] # # v1 v2 1 2 3 # 1 1 7 x # 2 3 10 b # 3 4 9 d j # 4 12 13 e m # 5 13 15 b k # 6 14 17 c c # 7 24 9 d
option 2: "data.table" use mget
set of data.frame
s list
. use rbindlist
convert list single data.table
. use sequence(.n)
generate sequence groups. use dcast.data.table
convert "long" data.table
"wide" one. library(data.table) dcast.data.table( rbindlist(mget(ls(pattern = "^file\\d")))[, time := sequence(.n), = list(v1, v2)], v1 + v2 ~ time, value.var = "v3", fill = "") # v1 v2 1 2 3 # 1: 1 7 x # 2: 3 10 b # 3: 4 9 d j # 4: 12 13 e m # 5: 13 15 b k # 6: 14 17 c c # 7: 24 9 d
both of these answers assume starting next sample data:
file1 <- structure( list(v1 = c(12l, 13l, 14l, 4l), v2 = c(13l, 15l, 17l, 9l), v3 = c("a", "b", "c", "d")), .names = c("v1", "v2", "v3"), class = "data.frame", row.names = c(na, -4l)) file2 <- structure( list(v1 = c(12l, 3l, 14l, 4l), v2 = c(13l, 10l, 17l, 9l), v3 = c("e", "b", "c", "j")), .names = c("v1", "v2", "v3"), class = "data.frame", row.names = c(na, -4l)) file3 <- structure( list(v1 = c(12l, 13l, 1l, 24l), v2 = c(13l, 15l, 7l, 9l), v3 = c("m", "k", "x", "d")), .names = c("v1", "v2", "v3"), class = "data.frame", row.names = c(na, -4l))
r merge
Comments
Post a Comment