R issues with merge/rbind/concatenate two data frames -
R issues with merge/rbind/concatenate two data frames -
i beginner r apologise in advance if question asked elsewhere. here issue:
i have 2 info frames, df1 , df2, different number of rows , columns. 2 frames have 1 variable (column) in mutual called "customer_no". want merged frame match records based on "customer_no" , rows in df2 only.both data.frames have multiple rows each customer_no.
i tried following:
merged.df <- (df1, df2, by="customer_no",all.y=true)
the problem assigns values of df1 df2 instead should empty. questions are:
1) how can tell command leave unmatched columns empty? 2) how can see merged file row came df? guess if resolve above question should easy see empty columns.
i missing in command don't know what. if question has been answered somewhere else, still kind plenty rephrase in english language here r beginner?
thanks!
data example:
df1: customer_no country year 10 uk 2001 10 uk 2002 10 uk 2003 20 2007 30 au 2006 df2: customer_no income 10 700 10 800 10 900 30 1000
merged file should this:
merged.df: customer_no income country year 10 uk 2001 10 uk 2002 10 uk 2003 10 700 10 800 10 900 30 au 2006 30 1000
so: puts columns together, adds values of df2 right after lastly 1 of df1 based on same customer_no , matches customer_no df2 (merged.df not have customer_no 20). also, leaves empty other cells.
in stata utilize append not sure in r...perhaps join?
thanks!!
try:
df1$id <- paste(df1$customer_no, 1, sep="_") df2$id <- paste(df2$customer_no, 2, sep="_") res <- merge(df1, df2, by=c('id', 'customer_no'),all=true)[,-1] res1 <- res[res$customer_no %in% df2$customer_no,] res1 # customer_no country year income #1 10 uk 2001 na #2 10 uk 2002 na #3 10 uk 2003 na #4 10 <na> na 700 #5 10 <na> na 800 #6 10 <na> na 900 #8 30 au 2006 na #9 30 <na> na 1000
if want alter na
''
,
res1[is.na(res1)] <- '' #but, leave `na` there `numeric` columns.
or, utilize rbindlist
data.table
(using original datasets)
library(data.table) indx <- df1$customer_no %in% df2$customer_no rbindlist(list(df1[indx,], df2),fill=true)[order(customer_no)] # customer_no country year income #1: 10 uk 2001 na #2: 10 uk 2002 na #3: 10 uk 2003 na #4: 10 na na 700 #5: 10 na na 800 #6: 10 na na 900 #7: 30 au 2006 na #8: 30 na na 1000
r merge
Comments
Post a Comment