R: Aggregating by date and hour and placing into a separate matrix -



R: Aggregating by date and hour and placing into a separate matrix -

i looking take dataframe has info ordered through time , aggregate hourly level, , place info separate dataframe. it's best explained example:

tradedata dataframe:

time amount 2014-05-16 14:00:05 10 2014-05-16 14:00:10 20 2014-05-16 14:08:15 30 2014-05-16 14:23:09 51 2014-05-16 14:59:54 84 2014-05-16 15:09:45 94 2014-05-16 15:24:41 53 2014-05-16 16:30:51 44

the matrix above contains info aggregate. below dataframe insert it: hourlydata dataframe:

time turn a profit 2014-05-16 00:00:00 100 2014-05-16 01:00:00 200 2014-05-16 02:00:00 250 ... 2014-05-16 14:00:00 30 2014-05-16 15:00:00 -50 2014-05-16 16:00:00 67 ... 2014-05-16 23:00:00 -8

i aggregate info in tradedata dataframe , place in right place in hourlydata dataframe below: new hourlydata dataframe:

time turn a profit amount 2014-05-16 00:00:00 100 0 2014-05-16 01:00:00 200 0 2014-05-16 02:00:00 250 0 ... 2014-05-16 14:00:00 30 0 2014-05-16 15:00:00 -50 195 (10+20+30+51+84) 2014-05-16 16:00:00 67 147 (94+53) 2014-05-16 17:00:00 20 44 ... 2014-05-16 23:00:00 -8 0

using solution provided akrun below, able solution instances. however, there appears issue when event occurs within lastly hr of day, below: tradedata

time amount 2014-08-15 22:09:07 11037.778 2014-08-15 23:01:33 13374.724 2014-08-20 23:25:40 133373.000

hourlydata

time amount 2014-08-15 23:00:00 11037.778 (correct) 2014-08-18 00:00:00 0 (incorrect) 2014-08-21 00:00:00 133373 (correct)

the formula appears skip info sec trade in tradedata dataframe when aggregating in hourlydata dataframe. appears though occurs trades occur in lastly hr of friday,because (i imagine) info doesn't exist saturday @ 12am i.e. fri 11pm + 1 hour. works trade occurring in lastly hr of mon thursday.

any ideas on how adjust algo? please allow me know if unclear.

thanks

mike

try

library(dplyr) res <- left_join(df2, df %>% group_by(hour=as.posixct(cut(time, breaks='hour'))+3600) %>% summarise(amount=sum(amount)), by=c('time'='hour')) res$amount[is.na(res$amount)] <- 0 res # time turn a profit amount #1 2014-05-16 00:00:00 100 0 #2 2014-05-16 01:00:00 200 0 #3 2014-05-16 02:00:00 250 0 #4 2014-05-16 14:00:00 30 0 #5 2014-05-16 15:00:00 -50 195 #6 2014-05-16 16:00:00 67 147 #7 2014-05-16 23:00:00 -8 0

or using data.table

library(data.table) dt <- data.table(df) dt2 <- data.table(df2) dt1 <- dt[,list(amount=sum(amount)), by=(time= as.posixct(cut(time, breaks='hour'))+3600)] setkey(dt1, time) dt1[dt2][is.na(amount), amount:=0][] # time amount turn a profit #1: 2014-05-16 00:00:00 0 100 #2: 2014-05-16 01:00:00 0 200 #3: 2014-05-16 02:00:00 0 250 #4: 2014-05-16 14:00:00 0 30 #5: 2014-05-16 15:00:00 195 -50 #6: 2014-05-16 16:00:00 147 67 #7: 2014-05-16 23:00:00 0 -8 update

based on weekends info,

indx <- with(df, as.numeric(format(time, '%h'))==23 & as.numeric(format(time, '%s'))>0& format(time, '%a')=='fri') grp <- with(df, as.posixct(cut(time, breaks='hour'))) grp[indx] <- grp[indx] +3600*49 grp[!indx] <- grp[!indx]+3600 df$time <- grp df %>% group_by(time) %>% summarise(amount=sum(amount)) #in illustration dataset, 3 rows # time amount #1 2014-08-15 23:00:00 11037.78 #2 2014-08-18 00:00:00 13374.72 #3 2014-08-21 00:00:00 133373.00 data df <- structure(list(time = structure(c(1400263205, 1400263210, 1400263695, 1400264589, 1400266794, 1400267385, 1400268281, 1400272251), class = c("posixct", "posixt"), tzone = ""), amount = c(10l, 20l, 30l, 51l, 84l, 94l, 53l, 44l)), .names = c("time", "amount"), row.names = c(na, -8l ), class = "data.frame") df2 <- structure(list(time = structure(c(1400212800, 1400216400, 1400220000, 1400263200, 1400266800, 1400270400, 1400295600), class = c("posixct", "posixt"), tzone = ""), turn a profit = c(100l, 200l, 250l, 30l, -50l, 67l, -8l)), .names = c("time", "profit"), row.names = c(na, -7l ), class = "data.frame") newdata df <- structure(list(time = structure(c(1408158000, 1408334400, 1408593600 ), tzone = "", class = c("posixct", "posixt")), amount = c(11037.778, 13374.724, 133373)), .names = c("time", "amount"), row.names = c(na, -3l), class = "data.frame")

r date aggregate

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -