R: Aggregating by date and hour and placing into a separate matrix -
R: Aggregating by date and hour and placing into a separate matrix -
i looking take dataframe has info ordered through time , aggregate hourly level, , place info separate dataframe. it's best explained example:
tradedata dataframe:
time amount 2014-05-16 14:00:05 10 2014-05-16 14:00:10 20 2014-05-16 14:08:15 30 2014-05-16 14:23:09 51 2014-05-16 14:59:54 84 2014-05-16 15:09:45 94 2014-05-16 15:24:41 53 2014-05-16 16:30:51 44
the matrix above contains info aggregate. below dataframe insert it: hourlydata dataframe:
time turn a profit 2014-05-16 00:00:00 100 2014-05-16 01:00:00 200 2014-05-16 02:00:00 250 ... 2014-05-16 14:00:00 30 2014-05-16 15:00:00 -50 2014-05-16 16:00:00 67 ... 2014-05-16 23:00:00 -8
i aggregate info in tradedata dataframe , place in right place in hourlydata dataframe below: new hourlydata dataframe:
time turn a profit amount 2014-05-16 00:00:00 100 0 2014-05-16 01:00:00 200 0 2014-05-16 02:00:00 250 0 ... 2014-05-16 14:00:00 30 0 2014-05-16 15:00:00 -50 195 (10+20+30+51+84) 2014-05-16 16:00:00 67 147 (94+53) 2014-05-16 17:00:00 20 44 ... 2014-05-16 23:00:00 -8 0
using solution provided akrun below, able solution instances. however, there appears issue when event occurs within lastly hr of day, below: tradedata
time amount 2014-08-15 22:09:07 11037.778 2014-08-15 23:01:33 13374.724 2014-08-20 23:25:40 133373.000
hourlydata
time amount 2014-08-15 23:00:00 11037.778 (correct) 2014-08-18 00:00:00 0 (incorrect) 2014-08-21 00:00:00 133373 (correct)
the formula appears skip info sec trade in tradedata dataframe when aggregating in hourlydata dataframe. appears though occurs trades occur in lastly hr of friday,because (i imagine) info doesn't exist saturday @ 12am i.e. fri 11pm + 1 hour. works trade occurring in lastly hr of mon thursday.
any ideas on how adjust algo? please allow me know if unclear.
thanks
mike
try
library(dplyr) res <- left_join(df2, df %>% group_by(hour=as.posixct(cut(time, breaks='hour'))+3600) %>% summarise(amount=sum(amount)), by=c('time'='hour')) res$amount[is.na(res$amount)] <- 0 res # time turn a profit amount #1 2014-05-16 00:00:00 100 0 #2 2014-05-16 01:00:00 200 0 #3 2014-05-16 02:00:00 250 0 #4 2014-05-16 14:00:00 30 0 #5 2014-05-16 15:00:00 -50 195 #6 2014-05-16 16:00:00 67 147 #7 2014-05-16 23:00:00 -8 0
or using data.table
library(data.table) dt <- data.table(df) dt2 <- data.table(df2) dt1 <- dt[,list(amount=sum(amount)), by=(time= as.posixct(cut(time, breaks='hour'))+3600)] setkey(dt1, time) dt1[dt2][is.na(amount), amount:=0][] # time amount turn a profit #1: 2014-05-16 00:00:00 0 100 #2: 2014-05-16 01:00:00 0 200 #3: 2014-05-16 02:00:00 0 250 #4: 2014-05-16 14:00:00 0 30 #5: 2014-05-16 15:00:00 195 -50 #6: 2014-05-16 16:00:00 147 67 #7: 2014-05-16 23:00:00 0 -8
update based on weekends info,
indx <- with(df, as.numeric(format(time, '%h'))==23 & as.numeric(format(time, '%s'))>0& format(time, '%a')=='fri') grp <- with(df, as.posixct(cut(time, breaks='hour'))) grp[indx] <- grp[indx] +3600*49 grp[!indx] <- grp[!indx]+3600 df$time <- grp df %>% group_by(time) %>% summarise(amount=sum(amount)) #in illustration dataset, 3 rows # time amount #1 2014-08-15 23:00:00 11037.78 #2 2014-08-18 00:00:00 13374.72 #3 2014-08-21 00:00:00 133373.00
data df <- structure(list(time = structure(c(1400263205, 1400263210, 1400263695, 1400264589, 1400266794, 1400267385, 1400268281, 1400272251), class = c("posixct", "posixt"), tzone = ""), amount = c(10l, 20l, 30l, 51l, 84l, 94l, 53l, 44l)), .names = c("time", "amount"), row.names = c(na, -8l ), class = "data.frame") df2 <- structure(list(time = structure(c(1400212800, 1400216400, 1400220000, 1400263200, 1400266800, 1400270400, 1400295600), class = c("posixct", "posixt"), tzone = ""), turn a profit = c(100l, 200l, 250l, 30l, -50l, 67l, -8l)), .names = c("time", "profit"), row.names = c(na, -7l ), class = "data.frame")
newdata df <- structure(list(time = structure(c(1408158000, 1408334400, 1408593600 ), tzone = "", class = c("posixct", "posixt")), amount = c(11037.778, 13374.724, 133373)), .names = c("time", "amount"), row.names = c(na, -3l), class = "data.frame")
r date aggregate
Comments
Post a Comment