R: Aggregating by date and hour and placing into a separate matrix -

i looking take dataframe has info ordered through time , aggregate hourly level, , place info separate dataframe. it's best explained example:

tradedata dataframe:

time                     amount   2014-05-16 14:00:05       10   2014-05-16 14:00:10       20   2014-05-16 14:08:15       30   2014-05-16 14:23:09       51   2014-05-16 14:59:54       84   2014-05-16 15:09:45       94   2014-05-16 15:24:41       53   2014-05-16 16:30:51       44

the matrix above contains info aggregate. below dataframe insert it: hourlydata dataframe:

time                         turn a profit   2014-05-16 00:00:00          100   2014-05-16 01:00:00          200   2014-05-16 02:00:00          250   ...   2014-05-16 14:00:00           30   2014-05-16 15:00:00          -50    2014-05-16 16:00:00           67   ...   2014-05-16 23:00:00           -8

i aggregate info in tradedata dataframe , place in right place in hourlydata dataframe below: new hourlydata dataframe:

time                         turn a profit   amount 2014-05-16 00:00:00          100         0 2014-05-16 01:00:00          200         0 2014-05-16 02:00:00          250         0 ...   2014-05-16 14:00:00           30         0 2014-05-16 15:00:00          -50       195 (10+20+30+51+84)   2014-05-16 16:00:00           67       147 (94+53) 2014-05-16 17:00:00           20        44 ...   2014-05-16 23:00:00           -8         0

using solution provided akrun below, able solution instances. however, there appears issue when event occurs within lastly hr of day, below: tradedata

        time            amount 2014-08-15 22:09:07     11037.778 2014-08-15 23:01:33     13374.724 2014-08-20 23:25:40     133373.000

hourlydata

  time                  amount 2014-08-15 23:00:00     11037.778 (correct)     2014-08-18 00:00:00         0 (incorrect)   2014-08-21 00:00:00     133373 (correct)

the formula appears skip info sec trade in tradedata dataframe when aggregating in hourlydata dataframe. appears though occurs trades occur in lastly hr of friday,because (i imagine) info doesn't exist saturday @ 12am i.e. fri 11pm + 1 hour. works trade occurring in lastly hr of mon thursday.

any ideas on how adjust algo? please allow me know if unclear.

thanks

mike

try

library(dplyr) res <- left_join(df2,                    df %>%                       group_by(hour=as.posixct(cut(time, breaks='hour'))+3600) %>%                       summarise(amount=sum(amount)),                       by=c('time'='hour'))  res$amount[is.na(res$amount)] <- 0 res #                     time  turn a profit amount #1 2014-05-16 00:00:00    100       0 #2 2014-05-16 01:00:00    200       0 #3 2014-05-16 02:00:00    250       0 #4 2014-05-16 14:00:00     30       0 #5 2014-05-16 15:00:00    -50     195 #6 2014-05-16 16:00:00     67     147 #7 2014-05-16 23:00:00     -8       0

or using data.table

 library(data.table)  dt <- data.table(df)  dt2 <- data.table(df2)  dt1 <- dt[,list(amount=sum(amount)), by=(time=                as.posixct(cut(time, breaks='hour'))+3600)]  setkey(dt1, time)  dt1[dt2][is.na(amount), amount:=0][]  #                      time amount  turn a profit  #1: 2014-05-16 00:00:00      0    100  #2: 2014-05-16 01:00:00      0    200  #3: 2014-05-16 02:00:00      0    250  #4: 2014-05-16 14:00:00      0     30  #5: 2014-05-16 15:00:00    195    -50  #6: 2014-05-16 16:00:00    147     67  #7: 2014-05-16 23:00:00      0     -8

update

based on weekends info,

 indx <- with(df, as.numeric(format(time, '%h'))==23 &             as.numeric(format(time, '%s'))>0& format(time, '%a')=='fri')  grp <- with(df, as.posixct(cut(time, breaks='hour')))  grp[indx] <- grp[indx] +3600*49  grp[!indx] <- grp[!indx]+3600   df$time <- grp  df %>%     group_by(time) %>%      summarise(amount=sum(amount)) #in  illustration dataset, 3 rows  #                 time    amount  #1 2014-08-15 23:00:00  11037.78  #2 2014-08-18 00:00:00  13374.72  #3 2014-08-21 00:00:00 133373.00

data

 df <- structure(list(time = structure(c(1400263205, 1400263210, 1400263695,   1400264589, 1400266794, 1400267385, 1400268281, 1400272251), class = c("posixct",   "posixt"), tzone = ""), amount = c(10l, 20l, 30l, 51l, 84l, 94l,   53l, 44l)), .names = c("time", "amount"), row.names = c(na, -8l  ), class = "data.frame")   df2 <- structure(list(time = structure(c(1400212800, 1400216400, 1400220000,   1400263200, 1400266800, 1400270400, 1400295600), class = c("posixct",   "posixt"), tzone = ""),  turn a profit = c(100l, 200l, 250l, 30l, -50l,   67l, -8l)), .names = c("time", "profit"), row.names = c(na, -7l  ), class = "data.frame")

newdata

 df <- structure(list(time = structure(c(1408158000, 1408334400, 1408593600  ), tzone = "", class = c("posixct", "posixt")), amount = c(11037.778,   13374.724, 133373)), .names = c("time", "amount"), row.names = c(na,   -3l), class = "data.frame")

r date aggregate

Search This Blog

Jaimee

R: Aggregating by date and hour and placing into a separate matrix -

Comments

Post a Comment

Popular posts from this blog

c - Compilation of a code: unkown type name string -

java - Bypassing "final local variable defined in an enclosing type" -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -