python - Efficient Partitioning of Pandas DataFrame rows between sandwiched indicator variables -

suppose have pandas df indicator row sandwiches period. ex.

in [9]: pd.dataframe({'col1':np.arange(1,11),'indicator':[0,1,0,0,0,1,0,0,1,1]}) out[9]:     col1  indicator 0     1          0 1     2          1 2     3          0 3     4          0 4     5          0 5     6          1 6     7          0 7     8          0 8     9          1 9    10          1

what want do, utilize groupby select partitions separated indicators.

ex.

group 1

col1 indicator 0 1 0 1 2 1

group 2

2 3 0 3 4 0 4 5 0 5 6 1

group 3

6 7 0 7 8 0 8 9 1

group 4

9 10 1

the naive solution take indicator column out list, run for-loop through it, , label each part. suppose dataset big, , want avoid for-loop. there more clever can done here, separate out different groups?

thanks!

just assign column cumsum of indicator, apply groupby, should trick:

# reverse order have indicator @ end of group, reverse df['grouped'] = df['indicator'].loc[::-1].cumsum().loc[::-1]  g in df.groupby('grouped', sort=false):     print g (4,    col1  indicator  grouped 0     1          0        4 1     2          1        4) (3,    col1  indicator  grouped 2     3          0        3 3     4          0        3 4     5          0        3 5     6          1        3) (2,    col1  indicator  grouped 6     7          0        2 7     8          0        2 8     9          1        2) (1,    col1  indicator  grouped 9    10          1        1)

python pandas group-by

Search This Blog

Jaimee

python - Efficient Partitioning of Pandas DataFrame rows between sandwiched indicator variables -

Comments

Post a Comment

Popular posts from this blog

c - Compilation of a code: unkown type name string -

java - Bypassing "final local variable defined in an enclosing type" -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -