python - Efficient Partitioning of Pandas DataFrame rows between sandwiched indicator variables -



python - Efficient Partitioning of Pandas DataFrame rows between sandwiched indicator variables -

suppose have pandas df indicator row sandwiches period. ex.

in [9]: pd.dataframe({'col1':np.arange(1,11),'indicator':[0,1,0,0,0,1,0,0,1,1]}) out[9]: col1 indicator 0 1 0 1 2 1 2 3 0 3 4 0 4 5 0 5 6 1 6 7 0 7 8 0 8 9 1 9 10 1

what want do, utilize groupby select partitions separated indicators.

ex.

group 1

col1 indicator 0 1 0 1 2 1

group 2

2 3 0 3 4 0 4 5 0 5 6 1

group 3

6 7 0 7 8 0 8 9 1

group 4

9 10 1

the naive solution take indicator column out list, run for-loop through it, , label each part. suppose dataset big, , want avoid for-loop. there more clever can done here, separate out different groups?

thanks!

just assign column cumsum of indicator, apply groupby, should trick:

# reverse order have indicator @ end of group, reverse df['grouped'] = df['indicator'].loc[::-1].cumsum().loc[::-1] g in df.groupby('grouped', sort=false): print g (4, col1 indicator grouped 0 1 0 4 1 2 1 4) (3, col1 indicator grouped 2 3 0 3 3 4 0 3 4 5 0 3 5 6 1 3) (2, col1 indicator grouped 6 7 0 2 7 8 0 2 8 9 1 2) (1, col1 indicator grouped 9 10 1 1)

python pandas group-by

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -