python - Efficient Partitioning of Pandas DataFrame rows between sandwiched indicator variables -
python - Efficient Partitioning of Pandas DataFrame rows between sandwiched indicator variables -
suppose have pandas df indicator row sandwiches period. ex.
in [9]: pd.dataframe({'col1':np.arange(1,11),'indicator':[0,1,0,0,0,1,0,0,1,1]}) out[9]: col1 indicator 0 1 0 1 2 1 2 3 0 3 4 0 4 5 0 5 6 1 6 7 0 7 8 0 8 9 1 9 10 1
what want do, utilize groupby select partitions separated indicators.
ex.
group 1
col1 indicator 0 1 0 1 2 1
group 2
2 3 0 3 4 0 4 5 0 5 6 1
group 3
6 7 0 7 8 0 8 9 1
group 4
9 10 1
the naive solution take indicator column out list, run for-loop through it, , label each part. suppose dataset big, , want avoid for-loop. there more clever can done here, separate out different groups?
thanks!
just assign column cumsum
of indicator
, apply groupby
, should trick:
# reverse order have indicator @ end of group, reverse df['grouped'] = df['indicator'].loc[::-1].cumsum().loc[::-1] g in df.groupby('grouped', sort=false): print g (4, col1 indicator grouped 0 1 0 4 1 2 1 4) (3, col1 indicator grouped 2 3 0 3 3 4 0 3 4 5 0 3 5 6 1 3) (2, col1 indicator grouped 6 7 0 2 7 8 0 2 8 9 1 2) (1, col1 indicator grouped 9 10 1 1)
python pandas group-by
Comments
Post a Comment