piper.verbs.summarise¶
-
piper.verbs.summarise(df: pandas.core.frame.DataFrame, *args, **kwargs) → pandas.core.frame.DataFrame[source]¶ summarise or aggregate data.
This is a wrapper function rather than using e.g. df.agg() For details of args, kwargs - see help(pd.DataFrame.agg)
- Parameters
df – dataframe
*args – arguments for wrapped function
**kwargs – keyword-parameters for wrapped function
- Returns
- Return type
A pandas DataFrame
Notes
Different behaviour to pd.DataFrame.agg()
If only the dataframe is passed, the function will do a ‘count’ and ‘sum’ of all columns. See also info().
%piper sample_sales() >> summarise() names n sum location 200 nan product 200 nan month 200 nan target_sales 200 5423715.00 target_profit 200 487476.42 actual_sales 200 5376206.33 actual_profit 200 482133.92
If you pass a groupby object to summarise() with no other parameters - summarise will summate all numeric columns.
The equivalent to: %pipe df >> group_by([‘col1’, ‘col2’]) >> summarise(sum)
%%piper sample_sales() >> group_by(['location', 'product']) >> summarise() >> head(tablefmt='plain') target_sales target_profit actual_sales actual_profit ('London', 'Beachwear') 407640 41372 388762 38940 ('London', 'Footwear') 275605 21449 274674 21411 ('London', 'Jeans') 414488 42200 404440 40691 ('London', 'Sportswear') 293384 28149 291561 28434
Examples
Syntax 1
column_name = ('existing_column', function)
Note
below there are some ‘common’ functions that can be quoted like ‘sum’, ‘mean’, ‘count’, ‘nunique’ or just state the function name
%%piper sample_sales() >> group_by('product') >> summarise(totval1=('target_sales', sum), totval2=('actual_sales', 'sum'))
Syntax 2
column_name = pd.NamedAgg('existing_column', function)
%%piper sample_sales() >> group_by('product') >> summarise(totval1=(pd.NamedAgg('target_sales', 'sum')), totval2=(pd.NamedAgg('actual_sales', 'sum')))
Syntax 3
{'existing_column': function} {'existing_column': [function1, function2]}
%%piper sample_sales() >> group_by('product') >> summarise({'target_sales':['sum', 'mean']})
Syntax 4:
'existing_column': lambda x: x+1
Example below identifies unique products sold by location.
%%piper sample_sales() >> group_by('location') >> summarise({'product': lambda x: set(x.tolist())}) >> # Alternative coding of 'list' ;) # summarise(products=('product', lambda x: list(set(x)))) >> # explode('product')