piper.verbs.summarise

piper.verbs.summarise(df: pandas.core.frame.DataFrame, *args, **kwargs)pandas.core.frame.DataFrame[source]

summarise or aggregate data.

This is a wrapper function rather than using e.g. df.agg() For details of args, kwargs - see help(pd.DataFrame.agg)

Parameters
  • df – dataframe

  • *args – arguments for wrapped function

  • **kwargs – keyword-parameters for wrapped function

Returns

Return type

A pandas DataFrame

Notes

Different behaviour to pd.DataFrame.agg()

If only the dataframe is passed, the function will do a ‘count’ and ‘sum’ of all columns. See also info().

%piper sample_sales() >> summarise()

  names             n          sum
  location        200          nan
  product         200          nan
  month           200          nan
  target_sales    200   5423715.00
  target_profit   200    487476.42
  actual_sales    200   5376206.33
  actual_profit   200    482133.92

If you pass a groupby object to summarise() with no other parameters - summarise will summate all numeric columns.

The equivalent to: %pipe df >> group_by([‘col1’, ‘col2’]) >> summarise(sum)

%%piper
sample_sales()
>> group_by(['location', 'product'])
>> summarise()
>> head(tablefmt='plain')

                            target_sales    target_profit    actual_sales    actual_profit
('London', 'Beachwear')           407640            41372          388762            38940
('London', 'Footwear')            275605            21449          274674            21411
('London', 'Jeans')               414488            42200          404440            40691
('London', 'Sportswear')          293384            28149          291561            28434

Examples

Syntax 1

column_name = ('existing_column', function)

Note

below there are some ‘common’ functions that can be quoted like ‘sum’, ‘mean’, ‘count’, ‘nunique’ or just state the function name

%%piper
sample_sales() >>
group_by('product') >>
summarise(totval1=('target_sales', sum),
          totval2=('actual_sales', 'sum'))

Syntax 2

column_name = pd.NamedAgg('existing_column', function)
%%piper
sample_sales() >>
group_by('product') >>
summarise(totval1=(pd.NamedAgg('target_sales', 'sum')),
          totval2=(pd.NamedAgg('actual_sales', 'sum')))

Syntax 3

{'existing_column': function}
{'existing_column': [function1, function2]}
%%piper
sample_sales()
>> group_by('product')
>> summarise({'target_sales':['sum', 'mean']})

Syntax 4:

'existing_column': lambda x: x+1

Example below identifies unique products sold by location.

%%piper
sample_sales() >>
group_by('location') >>
summarise({'product': lambda x: set(x.tolist())}) >>

# Alternative coding of 'list' ;)
# summarise(products=('product', lambda x: list(set(x)))) >>

# explode('product')