piper.verbs.transform¶
-
piper.verbs.transform(df: pandas.core.frame.DataFrame, index: Optional[Union[str, List[str]]] = None, **kwargs) → pandas.core.frame.DataFrame[source]¶ Add a group calculation to grouped DataFrame
Transform is based on the pandas pd.DataFrame.transform() function. For details of args, kwargs - see help(pd.DataFrame.transform)
Based on the given dataframe and grouping index, creates new aggregated column values using the list of keyword/value arguments (kwargs) supplied.
By default, it calculates the group % of the first numeric column, in proportion to the first ‘index’ or grouping value.
Examples
Calculate group percentage value
%%piper sample_data() >> group_by(['countries', 'regions']) >> summarise(TotSales1=('values_1', 'sum')) | | TotSales1 | |:---------------------|------------:| | ('France', 'East') | 2170 | | ('France', 'North') | 2275 | | ('France', 'South') | 2118 | | ('France', 'West') | 4861 | | ('Germany', 'East') | 1764 | | ('Germany', 'North') | 2239 | | ('Germany', 'South') | 1753 | | ('Germany', 'West') | 1575 |
To add a group percentage based on the countries:
%%piper sample_data() >> group_by(['countries', 'regions']) >> summarise(TotSales1=('values_1', 'sum')) >> transform(index='countries', g_percent=('TotSales1', 'percent')) >> head(8) | | TotSales1 | g_percent | |:---------------------|------------:|------------:| | ('France', 'East') | 2170 | 19 | | ('France', 'North') | 2275 | 19.91 | | ('France', 'South') | 2118 | 18.54 | | ('France', 'West') | 4861 | 42.55 | | ('Germany', 'East') | 1764 | 24.06 | | ('Germany', 'North') | 2239 | 30.54 | | ('Germany', 'South') | 1753 | 23.91 | | ('Germany', 'West') | 1575 | 21.48 |
- Parameters
df – dataframe to calculate grouped value
index – grouped column(s) (str or list) to be applied as grouping index.
kwargs –
Similar to ‘assign’, keyword arguments to be assigned as dataframe columns containing, tuples of column_name and function e.g. new_column=(‘existing_col’, ‘sum’)
If no kwargs supplied - calculates the group percentage (‘g%’) using the first index column as index key and the first column value(s).
Note
- transform() has built-in functions:
percent: calculate group % associated with index group
rank: dense rank (ascending order)
rank_desc: dense rank (descending order)
- Returns
- Return type
original dataframe with additional grouped calculation column