piper.verbs.duplicated¶
-
piper.verbs.duplicated(df: pandas.core.frame.DataFrame, subset: Optional[Union[str, List[str]]] = None, keep: bool = False, sort: bool = True, column: str = 'duplicate', ref_column: Optional[str] = None, duplicates: bool = False, loc: str = 'first') → pandas.core.frame.DataFrame[source]¶ locate duplicate data
Note
Returns a copy of the input dataframe object
Examples
from piper.factory import simple_series df = simple_series().to_frame() df = duplicated(df, keep='first', sort=True) df ids duplicate 2 C False 0 D False 1 E False 3 E True
- Parameters
df – pandas dataframe
subset – column label or sequence of labels, required Only consider certain columns for identifying duplicates Default None - consider ALL dataframe columns
keep – {‘first’, ‘last’, False}, default ‘first’ first : Mark duplicates as True except for the first occurrence. last : Mark duplicates as True except for the last occurrence. False : Mark all duplicates as True.
sort – If True sort returned dataframe using subset fields as key
column – Insert a column name identifying whether duplicate (True/False), default ‘duplicate’
duplicates – Default True. Return only duplicate key rows
- Returns
- Return type
pandas dataframe