piper.verbs.duplicated

piper.verbs.duplicated(df: pandas.core.frame.DataFrame, subset: Optional[Union[str, List[str]]] = None, keep: bool = False, sort: bool = True, column: str = 'duplicate', ref_column: Optional[str] = None, duplicates: bool = False, loc: str = 'first')pandas.core.frame.DataFrame[source]

locate duplicate data

Note

Returns a copy of the input dataframe object

Examples

from piper.factory import simple_series
df = simple_series().to_frame()

df = duplicated(df, keep='first', sort=True)
df

    ids    duplicate
 2  C      False
 0  D      False
 1  E      False
 3  E      True
Parameters
  • df – pandas dataframe

  • subset – column label or sequence of labels, required Only consider certain columns for identifying duplicates Default None - consider ALL dataframe columns

  • keep – {‘first’, ‘last’, False}, default ‘first’ first : Mark duplicates as True except for the first occurrence. last : Mark duplicates as True except for the last occurrence. False : Mark all duplicates as True.

  • sort – If True sort returned dataframe using subset fields as key

  • column – Insert a column name identifying whether duplicate (True/False), default ‘duplicate’

  • duplicates – Default True. Return only duplicate key rows

Returns

Return type

pandas dataframe