Remove top N percent of active users based on the frequency of data points per user. Although the majority of users are real people, some accounts are run by algorithms or 'bots', whereas others can be considered as spam accounts. Removing a certain top N percent of active users is an oft-used approach to remove such accounts and reduce the number of such users in the final dataset.

remove_top_users(df, user = "u_id", counts = "n_points", topNpct_user = 1)

Arguments

df

A dataframe with columns for the user id, counts point per user

user

Name of column that holds unique identifier for each user

counts

Name of column that holds the data points frequency for each user

topNpct_user

A decimal number that represent the certain percentage of users to remove