Skip to content

Feature/drop duplicates#31

Merged
kishyassin merged 2 commits intokishyassin:mainfrom
cs168898:feature/drop-duplicates
Jan 7, 2026
Merged

Feature/drop duplicates#31
kishyassin merged 2 commits intokishyassin:mainfrom
cs168898:feature/drop-duplicates

Conversation

@cs168898
Copy link
Copy Markdown
Collaborator

@cs168898 cs168898 commented Jan 7, 2026

Drop Duplicates method

  • Added Drop Duplicates method to allow users to drop any duplicate rows in the dataframe
  • Added DropDuplicatesOption struct to allow users to further fine tune the drop duplicates method which allows the options of:
    Subset -> Custom column names to check for duplicates
    Keep -> To keep which row when there is occurrence of duplication.
    Inplace -> To allow users to select whether to modify the existing dataframe or create a new dataframe with the modified rows.
  • getRowKey method is implemented to get the unique key of the row where it is made up of the column names and values in string format to keep in a map to recognise duplicates.
  • getSubslice method is implemented to get specific rows of the columns based on the row indexes to keep that is decided based on the logic before this method was called.

Syntax:

// To define the options:
options := goframe.DropDuplicates{
Subset : []string{} // Etner the column names in string format to check for duplicates
Keep : string // Enter "first" , "last" or "none"
Inplace : boolean // Enter true or false, true will modify the existing dataframe while false creates a new dataframe with modified rows
}
// To call the method:
df.DropDuplicates(options ...DropDuplicatesOption) 

Important things to note**

  • If no options is added to the arguments when calling the method, the default options are:
    Subset -> All columns
    Keep -> Keep only the first occurrence
    Inplace -> False, creates a new dataframe without the duplicate rows.

Things to improve

  1. Ignore Index option should be implemented where users can select if they want to re-index the dataframe post method call.
    please add sort,describe,toDatabase,readFromDatabase,dropDuplicates #26

added drop duplicates feature where user can drop any duplicates in the dataframe, options to pass into the method can be used, such as selecting custom column names to drop duplicates, Keep either the first, last or none of the rows and whether to modify the original dataframe or create a another dataframe with the modified rows
added drop duplicates test for all of the options as well as subset and keep in the same option struct
@cs168898 cs168898 marked this pull request as ready for review January 7, 2026 06:35
@cs168898 cs168898 added this to the dropDuplicates milestone Jan 7, 2026
@cs168898 cs168898 self-assigned this Jan 7, 2026
@cs168898 cs168898 linked an issue Jan 7, 2026 that may be closed by this pull request
@kishyassin kishyassin merged commit fe9bc79 into kishyassin:main Jan 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants