Skip to content

ValueError: You are trying to merge on object and float64 columns. #22

@rpetit3

Description

@rpetit3

I'm trying to run PlasmidID via the Bioconda release, and Am running into an issue with Pandas. Might be user error though!

CREATING SUMMARY REPORT (Thu Jun 30 01:24:20 UTC 2022)
 An html report with miniatures of the images will be generate with useful statistics to determine the correct plasmids in the sample.
Namespace(group=False, input_folder='/home/robert_petit/temp/test/plasmid/NO_GROUP/SRX4563634')
Creating summary
You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 465, in <module>
    main()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 457, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 116, in complete_report_df
    df = len_description_df.merge(covered_df, on='id', how='left')
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 9203, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 119, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 703, in __init__
    self._maybe_coerce_merge_keys()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 1256, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 465, in <module>
    main()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 457, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 116, in complete_report_df
    df = len_description_df.merge(covered_df, on='id', how='left')
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 9203, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 119, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 703, in __init__
    self._maybe_coerce_merge_keys()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 1256, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat

---------------------------------------

ERROR in Script plasmidID on or near line 1089; exiting with status 1
MESSAGE:

See /home/robert_petit/temp/test/plasmid/logs/plasmidID.log for more information.
command:
summary_report_pid.py -i /home/robert_petit/temp/test/plasmid/NO_GROUP/SRX4563634 -g

---------------------------------------

Command Used

plasmidID -d plasmidFinder_01_26_2018.fsa -s SRX4563634 -c SRX4563634.fna -T 4

Here are the files used (added .txt so GitHub would allow upload)
plasmidFinder_01_26_2018.fsa.txt
SRX4563634.fna.txt

Update 1.

Doing some digging, covered_df might the issue. It looks like this:

print(covered_df)
            id  len_covered
0  500039.4128         2363

print(covered_df.dtypes)
id             float64
len_covered      int64
dtype: object

Going to play around with this some more

Update 2

Converted the ID to a string and now have this

Columns must be same length as key
Traceback (most recent call last):
  File "./summary_report_pid.py", line 470, in <module>
    main()
  File "./summary_report_pid.py", line 462, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "./summary_report_pid.py", line 126, in complete_report_df
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3602, in __setitem__
    self._set_item_frame_value(key, value)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3729, in _set_item_frame_value
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
Traceback (most recent call last):
  File "./summary_report_pid.py", line 470, in <module>
    main()
  File "./summary_report_pid.py", line 462, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "./summary_report_pid.py", line 126, in complete_report_df
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3602, in __setitem__
    self._set_item_frame_value(key, value)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3729, in _set_item_frame_value
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

Update 3

Looks like the dataframe is empty

print(df)
Empty DataFrame
Columns: [id, length, species, description, fraction_covered, contig_name]
Index: []

    .... Code is below ... from complete_report_df()
    del df['len_covered']
    df = df.merge(contigs_df, on='id', how='left')
    df = df.dropna()
    print(df)
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)

Not sure if it matters but the percentage_file (e.g. *.coverage_adapted_clustered_percentage) does not exist

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions