Skip to content

ValueError: Unknown residues in the input sequence. #1

@sarah872

Description

@sarah872

Hi,
I am running razor on my proteins as:

python3 razor.py -f proteins.fasta -o test

They come from an assembled transcriptome/ORFs called by transdecoder.

I am getting the following error:

Multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 64, in global_worker
    return _func(x)
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 116, in wrapper
    **kwargs
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/data_types/series.py", line 20, in worker
    return series.apply(func, *args, **kwargs)
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandas/core/series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2327, in pandas._libs.lib.map_infer
  File "razor.py", line 139, in <lambda>
    df['Analysis_'] = df['Sequence'].parallel_apply(lambda x: razor_predict(x, m))
  File "razor.py", line 71, in razor_predict
    newObj = detector.RAZOR(seq=seq, max_scan=max_scan)
  File "/scratch/user/razor/Razor/libs/detector.py", line 31, in __init__
    self.seq = functions.validate(seq, self.max_scan)
  File "/scratch/user/razor/Razor/libs/functions.py", line 77, in validate
    "Unknown residues in the input "
ValueError: Unknown residues in the input sequence.
 Only standard amino acid codes are allowed.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "razor.py", line 159, in <module>
    main()
  File "razor.py", line 139, in main
    df['Analysis_'] = df['Sequence'].parallel_apply(lambda x: razor_predict(x, m))
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 462, in closure
    map_result,
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 396, in get_workers_result
    results = map_result.get()
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: Unknown residues in the input sequence.
 Only standard amino acid codes are allowed.

I tried to run a check using seqkit seq -v -V proteins.fasta, but that doesn't find the culprit residue. Do you have any other idea what I could try?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions