Skip to content

Replace sub-processed ppfp usage with the pp-format python library for extract validation #621

@mo-jareddrayton

Description

@mo-jareddrayton

There's a significant overhead introduced by subprocessing out to the ppfp command. We can rewrite the get_stash_from_pp function using ppformat directly with something like the following.

from pp_format.pp_format import list_fields

def get_stash_from_pp(filepath) -> dict[str, int]:
    fields = list_fields(filepath, ["STASH"])
    stash_list = [str(field_dict["STASH"]) for field_dict in fields]
    return Counter(stash_list)

The tricky bit is how we manage the dependency as pp-format is on neither pypi or conda.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions