Skip to content

Tweaking ED pipeline #121

@abhinavkulkarni

Description

@abhinavkulkarni

Hi,

Thanks again for the great work!

I am currently evaluating REL for ED purposes and comparing it against other ED techniques, chiefly against BLINK from Facebook AI Research. They both take into account the context in which a mention occurs, are two-staged, and use neural approaches. BLINK does well, but can be slow and requires a GPU to run, which is a limitation for me.

Although REL is fast and lightweight, I find that it often misses a few obvious cases. I am looking for some guidance as to how I can tweak the internal workings of REL to achieve accurate results.

The following results have been obtained by running REL on a podcast description and a particular episode description - separated by a newline.

That is, in the code

text_doc = podcast_summary + '\n' + episode_summary
el_result = requests.post(API_URL, json={
    "text": text_doc,
    "spans": []
}).json()
  • For this episode, mention Shadi Hamid is identified as Brookings_Institution with score 0.9991938769817352 and NER tag PER. This is particularly egregious. Shadi Hamid's Wikipedia page is not being returned as the 1st candidate.

  • For this episode, mention Lauren Bonner from the podcast description is being identified as Lauren_Samuels with score 0.9993583559989929 even though the last names are quite different while mention Ray J is (correctly) identified as Ray_J albeit with a lower score 0.8136761486530304.

  • For this episode, mention Charlamagne Tha God from the podcast description gets only 0.7140538295110067 score even though words like comedians, outspoken celebrities, and thought-leaders appear in the context (which should make it easy to match his embedding learned from his Wikipedia profile which contains similar words).

  • For this episode, mention Dave Smith is always identified as Dave_Smith_(engineer) with very high confidence, even though Dave_Smith_(comedian), the correct answer appears in the candidate set and has even words such as government, foreign policy, and all things Libertarian in the context which should have had a greater match with his description on Wikipedia.

The last point is particularly important since Dave Smith is quite a common name and there are at least 4 Dave Smiths in Wikipedia - but with very differing descriptions.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions