Added pair counting fmeasure metric by dinarior · Pull Request #220 · JuliaStats/Clustering.jl

dinarior · 2021-08-22T17:14:11Z

I often use this metric, I think it's worth having.

refs:
https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.7233&rep=rep1&type=pdf

Included also is precision and recall for clustering, I was not sure about the proper name (e.g. precision is already in use by Julia base).

The _pair_confusion_matrix is translated from sklearn's https://github.com/scikit-learn/scikit-learn/blob/2beed55847ee70d363bdbfe14ee4401438fba057/sklearn/metrics/cluster/_supervised.py#L154

there is a small duplication with the rand index, which also require this matrix, but as I did not want to modify it to use my new function right now, but in a separated or (if at all).

kmsquire · 2021-10-26T17:14:13Z

The _pair_confusion_matrix is translated from sklearn's https://github.com/scikit-learn/scikit-learn/blob/2beed55847ee70d363bdbfe14ee4401438fba057/sklearn/metrics/cluster/_supervised.py#L154

If it's a direct translation, you'll have to include the license here (or ask the scikit-learn folks if a translation of their code can be MIT licensed. But I'm guessing that would be difficult.).

I don't do much with this package, but I can review this. However, we'll need to figure out the license stuff first (i.e., do we really want to include BSD licensed code here.)

If _pair_confusion_matrix is simple (and it sounds like it should be), you could just include a description of the code, remove it from here, and someone else can implement it. Maybe me, but it can be anyone that hasn't seen the scikit-learn code. That way, we wouldn't have to worry about the license.

nalimilan · 2021-11-27T17:43:26Z

Given how short and simple the code is, it probably won't have to be considered as derived from NumPy if you adapt it to make it more Julian and more efficient, as in the end the only think that will remain from NumPy is the algorithm. For example, sum(c.*c) should we written as sum(abs2, c), sum(c,dims=1)[:] as vec(sum(c, dims=1)) and so on.

dinarior · 2021-11-27T18:22:02Z

Thanks,
I intend to rewrite it, maybe extract common functionalities from the ARI metric, hopefully, will get to it soon enough.

wildart · 2021-12-25T20:02:44Z

I've just re-implemented this functionality in #227 to fix ARI calculations.

dinarior · 2021-12-26T07:23:59Z

Great!, I will wait for it to get pushed and update this commit accordingly.

fmeasure

0161f88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added pair counting fmeasure metric#220

Added pair counting fmeasure metric#220
dinarior wants to merge 1 commit intoJuliaStats:masterfrom
dinarior:fmeasure

dinarior commented Aug 22, 2021 •

edited

Loading

Uh oh!

kmsquire commented Oct 26, 2021

Uh oh!

nalimilan commented Nov 27, 2021

Uh oh!

dinarior commented Nov 27, 2021

Uh oh!

wildart commented Dec 25, 2021 •

edited

Loading

Uh oh!

dinarior commented Dec 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dinarior commented Aug 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmsquire commented Oct 26, 2021

Uh oh!

nalimilan commented Nov 27, 2021

Uh oh!

dinarior commented Nov 27, 2021

Uh oh!

wildart commented Dec 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dinarior commented Dec 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dinarior commented Aug 22, 2021 •

edited

Loading

wildart commented Dec 25, 2021 •

edited

Loading