Skip to content

Taking weighting seriously #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 208 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
208 commits
Select commit Hold shift + click to select a range
1754cbd
WIP
gragusa Jun 10, 2022
1d778a5
WIP
gragusa Jun 15, 2022
12121a3
WIP
gragusa Jun 15, 2022
4363ba4
Taking weights seriously
gragusa Jun 17, 2022
ca702dc
WIP
gragusa Jun 18, 2022
e2b2d12
Taking weights seriously
gragusa Jun 21, 2022
bc8709a
Merge branch 'master' of https://github.com/JuliaStats/GLM.jl into Ju…
gragusa Jun 21, 2022
84cd990
Add depwarn for passing wts with Vector
gragusa Jun 22, 2022
cbc329f
Cosmettic changes
gragusa Jun 22, 2022
23d67f5
WIP
gragusa Jun 23, 2022
f4d90a9
Fix loglik for weighted models
gragusa Jul 4, 2022
6b7d95c
Fix remaining issues
gragusa Jul 15, 2022
c236b82
Final commit
gragusa Jul 15, 2022
d4bd0c2
Merge branch 'master'
gragusa Jul 15, 2022
8bdfb55
Fix merge
gragusa Jul 15, 2022
3eb2ca4
Fix nulldeviance
gragusa Jul 16, 2022
63c8358
Bypass crossmodelmatrix drom StatsAPI
gragusa Jul 16, 2022
e93a919
Delete momentmatrix.jl
gragusa Jul 16, 2022
7bb0959
Delete scratch.jl
gragusa Jul 16, 2022
ded17a8
Delete settings.json
gragusa Jul 16, 2022
3346774
AbstractWeights are required to be real
gragusa Sep 5, 2022
7376e78
Update src/glmfit.jl
gragusa Sep 5, 2022
a738268
Apply suggestions from code review
gragusa Sep 5, 2022
c9459e7
Merge pull request #2 from JuliaStats/master
gragusa Sep 5, 2022
6af3ca5
Throw error if GlmResp are not AbastractWeights
gragusa Sep 5, 2022
0ded1d4
Addressing review comments
gragusa Sep 5, 2022
d923e48
Reexport aweights, pweights, fweights
gragusa Sep 5, 2022
84f27d1
Fixed remaining issues with null loglikelihood
gragusa Sep 6, 2022
8804dc1
Fix nullloglikelihood tests
gragusa Sep 6, 2022
7f3aa36
Do not dispatch on Weights but use if
gragusa Sep 6, 2022
f67a8e0
Do not dispatch on Weights use if
gragusa Sep 6, 2022
23a3e87
Fix inferred test
gragusa Sep 6, 2022
5481284
Use if instead of dispatching on Weights
gragusa Sep 6, 2022
d12222e
Add doc for weights and fix output
gragusa Sep 7, 2022
a17e812
Fix docs failures
gragusa Sep 7, 2022
58dec0c
Fix pweights stderror even for rank deficient des
gragusa Sep 7, 2022
a6f5c66
Add test for pweights stderror
gragusa Sep 7, 2022
92ddb1e
Export UnitWeights
gragusa Sep 7, 2022
0c61fff
Fix documentation
gragusa Sep 7, 2022
8b0e8e1
Mkae cooksdistance work with rank deficient design
gragusa Sep 7, 2022
f609f06
Test cooksdistance with rank deficient design
gragusa Sep 7, 2022
23f3d03
Fix CholeskyPivoted signature in docs
gragusa Sep 8, 2022
2749b84
Make nancolidx v1.0 and v1.1 friendly
gragusa Sep 8, 2022
82e472b
Fix signatures
gragusa Sep 9, 2022
2d6aaed
Correct implementation of momentmatrix
gragusa Sep 9, 2022
dbc9ae9
Test moment matrix
gragusa Sep 9, 2022
e0d9cdf
Apply suggestions from code review
gragusa Sep 23, 2022
46e8f92
Incorporate suggestions of reviewer
gragusa Sep 23, 2022
6df401b
Deals with review comments
gragusa Sep 24, 2022
ca15eb8
Small fix
gragusa Sep 24, 2022
0c18ae9
Small fix
gragusa Sep 25, 2022
54d68d1
Apply suggestions from code review
gragusa Oct 3, 2022
422a8cd
Merge branch 'master' into JuliaStats-master
gragusa Oct 3, 2022
d6d4e6b
Fix vcov dispatch for vcov
gragusa Oct 3, 2022
b457d74
Fix dispatch of _vcov
gragusa Oct 3, 2022
b087679
Revert changes
gragusa Oct 3, 2022
a44e137
Update src/glmfit.jl
gragusa Oct 3, 2022
11db2c4
Fix weighted keyword in modelmatrix
gragusa Oct 3, 2022
b649d4f
perf in nulldeviance for unweighted models
gragusa Oct 3, 2022
170148c
Merge branch 'JuliaStats-master' of github.com:gragusa/GLM.jl into Ju…
gragusa Oct 3, 2022
29c43cb
Fixed std error for probability weights
gragusa Oct 19, 2022
279e533
Getting there (& switch Analytics to Importance)
gragusa Oct 20, 2022
afb145e
.= instead of copy!
gragusa Oct 20, 2022
2cead0a
Remove comments
gragusa Oct 20, 2022
a1ec49f
up
gragusa Oct 20, 2022
97bf28d
Speedup cooksdistance
gragusa Oct 23, 2022
9ce2d89
Revert back to AnalyticWeights
gragusa Oct 24, 2022
9bddf63
Add extensive tests for AnalyticWeights
gragusa Oct 24, 2022
3fe045a
Add extensive tests for AnalyticWeights
gragusa Oct 24, 2022
852e307
Delete scratch.jl
gragusa Oct 25, 2022
d1ba3e5
Delete analytic_weights.jl
gragusa Oct 25, 2022
831f280
Follow reviewer suggestions [Batch 1]
gragusa Nov 15, 2022
b00dc16
Follow reviewer's suggestions [Batch 2]
gragusa Nov 15, 2022
0825324
probability weights vcov uses momentmatrix
gragusa Nov 15, 2022
48d15fb
Fix ProbabilityWeights vcov and tests
gragusa Nov 16, 2022
3338eab
Use leverage from StasAPI
gragusa Nov 17, 2022
c27c749
Merge branch 'master' into JuliaStats-master
gragusa Nov 17, 2022
970e26e
Rebase against master
gragusa Nov 17, 2022
8832e9d
Fix test
gragusa Nov 17, 2022
9eb2390
Merge remote-tracking branch 'origin/master' into JuliaStats-master
gragusa Dec 20, 2022
587c129
Test on 1.6
gragusa Dec 20, 2022
fa63a9a
Address reviwer comments
gragusa Dec 29, 2022
807731a
Merge branch 'master' of github.com:JuliaStats/GLM.jl into JuliaStats…
gragusa Jun 16, 2023
72996fc
Merge branch 'master' into JuliaStats-master
andreasnoack Nov 19, 2024
1ee383a
Merge remote-tracking branch 'upstream/master' into JuliaStats-master
gragusa Nov 19, 2024
ba52ce9
Merge from origin
gragusa Nov 19, 2024
5e790df
Fix broken test of dof_residual
gragusa Nov 19, 2024
50c1a96
Fix testing issues
gragusa Nov 19, 2024
c4f7959
Fix docs
gragusa Nov 19, 2024
d2b5cb0
Added tests for ftest. They throw for pweights
gragusa Nov 25, 2024
cd165d7
Make ftest throw if a model weighted by pweights is passed
gragusa Nov 25, 2024
606a419
Fix how loglikelihood throws for pweights weighted models
gragusa Nov 25, 2024
a1a1e10
Merge branch 'master' of github.com:JuliaStats/GLM.jl into JuliaStats…
gragusa Nov 25, 2024
5d948de
Remove StatsPlots dependence.
gragusa Nov 25, 2024
4fb18df
Fix weighting with :qr method.
gragusa Nov 25, 2024
56d81ae
Add filter to jldoctest string
gragusa Dec 11, 2024
a2357cf
Fix problem with docstrings
gragusa Dec 11, 2024
6068d2a
Update docs/src/index.md
gragusa Dec 12, 2024
930a8cb
Remove trailing white spaces
gragusa Dec 12, 2024
107d17d
Add mention of UnitWeights in the weights discussion
gragusa Dec 12, 2024
a003b10
Remove trailing white spaces
gragusa Dec 12, 2024
1c06c7e
Change delbeta! signature
gragusa Dec 12, 2024
b41cce7
Add tests for dropcollinear=false
gragusa Dec 12, 2024
2730277
Minor cosmethic changes
gragusa Dec 12, 2024
cdeb1a3
Add weighting information in COMMON_FIT_KWARGS_DOCS
gragusa Dec 12, 2024
95d506e
Add test for leverage
gragusa Dec 13, 2024
f124589
[wip] work on leverage
gragusa Dec 13, 2024
cbdadbc
Use inverse
gragusa Dec 13, 2024
2386ab9
Test leverage
gragusa Dec 13, 2024
36326ff
Comment cookdistance
gragusa Dec 13, 2024
f26bc0e
Committed by mistake
gragusa Dec 13, 2024
2bc2138
leverage returns a vec
gragusa Dec 13, 2024
0569600
Fix cookdistance return type
gragusa Dec 13, 2024
dd1b4a8
Update docs/src/index.md
gragusa Dec 18, 2024
1c5953d
Update docs/src/index.md
gragusa Dec 18, 2024
cd39578
Update src/glmfit.jl
gragusa Dec 18, 2024
574ec69
Update src/linpred.jl
gragusa Dec 18, 2024
60f43a8
Adress outstanding issues
gragusa Mar 31, 2025
cb7b0a0
Merge branch 'JuliaStats-master' of github.com:gragusa/GLM.jl into Ju…
gragusa Mar 31, 2025
a8a4a34
Merge branch 'master' into JuliaStats-master
gragusa Apr 11, 2025
0f27b2f
Merge recent PRs
gragusa Apr 11, 2025
9f778ff
The final frontier
gragusa Apr 12, 2025
cb0a518
Merge branch 'master' into JuliaStats-master
gragusa Apr 12, 2025
ce6d8fd
The true final frontier
gragusa Apr 12, 2025
4d33245
Dealete dead code
gragusa Apr 13, 2025
9d63624
Update test/runtests.jl
gragusa Apr 13, 2025
214ce5e
Remove alien file
gragusa Apr 13, 2025
d6a5757
Merge branch 'JuliaStats-master' of github.com:gragusa/GLM.jl into Ju…
gragusa Apr 13, 2025
59781e2
Update test/runtests.jl
gragusa Apr 14, 2025
5023d30
Update src/lm.jl
gragusa Apr 14, 2025
992dcaa
Update docs/src/examples.md
gragusa Apr 14, 2025
4948b5a
Update src/linpred.jl
gragusa Apr 14, 2025
a005614
Merge branch 'JuliaStats-master' of github.com:gragusa/GLM.jl into Ju…
gragusa Apr 14, 2025
841da03
General cleanup [ci skip]
gragusa Apr 15, 2025
1d27c53
Add tests [ci skip]
gragusa Apr 15, 2025
6c2f3e5
Fix sparse GLM
gragusa Apr 15, 2025
4e9b765
Changes to make sparse reuse code
gragusa Apr 15, 2025
89a1d7b
Add tests for sparse and other corner cases
gragusa Apr 15, 2025
e661db9
Do not access private API for weights [no ci]
gragusa Apr 15, 2025
8db3d07
Re-add description of weights
gragusa Apr 15, 2025
e06acb5
Remove leftover [no ci]
gragusa Apr 15, 2025
c2a9103
Update probability_weights.jl
gragusa Apr 15, 2025
36997f1
Delete .vscode directory [no ci]
gragusa Apr 15, 2025
c0263aa
Update glmfit.jl
gragusa Apr 15, 2025
960f2fd
Update glmfit.jl [no ci]
gragusa Apr 15, 2025
9db9bf4
Update analytic_weights.jl
gragusa Apr 15, 2025
8a1bafb
Fix test_logs of warning in the binomial case
gragusa Apr 16, 2025
b977dd1
Update src/glmfit.jl
gragusa Apr 17, 2025
3901150
FIx sparse (broken in master) and make loglikelihood error for analyt…
gragusa Apr 17, 2025
6086c74
Improve Sparse case (vcov uses the dense code). Add tests for the spa…
gragusa Apr 17, 2025
e42d452
Update src/glmtools.jl [no ci]
gragusa Apr 18, 2025
74ce605
Update ext/GLMSparseArraysExt.jl [no ci]
gragusa Apr 18, 2025
cfa10d2
Update src/glmfit.jl [no ci]
gragusa Apr 18, 2025
02383cd
Add test for Sparse LM with probability weighted
gragusa Apr 18, 2025
025d10c
Format code
gragusa Apr 18, 2025
78db11d
Merge branch 'master' into JuliaStats-master
gragusa Apr 18, 2025
b7fec95
Implement cookdistance for glm. More testing.
gragusa Apr 18, 2025
0a78140
Merge branch 'JuliaStats-master' of github.com:gragusa/GLM.jl into Ju…
gragusa Apr 18, 2025
5a2af1c
Last commit.
gragusa Apr 18, 2025
5dbd074
Update src/glmfit.jl [no ci]
gragusa Apr 18, 2025
18d653a
Update src/lm.jl [no ci]
gragusa Apr 18, 2025
0ff9ecb
Update src/lm.jl [no ci]
gragusa Apr 18, 2025
328e898
Update src/lm.jl [no ci]
gragusa Apr 18, 2025
c06a174
Update src/glmfit.jl [no ci]
gragusa Apr 18, 2025
3b07d1e
Update src/glmfit.jl [no ci]
gragusa Apr 18, 2025
6efda93
Test residuals
gragusa Apr 18, 2025
ce6d39e
Merge branch 'JuliaStats-master' of github.com:gragusa/GLM.jl into Ju…
gragusa Apr 18, 2025
92b6894
Correct how we test for wts
gragusa Apr 18, 2025
4cef4c5
Add tests. Remove RDatasets from deps.
gragusa Apr 18, 2025
88d4767
Fix negbin and add more tests
gragusa Apr 19, 2025
caa8c0f
Update src/glmfit.jl [no ci]
gragusa Apr 21, 2025
3709d5e
Update src/glmfit.jl [no ci]
gragusa Apr 21, 2025
18e8f4c
Update src/lm.jl [no ci]
gragusa Apr 23, 2025
7ef083b
Update test/analytic_weights.jl [no ci]
gragusa Apr 23, 2025
419c835
Update test/analytic_weights.jl [no ci]
gragusa Apr 23, 2025
8507db1
Update src/lm.jl [no ci]
gragusa Apr 23, 2025
c6852a9
Update src/lm.jl [no ci]
gragusa Apr 23, 2025
a88b247
Update test/probability_weights.jl [no ci]
gragusa Apr 23, 2025
6d66c88
Update src/glmfit.jl [no ci]
gragusa Apr 23, 2025
d4a9a51
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
175b130
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
740ddb7
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
8f06f2a
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
8668eb6
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
a1c1a99
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
9523ae6
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
e7fbeac
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
c36c526
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
ce233b4
Update src/linpred.jl [no ci]
gragusa Apr 23, 2025
a3ff516
Update src/lm.jl [no ci]
gragusa Apr 23, 2025
81299f1
Update src/lm.jl [no ci]
gragusa Apr 23, 2025
fa23a79
Update src/negbinfit.jl [no ci]
gragusa Apr 23, 2025
28a950c
Update src/glmfit.jl [no ci]
gragusa Apr 23, 2025
38ff9a2
Update src/glmfit.jl [no ci]
gragusa Apr 23, 2025
064ad35
Update src/glmfit.jl [no ci]
gragusa Apr 23, 2025
7635b10
Update src/glmfit.jl [no ci]
gragusa Apr 23, 2025
53c2f8c
Update src/glmfit.jl [no ci]
gragusa Apr 23, 2025
125a44f
Update src/glmfit.jl [no ci]
gragusa Apr 23, 2025
5a5ae3d
Update src/glmfit.jl [no ci]
gragusa Apr 23, 2025
b839090
Fix comment in GlmResp
gragusa Apr 23, 2025
9ecf9b5
Fix how wts are checked for empty size
gragusa Apr 23, 2025
efbcbf6
Fix formatting [noci]
gragusa Apr 29, 2025
94d1044
Remove loglil_apweights_obs for Bernoulli and Binomial
gragusa Apr 29, 2025
ef6c86b
Fix formatting
gragusa Apr 29, 2025
786dd99
Fix formatting
gragusa Apr 29, 2025
a3d68bc
Test linkinv with CauchitLink
gragusa Apr 29, 2025
aff48d6
Fix CauchitLink linkinv
gragusa Apr 29, 2025
39300ab
Update src/lm.jl [no ci]
gragusa Apr 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
Optim = "429524aa-4258-5aef-a3af-852621145aeb"
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
StatsModels = "3eaba693-59b7-5ba5-a881-562e759f1c8d"

[compat]
DataFrames = "1"
Documenter = "1"
Optim = "1.6.2"
Optim = "1.6.2"
2 changes: 1 addition & 1 deletion docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

```@meta
DocTestSetup = quote
using CategoricalArrays, DataFrames, Distributions, GLM, RDatasets
using CategoricalArrays, DataFrames, Distributions, GLM, RDatasets, StableRNGs
end
```

Expand Down
43 changes: 11 additions & 32 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ julia> dof(ols)
3

julia> dof_residual(ols)
1.0
1

julia> round(aic(ols); digits=5)
5.84252
Expand Down Expand Up @@ -214,8 +214,8 @@ sales ^ 2 -6.94594e-9 3.72614e-9 -1.86 0.0725 -1.45667e-8 6.7487e-10
```jldoctest
julia> data = DataFrame(X=[1,2,2], Y=[1,0,1])
3×2 DataFrame
Row │ X Y
│ Int64 Int64
Row │ X Y
│ Int64 Int64
─────┼──────────────
1 │ 1 1
2 │ 2 0
Expand Down Expand Up @@ -319,8 +319,8 @@ julia> using GLM, RDatasets

julia> form = dataset("datasets", "Formaldehyde")
6×2 DataFrame
Row │ Carb OptDen
│ Float64 Float64
Row │ Carb OptDen
│ Float64 Float64
─────┼──────────────────
1 │ 0.1 0.086
2 │ 0.3 0.269
Expand Down Expand Up @@ -473,8 +473,8 @@ julia> dobson = DataFrame(Counts = [18.,17,15,20,10,21,25,13,13],
Outcome = categorical([1,2,3,1,2,3,1,2,3]),
Treatment = categorical([1,1,1,2,2,2,3,3,3]))
9×3 DataFrame
Row │ Counts Outcome Treatment
│ Float64 Cat… Cat…
Row │ Counts Outcome Treatment
│ Float64 Cat… Cat…
─────┼─────────────────────────────
1 │ 18.0 1 1
2 │ 17.0 2 1
Expand Down Expand Up @@ -513,29 +513,8 @@ In this example, we choose the best model from a set of λs, based on minimum BI
```jldoctest; filter = r"(\d*)\.(\d{6})\d+" => s"\1.\2"
julia> using GLM, RDatasets, StatsBase, DataFrames, Optim

julia> trees = DataFrame(dataset("datasets", "trees"))
31×3 DataFrame
Row │ Girth Height Volume
│ Float64 Int64 Float64
─────┼──────────────────────────
1 │ 8.3 70 10.3
2 │ 8.6 65 10.3
3 │ 8.8 63 10.2
4 │ 10.5 72 16.4
5 │ 10.7 81 18.8
6 │ 10.8 83 19.7
7 │ 11.0 66 15.6
8 │ 11.0 75 18.2
⋮ │ ⋮ ⋮ ⋮
25 │ 16.3 77 42.6
26 │ 17.3 81 55.4
27 │ 17.5 82 55.7
28 │ 17.9 80 58.3
29 │ 18.0 80 51.5
30 │ 18.0 80 51.0
31 │ 20.6 87 77.0
16 rows omitted

julia> trees = DataFrame(dataset("datasets", "trees"));

julia> bic_glm(λ) = bic(glm(@formula(Volume ~ Height + Girth), trees, Normal(), PowerLink(λ)));

julia> optimal_bic = optimize(bic_glm, -1.0, 1.0);
Expand All @@ -554,9 +533,9 @@ Coefficients:
────────────────────────────────────────────────────────────────────────────
(Intercept) -1.07586 0.352543 -3.05 0.0023 -1.76684 -0.384892
Height 0.0232172 0.00523331 4.44 <1e-05 0.0129601 0.0334743
Girth 0.242837 0.00922555 26.32 <1e-99 0.224756 0.260919
Girth 0.242837 0.00922556 26.32 <1e-99 0.224756 0.260919
────────────────────────────────────────────────────────────────────────────

julia> round(optimal_bic.minimum, digits=5)
156.37638
```
```
158 changes: 134 additions & 24 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The [RDatasets package](https://github.com/johnmyleswhite/RDatasets.jl) is usefu
Two methods can be used to fit a Generalized Linear Model (GLM):
`glm(formula, data, family, link)` and `glm(X, y, family, link)`.
Their arguments must be:

- `formula`: a [StatsModels.jl `Formula` object](https://juliastats.org/StatsModels.jl/stable/formula/)
referring to columns in `data`; for example, if column names are `:Y`, `:X1`, and `:X2`,
then a valid formula is `@formula(Y ~ X1 + X2)`
Expand Down Expand Up @@ -123,10 +124,115 @@ x: 4 -0.032673 0.0797865 -0.41 0.6831 -0.191048 0.125702
───────────────────────────────────────────────────────────────────────────
```

## Weighting

Both `lm` and `glm` allow weighted estimation. The four different
[types of weights](https://juliastats.org/StatsBase.jl/stable/weights/) defined in
[StatsBase.jl](https://github.com/JuliaStats/StatsBase.jl) can be used to fit a model:

- `AnalyticWeights` describe a non-random relative importance (usually between 0 and 1) for
each observation. These weights may also be referred to as reliability weights, precision
weights or inverse variance weights. These are typically used when the observations being
weighted are aggregate values (e.g., averages) with differing variances.
- `FrequencyWeights` describe the number of times (or frequency) each observation was seen.
These weights may also be referred to as case weights or repeat weights.
- `ProbabilityWeights` represent the inverse of the sampling probability for each observation,
providing a correction mechanism for under- or over-sampling certain population groups.
These weights may also be referred to as sampling weights.
- `UnitWeights` attribute a weight of 1 to each observation, which corresponds
to unweighted regression (the default).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment somewhere how these weights are later treated in estimation?

To indicate which kind of weights should be used, the vector of weights must be wrapped in
one of the three weights types, and then passed to the `weights` keyword argument.
Short-hand functions `aweights`, `fweights`, and `pweights` can be used to construct
`AnalyticWeights`, `FrequencyWeights`, and `ProbabilityWeights`, respectively.

We illustrate the API with randomly generated data.

```jldoctest weights
julia> using StableRNGs, DataFrames, GLM

julia> data = DataFrame(y = rand(StableRNG(1), 100), x = randn(StableRNG(2), 100), weights = repeat([1, 2, 3, 4], 25));

julia> m = lm(@formula(y ~ x), data)
LinearModel

y ~ 1 + x

Coefficients:
──────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept) 0.517369 0.0280232 18.46 <1e-32 0.461758 0.57298
x -0.0500249 0.0307201 -1.63 0.1066 -0.110988 0.0109382
──────────────────────────────────────────────────────────────────────────

julia> m_aweights = lm(@formula(y ~ x), data, wts=aweights(data.weights))
LinearModel

y ~ 1 + x

Coefficients:
──────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept) 0.51673 0.0270707 19.09 <1e-34 0.463009 0.570451
x -0.0478667 0.0308395 -1.55 0.1239 -0.109067 0.0133333
──────────────────────────────────────────────────────────────────────────

julia> m_fweights = lm(@formula(y ~ x), data, wts=fweights(data.weights))
LinearModel

y ~ 1 + x

Coefficients:
─────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────────
(Intercept) 0.51673 0.0170172 30.37 <1e-84 0.483213 0.550246
x -0.0478667 0.0193863 -2.47 0.0142 -0.0860494 -0.00968394
─────────────────────────────────────────────────────────────────────────────

julia> m_pweights = lm(@formula(y ~ x), data, wts=pweights(data.weights))
LinearModel

y ~ 1 + x

Coefficients:
───────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────────────
(Intercept) 0.51673 0.0287193 17.99 <1e-32 0.459737 0.573722
x -0.0478667 0.0265532 -1.80 0.0745 -0.100561 0.00482739
───────────────────────────────────────────────────────────────────────────

```

!!! warning

In the old API, weights were passed as `AbstractVectors` and were silently treated in
the internal computation of standard errors and related quantities as `FrequencyWeights`.
Passing weights as `AbstractVector` is still allowed for backward compatibility, but it
is deprecated. When weights are passed following the old API, they are now coerced to
`FrequencyWeights` and a deprecation warning is issued.

The type of the weights will affect the variance of the estimated coefficients and the
quantities involving this variance. The coefficient point estimates will be the same
regardless of the type of weights.

```jldoctest weights
julia> loglikelihood(m_aweights)
-16.29630756138424

julia> loglikelihood(m_fweights)
-25.518609617564483
```

## Comparing models with F-test

Comparisons between two or more linear models can be performed using the `ftest` function,
which computes an F-test between each pair of subsequent models and reports fit statistics:

```jldoctest
julia> using DataFrames, GLM, StableRNGs

Expand All @@ -149,6 +255,7 @@ F-test: 2 models fitted on 50 observations
## Methods applied to fitted models

Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org).

- `adjr2`: adjusted R² for a linear model (an alias for `adjr²`)
- `aic`: Akaike's Information Criterion
- `aicc`: corrected Akaike's Information Criterion for small sample sizes
Expand All @@ -175,9 +282,8 @@ Many of the methods provided by this package have names similar to those in [R](
- `stderror`: standard errors of the coefficients
- `vcov`: variance-covariance matrix of the coefficient estimates


Note that the canonical link for negative binomial regression is `NegativeBinomialLink`, but
in practice one typically uses `LogLink`.
Note that the canonical link for negative binomial regression is `NegativeBinomialLink`,
but in practice one typically uses `LogLink`.

```jldoctest methods
julia> using GLM, DataFrames, StatsBase
Expand Down Expand Up @@ -209,7 +315,9 @@ julia> round.(predict(mdl, test_data); digits=8)
9.33333333
```

The [`cooksdistance`](@ref) method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation used to fit a linear model, giving an estimate of the influence of each data point.
The [`cooksdistance`](@ref) method computes
[Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation
used to fit a linear model, giving an estimate of the influence of each data point.
Note that it's currently only implemented for linear models without weights.

```jldoctest methods
Expand All @@ -223,45 +331,47 @@ julia> round.(cooksdistance(mdl); digits=8)
## Separation of response object and predictor object

The general approach in this code is to separate functionality related
to the response from that related to the linear predictor. This
to the response from that related to the linear predictor. This
allows for greater generality by mixing and matching different
subtypes of the abstract type ```LinPred``` and the abstract type ```ModResp```.
subtypes of the abstract type `LinPred` and the abstract type `ModResp`.

A ```LinPred``` type incorporates the parameter vector and the model
matrix. The parameter vector is a dense numeric vector but the model
matrix can be dense or sparse. A ```LinPred``` type must incorporate
A `LinPred` type incorporates the parameter vector and the model
matrix. The parameter vector is a dense numeric vector but the model
matrix can be dense or sparse. A `LinPred` type must incorporate
some form of a decomposition of the weighted model matrix that allows
for the solution of a system ```X'W * X * delta=X'wres``` where ```W``` is a
for the solution of a system `X'W * X * delta=X'wres` where `W` is a
diagonal matrix of "X weights", provided as a vector of the square
roots of the diagonal elements, and ```wres``` is a weighted residual vector.
roots of the diagonal elements, and `wres` is a weighted residual vector.

Currently there are two dense predictor types, ```DensePredQR``` and
```DensePredChol```, and the usual caveats apply. The Cholesky
Currently there are two dense predictor types, `DensePredQR` and
`DensePredChol`, and the usual caveats apply. The Cholesky
version is faster but somewhat less accurate than that QR version.
The skeleton of a distributed predictor type is in the code
but not yet fully fleshed out. Because Julia by default uses
but not yet fully fleshed out. Because Julia by default uses
OpenBLAS, which is already multi-threaded on multicore machines, there
may not be much advantage in using distributed predictor types.

A ```ModResp``` type must provide methods for the ```wtres``` and
```sqrtxwts``` generics. Their values are the arguments to the
```updatebeta``` methods of the ```LinPred``` types. The
```Float64``` value returned by ```updatedelta``` is the value of the
A `ModResp` type must provide methods for the `wtres` and
`sqrtxwts` generics. Their values are the arguments to the
`updatebeta` methods of the `LinPred` types. The
`Float64` value returned by `updatedelta` is the value of the
convergence criterion.

Similarly, ```LinPred``` types must provide a method for the
```linpred``` generic. In general ```linpred``` takes an instance of
a ```LinPred``` type and a step factor. Methods that take only an instance
of a ```LinPred``` type use a default step factor of 1. The value of
```linpred``` is the argument to the ```updatemu``` method for
```ModResp``` types. The ```updatemu``` method returns the updated
Similarly, `LinPred` types must provide a method for the
`linpred` generic. In general `linpred` takes an instance of
a `LinPred` type and a step factor. Methods that take only an instance
of a `LinPred` type use a default step factor of 1. The value of
`linpred` is the argument to the `updatemu` method for
`ModResp` types. The `updatemu` method returns the updated
deviance.

## Debugging failed fits

In the rare cases when a fit of a generalized linear model fails, it can be useful
to enable more output from the fitting steps. This can be done through
the Julia logging mechanism by setting `ENV["JULIA_DEBUG"] = GLM`. Enabling debug output
will result in ouput like the following

```julia
┌ Debug: Iteration: 1, deviance: 5.129147109764238, diff.dev.:0.05057195315968688
└ @ GLM ~/.julia/dev/GLM/src/glmfit.jl:418
Expand Down
Loading