Skip to content

Commit 290f9d0

Browse files
authored
Run CodeQL as part of the CI (#771)
This PR adds a [CodeQL](https://codeql.github.com/) workflow to this repository. CodeQL is touted as a "semantic code analysis engine", i.e. its intention isn't really a full industry-grade static code analyzer like Coverity (which we [already run in our CI builds](https://github.com/microsoft/git/actions/workflows/coverity.yml)). This shows in the number of queries we have to suppress because they would result in an unwieldy number of false positives. Or, one might argue, it shows how convoluted part of Git's logic is, which may not only confuse CodeQL but also human readers. The latter might not be a _direct_ security issue, but as Tony Hoare [said](https://en.wikiquote.org/wiki/C._A._R._Hoare#The_Emperor's_Old_Clothes): > There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. I tried to make the code simpler with gitgitgadget#1888 and with gitgitgadget#1890, but the discussion went astray from my original purpose, instead veering off into the direction of arguing in complicated lines of reasoning that the current code is actually correct. Nevertheless, I do include these patches here, structured via merge commits to make it easier to drop them should too many merge conflicts in rebases to upstream Git versions make that advisable. However, for most of the alerts, the approach I took is exclude the queries wholesale, when they caused alerts. I could have taken the approach to suppress the alerts in a fine-grained way (via `codeql[<query-name>]`) so that true positives aren't suppressed in addition to false positives. However, I am loathe to take that approach because a voice inside me fully expects backlash on the Git mailing list along the lines of :"You're littering the code just to appease CodeQL!". Theoretically, an alternative exists: To develop modified versions of those CodeQL queries, e.g. to ignore paths inside the `.git/` directory in the `cpp/toctou-race-condition` query. However, CodeQL is a language that is not only (intentionally?) difficult to develop in by virtue of being declarative instead of imperative, its debugging facilities are pretty much non-existent. Given all of the above, the obvious question begs itself: Why bother with CodeQL at all? The truthful answer is: It is mandated by the Secure Future Initiative. And who knows, maybe CodeQL will come in handy in the future? After all, it is a framework more than a solution, and should in principle be able to help with answering questions like: "Which call paths into `libgit.a` could result in `die()` being called?".
2 parents e45b252 + 3e70801 commit 290f9d0

File tree

23 files changed

+254
-17
lines changed

23 files changed

+254
-17
lines changed

.github/codeql/codeql-config.yml

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
name: "CodeQL config"
2+
3+
queries:
4+
- uses: security-extended
5+
6+
query-filters:
7+
- exclude:
8+
# yes, this extra indentation is intentional
9+
# too common in Git's source code
10+
id: cpp/trivial-switch
11+
- exclude:
12+
id: cpp/loop-variable-changed
13+
- exclude:
14+
# we override this locally with a modified version
15+
id: cpp/non-constant-format
16+
- exclude:
17+
# Git does not consider this a problem
18+
id: cpp/irregular-enum-init
19+
- exclude:
20+
# Git has many long functions, this alert would match too many
21+
id: cpp/poorly-documented-function
22+
- exclude:
23+
# In Git, there is a lot of commented-out code
24+
id: cpp/commented-out-code
25+
- exclude:
26+
# While it is true that long switch cases are hard to read and
27+
# validate, Git has way too many for us to allow this query to
28+
# churn out alerts left and right
29+
id: cpp/long-switch
30+
- exclude:
31+
# CodeQL does not expect Git to heed the umask(), but it does
32+
id: cpp/world-writable-file-creation
33+
- exclude:
34+
# Git uses the construct `if (<not this>) ; else ...` often, to
35+
# avoid an extra indentation level. CodeQL does not like that.
36+
id: cpp/empty-block
37+
- exclude:
38+
# This rule unfortunately triggers some false positives, e.g.
39+
# where Git tries to redact URLs or where Git specifically
40+
# asks for a password upon GIT_SSL_CERT_PASSWORD_PROTECTED.
41+
id: cpp/user-controlled-bypass
42+
- exclude:
43+
# This rule fails to recognize that xmallocz() _specifically_
44+
# makes room for a trailing NUL, and instead assumes that this
45+
# function behaves like malloc(), which does not.
46+
id: cpp/invalid-pointer-deref
47+
- exclude:
48+
# CodeQL fails to recognize that xmallocz() accounts for the NUL,
49+
# instead assuming malloc() semantics.
50+
id: cpp/no-space-for-terminator
51+
- exclude:
52+
# Git does exchange plain-text passwords via stdin/stdout e.g.
53+
# with helpers in the credential protocol, or in credential-cache.
54+
# This rule, though, assumes that writing to _any_ file descriptor
55+
# is unsafe.
56+
id: cpp/cleartext-storage-file
57+
- exclude:
58+
# When storing the value of the environment variable `PWD` as the
59+
# current directory in absolute_pathdup(), or when allocating memory
60+
# for a binary patch where the size is specified in the patch itself,
61+
# CodeQL assumes that this can lead to a denial of service because
62+
# of an unbounded size, but Git's code works as designed here.
63+
id: cpp/uncontrolled-allocation-size
64+
- exclude:
65+
# lock_repo_for_gc() has admittedly obtuse logic to parse the
66+
# process ID out of the `gc.pid` file, which is correct, but
67+
# due to its construction throws a false positive here.
68+
id: cpp/missing-check-scanf
69+
- exclude:
70+
# discard_cache_entry() overwrites the name in a FLEX_ARRAY struct
71+
# if GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES is set, which CodeQL fails
72+
# to recognize as valid.
73+
id: cpp/overrun-write
74+
- exclude:
75+
# Since `time_t` can be signed or unsigned, there is unfortunately
76+
# no way to avoid letting this rule report a potential
77+
id: cpp/integer-multiplication-cast-to-long
78+
- exclude:
79+
# There are many, many legitimate code paths in Git where a path is
80+
# constructed from an environment variable, e.g. GIT_DIR. Let's suppress
81+
# this slightly overzealous query.
82+
id: cpp/path-injection
83+
- exclude:
84+
# Git has 99 instances of this at the time of writing :-(
85+
id: cpp/declaration-hides-variable
86+
- exclude:
87+
id: cpp/declaration-hides-parameter
88+
- exclude:
89+
id: cpp/local-variable-hides-global-variable
90+
- exclude:
91+
id: cpp/complex-condition
92+
- exclude:
93+
# Nested, long-winded switch statements are hard to read and hard
94+
# to reason about. Looking at you, `format_commit_one()`.
95+
id: cpp/complex-block
96+
- exclude:
97+
# There are four instances of this at time of writing, all intentional.
98+
# However, it is very easy to introduce unintentional re-use of loop
99+
# variable names, therefore we will most likely want to either change these
100+
# instances or add suppressions.
101+
id: cpp/nested-loops-with-same-variable
102+
- exclude:
103+
# zOMG so many FIXMEs
104+
id: cpp/fixme-comment
105+
- exclude:
106+
# Git assumes quite a bit about the user's control of the current worktree
107+
# Therefore, it kind of assumes that TOCTOU issues are not a thing when
108+
# it comes to files.
109+
id: cpp/toctou-race-condition
110+
- exclude:
111+
# Too many results in Git where the code was, however, intentionally written
112+
# the way it is.
113+
id: cpp/stack-address-escape
114+
- exclude:
115+
id: cpp/inconsistent-null-check
116+
- exclude:
117+
# This would trigger alerts in the functions in `help.c` that want to open
118+
# external programs to show manual pages.
119+
id: cpp/uncontrolled-process-operation
120+
- exclude:
121+
# The code in t/unit-tests/u-ctype.c implicitly exercises the `sane_istest()`
122+
# macro extensively, and CodeQL seems to miss the cast to `(unsigned char)`,
123+
# thereby mistaking the accesses for being past the end of the array (which
124+
# is incorrect).
125+
#
126+
# Ideally, we would exclude test programs from CodeQL anyways, but
127+
# unfortunately there is no Makefile rule in Git's code base to build only
128+
# the production code, and CodeQL's `paths-ignore` directive described at
129+
# https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning#specifying-directories-to-scan
130+
# unfortunately is _ignored_ for compiled languages.
131+
id: cpp/overflow-buffer

.github/workflows/codeql.yml

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: "CodeQL"
2+
3+
on:
4+
push:
5+
pull_request:
6+
workflow_dispatch:
7+
8+
jobs:
9+
analyze:
10+
name: Analyze
11+
runs-on: ubuntu-latest
12+
permissions:
13+
actions: read
14+
contents: read
15+
security-events: write
16+
17+
strategy:
18+
fail-fast: false
19+
matrix:
20+
language: ["cpp"]
21+
22+
steps:
23+
- name: Checkout repository
24+
uses: actions/checkout@v3
25+
26+
- name: Install dependencies
27+
run: ci/install-dependencies.sh
28+
if: matrix.language == 'cpp'
29+
env:
30+
jobname: codeql
31+
32+
# Initializes the CodeQL tools for scanning.
33+
- name: Initialize CodeQL
34+
uses: github/codeql-action/init@v3
35+
with:
36+
languages: ${{ matrix.language }}
37+
config-file: ./.github/codeql/codeql-config.yml
38+
39+
- name: Build
40+
if: matrix.language == 'cpp'
41+
run: |
42+
cat /proc/cpuinfo
43+
make -j$(nproc)
44+
45+
- name: Perform CodeQL Analysis
46+
uses: github/codeql-action/analyze@v3
47+
with:
48+
upload: False
49+
output: sarif-results
50+
51+
- name: debug
52+
shell: bash
53+
run: ls -la sarif-results
54+
55+
- name: publish sarif for debugging
56+
uses: actions/upload-artifact@v4
57+
with:
58+
name: sarif-results
59+
path: sarif-results
60+
61+
- name: Upload SARIF
62+
uses: github/codeql-action/upload-sarif@v3
63+
with:
64+
sarif_file: sarif-results/cpp.sarif

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,3 +258,5 @@ Release/
258258
CMakeSettings.json
259259
/contrib/libgit-rs/target
260260
/contrib/libgit-sys/target
261+
/.github/codeql/.cache/
262+
/.github/codeql/codeql-pack.lock.yml

branch.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,8 @@ static int inherit_tracking(struct tracking *tracking, const char *orig_ref)
224224
skip_prefix(orig_ref, "refs/heads/", &bare_ref);
225225

226226
branch = branch_get(bare_ref);
227+
if (!branch)
228+
BUG("could not get branch for '%s", bare_ref);
227229
if (!branch->remote_name) {
228230
warning(_("asked to inherit tracking from '%s', but no remote is set"),
229231
bare_ref);

builtin/cat-file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
106106
struct object_id oid;
107107
enum object_type type;
108108
char *buf;
109-
unsigned long size;
109+
unsigned long size = 0;
110110
struct object_context obj_context = {0};
111111
struct object_info oi = OBJECT_INFO_INIT;
112112
unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;

builtin/describe.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,8 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
324324
unsigned int unannotated_cnt = 0;
325325

326326
cmit = lookup_commit_reference(the_repository, oid);
327+
if (!cmit)
328+
die(_("could not look up commit '%s'"), oid_to_hex(oid));
327329

328330
n = find_commit_name(&cmit->object.oid);
329331
if (n && (tags || all || n->prio == 2)) {

builtin/fetch.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -555,7 +555,7 @@ static struct ref *get_ref_map(struct remote *remote,
555555
if (remote &&
556556
(remote->fetch.nr ||
557557
/* Note: has_merge implies non-NULL branch->remote_name */
558-
(has_merge && !strcmp(branch->remote_name, remote->name)))) {
558+
(has_merge && branch && !strcmp(branch->remote_name, remote->name)))) {
559559
for (i = 0; i < remote->fetch.nr; i++) {
560560
get_fetch_map(remote_refs, &remote->fetch.items[i], &tail, 0);
561561
if (remote->fetch.items[i].dst &&
@@ -573,6 +573,7 @@ static struct ref *get_ref_map(struct remote *remote,
573573
* Note: has_merge implies non-NULL branch->remote_name
574574
*/
575575
if (has_merge &&
576+
branch &&
576577
!strcmp(branch->remote_name, remote->name))
577578
add_merge_config(&ref_map, remote_refs, branch, &tail);
578579
} else if (!prefetch) {
@@ -2563,6 +2564,11 @@ int cmd_fetch(int argc,
25632564
die(_("must supply remote when using --negotiate-only"));
25642565
gtransport = prepare_transport(remote, 1);
25652566
if (gtransport->smart_options) {
2567+
/*
2568+
* Intentionally assign the address of a local variable
2569+
* to a non-local struct's field.
2570+
* codeql[cpp/stack-address-escape]
2571+
*/
25662572
gtransport->smart_options->acked_commits = &acked_commits;
25672573
} else {
25682574
warning(_("protocol does not support --negotiate-only, exiting"));

builtin/push.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ static void refspec_append_mapped(struct refspec *refspec, const char *ref,
9090
if (push_default == PUSH_DEFAULT_UPSTREAM &&
9191
skip_prefix(matched->name, "refs/heads/", &branch_name)) {
9292
struct branch *branch = branch_get(branch_name);
93-
if (branch->merge_nr == 1 && branch->merge[0]->src) {
93+
if (branch && branch->merge_nr == 1 && branch->merge[0]->src) {
9494
refspec_appendf(refspec, "%s:%s",
9595
ref, branch->merge[0]->src);
9696
return;

builtin/stash.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
284284
memset(&opts, 0, sizeof(opts));
285285

286286
tree = parse_tree_indirect(i_tree);
287-
if (parse_tree(tree))
287+
if (!tree || parse_tree(tree))
288288
return -1;
289289

290290
init_tree_desc(t, &tree->object.oid, tree->buffer, tree->size);
@@ -1395,6 +1395,11 @@ static int do_create_stash(const struct pathspec *ps, struct strbuf *stash_msg_b
13951395
goto done;
13961396
} else {
13971397
head_commit = lookup_commit(the_repository, &info->b_commit);
1398+
if (!head_commit) {
1399+
ret = error(_("could not look up commit '%s'"),
1400+
oid_to_hex (&info->b_commit));
1401+
goto done;
1402+
}
13981403
}
13991404

14001405
if (!check_changes(ps, include_untracked, &untracked_files)) {

builtin/submodule--helper.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1934,6 +1934,9 @@ static int determine_submodule_update_strategy(struct repository *r,
19341934
const char *val;
19351935
int ret;
19361936

1937+
if (!sub)
1938+
return error(_("could not retrieve submodule information for path '%s'"), path);
1939+
19371940
key = xstrfmt("submodule.%s.update", sub->name);
19381941

19391942
if (update) {

0 commit comments

Comments
 (0)