Skip to content

[Bug] The abnormal results in data flow analysis may be caused by an error in the REACHING_DEF edge. #5652

@yikesoftware

Description

@yikesoftware

Joern Version


     ██╗ ██████╗ ███████╗██████╗ ███╗   ██╗
     ██║██╔═══██╗██╔════╝██╔══██╗████╗  ██║
     ██║██║   ██║█████╗  ██████╔╝██╔██╗ ██║
██   ██║██║   ██║██╔══╝  ██╔══██╗██║╚██╗██║
╚█████╔╝╚██████╔╝███████╗██║  ██║██║ ╚████║
 ╚════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝  ╚═══╝
Version: 4.0.421
Type `help` to begin

BUG Description

When I apply this query

//var source = cpg.call.name("<operator>.addition").codeExact("hdrlen + datalen").l
var source = cpg.identifier.nameExact("total_len").lineNumber(10).l

cpg.call("memcpy").where(_.argument(3).reachableBy(source)).l

on the following C code

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

int main(void) {
    char *hdr = "hdr";
    char *data = "data";
    size_t hdrlen = strlen(hdr);
    size_t datalen = strlen(data);
    size_t total_len = hdrlen + datalen;
    char *combined = malloc(total_len + 1);
    char *dst = malloc(0x100);
    memcpy(combined, hdr, hdrlen);
    memcpy(combined + hdrlen, data, datalen);
    memcpy(dst, combined, total_len);
}

I got the following results:

val res13: List[io.shiftleft.codepropertygraph.generated.nodes.Call] = List(
  Call(
    argumentIndex = -1,
    argumentName = None,
    code = "memcpy(combined, hdr, hdrlen)",
    columnNumber = Some(value = 5),
    dispatchType = "STATIC_DISPATCH",
    dynamicTypeHintFullName = IndexedSeq(),
    lineNumber = Some(value = 13),
    methodFullName = "memcpy",
    name = "memcpy",
    offset = None,
    offsetEnd = None,
    order = 13,
    possibleTypes = IndexedSeq(),
    signature = "",
    staticReceiver = None,
    typeFullName = "ANY"
  ),
  Call(
    argumentIndex = -1,
    argumentName = None,
    code = "memcpy(combined + hdrlen, data, datalen)",
    columnNumber = Some(value = 5),
    dispatchType = "STATIC_DISPATCH",
    dynamicTypeHintFullName = IndexedSeq(),
    lineNumber = Some(value = 14),
    methodFullName = "memcpy",
    name = "memcpy",
    offset = None,
    offsetEnd = None,
    order = 14,
    possibleTypes = IndexedSeq(),
    signature = "",
    staticReceiver = None,
    typeFullName = "ANY"
  ),
  Call(
    argumentIndex = -1,
    argumentName = None,
    code = "memcpy(dst, combined, total_len)",
    columnNumber = Some(value = 5),
    dispatchType = "STATIC_DISPATCH",
    dynamicTypeHintFullName = IndexedSeq(),
    lineNumber = Some(value = 15),
    methodFullName = "memcpy",
    name = "memcpy",
    offset = None,
    offsetEnd = None,
    order = 15,
    possibleTypes = IndexedSeq(),
    signature = "",
    staticReceiver = None,
    typeFullName = "ANY"
  )
)

It's strange that, according to semantics, theoretically only the last memcpy function call should be reached by 'total_len' here But here, three results are given.

Through the code CPG, I found that for example, in line 13, the combine variable and hdrlen have a REACHING_DEF edge to each other, which looks a bit confusing Perhaps this is indeed a bug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions