Skip to content

Edited the third testing chapter - arrays.md #139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 115 additions & 34 deletions docs/getting-started/tutorials/arrays.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,31 @@
# Arrays and loops
# Arrays and Loops

Another common datatype in C is arrays. Reasoning about memory ownership for arrays is more difficult than for the datatypes we have seen so far, for two reasons: (1) C allows the programmer to access arrays using _computed pointers_, and (2) the size of an array does not need to be known as a constant at compile time.
In C programming, arrays are a fundamental data structure, but
reasoning about their about their behavior can be challenging.
This complexity arises from two primary factors:

To support reasoning about code manipulating arrays and computed pointers, CN has _iterated resources_. For instance, to specify ownership of an `unsigned` array with 10 cells starting at pointer `p`, CN uses the following iterated resource:
1. Pointer Arithmetic: C allows direct manipulation of array elements
through pointers, which can lead to subtle bugs if not handled
carefully.
2. Dynamic Sizes: The size of an array isn't always known at compile
time.

{{ hidden("AZ: Replaced _computed pointers_ with pointer arithmetic
as it did not seem to be a standardized term.")}}

CN address these challenges via _iterated resources_. An
_iterated resource_ is a way to express ownership over a sequence of
memory locations, such as the elements of an array. This concept is
crucial for reasoning about the safety and correctness of array
manipulations.

### Iterated Resources: Array Ownership

Suppose you have an array of type `unsigned int` and length 10. To
express that you own the entire array — i.e., that it’s safe to read
from or write to any of its cells — you use an _iterated resource_.
Let's say that the array starts at pointer `p`, then CN uses the
following iterated resource:

{{ todo("JWS: I think these should be `u64`s (per the warning),
but then the sizes `n` need to be cast to `u64` from `u32`
Expand All @@ -13,76 +36,105 @@ each (u32 i; i < 10u32)
{ RW<unsigned int>(array_shift<unsigned int>(p,i)) }
```

In detail, this can be read as follows:

- for each index `i` of CN type `u32`, …

- if `i` is between `0` and `10`, …

- assert ownership of a resource `RW<unsigned int>` …
This can be read as follows:

- for each index `i` of type `u32`, if `i` is between `0` and `10`, {{ todo("AZ: The upper limit should be 9 instead of 10.")}}
- assert ownership of a resource `RW<unsigned int>`
- for cell `i` of the array with base-address `p`.

Here `array_shift<unsigned int>(p,i)` computes a pointer into the array at pointer `p`, appropriately offset for index `i`.
Here `array_shift<unsigned int>(p,i)` computes a pointer into the array at pointer `p`, appropriately offset for index `i`. {{ todo("AZ: Should it be describes/specifies instead of computes?")}}

In general, the syntax of `each` is as follows:
The general syntax of `each` is as follows:

```c
each (<type> <var>; <boolean expr>) { <expr> }
```

### First array example
Where: {{ todo("AZ: Please review this for correctness.")}}

- `<type> <var>` introduces a variable `<var>` of type `<type>`,
- `<boolean expr>` is a condition that limits the range of `<var>`,
- `<expr>` is the resource expression that uses `<var>`.

This is CN’s way of writing a quantified ownership assertion — a loop
over memory, but in the spec.
{{ todo("AZ: This might be a good place to add more context on `each` and when to reach out for it when writing CN specifications.")}}

Let’s see how this applies to a simple array-manipulating function. Function `read` takes three arguments: the base pointer `p` of an `unsigned int` array, the length `n` of the array, and an index `i` into the array; `read` then returns the value of the `i`-th array cell.
### Reading From Arrays

Let’s see how this applies to a simple simple function that reads an
element from an array. Function `read` takes three arguments: the
base pointer `p` to an `unsigned int` array, the length `n` of the
array, and an index `i` into the array; `read` then returns the value
of the `i`-th array cell.

```c title="exercises/array_load.test.c"
--8<--
exercises/array_load.test.c
--8<--
```

The CN precondition requires
In this CN specification, the precondition requires:

- ownership of the array on entry — one `RW<unsigned int>` resource for each array index between `0` and `n - 1`and
- the ownership of the array on entry — one `RW<unsigned int>` resource for each array index between `0` and `n - 1`, and,
- that `i` lies within the range of RW indices.

On exit the array ownership is returned again. The postcondition also asserts that the return value of the function is indeed equal to the value of the array at index `i`.
On exit, the postcondition ensures that:

- the array ownership is returned, and
- the return value of the function is equal to the value of the array
at index `i`.


### Writing to arrays
### Writing to Arrays

Consider this next function `writei`, which sets the value of the `i`-th array cell to be `val`.
Consider this next function `writei`, which sets the value of the `i`-th array cell to be `v`.

```c title="exercises/array_write.test.c"
--8<--
exercises/array_write.test.c
--8<--
```

The specification closely resembles that of `read`, except for the last line, which now asserts that `A_post[i]`, the value of the array _after_ the function executes at index `i`, is equal to `val`.
Here, the specification closely resembles that of `read`, except for
the last line, which now asserts that `A_post[i] == v`, i.e. - the
value of the array _after_ the function executes at index `i`, should
be equal to `v`.

What if we additionally wanted to make assertions about values in the array _not_ being modified? In the prior `read` example, we could add
What if we additionally wanted to make assertions about values in the
array _not_ being modified? In the prior `read` example, we could add
```c
A == A_post
```
to assert that the array before execution has the same values as the array after execution. In this `writei` example, we can replace the last line with
{{ later("AZ: Adding the spec that read function leaves the array
unchanged in the previous example will make the presentation more
linear")}}
```c
A[i:v] == A_post
```
to assert that the array before execution, with index `i` updated to `v`, has the same values as the array after execution. Note that the update syntax can be sequenced, e.g., `A[i:v1][j:v2]`.

#### Exercises
### Exercises

_Exercise:_ Write a specification for `array_swap`, which swaps the values of two cells of an int array. Assume that `i` and `j` are different.
_Exercise:_ Swapping Array Elements.
Write a specification for `array_swap`, a function which swaps the
values of two cells of an int array. Assume that `i != j`.

```c title="exercises/array_swap.c"
--8<-
exercises/array_swap.c
--8<--
```

### Iterated conditions
### Iterated Conditions

Suppose we are writing a function that returns the maximum value in an array. Informally, we would want a postcondition that asserts that the returned value is greater than or equal to _each_ value in the array. Formally, for an array `A` of length `n`, we use an _iterated condition_ to write
Ownership is one part of the story — correctness is the other. Suppose
we are writing a function `array_max` that returns the maximum value
in an array. We would want the postcondition to assert that the
returned value is greater than or equal to every element in the array.
For an array `A` of length `n`, here is how you do that with an
_iterated condition_:

```c
each (u32 i; i < n)
Expand All @@ -103,15 +155,17 @@ Then, together with our usual conditions about ownership, as well as a precondit
@*/
```

We next test this specification on this implementation of `array_max`.
Next, we would like to test this specification on the below
implementation of `array_max`.

```c title="exercises/array_max.broken.c"
--8<--
exercises/array_max.broken.c
--8<--
```

If we run cn test, we get this error, and an associated counterexample
If we run `cn test`, we get the below error and an associated
counterexample.

```
************************ Failed at *************************
Expand All @@ -123,7 +177,8 @@ Load failed.
This "not owned" error suggests that we are trying to access a region of memory that we do not have ownership over — an index out-of-bounds issue, perhaps. Indeed, if we inspect our implementation, we realize that we need to fix the `i <= n` to `i < n`.


If we run `cn test` again, we now get
If we run `cn test` on the fixed version, it still fails with the
below message.

```
************************ Failed at *************************
Expand All @@ -146,27 +201,53 @@ array_max(p, n);

```

Examining the generated counterexample, we see that the elements of the array are `{18, 0, 17}`, so the first element is the maximum. And if we generate counterexamples a few more times, we see that this pattern persists. Inspecting our implementation a bit further, we find and fix another bug: in the first line, we should initialize `max` to be `p[0]`, not `0`.
Examining the generated counterexample, we see that the elements of
the array are `{18, 0, 17}`. In this case, the first element, 18, is
clearly the maximum. However, our function violates the postcondition.
{{ todo("AZ: From the error message, it is not clear to me that the
postcondition is violated. There are multiple usages of each in the
spec and it is not clear which one is the source of error. Neither
does the CounterExample talk about the return value (max).
Suggestion: Adding line numbers in the listing might help here since
the error message seems to produce a line number which might
disambiguate which each the error is referring to. ")}}
If we run the test multiple times, we consistently see similar
patterns in the counterexamples - arrays where the first element holds
the maximum value, yet the function fails to return it. This suggests
a deeper issue with how the initial value of max is chosen. Looking
more closely at the implementation, we realize that the problem lies
in the line that initializes `max` as `int max = 0;`. We should have
initialized `max` to be `p[0]`, and not `0`.

!!! tip
If the generated counterexample arrays are too large, try
temporarily restricting their length `n` to debug more easily.
For example, we can restrain them to a small length of 3 by adding
a precondition such as `n <= 3u32`.

(An additional debugging tip: if the generated counterexample arrays are too large, we can restrain them by adding a temporary precondition such as `n <= 3u32`, to force the array to be of length at most three, or some other suitable bound.)
{{ later("JWS: I personally found this a useful hack, but I don't
know if we want to advertise it as an officially sanctioned tip. (How
close are folks to adding shrinking?)") }}

Now, `cn test` will succeed!

#### Exercises
### Exercises

_Exercise:_ Write a specification for `array_add3`, which should add three to each array value. Your specification should succeed on this correct implementation, and fail when bugs are inserted (e.g., if four is added instead):
_Exercise:_ Adding to Each Element
Write a specification for `array_add3`, which adds `3` to each array
value. Your specification should succeed on this correct
implementation, and fail when bugs are inserted (e.g., if four is
added instead):

```c title="exercises/array_add3.test.c"
--8<--
exercises/array_add3.test.c
--8<--
```

_Exercise:_ Write a specification for `array_sort`, which should sort
an array into increasing order. Your specification should succeed on
_Exercise:_ Sorting an Array
Write a specification for `array_sort`, which sorts an array into
increasing order. Your specification should succeed on
this correct implementation below
(yes, [it's correct](https://arxiv.org/abs/2110.01111)), and fail
when bugs are inserted:
Expand Down