diff --git a/docs/getting-started/tutorials/arrays.md b/docs/getting-started/tutorials/arrays.md index 73c4d8c2..8b835a75 100644 --- a/docs/getting-started/tutorials/arrays.md +++ b/docs/getting-started/tutorials/arrays.md @@ -1,8 +1,31 @@ -# Arrays and loops +# Arrays and Loops -Another common datatype in C is arrays. Reasoning about memory ownership for arrays is more difficult than for the datatypes we have seen so far, for two reasons: (1) C allows the programmer to access arrays using _computed pointers_, and (2) the size of an array does not need to be known as a constant at compile time. +In C programming, arrays are a fundamental data structure, but +reasoning about their about their behavior can be challenging. +This complexity arises from two primary factors: -To support reasoning about code manipulating arrays and computed pointers, CN has _iterated resources_. For instance, to specify ownership of an `unsigned` array with 10 cells starting at pointer `p`, CN uses the following iterated resource: +1. Pointer Arithmetic: C allows direct manipulation of array elements + through pointers, which can lead to subtle bugs if not handled + carefully. +2. Dynamic Sizes: The size of an array isn't always known at compile + time. + +{{ hidden("AZ: Replaced _computed pointers_ with pointer arithmetic +as it did not seem to be a standardized term.")}} + +CN address these challenges via _iterated resources_. An +_iterated resource_ is a way to express ownership over a sequence of +memory locations, such as the elements of an array. This concept is +crucial for reasoning about the safety and correctness of array +manipulations. + +### Iterated Resources: Array Ownership + +Suppose you have an array of type `unsigned int` and length 10. To +express that you own the entire array — i.e., that it’s safe to read +from or write to any of its cells — you use an _iterated resource_. +Let's say that the array starts at pointer `p`, then CN uses the +following iterated resource: {{ todo("JWS: I think these should be `u64`s (per the warning), but then the sizes `n` need to be cast to `u64` from `u32` @@ -13,27 +36,37 @@ each (u32 i; i < 10u32) { RW(array_shift(p,i)) } ``` -In detail, this can be read as follows: - -- for each index `i` of CN type `u32`, … - -- if `i` is between `0` and `10`, … - -- assert ownership of a resource `RW` … +This can be read as follows: +- for each index `i` of type `u32`, if `i` is between `0` and `10`, {{ todo("AZ: The upper limit should be 9 instead of 10.")}} +- assert ownership of a resource `RW` - for cell `i` of the array with base-address `p`. -Here `array_shift(p,i)` computes a pointer into the array at pointer `p`, appropriately offset for index `i`. +Here `array_shift(p,i)` computes a pointer into the array at pointer `p`, appropriately offset for index `i`. {{ todo("AZ: Should it be describes/specifies instead of computes?")}} -In general, the syntax of `each` is as follows: +The general syntax of `each` is as follows: ```c each ( ; ) { } ``` -### First array example +Where: {{ todo("AZ: Please review this for correctness.")}} + +- ` ` introduces a variable `` of type ``, +- `` is a condition that limits the range of ``, +- `` is the resource expression that uses ``. + +This is CN’s way of writing a quantified ownership assertion — a loop +over memory, but in the spec. +{{ todo("AZ: This might be a good place to add more context on `each` and when to reach out for it when writing CN specifications.")}} -Let’s see how this applies to a simple array-manipulating function. Function `read` takes three arguments: the base pointer `p` of an `unsigned int` array, the length `n` of the array, and an index `i` into the array; `read` then returns the value of the `i`-th array cell. +### Reading From Arrays + +Let’s see how this applies to a simple simple function that reads an +element from an array. Function `read` takes three arguments: the +base pointer `p` to an `unsigned int` array, the length `n` of the +array, and an index `i` into the array; `read` then returns the value +of the `i`-th array cell. ```c title="exercises/array_load.test.c" --8<-- @@ -41,16 +74,21 @@ exercises/array_load.test.c --8<-- ``` -The CN precondition requires +In this CN specification, the precondition requires: -- ownership of the array on entry — one `RW` resource for each array index between `0` and `n - 1` — and +- the ownership of the array on entry — one `RW` resource for each array index between `0` and `n - 1`, and, - that `i` lies within the range of RW indices. -On exit the array ownership is returned again. The postcondition also asserts that the return value of the function is indeed equal to the value of the array at index `i`. +On exit, the postcondition ensures that: + +- the array ownership is returned, and +- the return value of the function is equal to the value of the array + at index `i`. + -### Writing to arrays +### Writing to Arrays -Consider this next function `writei`, which sets the value of the `i`-th array cell to be `val`. +Consider this next function `writei`, which sets the value of the `i`-th array cell to be `v`. ```c title="exercises/array_write.test.c" --8<-- @@ -58,21 +96,30 @@ exercises/array_write.test.c --8<-- ``` -The specification closely resembles that of `read`, except for the last line, which now asserts that `A_post[i]`, the value of the array _after_ the function executes at index `i`, is equal to `val`. +Here, the specification closely resembles that of `read`, except for +the last line, which now asserts that `A_post[i] == v`, i.e. - the +value of the array _after_ the function executes at index `i`, should +be equal to `v`. -What if we additionally wanted to make assertions about values in the array _not_ being modified? In the prior `read` example, we could add +What if we additionally wanted to make assertions about values in the +array _not_ being modified? In the prior `read` example, we could add ```c A == A_post ``` to assert that the array before execution has the same values as the array after execution. In this `writei` example, we can replace the last line with +{{ later("AZ: Adding the spec that read function leaves the array +unchanged in the previous example will make the presentation more + linear")}} ```c A[i:v] == A_post ``` to assert that the array before execution, with index `i` updated to `v`, has the same values as the array after execution. Note that the update syntax can be sequenced, e.g., `A[i:v1][j:v2]`. -#### Exercises +### Exercises -_Exercise:_ Write a specification for `array_swap`, which swaps the values of two cells of an int array. Assume that `i` and `j` are different. +_Exercise:_ Swapping Array Elements. +Write a specification for `array_swap`, a function which swaps the +values of two cells of an int array. Assume that `i != j`. ```c title="exercises/array_swap.c" --8<- @@ -80,9 +127,14 @@ exercises/array_swap.c --8<-- ``` -### Iterated conditions +### Iterated Conditions -Suppose we are writing a function that returns the maximum value in an array. Informally, we would want a postcondition that asserts that the returned value is greater than or equal to _each_ value in the array. Formally, for an array `A` of length `n`, we use an _iterated condition_ to write +Ownership is one part of the story — correctness is the other. Suppose +we are writing a function `array_max` that returns the maximum value +in an array. We would want the postcondition to assert that the +returned value is greater than or equal to every element in the array. +For an array `A` of length `n`, here is how you do that with an + _iterated condition_: ```c each (u32 i; i < n) @@ -103,7 +155,8 @@ Then, together with our usual conditions about ownership, as well as a precondit @*/ ``` -We next test this specification on this implementation of `array_max`. +Next, we would like to test this specification on the below +implementation of `array_max`. ```c title="exercises/array_max.broken.c" --8<-- @@ -111,7 +164,8 @@ exercises/array_max.broken.c --8<-- ``` -If we run cn test, we get this error, and an associated counterexample +If we run `cn test`, we get the below error and an associated +counterexample. ``` ************************ Failed at ************************* @@ -123,7 +177,8 @@ Load failed. This "not owned" error suggests that we are trying to access a region of memory that we do not have ownership over — an index out-of-bounds issue, perhaps. Indeed, if we inspect our implementation, we realize that we need to fix the `i <= n` to `i < n`. -If we run `cn test` again, we now get +If we run `cn test` on the fixed version, it still fails with the +below message. ``` ************************ Failed at ************************* @@ -146,18 +201,43 @@ array_max(p, n); ``` -Examining the generated counterexample, we see that the elements of the array are `{18, 0, 17}`, so the first element is the maximum. And if we generate counterexamples a few more times, we see that this pattern persists. Inspecting our implementation a bit further, we find and fix another bug: in the first line, we should initialize `max` to be `p[0]`, not `0`. +Examining the generated counterexample, we see that the elements of +the array are `{18, 0, 17}`. In this case, the first element, 18, is +clearly the maximum. However, our function violates the postcondition. +{{ todo("AZ: From the error message, it is not clear to me that the +postcondition is violated. There are multiple usages of each in the +spec and it is not clear which one is the source of error. Neither +does the CounterExample talk about the return value (max). +Suggestion: Adding line numbers in the listing might help here since +the error message seems to produce a line number which might +disambiguate which each the error is referring to. ")}} +If we run the test multiple times, we consistently see similar +patterns in the counterexamples - arrays where the first element holds +the maximum value, yet the function fails to return it. This suggests +a deeper issue with how the initial value of max is chosen. Looking +more closely at the implementation, we realize that the problem lies +in the line that initializes `max` as `int max = 0;`. We should have +initialized `max` to be `p[0]`, and not `0`. + +!!! tip + If the generated counterexample arrays are too large, try + temporarily restricting their length `n` to debug more easily. + For example, we can restrain them to a small length of 3 by adding + a precondition such as `n <= 3u32`. -(An additional debugging tip: if the generated counterexample arrays are too large, we can restrain them by adding a temporary precondition such as `n <= 3u32`, to force the array to be of length at most three, or some other suitable bound.) {{ later("JWS: I personally found this a useful hack, but I don't know if we want to advertise it as an officially sanctioned tip. (How close are folks to adding shrinking?)") }} Now, `cn test` will succeed! -#### Exercises +### Exercises -_Exercise:_ Write a specification for `array_add3`, which should add three to each array value. Your specification should succeed on this correct implementation, and fail when bugs are inserted (e.g., if four is added instead): +_Exercise:_ Adding to Each Element +Write a specification for `array_add3`, which adds `3` to each array +value. Your specification should succeed on this correct +implementation, and fail when bugs are inserted (e.g., if four is +added instead): ```c title="exercises/array_add3.test.c" --8<-- @@ -165,8 +245,9 @@ exercises/array_add3.test.c --8<-- ``` -_Exercise:_ Write a specification for `array_sort`, which should sort -an array into increasing order. Your specification should succeed on +_Exercise:_ Sorting an Array +Write a specification for `array_sort`, which sorts an array into +increasing order. Your specification should succeed on this correct implementation below (yes, [it's correct](https://arxiv.org/abs/2110.01111)), and fail when bugs are inserted: