Arrays in C never really felt boring, but... always less interesting.
Like, I can't decide the dimensions of an integer array at run time. Thats so uncool!!
So, I tried to implement a system similar to how numpy implements arrays. Its shape can be anything!
And numpy is fast because its implemented in C!! (Besides the optimizations they have applied to make it faster and the vectorization they use which makes it so cool!!)
The main idea behind this is if you wanna use a 2D or a higher dimensional array, you don't really need a 2D array (like int A[1][2]), instead you can have a 1D array and some way to traverse through it, you can do all the stuff you were able to do with 2D or higher arrays in a much cooler way!
We need some sort of metadata (which stores stuff like dimension, how many rows, columns, the height, etc.).
I saw this sort of implementation here and that has heavily inspired this project. (Do check it out, it's really nice.)
The metadata we store is like this:
typedef struct {
int *traverse; //The shape of the array basically
int *strides; //The strides, which help in accessing data from each row and column
int dim; //The dimension of the array
int count; //How many elements have been filled in
int total; //How many elements can be filled (from the shape)
int capacity; //How many elements can be stored (from the memory stuff)
long make_sure; //To make sure that the integer pointer we got as argument is initialised by init()
//helps in not corrupting memory when we try to vectorize it...
int mode; //whether it was declared by the user or by us
int index; //the index of this headr
} god_stuff;Now, this goes right before our array. How? Watch it here or continue reading.
What we essentially do is store the array with its metadata like this:
[metadata][array]
^(metadata is of type god_stuff and array is of type int here)
And metadata + 1 takes us to the array:
[metadata][array]
^This way, we can go from the array to its metadata to know about it, and move through it like we know everything about it (even though we do).
We initialize our array like this:
int *init(int dim,int *shape) {
god_stuff *headr = malloc(sizeof(god_stuff) + (sizeof(int)*CAPACITY));
headr->make_sure = 28602529; //Some key to ensure that the operation we are doing is on our array and not just some garbage in the memory
headr->count = 0;
headr->dim = dim;
headr->total = 1;
headr->traverse = malloc(sizeof(int)*headr->dim);
headr->strides = malloc(sizeof(int)*headr->dim);
init_to_num(headr->strides,headr->dim,1);
headr->capacity = CAPACITY;
//printf("shape >> ");
for (int i=0;i<headr->dim;i++) {
//scanf("%d",&headr->traverse[i]);
//getchar();
headr->traverse[i] = shape[i];
headr->total *= headr->traverse[i];
}
for (int i=0;i<headr->dim;i++) {
for(int j=headr->dim - 1;j>=1+i;j-- ) {
*(headr->strides + i) *= *(headr->traverse + j);
}
}
if(headr->total > headr->capacity) {
while (headr->total > headr->capacity) {
headr->capacity *= 2;
}
headr = realloc(headr,sizeof(god_stuff) + (sizeof(int)*headr->capacity));
}
int *numbrs = (int *)(headr + 1);
return numbrs;
}So, it's something like this:
[headr][numbrs]
^Now we got our function, lets initialise in main():
int *numbrs = init(3,(int[]){3,3,3}) //Dimension is 3. It doesn't matter, works for n dimensions (n must be natural)
//A 3x3x3 cube basically... Obviously can be decided at runtime tooTo get metadata of numbrs, just do something like this:
god_stuff *metadata = (god_stuff *)numbrs - 1;
//And, now you have access to everything!
metadata->dim //dimension
metadata->count //How many have we filled
metadata->traverse //How to traverse through the array
metadata->strides //which no. u gotta multiply with the location coordinates to get that value...
//(cuz our array is flattened into 1 dimension and doesn't exist as some higher dimension array in the memory)The main idea behind this is that you can store the data in the manner u like, and do the desired as long as u interpret it correctly.
Suppose I want a 2 dimensional integer array, I'll usually do int arr[a][b]; to declare it, and later I can put values into it by:
arr = {{41,42,43,...},{67,68,69,...},{419,420,421,...},...};But, what if there was a 'better' way of doing that?
What if we did something like this?
Every array is stored as an one dimensional array but interpreted as you wanted? Here's where strides come into play and they do most of the cool stuff!
let's say I want a 2x3 integer array, so, i do int *numbrs = init(2,(int[]){2,3});
What does it do? it initialises a god_stuff array with dim 2 and the traverse is {2,3}. suppose I wanna store 1 to 6 in that.
My data is stored like this
1 2 3 4 5 6
^
[headr][numbrs]But i wanna interpret it as something like this:
numbrs[0][0] = 1 numbrs[0][1] = 2 numbrs[0][2] = 3
numbrs[1][0] = 4 numbrs[1][1] = 5 numbrs[1][2] = 6My data is stored as 1D array, now, the strides for numbrs are:
strides[0] = 3;
strides[1] = 1;I can get numbrs[a][b] by numbrs[strides[0]*a + strides[1]*b]. So, numbrs[1][1] is numbrs[strides[0]*1 + strides[1]*1] which is numbrs[3*1 + 1*1] which is numbrs[4] which is 5. which is the same if i were to do it with a 2D std C array.
So, my strides basically say how much to move when u go a dimension above. Here, every row had 3 columns, so, for moving to the next row, u gotta go after 3 elements. Simple.
Now, that we understand how it works, we can implement them into storing and fetching desired elements!!!
You can push elements into the array, and the count will keep track of the index!
#define push(numbrs,...) push_with_size(numbrs,sizeof((int[]){__VA_ARGS__})/sizeof(int),__VA_ARGS__) //because u dont know how many elements you are pushing, so, a number which u dont need to worry about can be used internally
int push_with_size(int *numbrs,int wow,...) {
va_list arg;
god_stuff *headr = (god_stuff *)numbrs - 1;
va_start(arg,wow);
for(int i=0;i<wow;i++) {
if(headr->count >= headr->total) {
printf("\nFull!\n"); //If it exceeds how many elements I need, shape will be ruined...
return 1;
}
numbrs[headr->count] = va_arg(arg,int);
headr->count += 1; //for the next element to take the correct position
}
va_end(arg);
return 0;
}Similarly, you can summon the element by giving its coordinates:
int summon(int *numbrs,...) {
va_list arg;
god_stuff *headr = (god_stuff *)numbrs - 1;
int location[headr->dim];
va_start(arg,numbrs);
for(int i=0;i<headr->dim;i++) {
location[i] = va_arg(arg,int);
}
va_end(arg);
int val = 0;
for(int i=0;i<headr->dim;i++) {
val += location[i]*headr->strides[i]; //See? strides make it easier!
}
return numbrs[val];
}So, in numpy array, suppose numbrs is a numpy array, numbrs + 5 is valid and would add 5 to every element of numbrs.
And we can do that here too!!
But, I didn't wanna call different functions for different operations! Obviously, had to write different functions for different operations, I just wanted to call them by one single function!
Here comes _Generic, the closest thing thing to Generics in C. So, what I did is essentially function overloading but I manually had to write different functions and assign their pointers to the arguments accordingly.
#define edd(a,b) _Generic((a), \
int: _Generic((b), \ //Check for b if a is int
int: edd_num, \
int*: edd_num_array), \
int*: _Generic((b), \ //Check for b is a is int*
int: edd_array_num, \
int*: edd_array) \
)(a,b)And the functions are here:
int *edd_array(int *a,int *b);
int *edd_num_array(int a,int *b);
int *edd_array_num(int *a,int b) {
return edd_num_array(b,a);
}
int edd_num(int a,int b) {
return a + b;
}
int *edd_num_array(int a,int *b) {
god_stuff *headr = (god_stuff *)b - 1;
if(headr->make_sure != 28602529) {
ERROR("can't vectorize standard C arrays");
NOTE("enter god_stuff arrays as arguments instead"); //These are macros defined in colors.h
exit(1);
}
int *numbrs = init(headr->dim,headr->traverse);
god_stuff *sum_headr = (god_stuff *)numbrs - 1;
for (int i=0;i<headr->count;i++) { //We don't know, the array might not be filled completely, so, we can't update the count of `sum` to the total, so, to avoid that, we iterate it like this.
push(numbrs,b[i] + a);
}
return numbrs;
}
But, we gotta be more safe while adding two different arrays, cuz different shape arrays cant be added. So, a function for that would be enough.
int check_shape (int *a,int *b) {
god_stuff *headr1 = (god_stuff *)a - 1;
god_stuff *headr2 = (god_stuff *)b - 1;
int shape = 1;
if (headr1->make_sure != 28602529 || headr2->make_sure != 28602529) {
shape = -1;
return shape;
}
if (headr1->dim != headr2->dim) {
shape = 0;
return shape;
}
for(int i=0;i<headr1->dim;i++) {
if(shape == 0) {
break;
}
shape = (headr1->traverse[i] == headr2->traverse[i]);
}
return shape;
}This returns -1 if the array is not having proper metadata, and 0 if the dimensions or shape don't match.
So, our edd_array is basically:
int min(int a,int b) {
return (((a-b) >= 0) ? b:a);
}
int *edd_array(int *a,int *b){
god_stuff *headr1 = (god_stuff *)a - 1;
god_stuff *headr2 = (god_stuff *)b - 1;
if (check_shape(a,b) == 0) {
ERROR("can't add two god_stuff arrays of different shapes"); //Prints that specific error message and exits
exit(1);
}
else if (check_shape(a,b) == -1) {
ERROR("can't perform 'edd' operation on standard C arrays");
NOTE("enter god_stuff arrays as arguments instead"); //Prints that specific error message and exits
}
int *sum = init(headr1->dim,headr1->traverse);
god_stuff *sum_headr = (god_stuff *)sum -1;
for(int i=0;i<min(headr1->count,headr2->count);i++) { //We don't know, some elements of one array might not be filled completely...
push(sum,a[i]+b[i]); //this keeps track of count too!
}
return sum;
}This was enough, but I just wanted to add colors for the error messages and its corresponding note... So, went ahead and wrote the colors.h. It has the ERROR and NOTE macros.
The output for the program written in trial.c is:
What if I somehow wanted to log the god_stuff arrays initalised? Like after initialising, I somehow get a record that it was initilaised correctly? with the dim and shape and all documeneted in some sort of a temporary file. And after the program ends, I can go into that file and see what happened? It sounds so cool, right?
Well, how do we approach that? The first approach I thought was something like this:
So, we initialise the header file, i. e. the metadata in the function init, so, we can print into the logs from that fn, that god_stuff array is initialised...
Thats what I did:
fprintf(f,"\ngod_stuff array initailized\nlocation (of data): [%p]\ndimension: %d, shape (row major order): ",numbrs,dim);
for (int i=0;i<headr->dim;i++) {
fprintf(f,"%d%s",shape[i],(i == (headr->dim)-1) ? "\n":", ");
}If we do this:
int *numbrs = init(2,(int[]){2,2});that would print the output into logs.txt as something like:
god_stuff array initailized
location (of data): [0x2b579d38]
dimension: 2, shape (row major order): 2, 2This is fine, but i was bored and I had just completed writing a lexical analyzer a week ago, so I got a bit ambitious... (Next section is probably the coolest part in this README...)
What if i also wanted the name of the integer pointer whose metadata is in my logs.txt? That'd be so cool, right?
My initial idea was something like this:
source file -> tokenizer ------------> (get the identifier name just before that) -> put that in a temp file -> interesting.h reads from that temp file -> knows the name puts that in "logs.txt" -> Done!!!
read "init" (that must be our variable name)
About the tokenizer:
It tokenizes the C source file and categorizes the tokens into identifiers, numbers, keywords, operators, comments, whitespaces, punctuations, etc. It is similar to a finite state machine, has different modes when tokenizing that stuff...
Anyways, it can identify tokens, so obviously, when calling init(), the identifier just before it has to be the name of the ptr, right? No? It has to be! It's a rule!!
Now that being said, int *numbrs = init(); and
int *numbrs;
numbrs = init();both follow that. So, somehow, if we got that name of that identifier just before init, (init is a an identifier), put that name into input.txt, and i could read from that file in the header file...
Did some modifications in the tokenizer:
else if (is_identifier(token)) {
//fprintf(f,"identifier %s\n",token);
if (strcmp(token,"init") == 0) {
strcpy(name,temp_token); //since the prev token was the name!!
fprintf(h,"%d %s\n",init_count,temp_token); //print that name with index in the file (input.txt)
init_count += 1;
}
strcpy(temp_token,token); //store the current identifier into temp_token
return 1; //1 is the index of identifier in the category, which is used for printing, read that for undertanding it better...
} And, in the interesting.h, we add this in the init():
fscanf(g,"%d %s",&ptr_count,name); //read the name from the file(input.txt)
//printf("%d %s\n",ptr_count,name);
fprintf(f,"\ngod_stuff array initailized\n\nname: %s\nlocation (of data): [%p]\ndimension: %d, shape (row major order): ",name,numbrs,dim);
for (int i=0;i<headr->dim;i++) {
fprintf(f,"%d%s",shape[i],(i == (headr->dim)-1) ? "\n":", ");
} //and print the name into logs.txtso, doing this:
int *numbrs = init(2,(int[]){2,3});
int *stuff = init(3,(int[]){3,3,3});would print into logs.txt as:
god_stuff array initailized
name: numbrs
location (of data): [0x2b579d38]
dimension: 2, shape (row major order): 2, 3
god_stuff array initailized
name: stuff
location (of data): [0x2b57bdb8]
dimension: 3, shape (row major order): 3, 3, 3
But this has flaws...
Like, what if my init was inside a conditional statement? My tokenizer puts the names with indices in the order it encounters them, so, i can have an if condition and inside that i can do init, and another init in the else block, my tokenizer would put both of them in input.txt and that would be not correct, if the condition was not satified, then the one in the else got initialised, but obviously the one in the if block's name would be written for the metadata of the one in the else block...
The problem is that the condition being satisfied or not is something we know at runtime... So, we cant do anything pre-run time or complile time...
Only somehow, if we give the name of the pointer as one of the arguments while calling init(), then, only the ones truly initialised would be logged, that's what we need to do!!
Okay, I can't really explain everything, but the idea is that we tokenize, and put everything we have encountered before init() into a temp stuff.c, then switch chaos to 8, yeah, after millions of switching wtfs, that is something really simple. Now, if we are in chaos == 8 mode, then when u encounter ;, that means your init line has ended. So, u find the first ) before that, and insert ,"name"); into that (we are doing this all in stuff.c because overwritting our main source file would obviously lose the data, i. e. the stuff written after the ;, so doing that in stuff.c would be much much better!).
I did that with fseek and fprintf like this (this is inside the ispunct() condition):
if ((unsigned char)c == ';' && chaos == 8) {
//printf("at the end\n");
fseek(temp,-2,SEEK_CUR); //go back 2 characters.
char d = (unsigned char)fgetc(temp); //going back 2 characters and doing fgetc is bacically getting the character just before where u initially were because using fgetc advances by one character, so, ftell before fgetc and ftell now would differ by 1.
while (d != ')') {
fseek(temp,-2,SEEK_CUR);
d = (unsigned char)fgetc(temp);
} //u know, we can have white spaces between the ) and ;
fseek(temp,-1,SEEK_CUR); //u are currently at the ')', so go back one character.
fprintf(temp,",\"%s\");",name); //put the thing here,
chaos = 0; //switch chaos to 0 again, our work is done!!
continue;
}What we are doing essentially, as mentioned before is when we find init, we switch to chaos = 8 and wait for ;. And then, go back to the closest ) and one step back from there, and then inject the string.
When we find ;, it could have been something like init(...) ; which means, directly going one step back wont help. Somehow, we need to ensure that we landing on the ) just before ;, for that the while loop is there...
This works for all!! Even conditional!!
So, doing this:
if (yes == 0) {
int *numbrs = init(1,(int[]){6});
}
else {
int *stuff = init(2,(int[]){2,2});
}and if yes is 0, then the log would say:
god_stuff array initialised
name: numbrs
location (of data): [0x2b579d38]
dimension: 1, shape (row major order): 6else it would say soemthing like:
god_stuff array initialised
name: stuff
location (of data): [0x2b579d38]
dimension: 2, shape (row major order): 2, 2Hmm, all of this is good until, the user does some shit like printf("ex ptr: %p\n",init(3,(int[]){3,3,3})); now, our init call has no variable name!
In the logs, it'd be logged something like name: printf because that is the identifier just before init.
To solve that we can use our chaos! So, the logic goes like this:
Whenever u are assigning a ptr to a variable, i. e. using the =, switch to chaos = 9 and then when u encounter init, check for chaos == 9 if it is, then copy the name from temp_token to name, else, put the name as something like init was declared without a ptr name...
But, after knowing that our = wasnt used or used for our init declaration, we need to switch it back to 0. So, somthing like this would work:
if (((unsigned char)c == ';' || (unsigned char)c == ',') && chaos == 9) {
chaos = 0; //Line ends or one assigning is done, then switch back. U cant assign 2 ptrs or values to one variable anyway...
} if (chaos == 9) {
//printf("bonjour, je suis chaos et je est 9\n");
strcpy(name,temp_token);
fprintf(stdout,"found init, and the variable name was %s\n",name);
}
//fprintf(f,"chaos est %d\n",chaos);
//fprintf(stdout,"found init, and the variable name was %s\n",name);
//fprintf(h,"%d %s\n",init_count,temp_token);
if (chaos != 9) {
strcpy(name,"init was called without any ptr name");
fprintf(stdout,"found init, and the variable name wasnt declared\n");
}Now, taking the previous example of printf("ex ptr: %p\n",init(3,(int[]){3,3,3})); doing the fseek surgery wont work here, because the outermost bracket doesnt belong to init but printf! Doing that fseek surgery would create code thats can't be compiled!
The approach I used here was to track the nestedness (if that's a word) of my position... Like, in init(...) u can see that the identifier init and the outermost ) are in the same depth and the arguments inside init, i. e. ... have depth one more than init, cuz they are one level inside the (... This seems simple enough, so, wrote a function that does exactly this.
Declare depth as global variable and set it to 0 in main().
void nested (unsigned char c) {
depth += ((c == '(') ? 1:-1);
}and we need to call it from the ispunct() condition:
if (i == 0 && ((unsigned char)c == ')' || (unsigned char)c == '(')) {
nested((unsigned char)c); //since we are calling that function iff we encounter '(' or ')', we can assert that we dont run into errors here... We still might, but this is just a fun project!!!
}After we know the depth of our position, we can do the surgery in a much better way!!!
You see, we do something like this:
encountered init && chaos is 9 ---> switch chaos to 8 ---> After the first encounter of '(' ----> switch chaos to 10 -----> find the ')' which is at depth = init_at_depth, then go one character back and put the name string!
(and store the depth at init_at_depth) (the mode for searching the ')' with same depth as init)So, implemented exactly that:
if ((unsigned char)c == '(' && chaos == 8) {
chaos = 10;
}
if ((unsigned char)c == ')' && chaos == 10 && depth == init_at_depth) {
fseek(temp,-1,SEEK_CUR);
fprintf(temp,",\"%s\")",name);
chaos = 0;
continue;
}
Now, this is independent of ; or ,! So, this would work everywhere! Other than loops obviously because arr[i] = init(...); would need a parser to identify arr[i] as the thing i am assigning my ptr to...
Anyways, this whole process would convert this:
int *stuff = init(2,B), *numbrs = init(3,A);
int p = 3;
int *numbrs = &p;
init(p,A);
printf("ptr = %p\n",init(3,A));to:
int *stuff = init(2,B,"stuff"), *numbrs = init(3,A,"numbrs");
int p = 3;
int *numbrs = &p;
init(p,A,"init was called without any ptr name");
printf("ptr = %p\n",init(3,A,"init was called without any ptr name"));See, it works when we call inside from inside the printf() too!!
So happy!!!
freeing the memory after using it, so u can manually do that by calling free_ptr() or u can say adieu to all of them by goodbye_everybody().
I used an std C array of god_stuff pointers to get some sort of an index and put all the pointers of the headrs in one place...
god_stuff *people[CAPACITY];
int how_many;and then initialise them in main().
Then index them like this in init():
headr->index = how_many;
people[how_many] = headr;
//printf("%p bearing %s est %d\n",people[how_many],people[how_many]->name,how_many);
how_many += 1;free_ptr() is defined this way, just to you know, not mess up the other free_ptr()'s calls:
void free_ptr(int *numbrs) {
//printf("%s\n",headr->name);
//printf("%s %s\n",headr->name,people[headr->index]->name);
god_stuff *headr = (god_stuff *)numbrs - 1;
god_stuff *temp = people[headr->index];
people[headr->index] = people[how_many - 1];
people[how_many - 1] = temp;
people[headr->index]->index = headr->index;
people[how_many - 1]->index = how_many - 1;
//printf("%s\n",headr->name);
printf("%s %s\n",people[headr->index]->name,people[how_many - 1]->name);
how_many -= 1;
char* to_put = "\nfreed up %s [%p] from existence\n";
free(headr->traverse);
free(headr->strides);
fprintf(f,to_put,(headr->mode == 0) ? headr->name:"an array not explicitly declared by the user but was probably needed",numbrs);
free(headr);
}The reason for such weirdness of this simple freeing the ptr call is to make sure that whatever you did didn't somehow mess up for other calls.
If it were to be a simple call, which would display the message and free the pointer, then when we use something like goodbye_everybody(), we'd be doomed with bugs we haven't seen yet.
This whole array is dependent on the indices, if I desired to free some pointer in between the array, then it would create problems for goodbye_everybody(). So, we swap them (the last one and this one) and then decrease the count by 1. That way, all my valid pointers are together and I can access them through indices.
So, we gotta swap the indices too, because if we dont do that then the last one would have the index as how_many - 1 even if it is now sitting at the first place. And since we dont know at which position headr is, the only way to access it is by headr->index. And if we didn't have changed that index, then headr->index would've shown it original index and not the updated one, which obviously we don't want.
Then, for all of them, call goodbye_everybody() from main() which is defined:
void goodbye_everybody() {
int i = how_many - 1;
for (;i >= 0;i--) {
free_ptr(people[i]);
}
}which frees everything (declared by us or by the user)
To put this all together, I used make:
start_machine: tokenizer.c
clang tokenizer.c -o tokenizer
get_names: tokenizer $(MAIN)
: > stuff.c && ./tokenizer $(MAIN) //wipe stuff.c, if doesn't exist, then create, and tokenize the source file.
do_stuff: stuff.c input.txt
clang stuff.c -o stuff && ./stuff //run stuff
run: start_machine get_names do_stuff //call all the targets together!So, just make run MAIN=trial.c would show u the magic!!
I even used this in my Linear_Regression.c, obviously after a bit of modification, and it worked!!
You might see init_with_name(0,...), that is basically the mode, 0 is for the arrays decalared by the user, only those are logged, yet... and 1 is for the arrays decalared in some function like edd_num_array() or edd_array()... These don't need to be logged, yet.
The logging in init_with_name() happens only when mode is 0...
I recently got to know that c++ has built in Generics.
In the C part, we had to do the function overloading manually, i. e. _Generic() but in c++ you can just write the same function and make it do different things (redeclare it, etc.) based on the arguments' types and during compile time, it would create different functions based on the arguments' type. It seemed interesting, so I tried doing that in c++, commented out the _Generic() part and changed the functions' names to edd() in interesting.hpp and changed the Makefile to include the c++ compilation stuff.
And guess what? It works!
This is the output for trial.cpp which is the same as trial.c except it includes the interesting.hpp as header file which has the _Generic() commented out...
just doing make walk MAIN=trial.cpp would give this:
- Writing this was fun.
- This is cool af!!
- I'll use this all the time!! (advise u to do it too!!)
- The Saltwater Room is such a great song!!!!

