Skip to content

yodaxtah/jakx-c-kernel-decompiled

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jakx-c-kernel-decompiled

A repository to work on a part of the Jak X decompilation, with the purpose of porting the game to PC. This port requires the decompilation of the C++ kernel and the GOAL code that is ran by that kernel.

Decompilation overview

C++ decompilation

This project's primary purpose is to provide for the game/jakx and common/jakx C++ (kernel) code for the OpenGoal Project, necessary to run Jak X. As the game also useses networking, the secondary goal is to reverse the rest of the (SCE-RT) functions' labels (Medius, etc.) so that the game can hopefully be connected to unofficial fan-hosted servers. Even if we'd step away from Medius code in the client-side of the game, it's still useful to see what the names are of network related C function calls in GOAL code during GOAL decompilation. This way, people decompiling that GOAL code can better guess/understand what that GOAL code calling those functions is doing.

GOAL decompilation

As said, this project will focus on the C++ part of the code base, with the intention of merging/bringing it later back into the OpenGOAL project. Regarding the part of the codebase that consist of GOAL code, initial work has already been done to start decompilation in this pull request.

Code overview

The following code can be found in this project:

  • elf/kernel/jakx and elf/kernel/common: The Jak X C++ code. For now, this is still pseudo-code that has to be converted to valid and buildable C++ code. Next, it will have to be debugged so that it behaves as expected or similarly to Jak3's kernel, where applicable.
  • elf/cpp-dump: The pseudo code generated by Ghidra with all symbols we have added --- months of work pourred in, so it is often nothing near the original ELF's export. This is useful to have some reference of the binary, in case that'd be necessary and I'd be unavailable.

ELF overview

Using one of the scripts, you can generate an exhaustive label overview of the ELF. There are 4512 occurrences of the pattern "function" in that dump, which should be the exact number of functions reversed. There might be a handful of functions that have not been disassembled (discovered) yet, but that number should be low. There are 1310 occurrences of the regex pattern "FUN_........", hence a third of all functions have no information at all on known functions (yet). There are 1659 occurrences of the regex pattern ".*FUN_.........*", so additional information is available on about 300 of them, and this should be the number of functions that could not be matched. They are however not necessary to port the game, but they might make it a little bit easier.

Symbol matching

In order to make sense of the ELF, I've been primarily adding symbol names, which is what I refer to as "symbol matching". Unlike efficient approaches, I've been working from the ground up: I've invested time in adding as much symbols as possible before trying to export decompiled code.

Matching sources

I matched against several bsim servers that held the definitions of the following games. I looked at games around the same date that would have a shared codebase (i.e., Jak and Daxter, obvisouly, Medius, or simply around 2005, the release of Jak X). Retro Reversing has listed a few PS2 games with unstripped symbols and PS2 demos with symbols.

Next to that, I also got symbols from the PS2SDK project, where I could compare strings at best, or compare enums or other variables at worst. Sometimes I also copied over signatures from there.

Naming convention

My naming scheme changed over time as I noticed I needed to be more precise on where my symbols came from and how well I could trust them. This means that I cannot describe something here that will definitely fit all situations. I usually copied over the names and added a suffix, i.e. foo_G.

  • _G: Usually, I find these symbol names in other guess symbols, but I'm not entirely sure as they are not matching perfectly. Careful though, the names may also be completely made up, so check the above referene symbols. With global variables, this is usually made clear by using ALL_CAPS_STYLE_G, but not consistent.
  • _S: These symbol names are based on a string. This means I'm already confident I'm correct (I used to use _G or _Q for this as well). Later on, I typically remove these labels either way. Some names cannot be verified and in those cases, I would rarely removed the suffix. (Example: FUN_00133cc8_addPurchase calls addPurchase_S and also prints "addPurchase error" after checking its result.)
  • _Q: These symbol names are either guessed by matching functions recursively in BSIM search windows, or based on strings. In general, I was moderatly confident they were right, but wanted to come across another occurrence I could verify to be absolutely sure. I later started to use _S for string sources that would give away a name.
  • _W: I'm guessing a name wildly, based on some function body or data structure that is related somewhere. (I used to use _G for this as well.)
  • _T: The source is one of the tables; depending on how certain I am, it will be combined with W, G or nothing.
  • _M: These symbols are usually structures that were given away by the memory dump and otherwise hardly visible.
  • no prefix: This may mean I'm sure it's correct or that I named the symbol that way early on when I wasn't careful and it might even be made up, or simply a guess.

Symbol Transfer

The address of the binary dump from PCSX2 and the decrypted ELF match exactly. As most of the initial work was still in the memory dump, which was the most relevant for the decompilation of the game/jakx and common/jakx C++ code, I created a few scripts to go through the code and usually interactively ask whether to override a symbol or not.

To execute them, I simply copied over the script's code into Ghidraton (but the Python Window should work to with some small changes, normally). Most scripts are horribly coded but work fine, and allow you to quit execution at any time. It is recommended though to minimally understand what the scripts do, after all they're small anyway. If you're not familiar with the Ghidra API, some knowledge on them could always be handy --- you could ask Perplexity/ChatGPT, as they surprisingly know the API very well!

Note however that a range of sce functions have an 8 byte address mismatch to my memory dump, for some reason. That might be a bug in the function label porting script, where I used +/- to offset and navigate the code, so it's possible you don't come across that bug.

                      **************************************************************
                      *                          FUNCTION                          *
                      **************************************************************
                      int __stdcall sceFsReset(void)
      int               v0_lo:4        <RETURN>
      undefined8        Stack[-0x10]:8 local_10                                XREF[2]:     001192cc(W), 
                                                                                            001192e4(R)  
                      sceFsReset                                      XREF[1]:     InitIOP:00269774(c)  
001192c0 f0 ff bd 27     addiu      sp,sp,-0x10
001192c4 1f 00 02 3c     lui        v0,0x1f
                      sceFsReset
001192c8 20 00 04 3c     lui        a0,0x20
001192cc 00 00 bf ff     sd         ra,0x0(sp)=>local_10

In one of the scripts, you might be getting this error when you try to apply a signature override in Ghidra. (I came across it when doing this manually.) This should occur whenever the function call has a BLUE label "ptr_addr1_addr2". If it's either WHITE or simply "LAB_addr", then it's fine and shouldn't happen.

Error overriding signature: ghidra.util.exception.InvalidInputException: DataTypeSymbol has a reference
---------------------------------------------------
Build Date: 2024-Jun-07 1416 EDT
Ghidra Version: 11.1
Java Home: C:\Program Files\Eclipse Adoptium\jdk-17.0.11.9-hotspot
JVM Version: Eclipse Adoptium 17.0.11
OS: Windows 10 10.0 amd64
Workstation: REUBUS

To resolve the above error from appearing (for that function call signature override), simply remove the ptr label, so that it will be a blue "LAB_addr" label, and try again. You might be able to execute the signature override, I wasn't as I think I ran against a bug that might get fixed in the future. (The code is of course perfectly fine ;p)

String Tables

From what I understand from Perplexity, string tables are used for symbol resolution and to serve as debugging information. The ones I found all 10 reside in .text, but sadly, somewhere around function entry, they stopped appearing. I have labeled the start of the string tables with CPP_FILE.

I've used a prompt to apply these names as Perplexity is very good at transforming these strings into signatures, but they require double checks. Additionally, the return types appear not to be reliable --- is it even part of those strings? Further down in this conversation, you can find a few examples that are useful to learn how to interpret these strings. The prompt that gave okay results is the following:

I'm trying to figure out the signature of a symbol that I found in a C++ string table for symbol resolution and debugging. What would be the signature of the following mangled name: _videoCallbackEP7sceMpegP16sceMpegCbDataStrPv.

Todolist

Large tasks:

  • Although I don't expect to gain much from it, one can try to match the functions against those of Jak 1 or Jak 2.
  • gcc2_compiled. functions in other (demo) games have additional labels that give away their names, apparently. I noticed this too late, but it would be helpful to locate other nameless functions if we can match these gcc functions.
  • Apply mangled symbol names from tables (under CPP_FILE) if reliable.
  • Find source of orphaned strings (001eba50, 001e78b0, 001e78d0, ``)

Less important details to check:

  • Why is IOP_MODULE_DATA_WS not referenced by _AddModuleArgs? It occurs in the function's body.
  • Is DAT_001f63e0 (or lower) an array of thread ids?
  • What are these functions for?
    DAT_001f5b78_func1 = 0;
    DAT_001f5b7c_func2 = 0;
    DAT_001f5b80_func3 = 0;
  • Compare print functions with new sources. NOTE: the print functions are a mess, don't try to fix their names, as sources will contradict. (For example, fiprintf in all binaries call each _vfiprintf_r, but differently.)
  • Iterate over all matches of the regex pattern 0x(1|2)[0-9a-f]{5} to find addresses that should be labeled instead. Currently, there are 787 in the decrypted ELF.
  • MC_run is very large in Jak 3, but consists of 4 function calls in Jak X. This might also expose the missing functions: mc_get_filename, mc_get_filename_no_dir, mc_print, mc_get_total_bank_size, mc_checksum if they exist. (No bank0,1,2... found, but a save1,2,3,4 does exist.)
    void MC_run(void) {
      s32 sema_id = DAT_002d3908_mc_sema_id;
      WaitSema(DAT_002d3908_mc_sema_id);
      FUN_002730e4(); // only also called in FUN_0027387c
      FUN_002732e4(); // only also called in FUN_0027387c too
      SignalSema(sema_id);
    }
  • Attempt to find cb_reprobe_format, cb_unformat, cb_reprobe_createfile, cb_reprobe_save, ... (some of which are listed in kmemcard.h).
  • Find the exact start of the array at 00283740 of 0x8c0 sized elements? Also see sceMc2Init_G_Proxy where uVar6 * 0x230 as well as uVar6 * 0x8c0 occur.
  • Find the exact start of the array over 00283864, 00283874 of 0x230 sized elements (see mc_get_secrets_S)
  • Find the exact start of the array over 00283860 of 0x230 sized elements (see MC_shutdown_G)
  • Find usage of mc_slot_info and mc_file_info
  • ... many more that I forgot to write down.

Exporting C++

Exporting C++ code is easy with Ghidra's options, but there is only so much you can clean up in Ghidra --- the code is far from perfect. In this phase, I tried to map the content of each function as much as possible with the existing code base from Jak 3.

Methodology for this was to refactor the functions one by one, and if code was moved around or non-trivial changes were made, this was split up in different commits, to back track in case I messed up. I'd recommend splitting in these situations:

  • When moving code under a label to each goto, separate for each label/block. Bring over all the remainders of the code execution until you hit either the end of a parent loop, a return statement or the end of the function.
  • Moving variable declarations.
  • Converting/Forming for or while loops.

After this is done.

Todo list

  • Add Ptr<> or other OpenGOAL++ types.
  • Fix header files where needed.
  • Compile and run jak3 code.
  • Compile jakx code.
  • Debug jakx code, comparing with jak3's kernel.

Log

  • I dumped the PAL game's EE memory using PCSX2
  • I added a ton of debugging symbols from other games (such as R&C for SCE-RT) by comparing the functions
  • I added types (structs, enums, etc) where possible.
  • At that point, I could just reimplement the engine, because it was reversed sufficiently and I can compare with jak 3's implementation.
  • However, I wanted to squize out all information I could find, so I kept on reversing. After all, I reasoned that for the online component, it was necessary to understand what calls were made, as well as to make it easier to reverse the goal code.
  • Then I got fed up that I was searching through goal code a few times without realizing it, so I decided to look again at the original, encrypted PAL version. There, I recognized a function I had come across already in my memory dump (through other games). That's how I was able to decrypt the game's ELF, with the help of Ziemas. The "decryption" prototype can be found on Github.
  • Then, in December 2024, I started creating scripts to port the symbols from the memory dump to the decrypted ELF.
  • In January, I started exporting the C++ code. This took much longer than expected, as for some functions, some structure types were not discovered / applied, and other were very complex and/or different to their Jak 3 version. I first modified the functions gradually so they appear the most like the current Jak 3 functions, whilst retaining their execution paths.
  • In mid February, all the functions have been refactored to resemble the jak3 version as much as possible, ignoring a few functions that are not necessary or probably the same in previous versions. Currently some rough edges still need to be handled before building is possible.

About

A decompilation of Jak X's C kernel.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published