Hitman: Codename 47 into Unity

Teario
18 min readJan 27, 2020

--

A while ago I started a project where I would recreate the first level of a series of my favourite games. Unfortunately that didn’t really end well as I chose the MegaDrive game Jungle Strike as my first attempt. I assumed it’d be easy to find sprite sheets for it online but couldn’t seem to find any, so I figured I could just rip them out of the game’s data myself. 2 frustrating months later I finally lost interest.

That was over a year ago though and since then I’ve learnt not to get distracted by those kind of issues. I took up the idea once more and decided this time to start with the game Hitman: Codename 47. The first level of this one is really simple to implement. The player simply has to take an elevator to the top of a building, use a sniper rifle and then run back to the starting position. Okay that’s actually the second level but I’m discounting the sanitarium escape as it’s a tutorial.

Off I went, creating a capsule as a placeholder for the characters, making a 3rd person camera controller, added some movement, some buildings. The scale just didn’t seem right though and how can I faithfully recreate the level if I don’t have a map of the original to reference? Well I figured someone would have published the maps online, but no. What if I just ripped them from the game’s data myself?

It’s now one frustrating month later and I’m going to write about the experience.

My initial thoughts on getting the data out of the game were that there’d be some mod tools available and I could export a few models from the game. That would help me get the scale accurate and then I could just screenshot the maps to get an idea of the layout.

After a bit of searching I really couldn’t find any tools which would do this for me but there was information on unpacking the data format, so I started browsing around the game’s files. The file structure makes it fairly easy to guess what the data is going to be. Each campaign from the game has its own folder and then each level that makes up the campaign has a set of zip files with the data in. The first level is “C1_1.zip” and I assumed C1_1_Pre.zip and C1_1_Laptop.zip were for the events that happen before going into the level (the weapon setup and map screens appear on a laptop).

Inside the zip file for level 1 is a couple of folders and some files all named Pack or PackRepeat.

Some of these we can easily identify, as we already know a wav file is audio and a dxt is texture data. I guessed ANM would be animation related and LGT might be lighting data. I knew from my prior searches that the spk file contains the data for the level, including meshes. Curious, I also took a look into the Scripts folder and sure enough there are some sdl files in there with the code easily readable.

For the spk format I expected the data to begin with a FourCC, which appeared to be the case. Next was some unidentifiable data followed by a path to a script. Some other text was readable, it seemed like maybe this was a bit of configuration data.

Scrolling further through the file there is some other data like possible animation names, sound file names and a whole lot of stuff I couldn’t identify. Fortunately I had come across this repository with a file describing the SPK format. Using the information there as a reference I started writing code to extract the contents.

PNAM

Very quickly I was able to get at the different “chunks” (I called them sectors in my code as I assumed chunks was what Alex Kimov had decided to call them. Later I found references in the actual game data to ‘chunks’ so I guess that’s the real term for them. I’m going to call them sectors though so it matches my code.) and start listing them in the console. Once the sectors were being extracted I decided to first tackle the “PNAM” data, which is the names of the objects.

That was actually quite straightforward as they are all just null-terminated strings packed in a row. I could just read the data into a buffer and then convert it to a string each time I wrote a null byte.

The data in the spk file viewed in a hex editor
The same data extracted and logged in Unity

After that easy win I decided to just go through the file in order, so the next sector would be PSCR, script data.

PSCR

The first readable script path in the data was “Scripts\AllLevels\Std\Pedestrian.sdl”. I opened up the file and saw this at the top:

NativeImport
{
ConditionList PathList2;
REF Path;
int Looping;
}

The data in the spk file looks like this:

Which contains the “PathList2”, “Path” and “Looping” values defined in the script.

There are values above though, like “MaxReaction” and “MinMorale”, which don’t appear in the Pedestrian.sdl file at all.

At the top of Pedestrian.sdl is this:

#include "Neutral.sdl"

So if we look in Neutral.sdl, we don’t find either of those but Neutral does include “Man.sdl” and if we open that file, now we find these:

NativeImport
{
float MinMorale = 0.0;
float MaxReaction = 0.3;
}

Following this logic, the first value we encounter after the script name is “MaxHitpoints” and then “DNA_Override” so by following the script #include all the way up, we eventually get to “being.sdl” and find these:

NativeImport
{
int MaxHitpoints = 20;
// used to override iDNACode to change the type of the being
int DNA_Override = 0;
Ref LevelControl;
}

This nicely lines up with what is seen in the PSCR sector. There is more data there than just some default variable overrides though. I think what’s there might actually be some sort of compiled version of the entire script, but after some digging I didn’t manage to make any more progress on this bit of the data.

PROT & PHEA

The next sector in the file is PROT, but that chunk actually points to other chunks and it seemed like it all needed to be extracted together.

The first thing then, was to write some code to extract the scene graph and recreate it in Unity with the correct GameObject names. The template shows this for extracting the PHEA sector data:

typedef struct
{
UINT Unknown;
UINT exc;
UINT NameOffset;
Pos2 = FTell();
FSeek(CH.array[10].ChunkOffset + NameOffset);
// jump to PNAM chunk
string Name;
// FSeek(Pos2);
UINT Unknown;
UINT Unknown;
UINT16 Unknown;
UINT16 Size;
if (Size == 32)
{
UINT VertexOffset;
UINT QuatsOffset;
UINT TextureOffset;
UINT FaceTextOffset;
UINT VertexCount;
UINT QuatsCount;
UINT TextureCount;
UINT Unknown;
UINT Flags;
};
}

Plus to get to the PHEA data you must read the PROT data and extract the offset.

typedef struct
{
UINT Offset;
Pos1 = FTell();
// jump to PHEA chunk
FSeek(CH.array[11].ChunkOffset + Offset & bitmask);
PHNODE PHNode;
FSeek(Pos1); // get back to PROT chunk
UINT MN; //
if ((MN >> 24) == 128)
{
UINT SectionSize;
UINT ncount;
struct
{
PROTTree(1);
} SUB_NODES[ncount] <optimize=false>;
};
} PRNODE <name=getNodeName>;

This means constructing the hierarchy with correct object names requires reading the PROT to get the PHEA location, reading that to get the PNAM location and then reading that to get the object’s name. I wanted to read each sector into its own container class and expose the data, so rather than jumping into the PHEA and PNAM sectors I’d end up with some scaffolding code when I came to put the data together. For the initial test I decided to implement it as it was in the template though, so that it’d be easier to compare the two if I had issues.

This worked like a dream and I had the Unity scene hierarchy being generated by reading the file data.

Moving on to the next sector would not go quite as easily.

PCLP

The next sector is PCLP, but when reading it I kept ending up at the wrong point in the file. When I’d read in the id, it would just be random garbage. This part confused me for a long time. I’d got to the point where I knew the header for each sector was a 4 byte id followed by some information packed into an int (4 bytes). The template masks that info to take the lower 28 bits as the size but from looking at each sector I suspected the size was really the lower 30 bits and there were just 2 flags prepended. I discovered at this point that the template file is for the 010 hex editor so I downloaded that and ran the template through it, but it had errors when processing the PROT chunk. Although that didn’t tell me if my size mask suspicion was right, it did tell me the template wasn’t a complete breakdown of the file.

After some more digging I found another tool, c47edit, which also claimed it could read the file. It actually did load up the spk for the first level so I knew my answer lay in the codebase there somewhere. The header parser in c47edit not only confirmed that I was right about the two flags, but could also name them for me. I updated my own header parser to read the data and flags.

string sectorId = GetString( ref offset, 4 );uint sectorInfo = GetUInt( ref offset );
uint sectorSize = (sectorInfo & 0x3FFFFFFF);
bool hasSubdata = (sectorInfo & (1 << 31)) != 0;
bool hasMultidata = (sectorInfo & (1 << 30)) != 0;
uint dataSize = GetUInt( ref offset );

The sector info contains the sector size, but then the next 4 bytes are the data size. So far the two sizes follow a pattern of the data size just being whatever the sector size is minus 8, which makes sense because the combined size of the sectorId and sectorInfo is 8 bytes. This is confirmed by the first two entries in the file:

KPS has a sector size of 4759848 and data size of 4759840.
PSCR has a sector size of 232632 and data size of 232624.
PROT has a sector size of 39724 and data size 39720.

Okay, so PROT doesn’t add up correctly. Additionally, PROT reports as having 18 subsectors. Up to this point the only other sector reporting subsectors was KPS with 23. Given what I knew about the data, I just assumed this was a push/pop type scenario where the chunks are aligned in the file and the unpacker keeps track of how deep the hierarchy is to correctly assign a parent.

That would mean as the data was laid out, KPS is the first sector with children so we know the next sector, PSCR, is a child of KPS. The following sector, PROT, is also a child of KPS but since PROT has children whatever sector comes next should have PROT as a parent. Once PROT has 18 children, any following sectors become a child of KPS again.

Since this wasn’t working even after making the header parsing changes, the issue was either a bug in the code or flawed logic. I tested the logic, by manually looking at the data in the file and hand crafting the hierarchy based on the sector order and the given child counts. It was flawed.

Now clearly this doesn’t make any sense. KPS has 23 subsectors but there are only 2 direct children of it. PROT has 18 subsectors but in my tree there’s only one child. Clearly this is not what is meant by the subsector count value.

I decided to go back to much more simplicity and just try reading the sectors with a recursive function. They all follow the same structure so if I could make a function to load one sector then it should be able to unpack them all and then generate me the correct tree structure. I made a function to load up the sector, read the data, print the sector id and then call itself for each of the reported subsectors.

I saw it print out KPS into the console. Then PSCR, followed by PROT, followed by a blank line where I expected to see PCLP. I was further thrown off by the fact that KPS and PROT seemed to describe their contents differently. KPS reports 23 subsectors and a data size which encompasses the entirety of those 23 sectors — ie the subsectors are a part of the parent sector’s body data. PROT reports 18 subsectors but specifies a data range that ends right before PCLP — ie the subsector are not a part of the parent sector’s bdy data. Nothing in the data for KPS seemed to suggest it should be handled any differently.

I knew that c47edit could load this file so in the end I just copied the LoadChunk function from that and pasted it into my class then converted one line at a time from C++ to C# taking care to ensure the logic was identical. Because I knew c47edit was able to load the file with no problems, I discarded the console output and just added the tree structure building code in from the start. When I ran it, this is what I ended up seeing:

Exactly the same issue!

As it turns out, the subsectors that I could identify like PCLP, PVER etc are not the only sectors which exist. The first child of PROT isn’t PCLP, it’s an actual sector but the id is an int specifying an offset into the PHEA sector. I’d thought my code wasn’t working because the sector id wasn’t one I could see in the file, but it was actually working fine. There really is no special handling for root sectors and the PROT data size covers all of its subsectors. I just didn’t properly understand the format of the data I was trying to process.

Now the data could be unpacked into sector objects I wrote some code to read them and start to construct the scene again. I started out running through the PROT tree data and creating the hierarchy, using the PHEA offsets as the object names.

Once the tree structure was coming out as expected I added a lookup for the PHEA data and from there I could get the name index and use it for the GameObject names.

Seeing this was both uplifting and disappointing. I’d done a lot of work only to end up with something I’d already had in the first couple of hours. The addition that I did have this time was a far more generic loader for the data, which should be able to handle any spk file rather than just the one for the first level. I decided not to continue with the sectors in order and instead prioritise the ones that would help me build the scene.

PPOS

The sector “PPOS” is where the object positions are packed. This is just a stream of floats (4 bytes) and each 3 floats make up a vector describing the object’s position.

The PHEA data describes an offset into the PPOS sector body where the vector begins, so skipping to that point and reading 12 bytes is really all that is required. As I wanted to unpack the data before trying to build the scene, I simply scanned through the data at the start and built up a key-value pairing where the key is the position the data was read from and the value is a Unity Vector3 object.

Now when I want the position I can its offset from the PHEA object and use that as a key to retrieve it from the PPOS sector class. I generated a quad for each object to make them more visible and although I couldn’t yet tell if the positions were correct, they were definitely not all sitting at the origin any more.

PVER & PFAC

It seemed like there was a big payoff in sight. If I could just extract the mesh data then I’d finally be able to see the level show up in Unity.

Pulling out the vertex data seemed simple since it was a series of vectors and I already had positions being extracted the same way. I hard coded 3 indices for the triangle data and generated the meshes. I told myself this was to see if the mesh generation code was working but I really just wanted to see even the tiniest bit of real geometry show up after all this work.

It was a pointless mess, which is exactly what I was expecting but it felt good to see. Once the indices were being read properly this would transform into the level. I made an assumption about the data for this, that it’d be a series of 3 uint32 indices that describe a triangle. That’s how I read it into the lookup table but then the lookup keys and the actual data didn’t match up. Ok, that makes sense since quads could be described with 4 indices, so I just read each individual index into its own lookup table entry. I also eventually realised the values are shorts (2 bytes wide) and not uint32s (4 bytes wide).

Now I could get numbers from the data but they weren’t valid indices. I was getting things like index 300 for an object that only 80 verts. At this point I’d been up late every night and I was so tired and frustrated that nothing made sense any more. For example, why would sounds have geometry data?

0x000002635acf7636 "Sound_Chime#0" Verts: 26697, 40. Tris: 10541, 36. Quads: 10649, 20.
0x000002635acf7644 "Sound_Chime#1" Verts: 26697, 40. Tris: 10541, 36. Quads: 10649, 20.
0x000002635acf7652 "Sound_Chime#2" Verts: 26697, 40. Tris: 10541, 36. Quads: 10649, 20.

Because of the indices issue I couldn’t even draw that to see what it was and figure out what object it should have been part of instead of the sound. On a different object that really should have had geometry I tried listing out the values I was extracting.

Verts: 25, Tri Inds: 36 - 0, 2, 10, 8, 2, 6, 14, 10, 6, 4, 12, 14, 4, 0, 8, 12, 156, 154, 152, 156, 152, 158, 156, 158, 160, 156, 160, 154, 6, 2, 0, 4, 14, 10, 8, 12

I just could not figure this out. 25 verts but the indices go up to 158.
I logged more data and tested another different object and it got even weirder.

Num Verts: 18, Num Tris: 16, Index 0 = 52428
Num Verts: 18, Num Tris: 16, Index 1 = 2
Num Verts: 18, Num Tris: 16, Index 2 = 0
Num Verts: 18, Num Tris: 16, Index 3 = 4
Num Verts: 18, Num Tris: 16, Index 4 = 6
Num Verts: 18, Num Tris: 16, Index 5 = 4
[...]
Num Verts: 18, Num Tris: 16, Index 46 = 18
Num Verts: 18, Num Tris: 16, Index 47 = 34

I began seeing patterns that may or may not have been real, for example 52428 is a meaningless number, except it also looks like this in binary form:

What is the mystery of this number?

With this conspiracy left unsolved I finally got some much needed sleep.

In the morning, as was always inevitable, I immediately saw I’d made a stupid mistake. The offset for reading the indices was specified as the number of shorts (2 bytes each) but I was reading it as the number of single bytes. This meant if the offset was 5, I’d start reading from the 5th byte but should instead have been reading from the 10th.

At the sight of this guy I very nearly broke down in tears. All that effort and here it is, a wireframe Buddha. Suddenly it was all worthwhile. I went through the hierarchy and saw “Hero”. Time to see that bald head in all it’s outline glory.

Damn.

The data for the hitman is completely messed up. In fact the data for all characters is messed up, they appear as weird scrunched up balls. However if viewed from a very specific angle, it is possible to make sense of it.

That’s his tie, btw

I don’t think this is an issue with the code, I think they are genuinely packed this way. I assume once they have animation applied it expands into the real model.

Oh yeah and I finally figured out why sounds have mesh data.

I poked around some more and I was made happy by this choice of naming convention. I guess developers are the same everywhere.

The limo seems to be made up of two limos, but I’m not sure why one is rotated. Some other objects do this too, like the helicopter in this level and a train in another level.

These aren’t the full models though. The data stores tris and quads separately and although I’d got the tris extracted I did still need to do the quads. After that I’d have to merge them together into one mesh to place on the MeshFilter.

By unpacking the triangle data into one set of GameObjects and the quads into another set, the wireframe model of the first level can be made out.

In the centre are some steps leading up to a pagoda with the Buddha statue in the middle. I added some more code to merge these into a single mesh and applied a material to be able to see it more clearly.

The normals were not extracted here, so there isn’t a lot of detail. But by using Unity’s utility functions we can easily calculate the surface normals and then apply a material.

It’s much better now, except for the missing faces. This is clearly a winding issue which is causing the front faces to be culled instead of the backfaces. I changed the mesh generation to insert the indices counter-clockwise and bam!

It’s finally done. The models are imported.
Except why is that one bit of his hands missing?
Actually it’s being culled. If viewed upwards from inside the model it’s visible. The Buddha is all tri data except for those two missing parts which are quads, so maybe the quad winding is still not right. Which would be strange because the lamp next to the Buddha is made entirely of quads and yet it is being shown with no issues.

More exploration showed that buildings, which are almost entirely quads, were also wrong. So I had another look at the code for it and it turned out it really was still a winding issue. The reason the lamp shows up fine is because you can see inside it, so it has to have two faces.

Here it is, then. Hitman: Codename 47 level 1 loaded into Unity.
The code is available on Github.

As if this wasn’t enough of a tangent from my original idea of remaking this one level in Unity, now that I had the data from the original game I figured I could write it out to a file and then they’d be in the project as assets.

Or maybe extracted into some other 3D modelling software for a bit of tweaking. Perhaps one day even going so far as to slice it.

For a 3D printer.

Having this tiny Buddha feels like a great achievement and I bet whoever modelled it all the way back in the late 90s never thought someone would go to this much trouble to produce it in physical form. It wasn’t even my initial goal but when I was struggling to figure out all of these issues with the meshes and I realised I could print it if I got to the end, it really kept me going.

Now I’ve got the data extracted I can get on with my original idea of remaking it and having it feel just like the original did. Oh wait, what’s that weird issue with the roof?

Oh right, I forgot to extract the data for rotating the objects properly.

It’s gonna be another long night.

--

--