Manipulating Zend Hash Table (2)

c9s
PHP Hacks
Published in
3 min readOct 20, 2015

OK, Since you’ve already knew how to “reset”, “foreach”, “next” the hash of a PHP array. We can take a closer took at Zend Hash.

Please note the code below are based on PHP 5.6.x. There are many API changes in PHP7.

HashTable

Zend Hash consists of one HashTable and many Buckets. The HashTable structure contains the infomation of table size, table mask, number of elements, free elements, internal pointer, head bucket, tail bucket…

The definition of HashTable is defined in “Zend/zend_hash.h”:

typedef struct _hashtable {
uint nTableSize;
uint nTableMask;
uint nNumOfElements;
ulong nNextFreeElement;
Bucket *pInternalPointer; /* Used for element traversal */
Bucket *pListHead;
Bucket *pListTail;
Bucket **arBuckets;
dtor_func_t pDestructor;
zend_bool persistent;
unsigned char nApplyCount;
zend_bool bApplyProtection;
#if ZEND_DEBUG
int inconsistent;
#endif
} HashTable;

The “arBuckets” points to a list of Buckets, and each element is actually a linked list.

The ‘pDestructor’ is a function which handles value destruction. The ‘persistent’ indicates if the HashTable object is in persistent memory.

Now you might ask: what is persistent in Zend Engine?

In Zend Engine, every request allocates their own memory to run PHP scripts, the allocated resources will be free after the request is finished. Nothing will be left after the request, except persistent resources.

And a persistent resource means it will be left in the engine, and it can be reused for future requests.

However the “persistent” flag only means the HashTable itself is persistent, which doesn’t mean values in the HashTable are persistent. So you have to ensure your ZVALs are allocated in persistent memory by the persistent APIs.

Ah, we went too far, let me explain the persistent memory in other stories.

Let’s go back to the HashTable, the “nTableMasks” is used to calculate the bucket pool index (arBuckets):

nIndex = h & ht->nTableMask;p = ht->arBuckets[nIndex];while (p != NULL) {….

It starts from 0 (when hash table is initialized) and grows up to “tableSize-1” when table is resized (see zend_hash_do_resize function):

if ((ht->nTableSize << 1) > 0) { /* Let’s double the table size */  t = (Bucket **) perealloc(ht->arBuckets, (ht->nTableSize << 1) * sizeof(Bucket *), ht->persistent);  HANDLE_BLOCK_INTERRUPTIONS();  ht->arBuckets = t;  ht->nTableSize = (ht->nTableSize << 1);  ht->nTableMask = ht->nTableSize — 1;  zend_hash_rehash(ht);  HANDLE_UNBLOCK_INTERRUPTIONS();}

Bucket

Now you know arBuckets are linked lists, then it’s simple enough to understand the definition:

typedef struct bucket {  ulong h; /* Used for numeric indexing */  uint nKeyLength;  void *pData;  void *pDataPtr;  struct bucket *pListNext;  struct bucket *pListLast;  struct bucket *pNext;  struct bucket *pLast;  const char *arKey;} Bucket;

The “ulong h” is used for numeric index, for example: $array[0], $array[1]… etc

You might notice there are two data pointers: pData and pDataPtr.

The ‘pDataPtr’ is used only when the data size equals to the size of (void*) pointer. (the size is platform dependent) thus it’s faster for small data since the program don’t need to lookup another address.

And other fields are just like linked-list. It’s pretty simple, isn’t it?

--

--

c9s
PHP Hacks

Yo-an Lin, yet another programmer in 21 century