Python Data Structure collection

Tafadzwa L Nyamukapa
Analytics Vidhya

--

A datastructure_collection written in Python which helps developers and big-data scientists in implementing fast and efficient algorithms.

Package

The Pypi datastructer_collection package can be found here.

Getting Started

The datastructure_collection has three data structure classes :

  1. BinarySearchTree
  2. HashMap
  3. LinkedList

I Look forward to add more data-structures in the future.

Installation

Run the following to install package :

pip install datastructure_collection

Usage

Example

To use this package :

from datastructure_collection import BinarySearchTreefrom datastructure_collection import HashMapfrom datastructure_collection import LinkedList

1. Binary Search Tree

The Binary Search Tree operations and the time complexities are shown in the table below:

As seen from the table above the Binary Search Tree has an advantage over a Linear List because of its searching mechanism. The tree will be sorted where the left children of a node are less than the right children , and because of its sorted nature this gives it a Best Case runtime of O(logN) compared to a List search of O(N). The Worst Case of a Binary Search Tree O(N) occurs when the elements of the tree are ordered linearly i.e (elements are inserted with increasing order) eg 2 -> 3 ->4 -> 5 …

  • NB The add,remove,minValue,maxValue ,contains and valueOf operators uses the search mechanism to locate the target.

The Worst Case of a Binary Search Tree can however be improved by implementing a Balanced Search Tree with data structures like (AVL trees, splay trees, and red-black trees) which i look forward to add in the future.

Example BinarySearchTree

2. Hash Map

The HashMap is the most commonly used data structure in solving big data and mapping problems. In most data structure collection ,searching is the most important operation, and as such we need to do it fast and efficiently. Unlike most data structures like Lists, Trees which are based on key comparison when searching for a target, the Hashmap uses a concept of hashing the keys upon searching which runs at constant time O(1) to locate an index of a specific key. I have used the concept of double hashing in implementing the hashing algorithm and closed hashing/open addressing for Probing . The hashing algorithm is as follows:

Double hashing reduces primary and secondary clusters, thus reducing collisions.

The table below shows the operations and time complexities of a HashMap

From the table above a HashMap is one of the strongest data structure in implementing a map, as its fundamental core operations ie __getitem__, __setitem__,__deltitem__, runs at constant time O(1) at Best Case. The Hash Map worst case run-time can always be enhanced by implementing SortedTableMap which improves the Worst Case O(N) to O(logN), which i hope to add the data structure in the future.

Example HashMap

3. Linked List

One might ask why implement a linked List data structure if we already have a list in Python. Insertions and Deletions operations in a List requires items to be shifted when rehashing the collection . This however can be time consuming especially for a large collection of data. The add operator in a Linked List requires O(1) time where as the Python List requires O(N)

The table below shows the operations and time complexities of a LinkedList

NB. Unlike the add operation which runs at a constant time O(1) , the append operation in a linked list requires O(N) time since it has to navigate all the way down to the tail.

Example Linked List

Pull Requests

I Welcome and i encourage all Pull Requests. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Github repo can be found here

Created and Maintained by

--

--

Tafadzwa L Nyamukapa
Analytics Vidhya

Software Engineer with a huge passion for building software and explore new technologies. FinTech | CKA | Java | Django | AI | Js | System Architect | Cloud