Exploring Pure C Techniques for Detecting Duplicates in Array

Alexey Medvecky
6 min readMay 10, 2023

--

I’ve always been interested in “old school” programming in the 80s style. There was yet to be such a massive number of libraries and frameworks.
When every happy owner of an 8-bit computer felt like a pioneer, there were few tools and information — but how enormous was the field for creativity?

To plunge into this atmosphere a little, you can again try to solve typical problems using one of the most conservative C programming languages still in use.

The task of identifying duplicates in a one-dimensional integer array is relatively straightforward. To compare the effectiveness of different approaches, we can solve the problem using C++ and its libraries for one method and pure C for another.

The first approach is to sort the original array — and then look for repeated neighbouring elements.
In C++, everything looks simple.

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

class DuplicateDetector
{
public:
bool containsDuplicate( vector<int> & nums )
{
sort( nums.begin(), nums.end() );

for ( size_t index = 0; index < nums.size() - 1; index++ )
{
if ( nums[ index ] == nums[ index + 1 ] ) return true;
}

return false;
}
};

int main()
{
DuplicateDetector duplicateDetector;

vector<int> nums_1 { 1, 2, 3, 1 };
vector<int> nums_2 { 1, 2, 3, 4 };

bool result_1 = duplicateDetector.containsDuplicate( nums_1 );
bool result_2 = duplicateDetector.containsDuplicate( nums_2 );

cout << boolalpha << result_1 << endl;
cout << result_2 << endl;
cout << noboolalpha;

cout << "End of programm." << endl;

return EXIT_SUCCESS;
}

Almost the same in C

#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>

int compare( const void * a, const void * b )
{
const int * x = (const int*) a;
const int * y = (const int*) b;
return *x - *y;
}

bool containsDuplicate( int * nums, int numsSize )
{
qsort( nums, numsSize, sizeof(int), compare );

for ( size_t index = 0; index < numsSize - 1; index++ )
{
if ( nums[ index ] == nums[ index + 1 ] )
{
return true;
}
}

return false;
}

int main()
{
int nums[] = { 1, 2, 3, 1 };
int nums2[] = { 1, 2, 3, 4 };
bool result = containsDuplicate( nums, 4 );
printf( "Result: [ %s ]\n", result ? "true" : "false" );
result = containsDuplicate( nums2, 4 );
printf( "Result: [ %s ]\n", result ? "true" : "false" );

puts( "End of program." );
return EXIT_SUCCESS;
}

You can take a more technical approach and check repetitions using a set.

In C ++, the code will not be much more complicated than the previous one.

#include <iostream>
#include <unordered_set>
#include <vector>

using namespace std;

class DuplicateDetector
{
public:
bool containsDuplicate( vector<int> & nums )
{
unordered_set<int> numsSet;

for ( int number : nums )
{
if ( numsSet.count( number ) > 0 )
{
return true;
}

numsSet.insert( number );
}

return false;
}
};

int main()
{
DuplicateDetector duplicateDetector;

vector<int> nums_1 { 1, 2, 3, 1 };
vector<int> nums_2 { 1, 2, 3, 4 };

bool result_1 = duplicateDetector.containsDuplicate( nums_1 );
bool result_2 = duplicateDetector.containsDuplicate( nums_2 );

cout << boolalpha << result_1 << endl;
cout << result_2 << endl;
cout << noboolalpha;

cout << "End of programm." << endl;

return 0;
}

Everything will already be more attractive in C because there is no set implementation in the standard C library.

The simplest method is to implement a set on an array.
For this task, you only need the ability to add an element to the set and check for the presence of a component in the set.

typedef struct 
{
int elements[ MAX_SIZE ];
int size;
} Set;

void init( Set * set )
{
set->size = 0;
}

bool Set_contains( Set * set, int element )
{
for ( int i = 0; i < set->size; i++ )
{
if ( set->elements[ i ] == element )
{
return true;
}
}
return false;
}

void add( Set * set, int element )
{
if ( !Set_contains( set, element ) )
{
set->elements[ set->size++ ] = element;
}
}

The driver and containsDuplicate function is almost the same as in the C++ version.
Everything works — but if you submit a large enough array — you can notice subsidence in speed. Linear search still needs to be more efficient.
Sorting the array and using binary search can help to resolve speed issues. Still, this approach is only partially practical because it requires sorting the array every time a new element is added.
To speed up the search in implementing the set, you can replace the array with a hash table based on an array and a linked list.

#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>

typedef struct Node
{
int key;
struct Node * next;
} Node;

typedef struct Set
{
int size;
Node ** table;
} Set;

Set * Set_create( size_t size )
{
Set * set = (Set*) malloc( sizeof( Set ) );
set->size = 0;
set->table = ( Node** ) calloc( size, sizeof( Node* ) );

return set;
}

void Set_destroy( Set * set, size_t size )
{
for ( size_t index = 0; index < size; index++ )
{
Node * node = set->table[ index ];
while ( node != NULL )
{
Node * temp = node;
node = node->next;
free( temp );
}
}

free( set->table );
free( set );
}

unsigned int hash( int key, size_t size )
{
return key % size;
}

Node * findNode( Node * node, int key )
{
while ( node != NULL )
{
if ( node->key == key )
{
return node;
}
node = node->next;
}

return NULL;
}

void Set_add( Set * set, int key, size_t size)
{
unsigned int index = hash( key, size );
Node * node = findNode( set->table[ index ], key );

if ( node == NULL)
{
node = (Node*) malloc( sizeof( Node ) );
node->key = key;
node->next = set->table[ index ];
set->table[index] = node;
set->size++;
}
}

bool Set_contains( Set * set, int key, size_t size )
{
unsigned int index = hash( key, size );
Node * node = findNode( set->table[ index ], key );

return ( node != NULL );
}

bool containsDuplicate( int * nums, int numsSize )
{
Set * set = Set_create( numsSize );

for ( size_t index = 0; index < numsSize; index++ )
{
if ( Set_contains( set, nums[ index ], numsSize ) )
{
Set_destroy( set, numsSize );
return true;
}

Set_add( set, nums[ index ], numsSize );
}

Set_destroy( set, numsSize );

return false;
}

int main()
{
int nums[] = { 1, 2, 3, 1 };
int nums2[] = { 1, 2, 3, 4 };
bool result = containsDuplicate( nums, 4 );
printf( "Result: [ %s ]\n", result ? "true" : "false" );
result = containsDuplicate( nums2, 4 );
printf( "Result: [ %s ]\n", result ? "true" : "false" );

puts( "End of program." );
return EXIT_SUCCESS;
}

Bingo! The search is much faster, and the function works better with large arrays.

In the course of the solution in the “old” style — I had to remember the algorithms and data structures and invent a couple of bicycles with square wheels.
Of course, the old style C approach has no value in today’s enterprise development world.
But this makes you feel like an engineer and inventor again, as in those days. Even if not in reality, the charge of positivity and motivation is quite actual.

--

--