Spider R&D - Medium

Understanding Zero-Knowledge Proofs for Enabling Privacy Preserving Identity Verification

KaranSinghBisht — Sun, 02 Mar 2025 17:48:37 GMT

What if I told you that you could prove who you are without revealing any personal information — and I could still verify your identity with complete confidence? Sounds magical, doesn’t it? But it’s not magic — it’s mathematics. Let me explain.

The problem with traditional identity verification, commonly known as KYC (Know Your Customer), is that even when a website only needs to confirm if you’re over 18, it often requires you to upload an official identity document. That document contains far more personal details than necessary — details you might not want to share.

This creates two major concerns:

The website might sell your data without your knowledge.
Even if they don’t, they might store it insecurely, leaving it vulnerable to breaches. If a data leak occurs, your sensitive information is exposed.

Even in an ideal scenario where a company promises not to save or misuse your data, there’s no way for you to verify their claim. A common workaround is PII redaction — blurring or censoring unnecessary information before submitting documents.

But what if I told you that you could prove your identity without revealing any details at all?

Well, that’s exactly what Zero-Knowledge Proofs (ZKPs) enable — and they are already revolutionizing privacy in identity verification.

What are Zero Knowledge Proofs?

A zero-knowledge protocol is a method by which one party (the prover) can prove to another party (the verifier) that something is true, without revealing any information apart from the fact that this specific statement is true. (Source)

To illustrate this with a real life example, I’ll share the example I understood the most and was the first introduction to ZKP given to me by my Brother. Which is explained both in this blog as well as this youtube video.

Imagine you’re playing Where’s Waldo with a friend. To win, you need to find Waldo on the map and prove to your friend that you found him. Normally, the only way to convince them is by pointing directly at Waldo — but then they, too, will know his location.

A zero-knowledge protocol, however, would allow you to prove you know where Waldo is — without revealing his exact position.

Here’s how we can do that:

Or you can also read the Ali Baba Cave Story to understand them.

One of my mentors explaining me the Ali Baba Cave Story [I was supposed to be the one explaining :)]

Now that we have an understanding of how Zero-Knowledge proofs work let us look into coding a zero knowledge proof.

Writing your first ZK Circuit

I am going to use noir-lang developed by Aztec. Noir is an open-source Domain-Specific Language for safe and seamless construction of privacy-preserving Zero-Knowledge programs, requiring no previous knowledge on the underlying mathematics or cryptography.

The installation steps for Noir are taken from the official documentation. You can either follow along with this guide or refer directly to the docs.

Step 1 — Installation of Nargo — the CLI tool

Nargo is the CLI tool used to manage Noir projects. To install it, run

curl -L https://raw.githubusercontent.com/noir-lang/noirup/refs/heads/main/install | bash
noirup

Step 2 — Installation of Proving Backend

Proving backends provide functionalities such as generating proofs, verifying proofs, and generating smart contracts for your Noir programs. To install the proving backend, run

curl -L https://raw.githubusercontent.com/AztecProtocol/aztec-packages/refs/heads/master/barretenberg/bbup/install | bash
bbup

Step — 3 Setting up hello_world project

Once noir is setup, we can initiate one of the already available examples the traditional hello_worldproject

nargo new hello_world

This will generate 2 files

src/main.nr contains a simple boilerplate circuit
Nargo.toml contains environmental options, such as name, author, dependencies, and others.

Generating the Prover.toml File

We can now use nargo to generate a Prover.toml file, where our input values will be specified

nargo check

This will create a Prover.toml file, give the following inputs there

x = "11"
y = "2017"

We’re now ready to compile and execute our Noir program. By default the nargo execute command will do both, and generate the witness that we need to feed to our proving backend:

nargo execute

The witness corresponding to this execution will then be written to the file ./target/witness-name.gz.

The command also automatically compiles your Noir program if it was not already / was edited, which you may notice the compiled artifacts being written to the file ./target/hello_world.json.

With circuit compiled and witness generated, we’re ready to prove.

Step 4 — Proving Backend

bb prove -b ./target/hello_world.json -w ./target/hello_world.gz -o ./target/proof

The proof is now generated in the target folder. To verify it we first need to compute the verification key from the compiled circuit, and use it to verify

bb write_vk -b ./target/hello_world.json -o ./target/vk
bb verify -k ./target/vk -p ./target/proof

Congratulations, you have now created and verified a proof for your very first Noir program!

If you try to execute the circuit with having the variable values as same, it’ll throw an error.

Say your input was

x = "1"
y = "1"

As it should due to the assert (x != y) being False

Exploring Existing Initiatives for Privacy-Preserving Identity Verification in the Web3 Space

Privado ID — Identity Ownership in Web3

One of the most well-structured identity solutions in Web3, Privado ID, provides middleware infrastructure for privacy-preserving digital identity. Unlike traditional ID systems that require revealing personal details, Privado ID enables users to own their data and share only the necessary parts with their consent.

How does this work?

Users are issued Verifiable Credentials (VCs), signed cryptographically and secured by blockchain to ensure they are tamper-proof.
These credentials are stored in an identity wallet, much like how a crypto wallet holds private keys.
Whenever verification is needed, instead of sharing full credentials, the wallet generates a zero-knowledge proof, proving only what’s required — like “I am over 18” without exposing date of birth.

You can read more about them here.

OpenPassport — Proving Humanity Without Exposing Identity

Sybil attacks (where bad actors create multiple fake accounts) are a huge problem in Web3. OpenPassport helps prevent this by allowing users to prove they are human without compromising privacy.

Instead of revealing personal details, users generate a cryptographic proof linked to their identity — without disclosing any private information.

You can learn more about them here, This workshop was part of DevCon Bangkok, held from 12–15 November, 2024.

zkPassport — Zero-Knowledge Passport Verification

zkPassport brings identity verification into Web3 while maintaining anonymity. It allows users to scan the NFC chip of their passport or ID card and generate a zero-knowledge proof.

You can learn more about them here, This workshop was part of DevCon Bangkok, held from 12–15 November, 2024.

Key Discussions from ETHDenver 2025 on Decentralized Identity

Decentralized identity is evolving rapidly, and ETHDenver 2025 had some major discussions on privacy, KYC alternatives, and the role of AI in identity verification.

Bye Bye Biometrics: AI Demands Stronger Security Standards

Traditional biometric authentication (fingerprints, face scans) is increasingly being bypassed using AI-powered attacks. This talk featured experts from:

The discussion emphasized how zero-knowledge identity proofs will play a crucial role in securing user identity without centralized storage of biometrics. You can watch the session here.

Decentralized Identity: Unlocking Interoperability & User Control

A session by Patrick Young from Galxe, one of the biggest Web3 identity & reputation platforms, discussing:

Self-sovereign identity — Users control their credentials, rather than relying on Google/Facebook logins.
Zero-knowledge credential verification — Users can prove their reputation, participation, or age without revealing personal info.
Interoperability — Users can reuse credentials across different ecosystems.

You can watch the session here.

Privacy Without KYC — The Labyrinth Protocol Approach

A session by Amit Chaudhary from Labyrinth. Labyrinth Protocol explores the idea of privacy-friendly KYC alternatives, focusing on on-chain selective disclosure.

Users remain completely private unless flagged for suspicious activity.
If a compliance check is triggered, data remains encrypted and is only shared when necessary.
Unlike centralized KYC databases, Labyrinth ensures privacy by default while allowing regulated access when needed.

You can watch the session here.

Exploring Existing Initiatives for Privacy-Preserving Identity Verification in the Web2 Space

I’ve placed this section at the end since it primarily explores India-specific identity verification challenges, which may not be relevant to all readers.

Understanding Aadhaar and the Privacy Concerns

The Aadhaar card is a unique identity document for Indian citizens, issued by the government. It contains a 12-digit Aadhaar number, along with personal details such as name, date of birth, and address.

With Aadhaar being widely shared online — especially with centralized institutions for identity verification — privacy concerns arose. Cases of fraud, identity theft, and unauthorized use of Aadhaar details became a major discussion point, prompting the government to take corrective measures.

Government Measures to Improve Aadhaar Privacy

To mitigate these risks, the government introduced privacy-enhancing features:

Masked Aadhaar → Hides the Aadhaar number while keeping most other details visible. (Not so private)
Virtual ID (VID) → A temporary, revocable 16-digit identifier that can be used instead of the Aadhaar number for verification.

Does VID Truly Solve the Privacy Issue?

At first glance, VID seems like a great privacy improvement. Since banks (including SBI’s Video KYC) now accept either Aadhaar or VID, it ensures that your Aadhaar number remains hidden. Additionally, since VID is a one-way function, it is not reversible — meaning institutions cannot derive your Aadhaar number from it.

However, this raises a critical concern:

If every institution starts accepting VID instead of Aadhaar, doesn’t VID effectively become the new Aadhaar?
Yes, VID can be changed, but if someone uses your VID before you reset it, doesn’t that defeat the privacy purpose?

What is Anon Aadhaar?

Anon Aadhaar is a zero-knowledge protocol that enables Aadhaar ID holders to prove their identity in a privacy-preserving way. I haven’t explored it in depth, but you can check out their documentation here. Since it’s open-source, you can also explore the GitHub repository.

Anon Aadhaar in Action

One notable use case of Anon Aadhaar was its integration at ETHIndia 2024 for Sybil-resistant voting in the judging process. This implementation showcased how zero-knowledge proofs can enhance fairness and security in decentralized voting.

If you’re interested in learning more, you can read about its application in quadratic voting at ETHIndia 2024 here.

Final Thoughts

Zero-Knowledge Proofs (ZKPs) are redefining privacy in identity verification, eliminating unnecessary data exposure while ensuring security.

If you have any questions, please feel free to ask them in the comments, or reach out directly to me on LinkedIn or X.

Cheers! 🙌

Karan Singh Bisht

LinkedIn
X (Twitter)

Understanding Zero-Knowledge Proofs for Enabling Privacy Preserving Identity Verification was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

CASCA: Coflow Aware Selective Compression Accelerator

Ashutosh Anand — Tue, 16 Apr 2024 14:58:19 GMT

In Smart India Hackathon’23, a very special problem statement revolved around prominent challenges faced by backbone networks, opening an increasing demand for efficient data transmission and network optimization. As participants, we were supposed to develop a solution that compresses data effectively, minimizes bandwidth usage, and enhances overall network performance, all simultaneously.

For this problem statement, we built CASCA — a selective compression system for backbone networks. Before paving a final pathway for making CASCA, we looked into a lot of current research papers to find out the best way to tackle the issues.

With the help of this article, we will share the same thought process of approaching the problem statement, the work past the hackathon, and how we are proceeding in the near future with this project.

Understanding the Backbone Networks:

What is a backbone network?

Think of backbone networks as the superhighways of the internet, connecting cities of data across the digital landscape. They’re like the vital veins of our online world, making sure information gets where it needs to go, fast. But there’s also a problem lurking in these digital highways: too much traffic. As more and more people use the internet for everything from streaming videos to sending emails, these networks are getting clogged up with data. It’s like rush hour traffic on the internet!

That’s where the need for data compression comes in. It’s like packing your suitcase efficiently before a trip and squeezing out extra air to fit more stuff. Similarly, with data compression, we can make the files and messages traveling through these networks smaller to take up less space and move faster.

Backbone Network For a LAN

This was the way for us to develop a better data compression system. What we are doing is making files smaller and smarter to make networks run smoother and keep the internet flowing :)

Delving Deep into the data transmission:

To make backbone networks faster and more efficient, we need to understand what’s happening with the data flowing through them. But here’s the thing: we don’t control where the data comes from or where it goes. So, we use a clever trick called NetfilterQueue to intercept the data as it moves through the network.

You can imagine it as watching traffic on a highway from a bridge above.
Now, this interception happens at a specific level of the network called the data link layer. It’s like catching cars as they drive by, but in our case, we’re grabbing packets of data instead. By peeking at these packets, we can gather important information about where they’re coming from, where they’re going, and what they contain. This helps us understand how the network is used and find ways to improve it.

How does it happen?

In backbone networks, prioritizing efficiency necessitates a selective approach to data compression. Rather than compressing entire packets, our focus is solely on compressing the payload — the actual transmitted data — while leaving packet headers intact. This ensures critical information like source and destination addresses, error control details, and routing information remains unchanged.

While compression in the system would be the most efficient approach, both sender and receiver will need to be aware of the algorithm used. What CASCA does is move the compression from the application layers in the sender and receiver to the data link layer by intercepting packets. Thus, by implementing compression through CASCA, compression can occur within the routers themselves. This ensures a transparent process where neither the sender nor the receiver need to be aware of compression taking place. This approach is particularly advantageous for handling streams of data commonly encountered in backbone networks.

While compressing the payload reduces the amount of data that needs to be transmitted, it also introduces overhead in processing time and computational resources. To address this tradeoff, we employ a selective compression method. This approach allows us to adjust our compression strategy dynamically based on the data's characteristics and current network conditions.

Surfing the waves of information with precision and speed :

The journey of a packet from point A to point B is influenced by various factors, each contributing to the overall transit time. These factors include:

Routing Delay Time: The time taken for routers to process and forward the packet along the optimal path to its destination.
Processing Time: The time required for processing tasks such as packet inspection, routing table lookup, and Quality of Service (QoS) enforcement.
Propagation Time: The time it takes for the packet to travel across the physical distance between nodes.
Transmission Time: The time taken to transmit the packet over the network link, influenced by the link’s bandwidth and the packet’s size.

We employ a predictive approach to data compression to optimize network performance and reduce bandwidth utilization. This involves including a neural network model trained on system metrics, such as the number of CPU cores, CPU utilization, CPU frequency, total memory, and memory availability, to predict the compression rate.

Furthermore, the transit time of packets is estimated using round trip time (RTT) and bandwidth considerations. By factoring in delay, queue, propagation, and transmission times, we accurately calculate the total transit time.

Compression Conditions

However, compression itself introduces latency, as it takes time to compress and decompress data. Thus, we carefully evaluate whether performing compression on intercepted packets is advantageous. This decision hinges on whether the total time required for compression, transit with compression and decompression is less than the transit time for uncompressed packets.

Compression time + Transit time (compression)+ decompression time < Transit time(original)

Suppose there’s a lot of data from various streams that come to the router. In this case, the compression time to compress any packet would be quite high since there is a lot of system load. In this case, sending data uncompressed might be more efficient.

In the case of a very heavily congested network, we would need to aggressively compress data so that network transit time is less.

Tech behind CASCA:

Network simulation

We simulate a sample network using gns3. We limit the bandwidth and the specs of the machines to simulate how our proposed model works under load. In network simulation using GNS3, we create a virtual representation of our target network setup. This involves designing a network topology similar to our actual backbone network and configuring virtual machines within GNS3 to mimic the hardware specifications of real-world machines. We then impose bandwidth limitations on network links to replicate congestion scenarios and simulate traffic generation to observe network behavior under different loads.

GNS3 Setup

Once the network environment is set up, we integrate our proposed compression model into the simulation. This includes configuring compression and decompression processes within network nodes to mirror real-world implementation. We then conduct performance evaluations, measuring metrics like transit time, throughput, and compression ratio under varying network conditions.

Compression Technology

Indeed, since we’re compressing data at the link layer, we’re dealing with packets regardless of their initial content or form. Our compression process remains the same whether it’s encrypted data or any other format.

Here’s how it works:
We take the entire payload of each packet, compress it using our chosen compression algorithm, and then extract and retain only the relevant information from the packet headers. This information forms a new header for the compressed data.

It’s like taking each packet, separating out the header information, compressing the payload, and then appending the compressed payload with the relevant header information.

By compressing only the payload and retaining essential header information, we ensure that the critical details, such as source and destination addresses, error control, and routing information, remain intact. This allows for efficient data transmission while preserving essential network functionality.

Packet manipulation using Scapy:

Scapy is used to create a Scapy packet object from the raw packet data received from the Netfilter Queue. This object allows us to easily access and manipulate the various layers and fields of the packet. Scapy’s built-in layer detection capabilities are then used to determine if the intercepted packet contains a Raw layer, which typically carries payload data. We extract different headers from the packet, allowing the script to access information such as source and destination IP addresses, source and destination ports, and protocol type.

The payload data is extracted from the Raw layer of the packet and compressed or decompressed using Zstandard compression, depending on whether the packet is outgoing or incoming. After compression or decompression, we reconstruct the packet with the modified payload data. This involves updating the packet’s Raw layer with the compressed or decompressed payload.

Netfilterqueue and iptable rules:

Netfilter Queue is a mechanism provided by the Linux kernel’s Netfilter framework that allows userspace programs to intercept and modify network packets. It operates by providing a queueing mechanism where packet matching rules can be directed for processing by userspace applications.

In the context of CASCA, Netfilter Queue is being utilized to intercept network packets at the firewall level and to capture outgoing and incoming IP packets. When a packet is intercepted, it is passed to the user space program for further processing.

Within the defined function, the code inspects each intercepted packet using Scapy. It then examines the IP and UDP headers to determine the source, destination, and type of traffic.

For outgoing packets, it compresses the payload data using Zstandard compression and updates the packet accordingly before allowing it to continue its journey.

Zstandard for compressing

Using Zstandard for data compression in the context of backbone networks offers several advantages, particularly due to its high compression and decompression rates while maintaining excellent compression ratios. This makes it an attractive choice for optimizing network performance without sacrificing data integrity.

One of the key considerations in selecting a compression algorithm is the tradeoff between compression speed and compression ratio. Compression speed refers to how quickly data can be compressed, while compression ratio refers to the level of compression achieved, typically measured as the ratio of compressed data size to original data size.

Below is a comparison between different compression algorithms:-

In the case of Zstandard, a balance between speed and compression ratio is struck. It achieves high compression rates, meaning it can significantly reduce the size of data packets while also maintaining relatively fast compression and decompression speeds. This is crucial in backbone networks, where efficient data transmission is essential, and latency must be minimized.

Working of CASCA

Compressing Image and transferring it using CASCA

We can visualize the work by using the demonstration above; here, we’re transferring an image from our local machine (X) to another host (Y). Underneath, NetfilterQueue is being used to intercept all the packets transmitted towards Y from source X.

Scapy plays a critical role in this process. It dissects each packet, meticulously identifying the headers and extracting the payload (the actual image data). Subsequently, we use Zstandard, a powerful compression algorithm, to selectively compress these packets on the fly.

Finally, on the receiving end (Y), the decompressed packets are reassembled, resulting in the original file.

Novelty and scalability:

Our solution, using Zstandard for data compression, can be deployed directly within routers and can offer a seamless experience for users. Unlike existing solutions that may require additional software or configurations, our approach integrates compression directly into the network infrastructure. This means developers can implement our solution effortlessly, using a simple command without a complex setup.

In the upcoming time, we plan to integrate Coflow Awareness into CASCA to enhance its network management capabilities. By incorporating Coflow Awareness, CASCA will be able to intelligently prioritize and manage data flows based on their Coflow relationships. This means that CASCA will be able to optimize the transmission of Coflows, grouping related flows together and ensuring they are transmitted efficiently through the network. This enhancement will lead to improved network resource utilization, reduced latency, and enhanced application performance, particularly for data-intensive distributed computing tasks.

This article is published as a part of the ‘Decoding the Architecture Series’ under Spider Research and Development Club, NIT Trichy.

CASCA: Coflow Aware Selective Compression Accelerator was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

Statik: Revolutionizing Version Control with Decentralization

Bharanichandraprabhu — Tue, 09 Apr 2024 15:33:19 GMT

In the ever-evolving software development landscape, Version control is like a super organized file cabinet for software developers. It helps us work together smoothly, keeps track of changes we make to our code, and ensures everything stays in order. It’s like a map that helps us navigate different versions of our code and merge our work. Overall, it’s a big help in keeping our projects neat.

As more people lean towards decentralization, old-fashioned centralized systems for managing different code versions are starting to feel outdated. That’s where Statik comes in. It’s a new way to handle and work together on code, spreading it across many places to keep it safer and more reliable.

Enter Statik, our project leading the charge in decentralized version control. By harnessing the capabilities of IPFS (InterPlanetary File System), Statik revolutionizes how we handle and collaborate.For those eager to delve deeper into the underlying technology, check out our in-depth article on IPFS: Welcome to the world of decentralised file storage systems!

But before we embark on our journey through the intricacies of Statik, let’s take a step back and ponder: what exactly is version control, and why does it hold such paramount importance in software engineering?

Version Control

Version control is a system that records changes to files over time. It allows you to revisit specific versions of a file or project by tracking modifications and who made them. This is crucial for collaboration, tracking progress, and reverting to previous states if needed.

Git

Git is a distributed version control system that efficiently handles projects of any size. It tracks changes to files, allows branching for parallel development, and facilitates collaboration among developers. Git’s decentralized nature means every developer has a complete repository copy, enabling offline work and faster operations.

Key Git Features:

Branching and Merging: Git allows developers to create branches for parallel development, making it easy to work on new features or bug fixes without affecting the main codebase. Merging branches back into the main branch is seamless, preserving the project’s history and integrity.
Fast Performance: Git is known for its speed and efficiency, making it ideal for large projects with many files and contributors. Operations like committing changes, branching, and merging are typically fast, even in complex projects.
Distributed Development: Git’s distributed nature means that every developer has a complete copy of the repository. This allows offline work and enables teams to collaborate more effectively, especially in remote or distributed environments.
Data Integrity: Git uses cryptographic hashing to ensure the integrity of your data. Every change is checksummed, and the history of changes is stored in a Merkle tree, making it virtually impossible to change or corrupt historical data without detection.

GitHub

GitHub is a web-based platform built around Git that provides hosting for Git repositories. It adds a social and collaborative aspect to version control, allowing developers to share code, contribute to open-source projects, and manage workflows. GitHub offers features like pull requests, issues tracking, and project management tools, making it a central hub for many developers and teams.

In conclusion, version control systems like Git and platforms like GitHub have revolutionized the way developers work, enabling efficient collaboration, tracking changes, and ensuring project integrity. Statik builds on these principles, offering a decentralized approach to version control, empowering users with more control over their data and projects.

  ____  _        _   _ _    
 / ___|| |_ __ _| |_(_) | __
 \___ \| __/ _` | __| | |/ /
  ___) | || (_| | |_| |   < 
 |____/ \__\__,_|\__|_|_|\_\

Statik is a decentralized version control tool built on top of IPFS, the Inter Planetary File System. Unlike traditional version control systems such as Git, which rely on a single, centralized file storage system, Statik leverages the decentralized nature of IPFS. In IPFS, data is divided into chunks and distributed across a network of peers, creating a decentralized and immutable data structure.

Statik is currently under active development, developed as an open-source project, focusing on providing a robust alternative to centralized repository hosting services like GitHub. By enabling users to manage versions of their files without relying on a central server, Statik ensures data integrity, availability, and security. Join us as we delve into the world of Statik and explore its innovative approach to version control.

Architecture:

The basic workflow of statik can be considered a Function call made via CLI that, in turn, creates and manipulates the folders and files in the directory in which we wish to initialize a statik repository, basically the “.statik/” folder. Within this folder, we have /SNAPSHOTS,/HEAD,/head/branch folders, whose significance will be discussed as we go through this article along with the different commands that this CLI tool offers. When we run the start script in the src folder (i.e. the index.ts ), we utilize the commander package to create a command-line interface (CLI) for a version control system (VCS) called “Statik,” and then based the command line arguments we call the respective functions.

Init (CWD, ipfs_node_url) — It takes input as the Current working directory and the ipf_node_url (the one which defines our IPFS node’s identity), and when it is called, the following directories are created and manipulated.

/. statik/heads contain the list of branches
/.statik/heads/main includes the latest committed contents in the main branch (when we make new branches /heads/(branch name), which consist of the previously committed contents in that corresponding branch as cids.
/.statik/HEAD/ consists of the current branch, which is the main when the statik repo is initialized.
The CONFIG file has the IPFS node URL.

Add(CWD, path of the directory to be staged)- It first checks if the current directory is a Statik repository and if any file paths are specified for adding. If not, it provides a helpful hint.

- It then initializes an IPFS client based on the configuration fetched from the local Statik setup. Depending on whether there are previous commits, it either creates a new snapshot of the specified files or updates an existing one.

- The function iterates over the provided paths, checks if they are directories, and adds the files recursively to the snapshot. If no changes are compared to the previous snapshot, it notifies the user accordingly.

Finally, it writes the updated snapshot path to the repository and exits the process gracefully, handling any errors encountered during the process.

- It then initializes an IPFS client based on the local Statik configuration, retrieves the current branch, and reads the previous commit from the repository. It constructs a commit object containing the previous commit hash, the snapshot hash, the commit message, and a timestamp.

This commit object is added to IPFS, and its resulting hash is written to the repository’s branch file. The function clears the snapshot file and logs the successful commit with the hash to IPFS before gracefully exiting the process. Any encountered errors are logged, and the process exits with an error status.

The chaining of CID’s of commit data:

After the statik commit command in the heads/ folder, the CID of the commit content (which is in the form of JSON data) is written, the of which looks like

commit = {

prevCommit: prevCommit,

snapshot: snapshot,

message: message,

timestamp: Date. now()

};

it can be seen that the prevcommit field has the value prev commit which is the CID that’s been already there before the current commit,

hence, the commit data has a link in the form of CID to the previous commit and the current snapshot; this way of structuring helps as we compare the previous commit data and current staging data, figuring out the mismatches and deciding whether any changes have been made.

Log(cwd)-It is similar to the git log. We get the array of commit data from the /heads/branch file, and we iterate through it by decrypting the CID using client.cat() and serially print the commit messages and the time of commit through asynchronous iteration.

As mentioned before, the commit history is like linked lists where each commit data has a CID pointing to the previous commit, using which we can extract the Entire commit history. With just the single Head commits CID.

List(cwd) -lists all the branches in the repository. First, it ensures that the current directory is a Statik repository. It then reads the current branch from the repository’s HEAD file and retrieves a list of all branch files from the head’s directory.

- It iterates through each branch file, printing its name to the console. The current branch is highlighted with arrow symbols to differentiate it from others. They are caught and logged to the console if errors occur during the process. Overall, this function provides a simple way for users to view all branches within the Statik repository from the command line.

Jump(cwd,branch_name)-It first verifies that the current directory is a Statik repository. Then, it reads the current branch and checks if the requested branch matches the current one. If so, it logs a message indicating that the branch has already been checked out.

It further checks for staged changes, if any, and prevents switching branches without committing them. If the requested branch doesn’t exist, it creates a new branch by copying the current head commit.

- If the requested branch exists, it fetches the head commit of that branch from IPFS and compares it with the current working directory’s files to determine any unstaged changes. If unstaged changes are found, it lists them and terminates the process. Next, it checks for overriding changes and handles them similarly.

- Finally, it switches to the requested branch, updates the working directory with the branch’s files, and updates the HEAD pointer accordingly. Any encountered errors are caught and logged.

Why contribute to the project?

Learning Opportunities: Contribute to a project that combines IPFS and version control, gaining knowledge in both domains.

Open Source Spirit: Delve deep into the open-source spirit, fostering transparency and allowing contributors to showcase their skills to a broader audience.

Impact: Make meaningful contributions that have the potential to revolutionize how version control is approached in a decentralized world.

New Technology: Work with the latest decentralized technologies and version control systems advancements.

Statik is an open-source project hosted on GitHub. You can contribute to the development of Statik by visiting our GitHub repo.

Statik represents a significant leap in version control, leveraging IPFS’s decentralized architecture for secure, efficient, and transparent versioning. By embracing Statik, developers can explore a new paradigm in collaboration and data management, shaping the future of decentralized version control.

This article is published as a part of the ‘Decoding the Architecture Series’ under Spider Research and Development Club, NIT Trichy.

This article was co-written by Lok Visnu and Gabriel, who are also contributors to the Spider Research and Development Club’s ‘Decoding the Architecture Series’ at NIT Trichy.

Check out our previous article that discusses the architecture of the Gym Pool Registration Portal: Gym and Pool Registration Portal Architecture

Statik: Revolutionizing Version Control with Decentralization was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

Welcome to the world of decentralised file storage systems!

Bharanichandraprabhu — Tue, 09 Apr 2024 15:32:31 GMT

Image source: Github

Imagine a world where files are stored and shared without the limitations of traditional client-server models. This world exists in the InterPlanetary File System (IPFS), a decentralized network where files are distributed across nodes, ensuring quick access and reliability.

In traditional client-server models, when files are stored on centralized servers, there’s a vulnerability known as single-point failure. If the server crashes or experiences issues, access to those files is completely cut off until the server is restored. It’s like putting all your eggs in one basket — if something happens to that basket, you’re out of luck.

However, with IPFS (InterPlanetary File System), things work differently. Instead of relying on one central server, files are distributed across multiple nodes on various systems. Each node stores a copy of the file, so if one node goes down or experiences issues, plenty of other nodes still have a copy. It’s like having backups of your files scattered across different locations — even if one area fails, you can still access your files from another location.

This decentralized approach makes the system more resilient to failure and enhances security. Since files are distributed across multiple nodes, it becomes much more challenging for malicious actors to tamper with or manipulate them. Additionally, because no central authority controls access to the files, IPFS resists censorship. This means that even in environments where access to certain information may be restricted, IPFS ensures that files remain accessible to those who need them.

The shift towards IPFS represents a significant advancement in storing and accessing files. It offers increased resilience, enhanced security, and greater freedom of access, addressing critical concerns in today’s digital landscape.

The internet's rise and increasing reliance on technology has highlighted the need for more open, transparent, and resilient systems. Decentralization, exemplified by protocols like IPFS, offers a solution to these challenges, paving the way for a more equitable distribution of wealth and opportunities in the digital realm.

Join us on a journey to explore the world of IPFS, where files are not just stored but are part of a distributed, interconnected web that transcends borders and central authorities.

Watch the recording of our workshop about IPFS here to learn more about the topic.

https://medium.com/media/b59aaa3d08f774eeafb3cfbaaf546bcb/href

This is a demo of adding files to IPFS and viewing them through the local gateway. Here, we upload a video, inspect its Content Identification Index(CID), and get the corresponding DAG (Directed Acyclic Graph) of the CID (data chunks) split up. You can view the video through the local gateway. If you try to click any of the leaves of the DAG, its corresponding CID’s DAG will be shown, and if you view that in a local gateway, you can probably see a small part of the complete video. This demonstrates how the data is split into chunks.

How Content Delivery Works in Centralised Networks

Let’s imagine you’re sitting at home browsing the internet and deciding to watch a popular YouTube video. When you click on the video link, your web browser sends a request to YouTube’s servers asking for the video data. In a location-based addressing system, this request is directed to a specific server based on your geographic location or network proximity.

For instance, if you’re in India, the request might be routed to a YouTube server located in a data centre in India. The internet infrastructure makes This routing decision, considering network latency and server availability. The YouTube server retrieves the video data and streams it back to your web browser. The video content travels through various network links and infrastructure through this process before reaching your device.

While location-based addressing optimizes content delivery by directing requests to nearby servers, it does have limitations. For instance, if the designated YouTube server in India experiences high traffic or technical issues, it could lead to delays or interruptions in video streaming. Additionally, as internet usage grows in India and more users access YouTube simultaneously, scalability becomes a concern as individual servers may struggle to handle the increasing load.

How It Works in IPFS

When you click on a YouTube video link, your web browser sends a request to retrieve the video data. In IPFS, instead of routing the request to a specific server based on your geographic location, the request is broadcast across the IPFS network. The video content isn’t stored on a centralized server. Instead, it’s broken down into smaller chunks called blocks, each assigned a unique cryptographic hash called a Content Identifier (CID). These blocks are distributed across multiple nodes in the IPFS network.

When your request reaches the IPFS network, nodes with the requested blocks respond by delivering them directly to your web browser. These blocks may come from multiple nodes across the network. Your browser then assembles the blocks to reconstruct the original video content.

Having seen how content delivery happens in centralized networks and IPFS, we will dive deep into how IPFS works, its features, etc…

Key features of IPFS

Content addressing with CID:

IPFS uses Content Identifiers (CIDs) to identify content uniquely. To explain the underlying concept of CID, let us look at this analogy. Think of the CID as a unique fingerprint for each book in the library. Just as each person has a unique fingerprint, each book has a unique CID that distinguishes it from others. This allows users to identify and retrieve specific files in the IPFS network quickly.

Visualisation of how CIDs are chained:

Below is the visualization made with the DAG visualizer available on the internet, which presents how the bytes of data are split into chunks and form a Directed Acyclic Graph. Each chunk has its own CID and the points the bytes of data allotted to that node.

In this example, we added a PPT file, showing that the DAG is big. It depends on the files’ complexity and the data input structure for a folder, Video, or Picture. The sizes and a very spread graph can be even more significant.

Peer to Peer network:

Imagine a network of post offices where each can send and receive mail directly to and from other post offices without needing a central sorting facility. This decentralized approach allows for efficient communication and sharing of information, similar to how IPFS nodes communicate directly with each other to share files.

Decentralized Storage

In IPFS, files are divided into smaller chunks, and each chunk is stored on multiple nodes in the network. This decentralized storage approach ensures that files remain accessible even if some nodes go offline, making the system more resilient to failures and censorship.

Here is an easy analogy to help you understand the concept better. Picture a community pantry storing food items in multiple households instead of a single central pantry. If one household runs out of a particular item, they can borrow it from another. Similarly, in IPFS, files are stored across multiple nodes, ensuring that even if some nodes go offline, the files remain accessible.

Versioning and History

IPFS supports versioning and maintains a history of changes to content. Users can access previous versions of files by referencing their CIDs, allowing for efficient tracking of changes over time.

Offline Access

IPFS allows users to access content offline by caching content locally. When a user requests a piece of content, IPFS checks its local cache before retrieving it from the network, reducing latency and improving accessibility.

Secure and Encrypted

IPFS provides security and privacy features by allowing users to encrypt their data before storing it on the network. This ensures that only authorized users can access the content and protects against unauthorized access and tampering.

Having explored the key features of IPFS, let’s now delve into its practical applications. From decentralized file sharing to innovative storage solutions, IPFS offers diverse use cases that showcase its transformative potential in various domains.

IPFS Use Cases

Decentralization in IPFS: IPFS revolutionizes file management through decentralized file sharing, storage, and content distribution. It enables direct file exchange, eliminating intermediaries and central servers for peer-to-peer connectivity. Data dispersal across interconnected nodes ensures redundancy, resilience, and security, empowering users with greater control over their data. IPFS optimizes content distribution, enhancing download speeds and minimizing latency, benefiting content creators and overcoming centralized platform limitations.

Decentralized applications (dApps): IPFS can be used to build decentralized applications (dApps) that run on a distributed network of nodes. This allows for greater security and privacy and increased reliability and scalability compared to traditional centralized applications.

Permanent web hosting: IPFS provides a permanent web hosting solution that allows users to host content indefinitely without the risk of the content being taken down or removed. This can be particularly useful for hosting content that is politically sensitive, controversial, or subject to censorship.

Filecoin: Filecoin integrates with IPFS by utilizing IPFS as its underlying storage layer. IPFS provides the decentralized and content-addressable storage infrastructure for Filecoin. When users store data on Filecoin, it is chunked, encrypted, and distributed across IPFS nodes in the network. Filecoin introduces a layer of economic incentives and market mechanisms on top of IPFS to incentivize storage providers and ensure the reliability and availability of stored data.

Having examined the applications of IPFS, it’s clear that its decentralized model offers unique solutions to various challenges. Now, let’s shift our focus to the advantages IPFS provides.

Advantages of IPFS:

Quick Access to Content: IPFS enables fast content retrieval by storing files on a distributed network, reducing latency compared to centralized systems.

Increased Data Availability: By distributing files across multiple nodes, IPFS ensures data availability even if some nodes are offline, enhancing system resilience.

Improved Security: IPFS enhances security using cryptographic hashes to identify and verify content, preventing tampering and ensuring secure file storage and sharing.

While the advantages of IPFS are compelling, understanding its comparative strengths and weaknesses against centralized storage systems is crucial for informed decision-making. Let’s delve into scenarios where IPFS shines and where it might fall short compared to traditional centralized storage solutions.

Image source: IPFS Docs

When is IPFS Better/Worse than Centralized Storage:

Better: IPFS is better than centralized storage for scenarios where data availability and resilience are critical. For example, in disaster-prone areas with unreliable internet connectivity, IPFS can ensure that files remain accessible even during outages.
Worse: IPFS may be worse than centralized storage in scenarios requiring strict access control and centralized management. For example, a traditional centralized storage solution may be more suitable in a corporate environment where files must be accessed and managed centrally.

In conclusion, IPFS represents a significant shift in internet content storage and access. It offers resilience to failures by distributing files across multiple nodes, akin to a library storing books in different branches. IPFS is also resistant to censorship and can scale efficiently by adding new nodes. IPFS presents a decentralized and resilient alternative to traditional centralized systems, promising a transformative impact on the Internet’s future.

We are building a decentralised version of the Control Tool called Statik on top of IPFS. To learn more about it, check out our article: Statik: Revolutionizing Version Control with Decentralization.

This article was co-written by Lok Visnu and Gabriel.

Welcome to the world of decentralised file storage systems! was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

Subset Sum DP -further optimizations

Ashwanth K — Mon, 18 Sep 2023 10:46:40 GMT

Prerequisites: DP, Standard subset sum dp, maths
Topic difficulty: Hard

It is recommended to have a good understanding of the subset sum dp problem before reading this blog.

Dynamic Programming - Subset Sum Problem

General Problem:
Given an array of size N, and a target_sum. You need to determine whether any subset of this array has its sum = target_sum.

Constraints:
1≤ N ≤ 2000
1 ≤ target_sum ≤ N
0 ≤ Array[i] ≤ N

The above problem is called the standard subset sum problem which is usually solved using an O(N²) dp approach like dp[index][target_sum].

However, a variation of this problem can be further optimized to O(N*sqrt(N)*logN) using some observations. This variant problem and its interesting solution is what we will see in our blog.

Consider a variant of this subset sum problem where you were asked to find whether the target_sum is achievable or not using some subset of elements.

But additional constraints:
1 ≤ N ≤ 2*10⁴
sum of all array elements ≤ 2*10⁴

In our variant problem, we are provided with additional information that my input array sum ≤ N. Can we use this to optimize our algorithm …..

Yes, we can use this additional piece of information to improve our complexity. Let’s see how can this be done.

Observation 1:
If a positive integer N is represented as a sum of positive integers, such a sum always contains at most O(sqrt(n)) distinct numbers.

The reason for this is that to construct a sum that contains a maximum number of distinct numbers, we should choose small numbers.
If we choose the numbers 1,2,…,k, the resulting sum is
k(k +1)/2 = N (total_sum).
Thus, the maximum amount of distinct numbers is k = O(sqrt(n))

Conclusion:
My input array always contains O(sqrt(N)) distinct numbers. This observation gives us some hope that we can somehow compress our array into smaller sizes and perform our standard dp algorithm to achieve a good complexity.

Observation 2:
If a positive integer is repeated K times in the input array, we can compress it into just log(K) elements while retaining all possible subset sum information.

Example:

Lets say I have number x repeated 10 times : {x,x,x,x,x,x,x,x,x,x}
10 = (1 + 2 + 4) + 3
Try to express 10 as the sum of continuous powers of 2.
Since 1+2+4+8 = 15 >10, we just express it as 1+2+4+(remaining) 3

{x,x,x,x,x,x,x,x,x,x} this array can generate subset sums: {x , 2x , … , 10x}
Even our compressed array: {x, 2x, 4x, 3x} can also express the same set of sums as above.

Similarly {x,x,x,… 15 times } can be compressed as {x,2x,4x,8x} where all my subset sums are retained.

Conclusion:
My input array has O(sqrt(N)) distinct elements. Each distinct element with frequency K can be compressed into log(K) elements. So the compressed version of my array contains only O(sqrt(N)log(N)) elements.

Running our standard dp on O(sqrt(N)log(N)) elements with target sum ≤ N takes O(Nsqrt(N)log(N)) time complexity.

Code snippet for compressing array:

https://medium.com/media/06902e346545c22e3a3e96e959f348ad/href

Problem for practice (Hard):

Problem - E1 - Codeforces

Summary:

In Competitive coding, It is always good to use up all information provided to you in the problem statements, From the above content we can see that a small additional information has improved the time complexity of our program.

This article is published as a part of the ‘Algo Reads’ under Spider Research and Development Club, NIT Trichy.

Resource:

CSES Book chapter 27

[Tutorial] Subset Sum Square Root Optimisation - Codeforces

Subset Sum DP -further optimizations was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

Contribution technique-I

Ashwanth K — Sun, 10 Sep 2023 05:54:34 GMT

Pre Requisites: trees basic dfs, calculating subtree sizes, basic math
Difficulty: Medium

Today, we will see a good technique in CP known as contribution sums. This technique can be used in many counting problems to make our computations faster.

The basic idea behind this technique is to identify the entities (basic elements) that constitute the final answer. We need an answer to this question: “What is my final answer made up of?” Then, we would iterate on each entity and find its own contribution to the final answer.

Solving a counting problem with the contribution technique might be faster than a normal brute force.

This post will specifically deal with the contribution technique on trees.

Let's start with a simple problem to understand this technique.

Q) Given a tree (with any number of children) of N vertices and N-1 edges rooted at (1). I have to find the sum of the edge lengths between all possible pairs of nodes in the given tree.

Sample example and explanation:

Brute-force:

Consider the above tree with 6 Nodes. There are 15 pairs of nodes that I can choose. I have to find the path length for each pair and add it to my final answer. Let's do a walkthrough on this brute-force approach.

The above image is self-explanatory. Basically, we have tried all possible combinations of choosing two nodes and iterating through its path. In the worst case, each path length can be O(N), and there are O(N²) such pairs of nodes.

So, our brute force solution will have an O(N³) complexity. The above solution may be optimized to O(N²) with some observations but still gives TLE for large N ≤ 10⁵.

Efficient solution: O(N) using contribution technique

Let's see how we can use the idea of contribution here. If we notice carefully that our final answer constitutes only the edges of our tree. (i.e.,) Our final answer is just a collection of edges taken many times. So, our basic entity of the final answer is my edge of the tree.

Let's iterate on each edge and find its contribution. Basically, I am fixing my edge say (u,v), and trying to find how many of the paths contain this (u,v) edge.

See the image below for a better understanding:

Example:
3–4 edge is contained in 5 paths: 4 →3, 4 →1, 4 →2, 4 → 5, 4 → 6.
So, the (3–4) edge contributes a value of 5 to my final answer. Similarly, this can be calculated for other edges too.

Now, a question arises: “How do we efficiently compute occurrences of each edge? ”

Let's do some maths and observations here:

Let's try to find the contribution of (1–3) edge

Any node in the green region paired with any node in the orange region will have (1–3) edge passing by. So, the green region contains three nodes, and my orange region (outside green) also contains three nodes. So 3 x 3 = 9 pairs (u,v) contains this edge passing by.

The green region contains basically subtree_size[u] nodes. My orange region contains (N - subtree_size[u]) nodes.

So the contribution of edge (u, parent[u]) is basically :

So, just summing up the above equation for all “u” gives my answer.

subtreeSize[u] for all nodes can be calculated with a simple dfs on the tree and summing values taking an overall O(N) complexity.

https://medium.com/media/7f96f330834361b563f0449dae4f0666/href

Summary:

Thinking in terms of contributions can lead to a better solution with good time complexity. This technique is more common in competitive coding. We just need to figure out my basic elements and how they affect my final answer. I hope you all have some basic ideas on contributions; for more interesting problems, you can refer to the resource below.

This article is published as a part of the ‘Algo Reads’ under Spider Research and Development Club, NIT Trichy.

Resource:

Sums and Expected Value - part 1 - Codeforces

Contribution technique-I was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

Matrix Exponentiation

Ashwanth K — Fri, 25 Aug 2023 14:24:17 GMT

“A nice technique you will appreciate once known.”

Pre Requisites: Binary Exponentiation, Linear Recurrence relations, Matrices, Fibonacci Numbers
Topic Difficulty: Medium

In the previous post, we have seen that we can calculate the powers of integers in logarithm complexity. Are we only restricting this idea of binary exponentiation only to integers …… Can we go beyond this and extend this same idea to matrices?

It's a Yes. As the topic suggests, we can also calculate the powers of a matrix in logarithm complexity. But to note: If the matrix is of size K x K, then the multiplication of matrices costs O(k³) complexity.

Then raising the power of the matrix costs an overall complexity of
O(k³ logN), where K x K is the size of the matrix, and N is the power to be raised. (Refer to prev blog if you are not clear about this complexity)

But why are we so concerned about matrices? What problems does this technique actually solve? We shall see the answers to these questions soon below…

Note: We are dealing with only square matrices here of size K x K

Code snippet on matrix exponentiation:

For simplicity, all matrices are 2x2 in the below code, but they can be easily generalized to any K x K matrix.

https://medium.com/media/fdd142f66269dc029876ab72f8b15647/href

I hope that the above snippet is clear. If not, compare this snippet's pow function with the pow function of the previous blog.

Let's deal with a problem now….

Find the Nth Fibonacci number in log(N) time complexity.

Yes, you heard it right. Till now, we would have only seen O(N) as an easy solution for finding the Nth Fibonacci number.

But the truth is that there exists an O(logN) solution that solves the same problem of just finding the Nth Fibonacci number.

Try to think intuitively about what happens at O(N) algorithm. We are computing all 1st, 2nd to …. N th Fibonacci numbers. But my problem requires only the Nth Fibonacci Number. We are doing unnecessary computation here. We can intuitively think there can be a better solution to directly finding the required Fibonacci number without computing all previous values.

And Yes, matrix exponentiation comes to the rescue.

Now let's see how my Fibonacci number problem can be converted to matrices and solved using this technique.

Consider f(n) = Nth fibonacci number. Look at below matrix equation carefully.

This matrix equation can be easily verified.
Consider RHS:
1 * f(n-1) + 1 * f(n-2) = f(n-1) + f(n-2) = f(n) by definition {1st value}
Similarly f(n-1) = f(n-1) {2nd value}

On going further, we can see that …

You would have got an idea of what we are doing here. Going on further, we can see that….

Here my base cases are f(1) = 1 and f(0) = 0

It's done…. We have somehow converted our problem into a problem of matrix exponentiation. We need to raise the {{11}{10}} base matrix to power N-1, which can be solved with matrix expo in complexity O(k³ logN).
{in this example k = 2}. O(8logN) will run faster.

https://medium.com/media/ee6a1c9dd65aa21162481d215b983c14/href

NOTE:

Fibonacci numbers are even larger for large N, which does not fit in integer datatype. So generally, problems are asked in this technique based on printing answers in modulo P. Also, use LONG LONG.

SUMMARY:
It is to be shown that any linear recurrence of form
f(n) = c1*f(n-1) + c2*f(n-2) + …..+ ck*f(n-k) where c1,..,ck are integers.
It can be solved with matrix expo taking k x k base matrix size.

Even popular linear-dp problems like dice rolling and Staircase Dp have a matrix expo solution running in O(logN) complexity.

Coming up with a correct base matrix is sometimes harder, but it can be easily framed by considering the recurrence relation.

This article is published as a part of the ‘Algo Reads’ under Spider Research and Development Club, NIT Trichy.

Resource: Usaco Guide

Matrix Exponentiation was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

Binary Exponentiation

Ashwanth K — Fri, 18 Aug 2023 03:19:57 GMT

Topic Difficulty: EASY
Pre Requisites: Basic Math, general idea on time complexity.

Can we compute powers faster in O(logN) complexity? This topic uses divide and conquer; let's see how we can achieve this complexity …

Let's take this problem — Count good number. Try to spend some time on this problem before proceeding with this article.

This problem somehow boils down to finding expressions like pow(A, B) where A and B are integers.

Here, I am given two numbers, A and B.

My Aim: Compute pow(A, B) efficiently.

Naive Approach: Just brute force, initialize ans = 1, and perform multiplication with A for B times.

2. Divide and Conquer Style: The main idea is to split the work in half and try to find the results. It can be observed that ….

Analyzing the above relation shows that my power factor gets reduced by half every time. Assuming that each multiplication operation takes O(1) time, our overall complexity becomes O(log(N)), where N is the power to be raised.

Code: (Recursive approach)

We shall see a small example to show how the above code works. Let's say I want to compute pow(3,14)

pow(3,14) = pow(3,7) * pow(3,7)
pow(3,7) = pow(3,3) * pow(3,3) * 3
pow(3,3) = pow(3,1) * pow(3,1) * 3
pow(3,1) = pow(3,0) * pow(3,0) * 3

pow(3,0) = 1
pow(3,1) = 1 * 1 * 3 = 3
pow(3,3) = 3 * 3 * 3 = 27
pow(3,7) = 27 * 27 * 3 = 2187
pow(3,14) = 2187 * 2187 = 4782969

Even an iterative approach exists, which exploits the bitwise representation of my exponent number. Let's take a look at this too.

Main Logic: break down my exponent number into powers of 2 and combine the results.

Since 14 = 1110 in base 2, 14 = 8 + 4 + 2.
We can see that: 3¹⁴ = (3⁸)*(3⁴)*(3²)
Here we are computing values 3¹, 3², 3⁴,3⁸, and exclude 3¹.

Since the number N can have only log(N)+1 bits at most, we can see that we do O(logN) multiplications if we know the powers a¹,a²,a⁴,a⁸,a¹⁶,… so on.

Luckily these numbers {a¹,a²,a⁴,a⁸,a¹⁶,…} can be easily computed by just squaring the previous number.

Summary:

We can compute power(A, N) in log(N).

Though this may look too easy, this technique has many applications.
In the next blog, we will see about a more powerful technique in CP called Matrix Exponentiation, which has binary exponentiation as its prerequisite.

This article is published as a part of the ‘Algo Reads’ under Spider Research and Development Club, NIT Trichy.

Resource: cp-algorithms

Binary Exponentiation was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

Gym and Pool Registration Portal Architecture

Naveen Nair — Fri, 11 Aug 2023 07:47:42 GMT

NIT Trichy has taken a significant leap towards enhancing user experience with its state-of-the-art online gym and swimming pool registration portal, meticulously crafted by the Spider R&D club. The days of enduring long queues have been replaced by the convenience of a few simple clicks. Remarkably, every available slot now gets snatched up within a mere minute of opening. This article will delve into the underlying architecture and share the insights gained from our experimental journey to handle burstable loads with thousands of concurrent requests.

The Registration Portal’s Homepage

UNDERSTANDING THE REQUIREMENTS AND THE PROBLEM

A website that allows students to log in and register for the gym and swimming pool facilities of NIT Trichy. At first glance, this application is a simple registration portal where the users can log in and click on register, and it’s done. Still, we need to consider the fact that there are 300 different slots and at least a thousand students applying for the slots at a particular instant. It would be easier if we had scaled the server vertically. But we took the challenge to build an efficient system with average specs and tried to optimize the design. Additionally, we wanted the application to be real-time.

Breaking down the underlying architecture

After identifying two issues in our system — the possibility of overbooking slots for students and a lack of clarity regarding the number of available free slots during registration — we set out to find a solution. Our research led us to discover gRPC, a technology that enables us to display real-time free slots to users while they are registering. This way, students can clearly understand the available slots and avoid any potential overbooking problems.

Think of gRPC as a way for different computer programs to talk to each other. It’s like a common language they can use to communicate and exchange information. There are four types of streaming in gRPC.

1. Unary — send a single message and get a single message back

2. Server streaming — continuous flow of messages from the server

3. Client streaming — continuous flow of messages from the client

4. Bi-directional streaming — continuous back-and-forth communication

Four types of streaming in gRPC

For our registration purpose, we need the client to send one message requesting details about the availability of slots, and the server needs to send multiple responses in a particular time interval. So we used server streaming.

Also, this process leads to many reads from the database, which might result in overloading or similar situations. To overcome this, we used Redis, an in-memory data store. It provides better throughput and less latency. Redis improves performance, reduces the load on backend systems, and provides a better user experience by serving data quickly and efficiently. Additionally, we have used the publisher-subscriber concept of Redis to listen to the slot allocation and the registration status internally and trigger the streaming, which provides a real-time experience for the end users.

We also used a MySQL instance to store other information, such as information about different slots, allocation, etc. We chose MySQL because it is well known for its high performance and ease of use.

Although we use gRPC for server-side streaming, an issue was that gRPC works on a different version of the internet protocol (http/2) than our NextJS frontend, which operates on http/1. To overcome this problem, we implemented a gRPC wrapper. This wrapper acts as a bridge, converting the requests and responses between http/1 and http/2, ensuring smooth communication between the server and the front end.

HTTP/2 vs HTTP/1

One crucial aspect of gRPC that we leverage in our project is its efficient server-to-server connection. This feature is especially beneficial during the authentication process. When our clients authenticate through LCA (Lynx Central Auth), we must make server-to-server requests to verify their identity. To protect against spam request attacks, we use reCaptcha. We have also used DDoS-deflate, an open-source tool, to temporarily block IP addresses suspected of performing a Denial of Service attack.

Moreover, gRPC truly shines in handling heavy server loads efficiently. It allows us to manage the authentication process smoothly, even under high demand. Additionally, gRPC’s streaming feature comes in handy for our project. It allows us to set up real-time communication between different parts of our system. For example, we can update the front end with the latest information about available slots during registration. This ensures that clients have accurate and up-to-date details while making their selections. By utilizing gRPC effectively, we address various challenges in our project, such as ensuring secure authentication, managing server loads, and providing real-time updates to our users.

Gym Registration Frontend

For the front end, we use NextJS, a framework built on top of ReactJS, known for its high-speed and server-side rendering capabilities. This means the web pages load quickly, and some processing happens on the server side, providing a smoother user experience.

On the other hand, we have opted for Golang (Go) as our programming language for the backend owing to its speed and concurrency capabilities. gRPC uses the Protocol buffer data format. This data interchange format makes the system more efficient, making the whole process faster.

The backend has two services: the main service and the streaming service. The main service deals with the allocation and user profiles, whereas the streaming service is responsible for the real-time nature of the application. We have containerized every application for better portability and scaling.

By combining NextJS on the front end and Golang on the back end, we ensure that our application performs exceptionally well, delivering information to users swiftly and handling requests optimally. The result is a seamless and responsive system that caters to our users’ needs effectively.

The Overall Architecture

By now, we have gone through the different tech stacks used, the various challenges faced while building the project, and how they have been handled, but one crucial edge case still pertains, what if there are less number of seats remaining but more clients registering for a slot at the same time? How do we allocate seats up until a particular user?

This is famously known as the race condition. If only one seat remains, but two people are registering for that seat simultaneously, we use the unique key property of the Mysql DB to handle this by rejecting the clients after the number of seats is zero.

We’ve created an easy-to-use admin panel to ensure a smooth and well-organized experience for our clients. This panel gives our administrators the tools to create slots and manage the registration process independently and effortlessly.

RESULTS AND STATS

We tested the server simulating 2000 users sending requests concurrently, and we were happy to find that the server performed exceptionally well, considering it was a relatively low-specification machine. Despite its modest hardware configuration, the server demonstrated impressive response times and handled the burstable load without any noticeable performance degradation.

A graph representing the time taken to get the responses

At the end of the registration process, we received positive feedback from both users and administrators, indicating the success of this project. Indeed, this project has scope for improvement. So stay tuned for more engineering!

This article is published as a part of the ‘Decoding the Architecture Series’ under Spider Research and Development Club, NIT Trichy.

Gym and Pool Registration Portal Architecture was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.

Surviving the Unthinkable ft. Blockchain

Rutujeet Suryawanshi — Wed, 12 Apr 2023 12:35:09 GMT

Human nature doesn’t change. That struggle existed a hundred thousand years ago. That struggle exists today. What does change is technology. A hundred thousand years ago, an enraged human being could maybe kill 10 other people. And today, an enraged human being can kill a hundred million people. So as technology advances, our thoughts about ethics and morality and technology and civility have to evolve.

Micheal Saylor

Experts in every field have spent a lot of time thinking about the potential implications of global catastrophes like nuclear war. It’s not a pleasant topic, but it’s a necessary one. The fact is, the risk of nuclear war is always present, and we need to consider how we can best prepare ourselves and our communities for the aftermath.

In recent years, we’ve seen the power of blockchain to provide secure and decentralized storage of information. But what if we could take this one step further? What if we could use blockchain as a tool for survival in the face of a nuclear war? This blog will explore the potential applications of blockchain technology in a post-apocalyptic world and how it could help us rebuild and thrive. It’s time to start thinking about the unthinkable and considering the role of blockchain in our future survival.

In a blockchain network, data is stored on a decentralized network of nodes rather than on a centralized server. Each node in the network stores a copy of the blockchain, which is a ledger of all transactions that have taken place on the network. This means that if one node goes down, the data on the network is still accessible from other nodes.

Example: Map shows the concentration of reachable Bitcoin nodes found in countries around the world

For example, storing identity information, financial records, and medical records on the blockchain network would be like having a plant that has spread its seeds globally. As long as a single seed (a node) is still around, the information and assets can be completely regrown. Smart contracts can be used to enforce strict access controls and permissions, ensuring that only authorized parties can view or modify the information stored on the blockchain.

In addition to providing secure storage of information, blockchain technology could also serve as a tool for trade and commerce in a post-apocalyptic world. After the nuclear blast (if you survive), are you going to wait for the government to reboot the banking system? Will you trust them to honour your previous account balances (lol)? After a global calamity, anyone with blockchain-based cryptocurrency will be transacting with each other far before legacy financial systems are rebooted.

Humanity will do everything that it can to rebuild the internet after a global calamity. When the electricity comes back on and some form of global communication resumes — Blockchain network will return.

But what if a global nuclear war completely wipes out humanity? Well then,

That's a galactic shame.

Surviving the Unthinkable ft. Blockchain was originally published in Spider R&D on Medium, where people are continuing the conversation by highlighting and responding to this story.