How I made hpskloud: My very own Storage Bucket

9 min readNov 25, 2023

A lot of you guys probably use storage buckets on a day-to-day basis, whether that be Google Drive, DropBox, or — for some developers out there — Object Storage provided by Cloudflare or elsewhere. But have you ever thought about making your own? It may sound like an absurd idea, however I had this idea pop into my head and out came hpskloud.

Why I decided to make one

My original intent for making hpskloud was that it would just be a project like any other, and I would just make it for fun. However, there are some real benefits to making your own storage. For example, you can:

Control how the UI behaves
Control authentication (who can read / write)
Make it public to the outside world
Add your own backend to control how objects get processed

And probably more that I haven’t listed on here. For the average user, these benefits may seem trivial, however I love the fact that I can customize almost anything about what I build, I mean why do you think I made my own MongoDB instance? So, I started working.

Information about hpskloud

Hpskloud is a storage bucket that was completely made by me, and I use it primarily for my biggest website, SongFileHub. If you want to see the site itself, visit https://storage.hpsk.me or, if you want to make your own version, clone my GitHub Repository and create a .env.local file with these keys:

bucket=[path to your bucket]
keyPath=[path to MongoDB X509 certificate. If you use SCRAM auth, change the code accordingly]
jwtToken=[your jwt token]
MONGODB_URI=[your MongoDB URI, no extra parameters required]
NEXT_PUBLIC_URL=[base URL to your storage bucket]

How did I make hpskloud?

Now that I gave all the information about what I have created, you might be asking yourself: how did he make this? While I can’t fully answer that question, I can talk about my design thought process and how I made a finished product.

First things first, I needed a decent framework to work with so that I could make a relatively interactive site without much work required on my part. My personal goto is Next.js + Bootstrap, as it is very easy to start off. I didn’t use any custom express server or anything, just Next.js serverless functions, and I used the Pages router. I started off by creating a homepage of some sort, which looked something like this:

As you can see on the top left, my original idea was actually a CDN, but then I realized that what I was building was more than just that later on.

After I created a homepage, I added functionality to upload files from the client to my backend server. At first, I would upload my files in one fell swoop, which became a problem if I had really big files because your browser is not capable of uploading 100s of megabytes worth of data to a server.

Uploading very large files from my client to the server

To solve this issue, I created a cryptographic hash on the server which would then be sent to the client, and the client can input that via the x-secret-token header.

let key = crypto.generateKeySync("hmac", { length: 128 }).export().toString("hex")
let cryptoKey = await bcrypt.hash(key, 10)
await transactions.updateOne({ path: specifiedPath.path + (alreadyCreated ? alreadyCreated.url : "/" + url) }, {
    $set: {
        cryptoKey
    }
}, { upsert: true })
return res.status(201).send({ key })

The above code generates a cryptographic key using the HMAC algorithm, encrypts it with bcrypt, inputs that into the database, and then sends that key to the client.

let res = await fetch(`/api/bucket/file/[folderHash]?name=[fileName]`, {
    method: "POST",
    headers: {
       "Content-Type": "application/json",
       "X-secret-token": key
    },
    body: JSON.stringify(array)
})

The above code here displays how you would upload data to the server using the key. The array variable represents 8MB array chunks, comprised of Uint8Bits. If you wanted to end a transaction, your body would just be the word “END”, and if you wanted to cancel and completely revert that transaction, you would just input the word “CANCEL”.

if(req.body == "END") {
    await transactions.deleteOne({path: alreadyCreated.path})
    return res.status(201).send({hash: alreadyCreated.url.split("/")[1]})
}
if(req.body == "CANCEL") {
    await transactions.deleteOne({path: alreadyCreated.path})
    await mappings.deleteOne({path: alreadyCreated.path})
    await fs.rm(bucket as string + alreadyCreated.path).catch((e) => {
        console.log(e)
        return res.status(400).send({ error: "400 BAD REQUEST", message: "This group does not exists!" })
    })
    return res.status(204).send(null)
}

This method turned out to work pretty well actually, and I was able to upload a file of virtually any size I wanted.

After I crossed this hurdle, I had no problems making all the other editing features, and I also created a settings page specifically for the root user, who could add / remove permissions from certain individuals.

But then I came across another problem: how would I stream very large files from my bucket to the client? If I sent too big of a file, it would crash my server because too much memory would be used up.

Streaming very large files from my server to the client

So how did I go about fixing this? Originally, I was thinking about using chunks, just like I did for uploading, and came up with a Readable Stream approach like so:

const file = await fs.readFile(bucket as string + "/" + (req.query.path as string[]).join("/"))
const stream = new Readable()
stream._read = () => { }
for (let i = 0; i < file.length; i += 8000000) {
    stream.push(file.subarray(i, i + 8000000))
}
await new Promise((resolve, reject) => {
    stream.pipe(res)
    stream.on("end", resolve)
})
return res.end()

However this didn’t really end up working, as I still had to read the entire file prior to piping it, which took up a lot of memory. Eventually, I did find a native way to do this, via fs.createReadStream():

const file = createReadStream(bucket as string + specifiedPath.path, { highWaterMark: 1024 * 1024 })
file.on("data", (chunk) => res.write(chunk))
file.on("end", () => res.end())

This would read incrementally unlike the method I had before, so it took up very little memory, and I could easily stream files that were 500MB+. I personally found the ideal chunk size to be 1MB.

Even though this method worked pretty well, I found out quickly that for very large files especially, I can’t just pipe the entire file to the browser, as it would take too long to load / skip through. Rather, I would need to somehow send chunks to the browser. Luckily, there is a header called Range which is the information that I need. This header is usually sent from your browser, and tells the server what portion of the file it needs. Using this header, I created the following:

function iOS() {
    return [
      'iPad',
      'iPhone',
      'iPod'
    ].find(e => (req.headers["user-agent"] || "").includes(e)) || /^((?!chrome|android).)*safari/i.test(req.headers['user-agent'] || "");
  }
let str = "." + specifiedPath.name.split(".").at(-1)?.toLowerCase() || "bin"
let c_size = 1000000
let start = iOS() ? 0 : parseInt(req.headers.range?.split("=")?.[1] || "0")
let end = stat.size
if(req.headers.range && !iOS()) {
    end = c_size + start > stat.size ? stat.size : c_size + start
    res.setHeader("Content-Range", `bytes ${start}-${end-1}/${stat.size}`)
} else {
    res.setHeader("Content-Disposition", `${req.query.download ? "attachment" : "inline"}; filename="${req.query.name ? req.query.name + "." + specifiedPath.name.split(".").at(-1) : specifiedPath.name}"`)
}
res.writeHead(req.headers.range && !iOS() ? 206 : 200, {
    'content-length': end - start,
    'accept-ranges': 'bytes',
    'content-type': (types as any)[str] || "application/octet-stream"
})
createReadStream(bucket as string + specifiedPath.path, {highWaterMark: c_size, start, end}).pipe(res)

I first check if the browser is an iOS device or Safari, as both of those don’t handle partial chunks very well. If they are, I would use the same old method as before, as it works great with safari / iOS. If it isn’t, I would parse the range sent from a browser like chrome as the start value, and the end value would be the start value + 1MB. I would then send a 206 response, stating that this was indeed a partial chunk. Using this, I could send 1MB chunks to the browser, and it would fetch chunks automatically if needed.

This method of sending to the browser worked really well, andI finally thought that my cdn was fully functional and ready for personal use, and so I did a redesign and called it a day:

But then I realized something: I couldn’t create any directory name I wanted. This might not sound like a big issue, as what directory names could you possibly need right? But the thing is, when using a mounted drive (I was using a mounted DigitalOcean spaces bucket), you couldn’t add special characters AT ALL, including things like the % sign. I knew this had to be changed, however I also knew that it would require changing my entire backend & frontend.

Mapping objects to files on a hard drive

When it comes to mapping URLs (or hashes) to actual files on a hard drive, you might think the best way is to create object names in the bucket like so:

And then in the database, you can map them to hashes. However, this method really fails if you have special object names that a computer simply just cannot store. So the solution? Store the hashes as file names and store the actual file name in the database. As you can see in the 2nd photo, I also store the field “virtualPath”, which displays what the actual path would be in the file system.

files stored on mounted drive

This method also makes the whole file system more secure, as reading anything would require either visiting my website with a certain hash, or mapping the hashes yourself — much harder than just reading what’s in plain text I’d say. I can’t exactly show everything I did to reformat my site into this new system, as this blog would probably be too long, so I’ll leave a link to the commit here.

Conclusion

After I changed my entire backend, the site was pretty much ready for full production, including for my personal use. Of course, there will always be bugs that I need to fix, but it does work great as of right now.

Overall, I learned a lot from this experience, including:

How to host a mounted drive
How to use read streams effectively
How to map files effectively
How to create transactions with the crypto library

And more stuff that I can’t even name. It was a very fun experience overall, and I hope you guys learned something after reading this blog. Have a good day, and thanks for reading!

Information about me

Hello everyone, my name is Saarang, and I am a 14-year-old web developer from California. I love anything that has to do with web development, and if you guys would like to contact me, you can find my socials on my website https://hpsk.me.