Database Lookup Services & Magical Girls
Many of you may be familiar with LeakedSource, LeakBase, Snusbase and many others. LeakedSource for the longest time was the golden standard for database lookup services and I wanted to beat it. Unfortunately at this time they were no longer up so I couldn’t get any timings on how long lookups take. So I made LeakBase the golden standard for my project.
These stats are provided by PublicDBHost / cuck @ greysec. Snusbase seemed to offer the fastest. He is doing what I do with a different DBMS and much much larger server. So with my small 16GB RAM and 8core server we’re going to create Leakbase. I have about 500 million records in it currently and serve it entirely for free at about 300MS at most for a no-limit record search. I also do not currently have wildcard due to some issue I didn’t care enough to finish. But lets gets into the project of magical girls
.
Below you can see a 188MS lookup, this is faster than all the results above but however I only have 500M records I can see it being around 200–250MS with 2–3.2B.
DBMS — Database Management Systems
Selection
NoSQL by nature is built to scale and personally I find MySQL, MariaDB and whatever other fork disgusting so I went with MongoDB. They also provide an extremely versatile indexing. Documentation available here.
Structure
I decided against putting every DB into it’s own collection because of the nightmare of indexing it all. Also because it’s harder to find()
and the only advantage I gain would be removing a database from my results instantly instead of having to find the records first. So all records are in the collection data
.
Web App
I’m not sharing my source for this because this is beyond simple and I don’t want to spoon feed hard. But I use the PyMongo
library and authenticate and proceed to take the content
GET
parameter. With that information I do:
data.find({“$query”: {lookup[‘type’]: lookup[‘data’]}, “$maxTimeMS”: 10000}).collation(Collation(locale=’en’, strength=1))
Displaying Results
I generate basic tables using Python to display through Flask / Jinja2
This can done better but I just hacked it together.
Querying and result speed
There is a good chance you’re here wondering how I got results to be as fast as I did, this is simple and done with indexes. I researched fulltext indexes and normal indexes for awhile to figure out what I finally needed and I needed a fulltext index for the username, email and just a normal index for IPs.
I ended up with invalid UTF-8 in my records some how and I didn’t want to reimport 500m records so I just went with collation indexes for the username and email.
db.data.createIndex({“username”: 1}, {collation: {locale: “en”, strength: 1}})
db.data.createIndex({“email”: 1}, {collation: {locale: “en”, strength: 1}})
db.data.createIndex({“ip”: 1})
Conclusion
TL;dr use indexes and this is no where near a problem for people that know how DBMS even kind of work. Thanks for reading!