Effective Strategies for Handling Database with Millions of Records and Mass User Access

Untara Vivi Chahya
3 min readApr 26, 2024

As an application runs over time, its database naturally expands due to the need to store historical activity. With a growing volume of data, there’s also an increasing number of concurrent users accessing it. This leads to a rise in server operational demands, impacting both memory and processor resources. Continuously escalating server loads necessitate scaling up server numbers, consequently driving up operational costs.

Various strategies can be employed to address the escalating server loads. These include implementing indexing on frequently used columns for filtering and utilizing memory caching.

Consider a scenario involving an employee attendance application. It continuously interacts with two main tables: attendance and employee records. The attendance table steadily accumulates data daily, accelerating with a larger employee dataset. Challenges arise when this dataset reaches million records. Server loads spike notably during peak attendance periods but remain idle during lulls in attendance activity. Simply increasing server capacity, be it processors or RAM, isn’t a prudent solution.

The primary database query process in this attendance system involves fetching employee data for authentication, recording attendance, validating entries in the attendance table, and appending new records.

To alleviate server loads, we’re exploring data migration strategies. Essentially, we periodically transfer data from the main attendance table to a separate historical attendance table. This reduces the data volume in the main table, thereby significantly boosting read speeds.

This migration process is automated using available scheduling tools like crontab. Executing migrations during low server load periods, typically during late-night hours, ensures minimal disruption to system performance.

In this scenario, we’re utilizing PHP with the Laravel framework and MySQL as our database backend. To manage approximately 15,000 attendance records, we’re operating on a robust setup with a 12-core processor and 16 GB of RAM, which is quite substantial. Here’s an outline of the steps we’re taking:

Determine the retention period for attendance records in the primary table. This decision is driven by specific needs. In our case, we retain records in the primary table for two months to facilitate payroll calculations.

function DatabaseMigration(integer $limit = 10): void
{
// Decide the interval
$date = date('Y-m', strtotime(date('Y-m-d') . ' -2 months'));

// Get data more than 2 months ago
$attendances = Attendance::whereDate('created_at', '<', $date . '-01')
->limit($limit)
->get();

// Itterate to migrate data. Ensure all data are migrated.
foreach ($attendances as $attendance) {
if (!HistorycalAttendance::find($attendance->id)) {
try {
$create = HistorycalAttendance::create($attendance->toArray());
if ($create) {
$attendance->delete();
}
} catch (\Exception $exception) {
// Log when error happen
Log::error('migration failed');
}
}
}
}

Schedule the crontab function to run during off-peak hours, typically in the middle of the night.

protected function schedule(Schedule $schedule)
{
$schedule->call(function () {
if(date('H') < 5){
// Call function here

}
})->everyMinute();
}

Implement a pivoting mechanism in our reporting system. Often, we require historical attendance data for reports. When querying data older than two months, the system automatically accesses the historical attendance table. In Laravel, we utilize a multi-Model approach to seamlessly switch between these tables.

function report(Request $request){
// Get month differences from query
$date_query = new \DateTime($request->date);
$date_now = new \DateTime(date('Y-m-d');

$difference_time = $date_query->diff($date_now);
$difference_month = ($difference_time->format('%y') * 12) + (int)$difference_time->format('%m');

// Switch Model
if($difference_month > 2){
$attendances = Attendance::query()->get();
} else {
$attendances = HistoricalAttendance::query()->get();
}

return $attendances;
}

By adopting these strategies, we aim to significantly reduce server costs while maintaining optimal performance.

--

--