Jan 24, 2024

Further speed-up with continual learning

Perpetual Learning, a novel approach from our company, offers a significant leap forward in machine learning efficiency. It achieves 100x faster initial training compared to traditional methods and enables continual learning with remarkable speed improvements. This blog post dives into the mechanics of Perpetual Learning, demonstrating how it empowers continuous learning while upholding exceptional accuracy.

Employing the California Housing dataset, we illustrate the potential of continual learning. The dataset is segmented into 24 batches. The initial training utilizes the first 12 batches, followed by incremental learning on the subsequent 11 batches. After each incremental step, the model's performance is evaluated on the next batch. To benchmark the efficacy of continual learning, we also train separate models from scratch at each step.

Table 1: Continual Learning vs. Re-trainig
Batch # Batch Size Cumulative Data Size Continual CPU Time
Retraining
CPU
Time
Continual Cumulative CPU Time Retraining Cumulative CPU Time Speed-up
0 10320 10320 28.6 30.2 28.6 30.2 1.0x
1 860 11180 22.3 31.3 50.9 61.5 1.2x
2 860 12040 17.6 37.2 68.5 98.7 1.4x
3 860 12900 11.5 33.9 80.0 132.6 1.7x
4 860 13760 10.4 36.6 90.4 169.2 1.9x
5 860 14620 11.1 41.2 101.5 210.4 2.1x
6 860 15480 12.5 40.6 114.0 251.0 2.2x
7 860 16340 14.2 46.9 128.2 297.9 2.3x
8 860 17200 19.6 49.1 147.8 347.0 2.3x
9 860 18060 13.3 57.0 161.1 404.0 2.5x
10 860 18920 17.4 49.4 178.5 453.4 2.5x
11 860 19780 19.6 60.0 198.1 513.4 2.6x

The findings reveal that continual learning progressively accelerates learning with each incremental step, maintaining a consistent average test mean squared error (MSE) of 0.197. This translates to a theoretical speed-up that approaches infinity as the number of incremental steps increases, as visualized in the accompanying charts.

The following chart demonstrates the CPU time required at each step. Perpetual Learning exhibits near-constant CPU time due with the constant batch size. Conversely, retraining incurs a linear increase in CPU time as the data size grows proportionally.

The following chart showcases the cumulative CPU time across all steps. Perpetual Learning exhibits a linear growth in cumulative CPU time, while retraining incurs a quadratic increase. This highlights the theoretical infinite speed-up achieved by Perpetual Learning as the number of batches approaches infinity.

These compelling results establish Perpetual Learning's ability to not only deliver 100x faster initial training but also to facilitate learning from data in linear time, irrespective of data size and scale.

/+