Jan 24, 2025

Further speed-up with continual learning

Perpetual Learning, a novel approach from our company, offers a significant leap forward in machine learning efficiency. It achieves 100x faster initial training compared to traditional methods and enables continual learning with remarkable speed improvements. This blog post dives into the mechanics of Perpetual Learning, demonstrating how it empowers continuous learning while upholding exceptional accuracy.

Employing the California Housing dataset, we illustrate the potential of continual learning. The dataset is segmented into 24 batches. The initial training utilizes the first 12 batches, followed by incremental learning on the subsequent 11 batches. After each incremental step, the model's performance is evaluated on the next batch. To benchmark the efficacy of continual learning, we also train separate models from scratch at each step.

Table 1: Continual Learning vs. Re-trainig
Batch #	Batch Size	Cumulative Data Size	Continual CPU Time	Retraining CPU Time	Continual Cumulative CPU Time	Retraining Cumulative CPU Time	Speed-up
0	10320	10320	28.6	30.2	28.6	30.2	1.0x
1	860	11180	22.3	31.3	50.9	61.5	1.2x
2	860	12040	17.6	37.2	68.5	98.7	1.4x
3	860	12900	11.5	33.9	80.0	132.6	1.7x
4	860	13760	10.4	36.6	90.4	169.2	1.9x
5	860	14620	11.1	41.2	101.5	210.4	2.1x
6	860	15480	12.5	40.6	114.0	251.0	2.2x
7	860	16340	14.2	46.9	128.2	297.9	2.3x
8	860	17200	19.6	49.1	147.8	347.0	2.3x
9	860	18060	13.3	57.0	161.1	404.0	2.5x
10	860	18920	17.4	49.4	178.5	453.4	2.5x
11	860	19780	19.6	60.0	198.1	513.4	2.6x

The findings reveal that continual learning progressively accelerates learning with each incremental step, maintaining a consistent average test mean squared error (MSE) of 0.197. This translates to a theoretical speed-up that approaches infinity as the number of incremental steps increases, as visualized in the accompanying charts.

The following chart demonstrates the CPU time required at each step. Perpetual Learning exhibits near-constant CPU time due with the constant batch size. Conversely, retraining incurs a linear increase in CPU time as the data size grows proportionally.

The following chart showcases the cumulative CPU time across all steps. Perpetual Learning exhibits a linear growth in cumulative CPU time, while retraining incurs a quadratic increase. This highlights the theoretical infinite speed-up achieved by Perpetual Learning as the number of batches approaches infinity.

These compelling results establish Perpetual Learning's ability to not only deliver 100x faster initial training but also to facilitate learning from data in linear time, irrespective of data size and scale.