Further speed-up with continual learning
Perpetual Learning, a novel approach from our company, offers a significant leap forward in machine learning efficiency. It achieves 100x faster initial training compared to traditional methods and enables continual learning with remarkable speed improvements. This blog post dives into the mechanics of Perpetual Learning, demonstrating how it empowers continuous learning while upholding exceptional accuracy.
Employing the California Housing dataset, we illustrate the potential of continual learning. The dataset is segmented into 24 batches. The initial training utilizes the first 12 batches, followed by incremental learning on the subsequent 11 batches. After each incremental step, the model's performance is evaluated on the next batch. To benchmark the efficacy of continual learning, we also train separate models from scratch at each step.
Batch # | Batch Size | Cumulative Data Size | Continual CPU Time | Retraining CPU Time | Continual Cumulative CPU Time | Retraining Cumulative CPU Time | Speed-up |
---|---|---|---|---|---|---|---|
0 | 10320 | 10320 | 28.6 | 30.2 | 28.6 | 30.2 | 1.0x |
1 | 860 | 11180 | 22.3 | 31.3 | 50.9 | 61.5 | 1.2x |
2 | 860 | 12040 | 17.6 | 37.2 | 68.5 | 98.7 | 1.4x |
3 | 860 | 12900 | 11.5 | 33.9 | 80.0 | 132.6 | 1.7x |
4 | 860 | 13760 | 10.4 | 36.6 | 90.4 | 169.2 | 1.9x |
5 | 860 | 14620 | 11.1 | 41.2 | 101.5 | 210.4 | 2.1x |
6 | 860 | 15480 | 12.5 | 40.6 | 114.0 | 251.0 | 2.2x |
7 | 860 | 16340 | 14.2 | 46.9 | 128.2 | 297.9 | 2.3x |
8 | 860 | 17200 | 19.6 | 49.1 | 147.8 | 347.0 | 2.3x |
9 | 860 | 18060 | 13.3 | 57.0 | 161.1 | 404.0 | 2.5x |
10 | 860 | 18920 | 17.4 | 49.4 | 178.5 | 453.4 | 2.5x |
11 | 860 | 19780 | 19.6 | 60.0 | 198.1 | 513.4 | 2.6x |
The findings reveal that continual learning progressively accelerates learning with each incremental step, maintaining a consistent average test mean squared error (MSE) of 0.197. This translates to a theoretical speed-up that approaches infinity as the number of incremental steps increases, as visualized in the accompanying charts.
The following chart demonstrates the CPU time required at each step. Perpetual Learning exhibits near-constant CPU time due with the constant batch size. Conversely, retraining incurs a linear increase in CPU time as the data size grows proportionally.
The following chart showcases the cumulative CPU time across all steps. Perpetual Learning exhibits a linear growth in cumulative CPU time, while retraining incurs a quadratic increase. This highlights the theoretical infinite speed-up achieved by Perpetual Learning as the number of batches approaches infinity.
These compelling results establish Perpetual Learning's ability to not only deliver 100x faster initial training but also to facilitate learning from data in linear time, irrespective of data size and scale.