Deep Learning has revolutionized the fields of
computer vision, natural language understanding, speech
recognition, information retrieval and more. Many techniques have
evolved over the past decade that made models lighter, faster, and
robust with better generalization. However, many deep learning
practitioners persist with pre-trained models and architectures
trained mostly on standard datasets such as Imagenet, MS-COCO,
IMDB-Wiki Dataset, and Kinetics-700 and are either hesitant or
unaware of redesigning the architecture from scratch that will lead
to better performance. This scenario leads to inefficient models that
are not suitable on various devices such as mobile, edge, and fog. In
addition, these conventional training methods are of concern as they
consume a lot of computing power. In this paper, we revisit various
SOTA techniques that deal with architecture efficiency (Global
Average Pooling, depth-wise convolutions & squeeze and
excitation, Blurpool), learning rate (Cyclical Learning Rate), data
augmentation (Mixup, Cutout), label manipulation (label
smoothing), weight space manipulation (stochastic weight
averaging), and optimizer (sharpness aware minimization). We
demonstrate how an efficient deep convolution network can be built
in a phased manner by sequentially reducing the number of training
parameters and using the techniques mentioned above. We achieved
a SOTA accuracy of 99.2% on MNIST data with just 1500
parameters and an accuracy of 86.01% with just over 140K
parameters on the CIFAR-10 dataset.