Abstract

We present SLIMING (Singular vaLues-drIven autoMated filter prunING), an automated filter pruning method that uses singular values to formalize the pruning process as an optimization problem over filter tensors. Recognizing that this original formulation poses a combinatorial challenge, we propose to replace it with a two-step process that consistently uses singular values in each phase: (\(i\)) determining the pruning configuration, which specifies the number of filters to retain in each layer, and (\(ii\)) selecting the filters themselves. We show that this approach ensures the preservation of the filters' multidimensional structure throughout the pruning process. For each of these steps, we propose a straightforward algorithm to solve them. To validate each part of our approach, we performed a numerical simulation on an overparameterized synthetic toy example. Additionally, we conducted extensive simulations across eight architectures, four benchmark datasets, and four vision tasks, validating the efficacy of our framework.

🔥 News

🎥 Presentation Video

🎮 Toy example

This paper proposes a method that leverages tools from linear and multilinear algebra to provide a new solution for automated filter pruning. We propose to detect network redundancy hinging on the dynamics of singular values and the use of the nuclear norm. We illuminate the intricate relationship between filter redundancy within neural networks and the observable variations in their singular values. To illustrate the rationale of our proposed approaches, we create a synthetic dataset, dubbed as SVGG, which includes an original model that mimics the architecture of the VGG network with \(L=5\), \(d_l= 3\) and \(\{C_l\}_{l=1}^L = \{64, 128, 256, 512, 512\}\). We choose the redundant rates (the ratio of the number of redundant filters to the total number of filters, i.e., \(\frac{C_l-N_l}{C_l}\)) of these layers sequentially as \(\{0.25, 0.3, 0.35, 0.4, 0.45\}\), thus \(\{N_l\}_{l=1}^L = \{48, 90, 166, 307, 282\}\), and the number of retained filters \(N=893\). In the \(l\)-th layer, we init \(N_l\) core filters with the standard normal distribution while the remaining redundant filters are copied from the core filters with a small noise of variance \(\epsilon = 0.01\). The "multilinear" singular values are visualized as follows. One should note that the overparameterized model contains many near-zero singular values, indicating redundancy. The random search approach, CHIP, FPC, and SPSRC yield suboptimal results, whereas GEM successfully identifies all unique filters, achieving 100% accuracy comparable to the complete search, with reduced computational overhead.

fig1
Figure 1: Distribution of singular values (left). Non-redundant selected filters (right).

🚩 Main results

To showcase SLIMING's adaptability, we evaluate it on five architectures: VGG-16-BN, GoogLeNet with inception modules, ResNet-20/32/56/110 with residual blocks, DenseNet-40 with dense blocks, and MobileNetV2 with inverted residual blocks. These models are tested on the CIFAR-10/100 datasets. To further validate SLIMING's scalability, we perform experiments on the ImageNet dataset using ResNet-50 and MobileNetV2 architectures. Additionally, the compressed ResNet-50 model is used as the backbone for Faster R-CNN-FPN, Mask R-CNN, and Keypoint R-CNN on the COCO-2017 dataset. We compare SLIMING with 56 related works, as detailed in the paper, and present ResNet-50 results on ImageNet in Table 1 for clarity. Furthermore, the compression results of ResNet-110 on CIFAR-10 are summarized in Table 2. Our method consistently surpasses other approaches across all compression levels in terms of performance and complexity reduction.

Table 1. Compression results of ResNet-50 on ImageNet
Method Auto Top-1 Top-5 MACs (↓%) Params (↓%)
ResNet-50 (CVPR'16) 76.15 92.87 4.12G (00) 25.56M (00)
REAF (TIP'23) 75.17 92.44 2.16G (48) 14.57M (43)
RGP (TNNLS'24) 75.30 92.55 2.30G (44) 14.34M (44)
Chen et al. (TNNLS'23) 75.60 92.58 2.21G (46) N/A
C-SGD (TNNLS'23) 75.80 92.65 2.19G (47) 14.58M (43)
CHIP (NeurIPS'21) 76.15 92.91 2.10G (49) 14.23M (44)
SFI-FP (Pattern Recognition'24) 76.29 93.08 2.10G (49) 14.23M (44)
PEEL (Pattern Recognition'24) 76.50 N/A 2.20G (46) N/A
SLIMING (Ours) 76.74 93.43 2.09G (49) 13.27M (48)
CIE (Neural Networks'24) 74.06 91.87 1.56G (62) 9.98M (61)
RGP (TNNLS'24) 74.58 92.09 1.92G (53) 11.99M (53)
MFP (TNNLS'23) 74.86 92.43 1.88G (54) N/A
FPWT (Neural Networks'24) 75.01 92.45 1.89G (54) 12.86M (50)
Torque (WACV'24) 75.07 N/A 1.99G (51) 9.68M (62)
OTOv2 (ICLR'23) 75.20 92.22 1.53G (63) N/A
FiltDivNet (TNNLS'24) 75.23 92.50 1.66G (59) 15.62M (39)
ASTER (TNNLS'24) 75.27 92.47 1.51G (63) N/A
C-SGD (TNNLS'23) 75.29 92.39 1.82G (55) 12.37M (52)
Hu et al. (Pattern Recognition'24) 75.30 92.40 1.81G (56) 17.86M (30)
HSC (TPAMI'25) 75.46 92.40 1.57G (62) N/A
DCFF (TPAMI'23) 75.60 92.55 1.52G (63) 11.05M (57)
HTP-URC (TNNLS'24) 75.81 N/A 1.88G (54) 15.81M (38)
SLIMING (Ours) 75.96 93.29 1.51G (63) 9.68M (62)
HBFP (Neurocomputing'24) 69.17 N/A 0.94G (76) 8.09M (68)
CHIP (NeurIPS'21) 72.30 90.74 0.95G (77) 8.01M (69)
SNACS (TNNLS'24) 72.60 N/A 1.98G (52) 7.92M (69)
RGP (TNNLS'24) 72.68 91.06 0.94G (77) 8.13M (68)
FPWT (Neural Networks'24) 72.82 91.14 1.02G (75) 6.38M (75)
SFI-FP (Pattern Recognition'24) 73.48 92.87 0.96G (77) 8.03M (69)
ACSC (Neurocomputing'24) 73.68 N/A 1.03G (75) 6.31M (75)
DCFF (TPAMI'23) 73.81 91.59 1.02G (75) 6.56M (74)
Guo et al. (IJCV'24) 73.84 92.07 1.19G (71) 6.25M (75)
SLIMING (Ours) 73.88 92.07 0.87G (79) 5.68M (78)
Table 2. Compression results of ResNet-110 on CIFAR-10
Method Auto Top-1 MACs (↓%) Params (↓%)
ResNet-110 (CVPR'16) 93.50 256.04M (00) 1.73M (00)
HSC (TPAMI'25) 94.01 88.26M (65) 0.69M (60)
SLIMING (Ours) 94.52 87.59M (66) 0.61M (65)
HSC (TPAMI'25) 93.56 71.31M (72) 0.51M (70)
SLIMING (Ours) 93.64 54.50M (79) 0.28M (84)

🚀 Throughput acceleration

To emphasize the practical benefits of SLIMMING, we meticulously conducted an experiment comparing a baseline model with a compressed model, both designed for object detection tasks. Using the FasterRCNN_ResNet50_FPN architecture on an RTX 3060 GPU, the experiment robustly demonstrates the significant performance improvement achieved by SLIMMING. Accompanying GIFs provide a clear visual representation: the baseline model achieves an inference speed of approximately 9 FPS, while the SLIMMING-compressed model achieves a remarkable twofold increase in throughput. This substantial difference effectively demonstrates SLIMMING's effectiveness and scalability, firmly establishing its relevance and usefulness across various deployment scenarios.

Figure 2: Baseline (left) vs Pruned (right) model inference.

🌈 Visualizing feature preservation

We present a qualitative evaluation of feature preservation, complementing the established efficiency demonstrated through numerical results. Our analysis involves a random selection of 5 images from the ImageNet validation dataset, examining three compression levels applied to the original ResNet-50 model: 44%, 63%, and 79%. Utilizing GradCAM for interpretation, we visually assess and analyze feature maps in both the original and compressed models. The visual representation underscores our framework's efficacy in retaining crucial features across a diverse range of classes. Noteworthy is its consistent robustness in capturing and preserving essential information at different CRs. This resilience implies sustained effectiveness and reliability across varying scenarios and compression levels, positioning our framework as a versatile choice for network compression across diverse applications and datasets.

Input CR=0% CR=44% CR=63% CR=79%

Figure 3: Qualitative assessment of feature preservation in compressed models.

🔖 Citation

If the code and paper help your research, please kindly cite:


        @misc{pham2024singular,
          title={Singular Values-Driven Automated Filter Pruning},
          author={Pham, Van Tien and Zniyed, Yassine and Nguyen, Thanh Phuong},
          howpublished={\url{https://sliming-ai.github.io/}},
          year={2024}
        }
      

👍 Acknowledgements

This work was granted access to the high-performance computing resources of IDRIS under the allocation 2023-103147 made by GENCI. Specifically, our experiments were conducted on the Jean Zay supercomputer, located at IDRIS, the national computing center for the National Centre for Scientific Research (CNRS).

We thank the Agence Nationale de la Recherche (ANR) for partially supporting our work through the ANR ASTRID ROV-Chasseur project (ANR-21-ASRO-0003).

jean-zay jean-zay

⏩ More & Moore 📈

The ever-accelerating progress of technology… gives the appearance of approaching some essential singularity. — John von Neumann, 1958
The singularity is nearer. — Ray Kurzweil, 2024