DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone

DLゼミ (論文紹介)
MobileOne: An Improved One
millisecond Mobile Backbone
北海道大学大学院情報科学研究院
情報理工学部門複合情報工学分野調和系工学研究室
博士後期課程2年森雄斗
2023/11/20

Copyright © 2020 調和系工学研究室 - 北海道大学大学院情報科学研究院情報理工学部門複合情報工学分野 – All rights reserved.
論文情報 2
タイトル
MobileOne: An Improved One millisecond Mobile
Backbone
著者
Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel
Tuzel, Anurag Ranjan
Apple
掲載
CVPR2023
URL
GitHub
https://github.com/apple/ml-mobileone
論文
https://openaccess.thecvf.com/content/CVPR2023/html/Vasu_Mo
bileOne_An_Improved_One_Millisecond_Mobile_Backbone_CVPR
_2023_paper.html

論文情報 3
タイトル
MobileOne: An Improved One millisecond Mobile
Backbone
著者
Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel
Tuzel, Anurag Ranjan
Apple
掲載
CVPR2023
URL
GitHub
https://github.com/apple/ml-mobileone
論文
https://openaccess.thecvf.com/content/CVPR2023/html/Vasu_Mo
bileOne_An_Improved_One_Millisecond_Mobile_Backbone_CVPR
_2023_paper.html
PyTorch Image Models (timm)
2023/8/25に追加

概要 4
モバイル端末向けのNN（ニューラルネットワーク）
における推論時間 (latency) の課題
従来評価指標であるFLOPsやパラメータ数と推論時間の相関
があまりない場合がある
FLOPs: 浮動小数点の乗算・加算演算の回数
上記の検証内容を考慮したモバイル向けの
NNバックボーン”MobileOne”を提案
同等の精度を持つ従来のモデルよりも高速 (1ms以下)

結果 5

背景 6
モバイル向けのNNバックボーンは
推論精度を向上させながらFLOPs (FLoating-point
OPerationS)とパラメータ数を減少させるように進歩
FLOPs (or パラメータ数)
精度 (Top-1など)
1
2
従来手法
一般的な提案手法
同じ精度でFLOPsが
小さいから優れている
FLOPsが同じで精度が
高いから優れている
最近の研究[1]で従来評価指標であるFLOPsやパラメータ数と
モデルの効率性に顕著な相関はないことが示された
1
2

主張の根拠 7
ネットワーク内のパラメータ共有
FLOPsは大きくなるが、モデルサイズは小さくなる
= 計算回数が増えるが、パラメータ数は減る
skip-connections[2]やbranching(分岐)[3,4]
パラメータ数は増加しないが、
大きなメモリアクセスのコストが発生
ResNet[2]のskip-connection DenseNet[3]の概要図

本研究の目的 8
1. on-device latencyに影響を与える
ネットワーク構造を特定
2. 最適化に必要なボトルネックの特定
3. on-device latencyのコストを改善するための
ネットワークアーキテクチャを提案

関連研究 9
パラメータ数を最適化
SqueezeNet[5], MobileViT[6]
FLOPsを最適化
MobileNets[7], MobileNeXt[8], GhostNet[9], MobileFormer[10]
これらのモデルが必ずモバイル端末でも高速推論できるとは限らない
提案手法MobileOneのベース手法
RepVGG[11] : 再パラメータ可能なskip-connectionが導入されている
(スライド16,17ページで説明)

Latencyと従来指標の相関関係 10
実験
iPhone12で実行するためにモデルを変換
Pytorch → onnx → CoreML パッケージ
モバイル向けの主要モデルとの比較
検証内容
実機によるLatencyとFLOPsの比較
実機によるLatencyとパラメータ数の比較

FLOPs vs Latency 11
FLOPsが多いほど、Latencyが小さくなる
モデルが多い
1. MobileNetV1
2. EfficientNet-B0
3. ShuffleNetV2-2.0
4. ShuffleNetV2-1.0
5. MobileNext-1.4
6. MobileNetV2
7. MobileNetV3-S
8. MobileNetV3-L
9. MixNet-S
10. MNASNet-A1
11. MobileOne-S1
12. MobileOne-S0

パラメータ数 vs Latency 12
FLOPsと同様
パラメータ数が多いほど、
Latencyが小さくなるモデルが多い
1. MobileNetV1
2. EfficientNet-B0
3. ShuffleNetV2-2.0
4. ShuffleNetV2-1.0
5. MobileNext-1.4
6. MobileNetV2
7. MobileNetV3-S
8. MobileNetV3-L
9. MixNet-S
10. MNASNet-A1
11. MobileOne-S1
12. MobileOne-S0

両者の比較 13
CNN系はTransformer系と比較して
同じFLOPsとパラメータ数でもLatencyが低い
モデル構造に変化がある場合は従来指標は
あまり参考にならないのでは？
FLOPs vs Latency パラメータ数 vs Latency

相関係数 14
スピアマンの順位相関係数[12]
各変数が正規分布に従わない場合に適用
変数の値を順位に変換して相関係数を求めた指標
結果
FLOPs と Latency = 中程度の相関
パラメータ数と Latency = 弱い相関
デスクトップCPUでは相関関係がさらに悪化

ボトルネックの特定 15
活性化関数 (左図)
ReLUが最速
MobileOneの活性化関数はReLUを採用
アーキテクチャブロック (右図)
SEブロックとSkip-Connectionsの比較
SEブロック: チャネルをまたいだ演算 (活性化関数の値を保持)
Skip-connections: メモリアクセスコストが増加する機構を持つ

MobileOneのアーキテクチャ 16
構造再パラメータ化[11]と同様に
学習時と推論時でアーキテクチャが異なる
基本構造はMobileNet-V1[7]の
3x3のdepthwise layersと1x1のpointwise layers

構造的再パラメータ化 (structural re-parameterization ) 17
“RepVGG: Making VGG-style ConvNets Great Again”[11]
という論文で発表された
re-parameterizationを用いて学習時と推論時で構造を変化
推論時のモデル
構造がVGG-like

構造的再パラメータ化 (structural re-parameterization ) 18
𝑀(1)
𝑀(2)
𝑊(3)
𝑊(1)
1. BN Layerはパラメータの更新がない
場合、定数倍とバイアス項で表せる
2. 1x1 convとidentify mappingを3x3に
変換する
3. すべての3x3 convを加算し、
一つの1x1 convに変換する

MobileNet-V1[7] の構造 19
(a) 通常の畳み込み演算
(b) 下がMobileNetの畳み込み演算
空間方向とチャネル方向の畳み込みを同時に行わず、
Depthwise(空間方向)とPointwise(チャネル方向)を順に行う
m×m×nの標準的な畳込みレイヤの場合
MAC(積和演算)の回数が𝑚2
倍少ない

モデルのスケーリング 20
MobileNet-V2[13]と同じ深度スケーリングを採用
depthwise convに入れる前にチャネル数を変化
𝛼 = スケーリングとAutoAugment[15]の強度に使用
𝑘 = over-parameterization係数

MobileNet-V2[13]と同じ深度スケーリングを採用
depthwise convに入れる前にチャネル数を変化
𝛼 = スケーリングとAutoAugment[14]の強度に使用
𝑘 = over-parameterization係数
モデルのスケーリング 21
S0 は k=4
S1 以降ではk=1で高精度

MobileOneの概要 22
MobileNet-v1の構造をベースとして、
re-parameterizationによる高速推論化と
MobileNet-v2と同じ深度スケーリングを可能にした
モバイル向けのネットワークモデル

学習データセットとベンチマーク 23
ImageNetデータセットで学習
S0, S1はサイズ変換トリミングと水平反転のみ
S2, S3, S4はAutoAugmentを使って前処理の方策を
自動で決定
Latencyのベンチマークはそれぞれの端末で
1000回実行した結果の中央値を採用
Mobile: iPhone12
CPU: Intel Xeon Gold 5118プロセッサー
GPU: RTX-2080Ti GPU

結果 (ImageNet-1k) 24

結果 (ImageNet-1k) 25
MobileOne-S3は
EfficientNet-B0に対して
1％の精度向上と11％の高速化

その他ベンチマーク 26
物体検出 on MS-COCOの結果
セマンティックセグメンテーション on
Pascal VOC and ADE 20kの結果
画像識別タスク以外においても高精度

物体検出 27

セマンティックセグメンテーション 28

まとめ 29
Latencyはパラメータ数やFLOPsなどの指標と
はあまり相関がない可能性を示した
再パラメータ化可能な構造を持った
MobileOneを提案
高精度を保ちながら、Latencyの減少に大きく貢献

参考文献 30
[1] Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, and Yi Tay. The efficiency misnomer. arXiv preprint
arXiv:2110.12894, 2021.
[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2016.
[3] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[4] Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep
learning. In Proceedings of the 30th International Conference on Machine Learning, 2013.
[5] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. Squeezenet: Alexnet-
level accuracy with 50x fewer parameters and ¡1mb model size. CoRR, 2016.
[6] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia
Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank
Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep
learning library. In Advances in Neural Information Processing Systems 32. 2019.
[7] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and
Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv, abs/1704.04861, 2017.
[8] Daquan Zhou, Qibin Hou, Yunpeng Chen, Jiashi Feng, and Shuicheng Yan. Rethinking bottleneck structure for efficient mobile
network design. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
[9] Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. Ghostnet: More features from cheap operations. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[10] Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, and Zicheng Liu. Mobileformer: Bridging
mobilenet and transformer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[11] Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun. Repvgg: Making vgg-style
convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[12] Jerrold H Zar. Spearman rank correlation. Encyclopedia of biostatistics, 7, 2005.
[13] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals
and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520,
2018.

DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone

Similar to DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone (20)

More from harmonylab

More from harmonylab (20)

Recently uploaded

Recently uploaded (14)

DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone

Editor's Notes