Coding 技巧 — 9種 Python List Iteration 方法與實測

6 min readApr 2, 2021

Python List 是資料處理裡面最常用的資料結構之一，我們也經常對每個元素進行 element-wise 的逐一操作，稱之為 List Iteration。但是最近在公司面試了一些同學，我發現懂得正確且漂亮的完成 List Iteration的同學不是很多。因此我決定寫篇文章並附上實驗 source code，讓大家可以自己研究並實驗各種寫法的速度與優缺點。

這篇文章會使用 Python3 討論以下 9 種 List Iteration 寫法，並且對比他們在 CPU 計算跟 IO 運算裡面的效能差別：

for 迴圈
for 迴圈 + range
for 迴圈 + enumerate
map
ThreadPoolExecutor.map
ProcessPoolExecutor.map
List Comprehension
List Comprehension + range
List Comprehension + enumerate

GitHub在這裡，大家可以自行下載並修改練習：

YangRice/python-practice

Test performance for some python scripts. You can clone this code and run on your environment to check the performance…

github.com

9 種寫法範例與原理說明

假設我們題目是要對 List 裡面的每個元素乘以二並輸出一個新的 List ，以上 9 種的標準寫法如下：

for 迴圈、for 迴圈 + range、for迴圈 + enumerate

這三種都是常見的迴圈方式，不同的是 for 迴圈只會取用每個元素值，for 迴圈 +range 則是用 index 循環，而 for 迴圈 +emunerate 則是同時可以取得元素值與 index。

注意，因為題目是要回傳一個新的 list，所以上面寫法使用 yield 回傳。這樣的回傳結果會是一個 generator，需要再依序讀取或是轉成 list 才能得到每個元素值。

Map, ThreadPoolExecutor.map, ProcessPoolExecutor.map

三種都屬於 map 系列的方法。注意，Python map 與 c++ map 不一樣：

Python map：對每一個元素都進行相同的函數操作。
C++ map：用 Hash 實現 key-value 的一種資料結構，類似 Python dict。

在 Python 裡面，這三種 map 的差別分別如下所述：

Map(func, list)：回傳一個 generator，generator 只能依序讀取，並且在讀取的時候才進行 func。
ThreadPoolExecutor.map(func, list)：在 multi-thread 執行，回傳與 map 相同。
ProcessPoolExecutor.map(func, list)：在 multi-process 執行，回傳與 map 相同。

List Comprehension, List Comprehension + range, List Comprehension + enumerate

List Comprehension 與 For 迴圈相同，但使用一行程式碼完成。

執行效能對比

除了這 9 種寫法之外，我還測試了 3 種不同的操作 function 的排列組合：

直接在 code 裡面處理，或是使用 lambda function
呼叫 defined function 處理
跟 IO 有關的 function 處理（例如 wget，這裡使用 sleep 模擬）

測試環境為 Ubuntu (WSL)，實驗結果如下圖，耗時時間為處理一個 list 的平均耗時。

左：直接 code 處理。中：呼叫 function 處理。右：IO 相關的 function 運算

我在 Windows 也跑了相同的實驗。除了 ProcessThreadPool 會掛掉之外，實驗結果沒有太大的差別。

統整一下實驗資料，可以得到以下結論：

Generator 會立即回傳，但是得不到值

使用 for-loop + yield 與 map 系列所得到的都是 generator，因此耗時非常少，但是 print 後得不到數值，必須轉 list 之後才看得到數值。Generator 的所有運算與耗時會在轉換或取數值的時候才發生。

不額外呼叫 function 的情況，List Comprehension系列最快

這是 Python 的直譯式語言特性導致的差別。直譯式語言是一行一行執行的，語言引擎有機會以「行」的層級來加速，所以一行能夠完全表達的 code 有機會以更快的速度跑完。

非 IO function 的情況，Map 最快

這是因為 map 是針對 element-wise function 特別優化過的方法。而其他ThreadPoolExecutor, ProcessPoolExecutor 比較慢的原因是因為有額外的系統性能開銷（建立 thread、建立process）

IO function 的情況，ThreadPoolExecutor.map, ProcessPoolExecutor.map 最快

9種寫法中，其他 7 種都是 single-thread 的寫法，因此 print 出來的平均消耗時間都受限於 function 的時間。只有 ThreadPoolExecutor, ProcessPoolExecutor 可以使用 multi-thread/multi-process 的平行處理。

在這個實驗裡 ThreadPoolExecutor 又比 ProcessPoolExecutor 更快，這是因為建立 process 並且同步資料的系統開銷比新建 thread 大得多。但在實務上，兩者各自的適用場景可以用一句話總結:

如果瓶頸是 CPU，使用 ProcessPoolExecutor；如果瓶頸是 IO，使用 ThreadPoolExecutor。

大家可以自行修改實驗的 source code，來驗證上面這句話是否正確。

綜合實務建議

對於簡短的操作，使用 List Comprehension 實作，可讀性與效能都較好。
對於複雜的操作，可以包裝成函數，然後使用 map 實作，可讀性與效能都較好。
對於 IO 相關的操作（如：下載、寫檔），使用 ThreadPoolExecutor.map。
對於高 CPU 的操作（如：模型推理），使用 ProcessPoolExecutor.map。
盡量不要寫成 for-loop，會失去 Python 引擎的加速機會，而且不簡潔。
不需要 index 的時候，不要用 range 或 enumerator 的方式。

我鼓勵大家盡量寫簡短可讀的 Python Code，畢竟同時具備簡潔與可讀性是 Python 最大的優點。多加練習，可以增加你的 Coding 效率、提升可讀性、也能降低 bug。面試的時候，簡潔高效的 Code 也可以增加為你額外的 Coding Style 分數。

人生苦短，我用 Python。