マルチプロセッシング-パイプとキュー

151

Pythonのマルチプロセッシングパッケージのキューとパイプの基本的な違いは何ですか？

どのシナリオでどちらを選択する必要がありますか？いつ使用するのが有利Pipe()ですか？いつ使用するのが有利Queue()ですか？

— ジョナサン
ソース

281

A Pipe()は2つのエンドポイントしか持つことができません。
A Queue()は複数のプロデューサーとコンシューマーを持つことができます。

いつ使うか

通信に3つ以上のポイントが必要な場合は、を使用してくださいQueue()。

がの上に構築されているPipe()ため、絶対的なパフォーマンスが必要な場合は、aの方がはるかに高速Queue()ですPipe()。

パフォーマンスベンチマーク

2つのプロセスを生成し、それらの間でメッセージをできるだけ早く送信したいとします。Pipe()and を使用した同様のテスト間のドラッグレースのタイミング結果は次のとおりQueue()です。

参考までに、私はJoinableQueue()ボーナスとして結果を出しました。が呼び出されたJoinableQueue()ときにタスクを考慮しますqueue.task_done()（特定のタスクについてさえ知らず、キュー内の未完了のタスクを数えるだけです）。これによりqueue.join()、作業が完了したことがわかります。

この回答の下部にあるそれぞれのコード...

mpenning@mpenning-T61:~$ python multi_pipe.py 
Sending 10000 numbers to Pipe() took 0.0369849205017 seconds
Sending 100000 numbers to Pipe() took 0.328398942947 seconds
Sending 1000000 numbers to Pipe() took 3.17266988754 seconds
mpenning@mpenning-T61:~$ python multi_queue.py 
Sending 10000 numbers to Queue() took 0.105256080627 seconds
Sending 100000 numbers to Queue() took 0.980564117432 seconds
Sending 1000000 numbers to Queue() took 10.1611330509 seconds
mpnening@mpenning-T61:~$ python multi_joinablequeue.py 
Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds
Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds
Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds
mpenning@mpenning-T61:~$

要約するPipe()と、は約3倍高速ですQueue()。JoinableQueue()あなたが本当に利益を得なければならない場合を除いて、考えさえしないでください。

ボーナスマテリアル2

マルチプロセッシングは、情報フローに微妙な変更を導入し、ショートカットを知らない限りデバッグを困難にします。たとえば、多くの条件下でディクショナリを介してインデックスを作成するときに正常に機能するが、特定の入力でまれに失敗するスクリプトがあるとします。

通常、Pythonプロセス全体がクラッシュすると、失敗の手がかりが得られます。ただし、マルチプロセッシング関数がクラッシュした場合、一方的なクラッシュトレースバックがコンソールに出力されません。未知のマルチプロセッシングクラッシュを追跡することは、プロセスをクラッシュさせた原因を知る手がかりがなければ困難です。

マルチプロセッシングのクラッシュ情報を追跡するために私が見つけた最も簡単な方法は、マルチプロセッシング関数全体をtry/ でラップしてexcept使用することtraceback.print_exc()です：

import traceback
def run(self, args):
    try:
        # Insert stuff to be multiprocessed here
        return args[0]['that']
    except:
        print "FATAL: reader({0}) exited while multiprocessing".format(args) 
        traceback.print_exc()

ここで、クラッシュを見つけると、次のようなものが表示されます。

FATAL: reader([{'crash': 'this'}]) exited while multiprocessing
Traceback (most recent call last):
  File "foo.py", line 19, in __init__
    self.run(args)
  File "foo.py", line 46, in run
    KeyError: 'that'

ソースコード：

"""
multi_pipe.py
"""
from multiprocessing import Process, Pipe
import time

def reader_proc(pipe):
    ## Read from the pipe; this will be spawned as a separate Process
    p_output, p_input = pipe
    p_input.close()    # We are only reading
    while True:
        msg = p_output.recv()    # Read from the output pipe and do nothing
        if msg=='DONE':
            break

def writer(count, p_input):
    for ii in xrange(0, count):
        p_input.send(ii)             # Write 'count' numbers into the input pipe
    p_input.send('DONE')

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        # Pipes are unidirectional with two endpoints:  p_input ------> p_output
        p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
        reader_p = Process(target=reader_proc, args=((p_output, p_input),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process

        p_output.close()       # We no longer need this part of the Pipe()
        _start = time.time()
        writer(count, p_input) # Send a lot of stuff to reader_proc()
        p_input.close()
        reader_p.join()
        print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_queue.py
"""

from multiprocessing import Process, Queue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:             
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to Queue() took {1} seconds".format(count, 
            (time.time() - _start)))

"""
multi_joinablequeue.py
"""
from multiprocessing import Process, JoinableQueue
import time

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        queue.task_done()

def writer(count, queue):
    for ii in xrange(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process
        # reader_proc() reads from jqueue as a different process...
        reader_p = Process(target=reader_proc, args=((jqueue),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process
        _start = time.time()
        writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process)
        jqueue.join()         # Wait for the reader to finish
        print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, 
            (time.time() - _start)))

— マイク・ペニントン
ソース

@Jonathan「要約すると、Pipe（）はQueue（）よりも約3倍高速です」

— James Brady

優れた！良い回答であり、ベンチマークを提供してくれてうれしいです。私は2つの小さな問題を抱えています。（1）「桁違いに速い」は少し誇張されています。違いはx3で、これは1桁の約3分の1です。ただ言って。;-); （2）より公平な比較は、実行中のN個のワーカーであり、それぞれがポイントツーポイントパイプを介してメインスレッドと通信し、実行中のN個のワーカーすべてが単一のポイントツーマルチポイントキューからプルするパフォーマンスと比較します。

— JJC 2012年

あなたの「ボーナスマテリアル」に...ええ。プロセスをサブクラス化する場合は、 'run'メソッドの大部分をtryブロックに配置します。これは、例外のロギングを行うための便利な方法でもあります。通常の例外出力を複製するには：sys.stderr.write（ ''。join（traceback.format_exception（*（sys.exc_info（）））））

— travc

@ alexpinho98-ただし、送信するものが通常のデータではなくエラーデータであることを示すために、帯域外データと関連するシグナリングモードが必要になります。元のプロセスがすでに予測不可能な状態にあるので、これは質問するには多すぎるかもしれません。

— scytale 2013年

@JJC自分の小石で小刻みに動く場合、3xは3分の1ではなく、約半分の大きさです-sqrt（10）=〜3.

— jab

Queue()注目に値するもう1つの機能は、フィーダースレッドです。このセクションでは、「プロセスが最初にアイテムをキューに入れると、オブジェクトをバッファからパイプに転送するフィーダスレッドが開始されます。」ブロックをQueue()呼び出すことなく、無限の数（またはmaxsize）のアイテムを挿入できますqueue.put()。これによりQueue()、プログラムで処理できるようになるまで、複数のアイテムをに保存できます。

Pipe()一方、には、1つの接続に送信されたが、他の接続からは受信されていないアイテムのための有限のストレージがあります。このストレージが使い果たされた後connection.send()、アイテム全体を書き込むスペースができるまで、への呼び出しはブロックされます。これにより、他のスレッドがパイプから読み取るまで、書き込みを行っているスレッドが停止します。Connectionオブジェクトを使用すると、基礎となるファイル記述子にアクセスできます。* nixシステムではconnection.send()、os.set_blocking()関数を使用して呼び出しがブロックされるのを防ぐことができます。ただし、パイプのファイルに収まらない単一のアイテムを送信しようとすると、問題が発生します。Linuxの最近のバージョンでは、ファイルのサイズを増やすことができますが、許可される最大サイズはシステム構成によって異なります。したがって、Pipe()データのバッファリングに依存するべきではありません。への呼び出しconnection.send パイプからデータが読み取られるまでブロックされる可能性があります。

結論として、データをバッファリングする必要がある場合、Queueはpipeよりも優れた選択肢です。2点間で通信する必要があるだけでも。

— ロジャー・アイエンガー
ソース