出力がしきい値を下回ったときに並列化されたforループが停止するような方法で遅延された並列を実装する方法は？

8

次のコードがあるとします。

from scipy import *
import multiprocessing as mp
num_cores = mp.cpu_count()
from joblib import Parallel, delayed
import matplotlib.pyplot as plt

def func(x,y):
    return y/x
def main(y, xmin,xmax, dx):
    x = arange(xmin,xmax,dx)
    output = Parallel(n_jobs=num_cores)(delayed(func)(i, y) for i in x)
    return x, asarray(output)
def demo():
    x,z = main(2.,1.,30.,.1)
    plt.plot(x,z, label='All values')
    plt.plot(x[z>.1],z[z>.1], label='desired range') ## This is better to do in main()
    plt.show()

demo()

出力>与えられた数値（出力の要素がxの増加に伴って単調に減少すると想定できます）になるまで出力を計算し、次に停止します（xのすべての値を計算してから並べ替えるのではなく、これは私の目的には非効率的です）。Parallel、Delayed、またはその他のマルチプロセッシングを使用してそれを行う方法はありますか？

— user247534
ソース

numpyも使用できます。いくつかの数字を追加しました。コードをより効率的にするために、デモ関数での選択[z> .1]はメイン関数で行う必要があります。

— user247534

乱雑になることはわかっていますが、1つのリストを作成して関数に渡し、関数は結果をそのリストに追加します。次に、外部で、リストにそれより大きい数が含まれているかどうかを確認し、スレッドを何らかの方法で終了します。これについて考えた今、キューのようにこれを行うにはよりスマートな方法があると思います

— Maxxik CZ

1

output > a given number明記はありませんでしたので、作りました。テスト後、適切に動作させるには条件を逆にする必要がありましたoutput < a given number。

私はプールを使用し、コールバック関数でプロセスを起動して停止条件を確認し、準備ができたらプールを終了します。しかし、これにより競合状態が発生し、終了を許可されていない実行中のプロセスから結果を省略できます。この方法はコードへの変更が最小限であり、非常に読みやすいと思います。リストの順序は保証されていません。

長所：オーバーヘッドがほとんどない
短所：結果が欠落する可能性があります。

方法1）

from scipy import *
import multiprocessing

import matplotlib.pyplot as plt


def stop_condition_callback(ret):
        output.append(ret)
        if ret < stop_condition:
            worker_pool.terminate()


def func(x, y, ):
    return y / x


def main(y, xmin, xmax, dx):
    x = arange(xmin, xmax, dx)
    print("Number of calculations: %d" % (len(x)))

    # add calculations to the pool
    for i in x:
        worker_pool.apply_async(func, (i, y,), callback=stop_condition_callback)

    # wait for the pool to finish/terminate
    worker_pool.close()
    worker_pool.join()

    print("Number of results: %d" % (len(output)))
    return x, asarray(output)


def demo():
    x, z_list = main(2., 1., 30., .1)
    plt.plot(z_list, label='desired range')
    plt.show()


output = []
stop_condition = 0.1

worker_pool = multiprocessing.Pool()
demo()

この方法はオーバーヘッドが大きくなりますが、開始されたプロセスが終了することを許可します。方法2）

from scipy import *
import multiprocessing

import matplotlib.pyplot as plt


def stop_condition_callback(ret):
    if ret is not None:
        if ret < stop_condition:
            worker_stop.value = 1
        else:
            output.append(ret)


def func(x, y, ):
    if worker_stop.value != 0:
        return None
    return y / x


def main(y, xmin, xmax, dx):
    x = arange(xmin, xmax, dx)
    print("Number of calculations: %d" % (len(x)))

    # add calculations to the pool
    for i in x:
        worker_pool.apply_async(func, (i, y,), callback=stop_condition_callback)

    # wait for the pool to finish/terminate
    worker_pool.close()
    worker_pool.join()

    print("Number of results: %d" % (len(output)))
    return x, asarray(output)


def demo():
    x, z_list = main(2., 1., 30., .1)
    plt.plot(z_list, label='desired range')
    plt.show()


output = []
worker_stop = multiprocessing.Value('i', 0)
stop_condition = 0.1

worker_pool = multiprocessing.Pool()
demo()

方法3）長所：結果は除外されません。
短所：この手順は、通常行うことのない方法です。

方法1を取り、追加

def stopPoolButLetRunningTaskFinish(pool):
    # Pool() shutdown new task from being started, by emptying the query all worker processes draw from
    while pool._task_handler.is_alive() and pool._inqueue._reader.poll():
        pool._inqueue._reader.recv()
    # Send sentinels to all worker processes
    for a in range(len(pool._pool)):
            pool._inqueue.put(None)

次に変更します stop_condition_callback

def stop_condition_callback(ret):
    if ret[1] < stop_condition:
        #worker_pool.terminate()
        stopPoolButLetRunningTaskFinish(worker_pool)
    else:
        output.append(ret)

— ロン
ソース

0

Daskを使用して並列に実行します。具体的には、Futuresインターフェースを使用して、完了時に結果をリアルタイムでフィードバックします。完了したら、処理中の残りのフューチャーをキャンセルするか、不要なフューチャーをリースして非同期で終了するか、クラスターを閉じることができます。

from dask.distributed import Client, as_completed
client = Client()  # defaults to ncores workers, one thread each
y, xmin, xmax, dx = 2.,1.,30.,.1

def func(x, y):
    return x, y/x
x = arange(xmin,xmax,dx)
outx = []
output = []
futs = [client.submit(func, val, y) for val in x]
for future in as_completed(futs):
    outs = future.result()
    outx.append(outs[0])
    output.append(outs[1])
    if outs[1] < 0.1:
        break

注：-最初の値が既に（y / xmin > 0.1）を超えているため、「未満」を意味するものと思います- 結果が準備できたときに結果をフェッチする場合、出力は入力した順序であるとは限りませんが、高速計算、おそらくそれらは常にそうです（これがfuncが入力値も返すようにした理由です）-計算を停止すると、出力は入力の完全なセットよりも短くなるため、何をしたいのかよくわかりません印刷。

— mdurant
ソース