FROM句でSet Returning Function（SRF）の実行が遅くなるのはなぜですか？

これはデータベース内部の質問です。私はPostgreSQL 9.5を使用FROMしています。たとえば、これらのコマンドを実行するときのように、句の中でSet Returning Functions（SRF）（テーブル値関数（TVF）とも呼ばれます）の実行速度が低下するのはなぜですか。

CREATE TABLE foo AS SELECT * FROM generate_series(1,1e7);
SELECT 10000000
Time: 5573.574 ms

それはだ常に、より実質的に遅いです

CREATE TABLE foo AS SELECT generate_series(1,1e7);
SELECT 10000000
Time: 4622.567 ms

ここで作成できる一般的なルールはありますか？たとえば、節の外で常に Set-Returning関数を実行する必要がありFROMますか？

— エヴァン・キャロル
ソース

実行プランを比較することから始めましょう：

tinker=> EXPLAIN ANALYZE SELECT * FROM generate_series(1,1e7);
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=32) (actual time=2382.582..4291.136 rows=10000000 loops=1)
 Planning time: 0.022 ms
 Execution time: 5539.522 ms
(3 rows)

tinker=> EXPLAIN ANALYZE SELECT generate_series(1,1e7);
                                           QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
 Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.008..2622.365 rows=10000000 loops=1)
 Planning time: 0.045 ms
 Execution time: 3858.661 ms
(3 rows)

わかりましたので、今はそれが知っているSELECT * FROM generate_series()使用して実行されFunction Scanている間、ノードをSELECT generate_series()使用して実行されるResultノードを。これらのクエリが異なる方法で実行される原因は何であれ、これらの2つのノードの違いに要約され、どこを見ればよいかが正確にわかります。

EXPLAIN ANALYZE出力のもう1つの興味深いこと：タイミングに注意してください。SELECT generate_series()ですがactual time=0.008..2622.365、SELECT * FROM generate_series()ですactual time=2382.582..4291.136。Function Scanノードが起動する頃にレコードを返すResultノードが終了したレコードを返します。

PostgreSQLは間は何をやっていたt=0し、t=2382中にFunction Scan計画？どうやらそれは実行にかかる時間に関するものなgenerate_series()ので、私はそれがまさにそれがやっていたことを賭けます。答えが形になり始めResultます：結果はすぐに返されるようですがFunction Scan、結果を具体化してスキャンするようです。

EXPLAIN道のうち、の実装をチェックしてみましょう。Resultノードはに住んでいるnodeResult.c：言っています、

 * DESCRIPTION
 *
 *      Result nodes are used in queries where no relations are scanned.

コードは十分に単純です。

Function Scanはに住んでnodeFunctionScan.cおり、実際には2フェーズの実行戦略をとっているようです。

/*
 * If first time through, read all tuples from function and put them
 * in a tuplestore. Subsequent calls just fetch tuples from
 * tuplestore.
 */

そして、明確にするために、a tuplestoreが何であるかを見てみましょう：

 * tuplestore.h
 *    Generalized routines for temporary tuple storage.
 *
 * This module handles temporary storage of tuples for purposes such
 * as Materialize nodes, hashjoin batch files, etc.  It is essentially
 * a dumbed-down version of tuplesort.c; it does no sorting of tuples
 * but can only store and regurgitate a sequence of tuples.  However,
 * because no sort is required, it is allowed to start reading the sequence
 * before it has all been written.  This is particularly useful for cursors,
 * because it allows random access within the already-scanned portion of
 * a query without having to process the underlying scan to completion.
 * Also, it is possible to support multiple independent read pointers.
 *
 * A temporary file is used to handle the data if it exceeds the
 * space limit specified by the caller.

仮説が確認されました。Function Scan事前に実行して関数の結果を具体化します。結果セットが大きい場合、ディスクに流出します。Result何も具体化しませんが、簡単な操作のみをサポートします。

— Willglynn
ソース