PostgreSQL：同じクエリでの複雑な中間結果の再利用

8

PostgreSQLの（8.4）は、私はいくつかのテーブルから様々な結果をまとめたビューを作成しています使用して（例えば、列を作成しa、b、cビュー内）、その後、私は（一緒に同じクエリでこれらの結果の一部を結合する必要があるなどa+b、a-b、(a+b)/c、...）、最終結果を生成するようにします。私が気づいているのは、同じクエリ内で実行された場合でも、中間結果が使用されるたびに完全に計算されることです。

毎回同じ結果が計算されるのを避けるためにこれを最適化する方法はありますか？

これは、問題を再現する簡単な例です。

CREATE TABLE test1 (
    id SERIAL PRIMARY KEY,
    log_timestamp TIMESTAMP NOT NULL
);
CREATE TABLE test2 (
    test1_id INTEGER NOT NULL REFERENCES test1(id),
    category VARCHAR(10) NOT NULL,
    col1 INTEGER,
    col2 INTEGER
);
CREATE INDEX test_category_idx ON test2(category);

-- Added after edit to this question
CREATE INDEX test_id_idx ON test2(test1_id);

-- Populating with test data.
INSERT INTO test1(log_timestamp)
    SELECT * FROM generate_series('2011-01-01'::timestamp, '2012-01-01'::timestamp, '1 hour');
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (20000*random()-10000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (2000*random()-1000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
    SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
               (2000*random()-40)::int, (3000*random()-200)::int FROM test1;

最も時間がかかる操作を実行するビューを次に示します。

CREATE VIEW testview1 AS
    SELECT
       t1.id,
       t1.log_timestamp,
       (SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
       (SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
       (SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
    FROM test1 t1;

SELECT a FROM testview1生成し、この計画を（経由EXPLAIN ANALYZE）：

 Seq Scan on test1 t1  (cost=0.00..1787086.55 rows=8761 width=4) (actual time=12.877..10517.575 rows=8761 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.193..1.193 rows=1 loops=8761)
           ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.109..1.177 rows=0 loops=8761)
                 Recheck Cond: ((category)::text = 'A'::text)
                 Filter: (test1_id = $0)
                 ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.414..0.414 rows=1631 loops=8761)
                       Index Cond: ((category)::text = 'A'::text)
 Total runtime: 10522.346 ms

SELECT a, a FROM testview1生成し、この計画を：

 Seq Scan on test1 t1  (cost=0.00..3574037.50 rows=8761 width=4) (actual time=3.343..20550.817 rows=8761 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.183..1.183 rows=1 loops=8761)
           ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.100..1.166 rows=0 loops=8761)
                 Recheck Cond: ((category)::text = 'A'::text)
                 Filter: (test1_id = $0)
                 ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.418..0.418 rows=1631 loops=8761)
                       Index Cond: ((category)::text = 'A'::text)
   SubPlan 2
     ->  Aggregate  (cost=203.96..203.97 rows=1 width=4) (actual time=1.154..1.154 rows=1 loops=8761)
           ->  Bitmap Heap Scan on test2 t2  (cost=36.49..203.95 rows=1 width=4) (actual time=1.083..1.143 rows=0 loops=8761)
                 Recheck Cond: ((category)::text = 'A'::text)
                 Filter: (test1_id = $0)
                 ->  Bitmap Index Scan on test_category_idx  (cost=0.00..36.49 rows=1631 width=0) (actual time=0.426..0.426 rows=1631 loops=8761)
                       Index Cond: ((category)::text = 'A'::text)
 Total runtime: 20557.581 ms

ここでは、selecting a, aはselectingの2倍の時間がかかりますaが、実際には1回しか計算できません。たとえば、を使用するとSELECT a, a+b, a-b FROM testview1、サブプランをa3回およびb2回実行しますが、実行時間は合計時間の2/5に短縮できます（ここでは+および-は無視できると仮定）。

未使用の列（bおよびc）が不要な場合は計算しないのは良いことですが、ビューから同じ使用済みの列を一度だけ計算する方法はありますか？

編集： @Frank Heikensは、上記の例では欠落していたインデックスの使用を正しく提案しました。各サブプランの速度は向上しますが、同じサブクエリが複数回計算されるのを防ぐことはできません。申し訳ありませんが、明確にするために、最初の質問にこれを含めるべきでした。

performance postgresql

— ブルーノ
ソース

6

（自分の質問に回答してお詫びしますが、この無関係な質問と回答を読んだ後、代わりにCTEを使用してみるべきだと思いました。それは機能します。）

これはtestview1、質問と同様の別のビューですが、共通テーブル式を使用しています。

CREATE VIEW testview2 AS
    WITH testcte AS (SELECT
       t1.id,
       t1.log_timestamp,
       (SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
       (SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
       (SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
      FROM test1 t1)
    SELECT * FROM testcte;

（これは単なる例であり、ビューとCTEを組み合わせることが必ずしも良いアイデアであることを示唆しているわけではありません。CTEで十分かもしれません。）

異なりtestview1、クエリのためには、計画SELECT a FROM testview2、今も計算bし、c未使用であるため無視されました、testview1：

Subquery Scan testview2  (cost=395272.42..395535.25 rows=8761 width=8) (actual time=0.256..607.941 rows=8761 loops=1)
   ->  CTE Scan on testcte  (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.255..604.106 rows=8761 loops=1)
         CTE testcte
           ->  Seq Scan on test1 t1  (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.252..589.358 rows=8761 loops=1)
                 SubPlan 1
                   ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'A'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.009..0.009 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)
                 SubPlan 2
                   ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'B'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)
                 SubPlan 3
                   ->  Aggregate  (cost=15.02..15.04 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=8) (actual time=0.013..0.014 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'C'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)

ただし、同じクエリ内で複数回使用された結果（これが目的でした）は再計算されません。

異なり、testview1これとSELECT a, a, a, a, a比べて5倍の時間がかかったSELECT a、ここSELECT a, a, a, a, a, b, c, a+b, a+c, b+c FROM testview2と同じように時間がかかりSELECT a FROM testview2またはSELECT a, b, c FROM testview2。それだけで経由するa、bとc一度：

 Subquery Scan testview2  (cost=395272.42..395600.96 rows=8761 width=24) (actual time=0.147..562.790 rows=8761 loops=1)
   ->  CTE Scan on testcte  (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.144..554.194 rows=8761 loops=1)
         CTE testcte
           ->  Seq Scan on test1 t1  (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.140..542.657 rows=8761 loops=1)
                 SubPlan 1
                   ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'A'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)
                 SubPlan 2
                   ->  Aggregate  (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'B'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.006..0.006 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)
                 SubPlan 3
                   ->  Aggregate  (cost=15.02..15.04 rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=8761)
                         ->  Bitmap Heap Scan on test2 t2  (cost=4.28..15.02 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=8761)
                               Recheck Cond: (test1_id = $0)
                               Filter: ((category)::text = 'C'::text)
                               ->  Bitmap Index Scan on test_if_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
                                     Index Cond: (test1_id = $0)

— ブルーノ
ソース

1

謝罪する必要はありません！:-)実際には、あなた自身の質問に答えるための自己学習バッジがあります。

— Gaius

3

変更するテーブルtest2のtest1_idのインデックスが必要です。

Seq Scan on test1 t1  (cost=0.00..301450.63 rows=8761 width=12) (actual time=0.108..229.859 rows=8761 loops=1)
  SubPlan 1
    ->  Aggregate  (cost=11.45..11.46 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=8761)
          ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=8761)
                Recheck Cond: (test1_id = t1.id)
                Filter: ((category)::text = 'A'::text)
                ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                      Index Cond: (test1_id = t1.id)
  SubPlan 2
    ->  Aggregate  (cost=11.45..11.46 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=8761)
          ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=8761)
                Recheck Cond: (test1_id = t1.id)
                Filter: ((category)::text = 'B'::text)
                ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                      Index Cond: (test1_id = t1.id)
  SubPlan 3
    ->  Aggregate  (cost=11.46..11.47 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=8761)
          ->  Bitmap Heap Scan on test2 t2  (cost=3.27..11.45 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=8761)
                Recheck Cond: (test1_id = t1.id)
                Filter: ((category)::text = 'C'::text)
                ->  Bitmap Index Scan on idx_id  (cost=0.00..3.27 rows=3 width=0) (actual time=0.003..0.003 rows=3 loops=8761)
                      Index Cond: (test1_id = t1.id)
Total runtime: 232.419 ms

— フランク・ハイケンズ
ソース

ありがとう、これは確かにパフォーマンスを向上させますが、私の実際のテーブルには、実際に対応するインデックスがすでにあります。初期の問題はまだ存在しています。SELECT a, a, a, a, a FROM testview1それでも、の5倍の時間がかかりますSELECT a FROM testview1。

— ブルーノ