このゲームで勝った単語のセットを見つけるための最速のpythonコード

これは、子供向けの一連のアクティビティカードからの単語ゲームです。ルールの下には、/ usr / share / dict / wordsを使用して最適なトリプレットを見つけるためのコードがあります。私はそれが面白い最適化問題だと思い、人々が改善を見つけることができるかどうか疑問に思っています。

ルール

以下の各セットから1つの文字を選択します。
選択した文字（およびその他の文字）を使用して単語を選択します。
単語を採点します。
- 選択したセットの各文字には、セットに表示されている番号が付けられます（繰り返しが含まれます）。
- AEIOU カウント0
- 他のすべての文字は-2
上記の手順1〜3（手順1で文字を再利用しない）をさらに2回繰り返します。
最終スコアは、3つの単語スコアの合計です。

セット

（1点を1点に設定、2点を2点に設定など）

コード：

from itertools import permutations
import numpy as np

points = {'LTN' : 1,
          'RDS' : 2,
          'GBM' : 3,
          'CHP' : 4,
          'FWV' : 5,
          'YKJ' : 6,
          'QXZ' : 7}

def tonum(word):
    word_array = np.zeros(26, dtype=np.int)
    for l in word:
        word_array[ord(l) - ord('A')] += 1
    return word_array.reshape((26, 1))

def to_score_array(letters):
    score_array = np.zeros(26, dtype=np.int) - 2
    for v in 'AEIOU':
        score_array[ord(v) - ord('A')] = 0
    for idx, l in enumerate(letters):
        score_array[ord(l) - ord('A')] = idx + 1
    return np.matrix(score_array.reshape(1, 26))

def find_best_words():
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    wlist = [l for l in wlist if len(l) > 4]
    orig = [l for l in wlist]
    for rep in 'AEIOU':
        wlist = [l.replace(rep, '') for l in wlist]
    wlist = np.hstack([tonum(w) for w in wlist])

    best = 0
    ct = 0
    bestwords = ()
    for c1 in ['LTN']:
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                vals = [to_score_array(''.join(s)) for s in zip(c1, c2, c3, c4, c5, c6, c7)]
                                ct += 1
                                print ct, 6**6
                                scores1 = (vals[0] * wlist).A.flatten()
                                scores2 = (vals[1] * wlist).A.flatten()
                                scores3 = (vals[2] * wlist).A.flatten()
                                m1 = max(scores1)
                                m2 = max(scores2)
                                m3 = max(scores3)
                                if m1 + m2 + m3 > best:
                                    print orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()], m1 + m2 + m3
                                    best = m1 + m2 + m3
                                    bestwords = (orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()])
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

マトリックスバージョンは、純粋なpythonで1つ（辞書を使用して各単語を個別に採点）を作成し、numpyで別の1つを作成し、マトリックス乗算ではなくインデックスを使用して作成したものです。

次の最適化は、スコアリングから母音を完全に削除することです（そして修正されたord()関数を使用します）が、もっと速いアプローチがあるのではないかと思います。

編集：timeit.timeitコードを追加

編集：私は賞金を追加します。これは、私が最も好きな改善に与えます（または複数の回答がありますが、その場合はより多くの評判を獲得する必要があります）。

fastest-code python optimization

— トゥイ
ソース

ところで、私は彼が母親とゲームをしたときに覚えておくために8歳の3つの言葉を与えるコードを書きました。キシロピログラフィーの意味がわかりました。

これは楽しい質問です。次の情報を提供すると、回答が得られる可能性が高くなると思います。（1）オンラインワードリストへのリンク。全員が同じデータセットで作業できるようにします。（2）ソリューションを単一の機能に配置します。（3）time-itモジュールを使用してその機能を実行し、タイミングを表示します。（4）ディスク速度をテストしないように、辞書データのロードを関数の外に置くようにしてください。ユーザーは、既存のコードをソリューションを比較するためのフレームワークとして使用できます。

timeitを使用するように書き直しますが、公正な比較のために、私は自分のマシンを使用する必要があります（ソリューションを投稿する人々のために喜んでそれを行います）。単語リストは、ほとんどのシステムで利用できるように、ではない場合は、ここではいくつかあるはずです。wordlist.sourceforge.net

各ユーザーが自分のマシンで自分のソリューションと他の投稿されたソリューションの時間を計測すると、公正な比較が可能です。プラットフォームによって多少の違いがありますが、一般にこの方法は機能します。

うーん、その場合、これが正しいサイトかどうか疑問に思います。SOが最適だったと思います。

— ジョーイ

回答:

各単語の最高のスコアを事前に計算するというキースのアイデアを使用して、コンピューターでの実行時間を約0.7秒に短縮することができました（75,288単語のリストを使用）。

トリックは、選んだ文字のすべての組み合わせではなく、再生する単語の組み合わせを調べることです。いくつかの単語の組み合わせ（私の単語リストを使用して203）を除くすべての単語を無視することができます。これは、それらが既に見つかったスコアよりも高いスコアを取得できないためです。実行時間のほぼすべてが、単語スコアの事前計算に費やされます。

Python 2.7：

import collections
import itertools


WORDS_SOURCE = '../word lists/wordsinf.txt'

WORDS_PER_ROUND = 3
LETTER_GROUP_STRS = ['LTN', 'RDS', 'GBM', 'CHP', 'FWV', 'YKJ', 'QXZ']
LETTER_GROUPS = [list(group) for group in LETTER_GROUP_STRS]
GROUP_POINTS = [(group, i+1) for i, group in enumerate(LETTER_GROUPS)]
POINTS_IF_NOT_CHOSEN = -2


def best_word_score(word):
    """Return the best possible score for a given word."""

    word_score = 0

    # Score the letters that are in groups, chosing the best letter for each
    # group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts_sum = 0
        max_letter_count = 0
        for letter in group:
            if letter in word:
                count = word.count(letter)
                letter_counts_sum += count
                if count > max_letter_count:
                    max_letter_count = count
        if letter_counts_sum:
            word_score += points_if_chosen * max_letter_count
            total_not_chosen += letter_counts_sum - max_letter_count
    word_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return word_score

def best_total_score(words):
    """Return the best score possible for a given list of words.

    It is fine if the number of words provided is not WORDS_PER_ROUND. Only the
    words provided are scored."""

    num_words = len(words)
    total_score = 0

    # Score the letters that are in groups, chosing the best permutation of
    # letters for each group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts = []
        # Structure:  letter_counts[word_index][letter] = count
        letter_counts_sum = 0
        for word in words:
            this_word_letter_counts = {}
            for letter in group:
                count = word.count(letter)
                this_word_letter_counts[letter] = count
                letter_counts_sum += count
            letter_counts.append(this_word_letter_counts)

        max_chosen = None
        for letters in itertools.permutations(group, num_words):
            num_chosen = 0
            for word_index, letter in enumerate(letters):
                num_chosen += letter_counts[word_index][letter]
            if num_chosen > max_chosen:
                max_chosen = num_chosen

        total_score += points_if_chosen * max_chosen
        total_not_chosen += letter_counts_sum - max_chosen
    total_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return total_score


def get_words():
    """Return the list of valid words."""
    with open(WORDS_SOURCE, 'r') as source:
        return [line.rstrip().upper() for line in source]

def get_words_by_score():
    """Return a dictionary mapping each score to a list of words.

    The key is the best possible score for each word in the corresponding
    list."""

    words = get_words()
    words_by_score = collections.defaultdict(list)
    for word in words:
        words_by_score[best_word_score(word)].append(word)
    return words_by_score


def get_winning_words():
    """Return a list of words for an optimal play."""

    # A word's position is a tuple of its score's index and the index of the
    # word within the list of words with this score.
    # 
    # word played: A word in the context of a combination of words to be played
    # word chosen: A word in the context of the list it was picked from

    words_by_score = get_words_by_score()
    num_word_scores = len(words_by_score)
    word_scores = sorted(words_by_score, reverse=True)
    words_by_position = []
    # Structure:  words_by_position[score_index][word_index] = word
    num_words_for_scores = []
    for score in word_scores:
        words = words_by_score[score]
        words_by_position.append(words)
        num_words_for_scores.append(len(words))

    # Go through the combinations of words in lexicographic order by word
    # position to find the best combination.
    best_score = None
    positions = [(0, 0)] * WORDS_PER_ROUND
    words = [words_by_position[0][0]] * WORDS_PER_ROUND
    scores_before_words = []
    for i in xrange(WORDS_PER_ROUND):
        scores_before_words.append(best_total_score(words[:i]))
    while True:
        # Keep track of the best possible combination of words so far.
        score = best_total_score(words)
        if score > best_score:
            best_score = score
            best_words = words[:]

        # Go to the next combination of words that could get a new best score.
        for word_played_index in reversed(xrange(WORDS_PER_ROUND)):
            # Go to the next valid word position.
            score_index, word_chosen_index = positions[word_played_index]
            word_chosen_index += 1
            if word_chosen_index == num_words_for_scores[score_index]:
                score_index += 1
                if score_index == num_word_scores:
                    continue
                word_chosen_index = 0

            # Check whether the new combination of words could possibly get a
            # new best score.
            num_words_changed = WORDS_PER_ROUND - word_played_index
            score_before_this_word = scores_before_words[word_played_index]
            further_points_limit = word_scores[score_index] * num_words_changed
            score_limit = score_before_this_word + further_points_limit
            if score_limit <= best_score:
                continue

            # Update to the new combination of words.
            position = score_index, word_chosen_index
            positions[word_played_index:] = [position] * num_words_changed
            word = words_by_position[score_index][word_chosen_index]
            words[word_played_index:] = [word] * num_words_changed
            for i in xrange(word_played_index+1, WORDS_PER_ROUND):
                scores_before_words[i] = best_total_score(words[:i])
            break
        else:
            # None of the remaining combinations of words can get a new best
            # score.
            break

    return best_words


def main():
    winning_words = get_winning_words()
    print winning_words
    print best_total_score(winning_words)

if __name__ == '__main__':
    main()

これ['KNICKKNACK', 'RAZZMATAZZ', 'POLYSYLLABLES']により、スコアが95のソリューションが返されます。キースのソリューションの単語が単語リストに追加されると、彼と同じ結果が得られます。thouisの「xylopyrography」を追加する['XYLOPYROGRAPHY', 'KNICKKNACKS', 'RAZZMATAZZ']と、105のスコアが得られます。

— Flornquake
ソース

ここにアイデアがあります-あなたはほとんどの単語がひどいスコアを持っていることに気づくことにより、多くの単語をチェックすることを避けることができます。50ポイントを獲得できるかなり良いスコアリングプレイを見つけたとします。その場合、50ポイントを超えるプレイには、少なくともceil（51/3）= 17ポイントの単語が必要です。したがって、おそらく17ポイントを生成できない単語は無視できます。

上記を実行するコードを次に示します。辞書内の各単語に対して最高のスコアを計算し、スコアでインデックス付けされた配列に保存します。次に、その配列を使用して、必要な最小スコアを持つ単語のみをチェックします。

from itertools import permutations
import time

S={'A':0,'E':0,'I':0,'O':0,'U':0,
   'L':1,'T':1,'N':1,
   'R':2,'D':2,'S':2,
   'G':3,'B':3,'M':3,
   'C':4,'H':4,'P':4,
   'F':5,'W':5,'V':5,
   'Y':6,'K':6,'J':6,
   'Q':7,'X':7,'Z':7,
   }

def best_word(min, s):
    global score_to_words
    best_score = 0
    best_word = ''
    for i in xrange(min, 100):
        for w in score_to_words[i]:
            score = (-2*len(w)+2*(w.count('A')+w.count('E')+w.count('I')+w.count('O')+w.count('U')) +
                      3*w.count(s[0])+4*w.count(s[1])+5*w.count(s[2])+6*w.count(s[3])+7*w.count(s[4])+
                      8*w.count(s[5])+9*w.count(s[6]))
            if score > best_score:
                best_score = score
                best_word = w
    return (best_score, best_word)

def load_words():
    global score_to_words
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    score_to_words = [[] for i in xrange(100)]
    for w in wlist: score_to_words[sum(S[c] for c in w)].append(w)
    for i in xrange(100):
        if score_to_words[i]: print i, len(score_to_words[i])

def find_best_words():
    load_words()
    best = 0
    bestwords = ()
    for c1 in permutations('LTN'):
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
            print time.ctime(),c1,c2,c3
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                sets = zip(c1, c2, c3, c4, c5, c6, c7)
                                (s1, w1) = best_word((best + 3) / 3, sets[0])
                                (s2, w2) = best_word((best - s1 + 2) / 2, sets[1])
                                (s3, w3) = best_word(best - s1 - s2 + 1, sets[2])
                                score = s1 + s2 + s3
                                if score > best:
                                    best = score
                                    bestwords = (w1, w2, w3)
                                    print score, w1, w2, w3
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

最小スコアは100まで急速に上昇します。これは、33ポイント以上の単語のみを考慮する必要があることを意味し/usr/share/dict/wordsます。私のマシンで約30分で実行され、以下を生成します：

(('MAXILLOPREMAXILLARY', 'KNICKKNACKED', 'ZIGZAGWISE'), 101)

— キース・ランドール
ソース

いいねこれをマトリックスソリューションに追加します（スコアが低くなりすぎると単語を削除します）が、これは私が思いついた純粋なpythonソリューションよりもはるかに優れています。

— -thouis

これまでに多くのforループがネストされたことを見たことはありません。

— ピーターオルソン

アイデアをマトリックススコアリング（および可能な最高のスコアのより厳密な上限）と組み合わせると、私のマシンで（約1時間から）約80秒まで時間が短縮されます。コードはこちら

— 11年

その時間の大部分は、可能な限り最高のスコアを事前に計算することにあります。

— -thouis