grepが同じ文字列を複数回印刷するのを防ぐ方法は？

14

次を含むファイルをgrepした場合：

These are words
These are words
These are words
These are words

...という単語のThese場合、文字列をThese are words4回出力します。

grepが繰り返し文字列を複数回印刷するのを防ぐにはどうすればよいですか？それ以外の場合、grepの出力を操作して重複行を削除するにはどうすればよいですか？

command-line bash grep

— トレー
ソース

一致の順序を出力に保持する必要がありますか？それ以外の場合、John1024がポストしたコマンドは機能します。

— コス

21

Unixの哲学は、1つのことを実行し、それらをうまく実行するツールを持つことです。この場合grepは、ファイルからテキストを選択するツールです。重複があるかどうかを調べるために、テキストをソートします。重複を削除するには、の-uオプションを使用しますsort。したがって：

grep These filename | sort -u

sort多くのオプションがあります：をご覧くださいman sort。重複をカウントしたい場合、または重複の有無を判断するためのより複雑なスキームを使用する場合は、ソート出力をuniq：にパイプして、オプションgrep These filename | sort | uniqについてmanuniq`を参照してください。

— ジョン1024
ソース

2

grep単一の文字列のみを探している場合は、追加のスイッチを使用します

grep -m1 'These' filename

から man grep

-m NUM, --max-count=NUM
        Stop reading a file after NUM matching lines.  If the input is
        standard input from a regular file, and NUM matching lines are
        output, grep ensures that the standard input is positioned  to
        just  after  the  last matching  line  before exiting, regardless
        of the presence of trailing context lines.  This enables a calling
        process to resume a search.  When grep stops after NUM matching
        lines, it outputs any trailing context lines.  When the -c or
        --count option is also used, grep does not output a count greater
        than NUM.  When the -v or --invert-match option is also used, grep
        stops after outputting NUM non-matching lines.

または使用awk ;）

awk '/These/ {print; exit}' foo

— AB
ソース

私見で最も適切な答えは-mフラグです。答えの一番上に置くことをお勧めします。非常に良い答えです！

— セルギーKolodyazhnyy

3

正規表現を使用している場合、これは機能しません。最初の一致の直後に停止し、一致する可能性のあるものが1つしか取得されないことを確認しません。

— csvan