Linuxコマンドまたはスクリプトは、テキストファイルの重複行をカウントしますか？

116

次の内容のテキストファイルがある場合

red apple
green apple
green apple
orange
orange
orange

次の結果を得るために使用できるLinuxコマンドまたはスクリプトはありますか？

1 red apple
2 green apple
3 orange

linux text duplicates

— タイムオン
ソース

214

それを送信してsort（隣接する項目をまとめるため）、次にuniq -cカウントを指定します。

sort filename | uniq -c

そして、リストを（頻度によって）ソートされた順序で取得するには、次のことができます

sort filename | uniq -c | sort -nr

— ひどい
ソース

48

ほぼborriblesと同じですが、dparamを追加すると、uniq重複のみが表示されます。

sort filename | uniq -cd | sort -nr

— ジャベリノ
ソース

1

ちょっとした-dメモに賛成。

— sepehr

6

uniq -c file

ファイルがまだソートされていない場合：

sort file | uniq -c

— ミフリッツ
ソース

3

これを試して

cat myfile.txt| sort| uniq

— ラフル
ソース

-cまたは-dフラグがないと、uniqは重複行と非重複行を区別しません。または、何か不足していますか？

— drevicko

2

cat <filename> | sort | uniq -c

— パジトン
ソース

2

アルファベット順に並べられたリストと一緒に生きられますか？

echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u

？

green apple
orange
red apple

または

sort -u FILE

-uは一意を表し、一意性はソートによってのみ到達します。

順序を維持するソリューション：

echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do   if [[ $line != $old ]]; then  echo $line;   old=$line; fi ; done }
red apple
green apple
orange

そして、ファイルで

cat file | { 
old=""
while read line
do
  if [[ $line != $old ]]
  then
    echo $line
    old=$line
  fi
done }

最後の2つは重複を削除するだけで、すぐ後に続きます-これはあなたの例に適合します。

echo "red apple
green apple
lila banana
green apple
" ...

バナナで分割された2つのリンゴを印刷します。

— ユーザー不明
ソース

0

カウントを取得するには：

$> egrep -o '\w+' fruits.txt | sort | uniq -c

      3 apple
      2 green
      1 oragen
      2 orange
      1 red

ソートされたカウントを取得するには：

$> egrep -o '\w+' fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red
      2 green
      2 orange
      3 apple

編集

ああ、これは言葉の境界に沿ったものではありませんでした。全行に使用するコマンドは次のとおりです。

$> cat fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red apple
      2 green apple
      2 orange

— クリス・エバーレ
ソース

0

以下は、Counterタイプを使用した簡単なpythonスクリプトです。利点は、ファイルを並べ替える必要がなく、基本的にはメモリを使用しないことです。

import collections
import fileinput
import json

print(json.dumps(collections.Counter(map(str.strip, fileinput.input())), indent=2))

出力：

$ cat filename | python3 script.py
{
  "red apple": 1,
  "green apple": 2,
  "orange": 3
}

または、単純なワンライナーを使用できます。

$ cat filename | python3 -c 'print(__import__("json").dumps(__import__("collections").Counter(map(str.strip, __import__("fileinput").input())), indent=2))'

— Orestisf
ソース