文字列から繰り返し単語を削除する


12

入力された文から繰り返し単語をすべて削除します。

入力は次のようにcat dog cat dog bird dog Snake snake Snakeなり、出力はになりますcat dog bird Snake snake。単語を区切るスペースは常に1つです。

出力順序は入力と同じでなければなりません。(例を参照)

句読点を処理する必要はありませんが、大文字の処理が必要です。


13
少なくとも数日間は回答を受け入れるのを待つことをお勧めします。もっと短い解決策がまだあるかもしれません。
アレックスA.

1
uniqcharsと同様のソリューションが期待されますが、重複を削除するビルトインが禁止されていない点が異なります。
xnor

2
例を参照すると、特別な大文字の処理はありません。Snakeまたsnake、単に異なるものとして扱われます
-edc65

@AlexA .:実際、すでにあります。codegolf.stackexchange.com/questions/62044/...
ev3commander

回答:


1

gs2、3バイト

,É-

CP437でエンコードされています

STDINは、プログラムの開始時にプッシュされます。,スペースに分割します。Éuniq、重複をフィルタリングします。-スペースで結合します。


10

CJam、7文字

qS/_&S*

おそらくはるかに短くすることができます...しかし、私がCJamを使用したことはほとんどありません。^。^

q入力を読み取り、S/スペースで分割し、_&複製し、セットワイズANDを適用して(したがって、重複を取り除きます)、S*スペースで再結合します。

オンライン通訳リンク


1
どうすれば7よりもずっと短くすることができますか?笑
ランチャー

誰かがやった。
エイリアンG

8

Haskell、34バイト

import Data.List
unwords.nub.words

使用例:(unwords.nub.words) "cat dog cat dog bird dog Snake snake Snake"-> "cat dog bird Snake snake"


8

APL、22 20バイト

{1↓∊∪(∊∘' '⊂⊢)' ',⍵}

これにより、右側の文字列を受け入れて文字列を返す名前のないモナド関数が作成されます。

説明:

               ' ',⍵}    ⍝ Prepend a space to the input string
     (∊∘' '⊂⊢)          ⍝ Split the string on spaces using a fork
    ∪                    ⍝ Select the unique elements
{1↓∊                     ⍝ Join into a string and drop the leading space

オンラインで試す

デニスのおかげで2バイト節約できました!


3
私は非難解で非ゴルフの言葉を使った答えが大好きです。
ダースEgregious


7

JavaScript(ES6)33

この回答を参照)

EcmaScript 6準拠のブラウザーで以下のスニペットを実行してテストします(セット、スプレッド演算子、テンプレート文字列、矢印関数を実装しています-私はFirefoxを使用しています)。

注:Setへの変換はすべての重複ドロップし、Setは元の順序を維持します。

f=s=>[...Set(s.split` `)].join` `

function test() { O.innerHTML=f(I.value) }

test()
#I { width: 70% }
<input id=I value="cat dog cat dog bird dog Snake snake Snake"/><button onclick="test()">-></button>
<pre id=O></pre>


すごいすごい...私は、あなたが私が考えている解決策を25%以上削減する能力に絶えず驚いています。+1
ETHproductions

1
問題を見て、すぐにセットを考えました...あなたがすでにそれをやったことを理解するために= Pは非常に素晴らしいです!
Mwr247

元の順序を維持するにはどうすれば設定できますか?
njzk2

@ njzk2は、言語の開発者に尋ねます。それは可能性があること:セットが内部的に配列され、それぞれの挿入時に重複を拒否するためのチェックがあります。とにかく実装の詳細
-edc65

@ njzk2 方法はわかりませんが、この事実は言語によって指定されていることがわかります。セットオブジェクトは値のコレクションであり、その要素を挿入順に繰り返すことができます。セット内の値は1回しか使用できません。セットのコレクション内で一意です。developer.mozilla.org/it/docs/Web/JavaScript/Reference/...
edc65

6

TeaScript、12バイト

TeaScriptはゴルフ用のJavaScriptです。

xs` `u()j` `

This is pretty short. It splits on each space, filters out duplicates, then rejoins.

Try it online


Is it tee-a script or tee script?

@MathiasFoster it would be "tee-script"
Downgoat

Does TeaScript have letters reserved for variable names? Most of them appear to be shorthands for built-in properties.
intrepidcoder

@intrepidcoder yes all of these: cdfghijklmnopstuvw are reserved for variables, they are all pre-initialized to 0. b is also reserved for a variable name, it is pre-initialized to an empty string
Downgoat

6

PowerShell, 15 Bytes

$args|select -u

Whoa, an actual entry where PowerShell is somewhat competitive? That's unpossible!

Takes the string as input arguments, pipes to Select-Object with the -Unique flag. Spits out an array of strings, preserving order and capitalization as requested.

Usage:

PS C:\Tools\Scripts\golfing> .\remove-repeated-words-from-string.ps1 cat dog cat dog bird dog Snake snake Snake
cat
dog
bird
Snake
snake

If this is too "cheaty" in assuming the input can be as command-line arguments, then go for the following, at 24 21 Bytes (saved some bytes thanks to blabb). Interestingly, using the unary operator in this direction happens to also work if the input string is demarcated with quotes or as individual arguments, since the default -split is by spaces. Bonus.

-split$args|select -u

Relying on the environment's behavior of spoon-feeding the code with readily split up input…?
manatwork

@manatwork I've added a clarification if the first usage is considered too "cheaty" -- since it's not clear exactly how the input is specified, we'll leave it up to the OP.
AdmBorkBork

And now is clear how efficients are PowerShell's own features. That 24 really deserves an upvote.
manatwork

@timmyD you can chop off 3 bytes to the uncheaty ?? version by using the unary split and no need for "" '' in the commandline args too :\>ls -l split.ps1 & type split.ps1 & echo.&powershell -nologo -f split.ps1 cat dog cat dog bird dog Snake snake Snake -rw-rw-rw- 1 Admin 0 21 2015-11-02 19:06 split.ps1 -split$args|select -u cat dog bird Snake snake
blabb

4

Julia, 29 bytes

s->join(unique(split(s))," ")

This creates an unnamed function that splits the string into a vector on spaces, keeps only the unique elements (preserving order), and joins the array back into a string with spaces.


4

R, 22 bytes

cat(unique(scan(,"")))

This reads a string from STDIN and splits it into a vector on spaces using scan(,""), selects only unique elements, then concatenates them into a string and prints it to STDOUT using cat.


4

Retina, 22 bytes

 (\w+)\b(?<=\b\1\b.+)

Save the file with a trailing linefeed and run it with the -s flag.

This is fairly straight forward in that it matches a single word, and the lookbehind checks whether that same word has appeared in the string before. The trailing linefeed causes Retina to work in Replace mode with an empty replacement string, removing all matches.


4

Mathematica, 43 39 bytes

StringRiffle@*Keys@*Counts@*StringSplit

Kudos for using StringRiffle[].
Michael Stern

could use Keys@Counts instead of DeleteDuplicates
branislav

@branislav Does Keys@Counts preserve order?
LegionMammal978

@LegionMammal978 Counts[list] gives an association whose keys are in the same order as they first occur as elements of list.
branislav


3

C++11, 291 bytes

#include<iostream>
#include<string>
#include<list>
#include<sstream>
#include<algorithm>
using namespace std;main(){string s;getline(cin,s);list<string>m;stringstream b(s);while(getline(b,s,' '))if(find(m.begin(),m.end(),s)==m.end())m.push_back(s);for(auto a:m)cout<<a<<' ';cout<<endl;}

I don't see a whole lot of C++ answers compared to golfing languages, so why not. Note that this uses C++11 features, and so if your compiler is stuck in the dark ages sufficiently old enough, you may need to pass a special compilation switch to make it use the C++11 standard. For g++, it's -std=c++11 (only needed for versions < 5.2). Try it online


If you compare the number of bytes with other languages, you will see why no one is using C++.
CroCo

3
@CroCo If you realize the point of this site is to find the shortest solution in each language, you will see why I posted this answer.
Mego

sorry I'm not aware of it.
CroCo

1
Why not use a set? It allows no duplicates by design. Just push into it.
edmz

1
@black A set is not guaranteed to have the items in the same order they were added.
Mego

3

K5, 9 bytes

" "/?" "\

FYI, this is a function.

Explanation

     " "\    Split the input on spaces
    ?        Find all the unique elements
" "/         Join them back together

2

Matlab: 18 Bytes

unique(d,'stable')

where d is d = {'cat','dog','cat','dog','bird','dog','Snake','snake','Snake'}.

The result is 'cat' 'dog' 'bird' 'Snake' 'snake'


4
Welcome to Programming Puzzles and Code Golf! Submissions here need to either be full programs that read from STDIN and write to STDOUT, or functions which accept input and return output. As it stands, this is merely a snippet; it assumes the variable d is already assigned. You can rectify this by using a function handle: @(d)unique(d,'stable'), at the cost of 4 bytes.
Alex A.

2

Python 3, 55

l=[]
for x in input().split():l+=[x][x in l:]
print(*l)

Yeesh, this is long. Unfortunately, Python's set doesn't keep the order of the elements, so we have to do the work ourselves. We iterate through the input words, keeping a list l of elements that aren't yet in l. Then, we print the contents of l space-separated.

A string version of l would not work if some words are substrings of other words.


2

C#, 38 bytes

String.Join(" ",s.Split().Distinct());

2
I'm not sure you can assume input is already populated in s, I think you should get it as an argument.
Jacob

3
Welcome to PPCG! Please have a look at our default answer formats. Answers should either be full programs or functions. Unnamed functions (like lambda literals) are fine, but snippets which expect the code to already exist in some variable/on the stack etc. or require a REPL environment are generally disallowed unless the OP explicitly permits them.
Martin Ender

2

Perl 6, 14 bytes

As a whole program the only way you would write it is 21 bytes long

say $*IN.words.unique # 21 bytes

As a lambda expression the shortest is 14 bytes

*.words.unique # 14 bytes
say ( *.words.unique ).('cat dog cat dog bird dog Snake snake Snake')

my &foo = *.words.unique;
say foo $*IN;

While the output is a List, if you put it in a stringifying context it will put a space between the elements. If it was a requirement to return a string you could just add a ~ to the front ~*.words.unique.


If snippets were allowed, you could shorten it to 13 bytes by removing the *.

$_ = 'cat dog cat dog bird dog Snake snake Snake';

say .words.unique

1

Python 3, 87 80 bytes

turns out the full program version is shorter

s=input().split(' ')
print(' '.join(e for i,e in enumerate(s)if e not in s[:i]))

Did it without regex, I am happy

Try it online


1

Lua, 94 bytes

function c(a)l={}return a:gsub("%S+",function(b)if l[b]then return""else l[b]=true end end)end

An anonymous user suggested to replace ... return""else l[b]=true end end... with ...return""end l[b]=""end....
Jonathan Frech

1

awk, 25

BEGIN{RS=ORS=" "}!c[$0]++

Output:

$ printf "cat dog cat dog bird dog Snake snake Snake" | awk 'BEGIN{RS=ORS=" "}!c[$0]++'
cat dog bird Snake snake $ 
$ 

1

JavaScript, 106 102 100 bytes

function(s){o={};s.split(' ').map(function(w){o[w]=1});a=[];for(w in o)a.push(w);return a.join(' ')}

// way too long for JS :(


Try using JS (aka ECMAScript) 6 arrow functions, which should save 6 bytes. Also, I can already see porting this to CoffeeScript will save at least 30 bytes.
kirbyfan64sos

This answer is in native JavaScript (ECMA5), there is edc65's one for es6.
Jacob


1

PHP 64 59 bytes

function r($i){echo join(" ",array_unique(split(" ",$i)));}

explode()split(), implode()join()?
manatwork

Thanks! Good suggestions. Seems split is being depricated though, but guess that does not matter for codegolving.
Jeroen

1

AppleScript, 162 bytes

Interestingly, this is almost identical to the non-repeating characters thing.

set x to(display dialog""default answer"")'s text returned's words
set o to""
repeat with i in x
considering case
if not i is in o then set o to o&i&" "
end
end
o

I didn't actually know the considering keyword before this. the more you know...


1

Burlesque, 6 bytes

blsq ) "cat dog cat dog bird dog Snake snake Snake"wdNBwD
cat dog bird Snake snake

Rather simple: split words, nub (nub = remove duplicates), convert back to words.


1

Gema, 21 characters

*\S=${$0;$0}@set{$0;}

(Very similar to the unique character solution, as there are no arrays in Gema, so allowing built-in unique functions not helps us much.)

Sample run:

bash-4.3$ gema '*\S=${$0;$0}@set{$0;}' <<< 'cat dog cat dog bird dog Snake snake Snake'
cat dog bird Snake snake 

1

Scala, 44 47 bytes

(s:String)=>s.split(" ").distinct.mkString(" ")

EDIT: using toSet might not preserve order, so I'm now using distinct // that just cost me 3 bytes :(


0

PHP, 37 Bytes

Assuming $s is the input string.

print_r(array_flip(explode(' ',$s)));
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.