文字列から一意の要素を削除する


12

文字列内の一意の文字を見つけることは非常に一般的なユースケースのようであるため、この質問に思いつきました。しかし、それらを取り除きたい場合はどうでしょうか?

入力には小文字のアルファベットのみが含まれます。aからzの文字のみが使用されます。入力長は1〜1000文字です。

例:
入力:helloworld
出力:llool

目的:最短のコード
優先言語:TIOBE言語の上位20のいずれか

回答:


7

Perl、28 24文字(「p」オプションに1を含む)

s/./$&x(s!$&!$&!g>1)/eg

使用法:

> perl -pe 's/./$&x(s!$&!$&!g>1)/eg'
helloworld
llool

最初はネガティブな先読みとネガティブな後読みでこれができると思っていましたが、ネガティブな後読みは固定長でなければならないことがわかりました。そこで、代わりにネストされた正規表現を使用しました。おかげでは暴徒のための$&ヒント。


+1。私は、Rubyの回答でこのことを理解できると単純に考えました。
スティーブンランバルスキー

私は中国語のテキストでこれを試しましたが、うまくいきませんでした。=(
ixtmixilix

@ixtmixilix -と、その後実行perlの-CDSオプション
暴徒

@ixtmixilixユニコードとPerlのサポートについて十分に知りませんが、恐らく中国語のテキストで動作させる方法を提案します。幸いなことに、質問は小文字のaからzのみです。
ガレス

1
すべてを置換する$1$&、括弧のペアが2つなくなる可能性があります。
暴徒

12

(GolfScript、15 13文字)

:;{.;?);>?)},

GolfScriptはトップ20の1つではありませんが、GolfScriptのないコードゴルフ...(自分で実行してください

以前のバージョン:(スクリプトを実行

1/:;{;\-,;,(<},

1
:;?あなたは意図的に初心者を混乱させようとしていますよね?;)
ピーターテイラー

@PeterTaylorそうですね。私はaを選ぶべき)でした-それはそれからそれをスマイリーにするでしょう:)。残念ながら、数字1を削除する方法さえ見つけられませんでした(GolfScript初心者向けの注意:;コード内xの文字を(または他の文字や数字-またはスクリプトで使用されていない文字)に置き換えることができます)。この特殊なケースで;は単なる変数名であり、「ポップアンド破棄」という意味はありません。GolfScriptでは、ほとんどすべてのトークンが変数であり、事前定義されたシンボルを使用すると、スクリプトを部外者にとってさらに読みにくくすることができます;-)。)
ハワード

別の13文字のソリューション::a{]a.@--,(},
Ilmari Karonen 14

7

J, 12 characters

Having entered a valid Perl answer, here's an invalid (language not in the TIOBE top 20) answer.

a=:#~1<+/@e.

Usage:

   a 'helloworld'
llool

Declares a verb a which outputs only non unique items.



4

Ruby 46 40 36

gets.chars{|c|$><<c if$_.count(c)>1}

You may save 4 chars if you inline s and use $_ for the second appearance (the space before is then dispensable).
Howard

@Howard: Nice catch. Thanks. I have about zero experience with Ruby.
Steven Rumbalski

2

Perl 44

$l=$_;print join"",grep{$l=~/$_.*$_/}split""

Execution:

perl -lane '$l=$_;print join"",grep{$l=~/$_.*$_/}split""' <<< helloworld
llool


2

Python 2.7 (52 51), Python 3 (52)

I didn't expect it to be so short.

2.7: a=raw_input();print filter(lambda x:a.count(x)>1,a)

3.0: a=input();print''.join(i for i in a if a.count(x)>1)

raw_input(): store input as a string (input() = eval(raw_input()))
(Python 3.0: input() has been turned into raw_input())

filter(lambda x:a.count(x)>1,a): Filter through all characters within a if they are found in a more than once (a.count(x)>1).


If you use python 3 instead, you can use input() rather than raw_input(). Although you have to add one character for a closing bracket, since print is a function in python 3.
Strigoides

@Strigoides: I have added a Python 3 code snippet to my answer.
beary605

Python 3's filter returns an iterator... You'll need to do ''.join(...)
JBernardo

@JBernardo: :( Dang. Thanks for notifying me. As you can see, I don't use 3.0.
beary605

2

sed and coreutils (128)

Granted this is not part of the TIOBE list, but it's fun (-:

<<<$s sed 's/./&\n/g'|head -c -1|sort|uniq -c|sed -n 's/^ *1 (.*)/\1/p'|tr -d '\n'|sed 's:^:s/[:; s:$:]//g\n:'|sed -f - <(<<<$s)

De-golfed version:

s=helloworld
<<< $s sed 's/./&\n/g'        \
| head -c -1                  \
| sort                        \
| uniq -c                     \
| sed -n 's/^ *1 (.*)/\1/p'   \
| tr -d '\n'                  \
| sed 's:^:s/[:; s:$:]//g\n:' \
| sed -f - <(<<< $s)

Explanation

The first sed converts input into one character per line. The second sed finds characters that only occur once. Third sed writes a sed script that deletes unique characters. The last sed executes the generated script.


2

Brachylog (v2), 8 bytes

⊇.oḅlⁿ1∧

Try it online!

Function submission. Technically noncompeting because the question has a limitation on what langauges are allowed to compete (however, several other answers have already ignored the restriction).

Explanation

⊇.oḅlⁿ1∧
⊇         Find {the longest possible} subset of the input
  o       {for which after} sorting it,
   ḅ        and dividing the sorted input into blocks of identical elements,
    lⁿ1     the length of a resulting block is never 1
 .     ∧  Output the subset in question.

Why do you CW all your solutions?
Shaggy

1
@Shaggy: a) because I'm fine with other people editing them, b) to avoid gaining reputation if they're upvoted. In general I think the gamififcation of Stack Exchange is a huge detriment to the site – there's sometimes a negative correlation between the actions that you can take to improve rep and the actions you can take to actually improve the site. Additionally, being at a high reputation count sucks; the site keeps nagging you to do admin tasks, and everything you do is a blunt instrument (e.g. when you're at low rep you can suggest an edit, at high rep it just gets forced through).
ais523

2

Japt, 6 5 bytes

ÆèX É

-1 byte thanks to @Oliver

Try it online!


2
Welcome to Japt! There is actually a shortcut for o@: Æ
Oliver

@Oliver Another shortcut that I missed, cool, thanks :)
Quintec

@Oliver, the better question is how the feck did I miss it?! :\
Shaggy

1

Python (56)

Here's another (few chars longer) alternative in Python:

a=raw_input();print''.join(c for c in a if a.count(c)>1)

If you accept output as a list (e.g. ['l', 'l', 'o', 'o', 'l']), then we could boil it down to 49 characters:

a=raw_input();print[c for c in a if a.count(c)>1]

Hey, >1 is a good idea! May I incorporate that into my solution?
beary605

@beary605 Sure no problem at all - easy way to trim a character off :D
arshajii

1

Mathematica 72 63

Ok, Mathematica isn't among the top 20 languages, but I decided to join the party anyway.

x is the input string.

"" <> Select[y = Characters@x, ! MemberQ[Cases[Tally@y, {a_, 1} :> a], #] &]


1

C# – 77 characters

Func<string,string>F=s=>new string(s.Where(c=>s.Count(d=>c==d)>1).ToArray());

If you accept the output as an array, it boils down to 65 characters:

Func<string,char[]>F=s=>s.Where(c=>s.Count(d=>c==d)>1).ToArray();

1

Ocaml, 139 133

Uses ExtLib's ExtString.String

open ExtString.String
let f s=let g c=fold_left(fun a d->a+Obj.magic(d=c))0 s in replace_chars(fun c->if g c=1 then""else of_char c)s

Non-golfed version

open ExtString.String
let f s =
  let g c =
    fold_left
      (fun a c' -> a + Obj.magic (c' = c))
      0
      s
  in replace_chars
  (fun c ->
    if g c = 1
    then ""
    else of_char c)
  s

The function g returns the number of occurences of c in the string s. The function f replaces all chars either by the empty string or the string containing the char depending on the number of occurences. Edit: I shortened the code by 6 characters by abusing the internal representation of bools :-)

Oh, and ocaml is 0 on the TIOBE index ;-)


f*** the TIOBE index.
ixtmixilix

I agree. Also, thanks for the upvote. Now I can comment :-)
ReyCharles

1

PHP - 70

while($x<strlen($s)){$c=$s[$x];echo substr_count($s,$c)>1?$c:'';$x++;}

with asumption $s = 'helloworld'.


1

Java 8, 90 bytes

s->{for(char c=96;++c<123;s=s.matches(".*"+c+".*"+c+".*")?s:s.replace(c+"",""));return s;}

Explanation:

Try it online.

s->{                         // Method with String as both parameter and return-type
  for(char c=96;++c<123;     //  Loop over the lowercase alphabet
    s=s.matches(".*"+c+".*"+c+".*")?
                             //   If the String contains the character more than once
       s                     //    Keep the String as is
      :                      //   Else (only contains it once):
       s.replace(c+"",""));  //    Remove this character from the String
  return s;}                 //  Return the modified String

1

PowerShell, 59 bytes

"$args"-replace"[^$($args|% t*y|group|?{$_.Count-1}|% n*)]"

Try it online!

Less golfed:

$repeatedСhars=$args|% toCharArray|group|?{$_.Count-1}|% name
"$args"-replace"[^$repeatedСhars]"

Note: $repeatedChars is an array. By default, a Powershell joins array elements by space char while convert the array to string. So, the regexp contains spaces (In this example, [^l o]). Spaces do not affect the result because the input string contains letters only.


1

APL (Dyalog Extended), 8 bytesSBCS

Anonymous tacit prefix function.

∊⊢⊆⍨1<⍧⍨

Try it online!

⍧⍨ count-in selfie (count occurrences of argument elements in the argument itself)

1< Boolean mask where one is less than that

⊢⊆⍨ partition the argument by that mask (beginning a new partition on 1s and removing on 0s)

ϵnlist (flatten)



1

R, 70 bytes

a=utf8ToInt(scan(,''));intToUtf8(a[!a%in%names(table(a)[table(a)<2])])

Try it online!

A poor attempt, even from a TIOBE top 20 language. I know something can be done about the second half, but at the moment, any golfs escape me.




0

PHP - 137

Code

implode('',array_intersect(str_split($text),array_flip(array_filter(array_count_values(str_split($text)),function($x){return $x>=2;}))));

Normal Code

$text   = 'helloworld';
$filter = array_filter(array_count_values(str_split($text)), function($x){return $x>=2;});
$output = implode('',array_intersect(str_split($text),array_flip($filter)));

echo $output;

0

PHP - 83 78

<?for($a=$argv[1];$i<strlen($a);$r[$a[$i++]]++)foreach($ras$k=>$c)if($c>1)echo$k

Improved version:

<?for($s=$argv[1];$x<strlen($s);$c=$s[$x++]) echo substr_count($s,$c)>1?$c:'';

Of course this needs notices to be turned off

Edit: Improvement inspired by @hengky mulyono

I am so bad at codegolf :)


0

C++, 139 bytes

string s;cin>>s;string w{s}; auto l=remove_if(begin(s),end(s),[&w](auto&s){return count(begin(w),end(w),s)==1;});s.erase(l,end(s));cout<<s;

ungolfed:

#include <algorithm>
#include <string>
#include <iostream>

int main() {
  using namespace std;
  string s;
  cin >> s;
  const string w{s};
  auto l = remove_if(begin(s), end(s), [&w](auto& s) {
                                         return count(begin(w), end(w), s) == 1;
                                       });
  s.erase(l, end(s));
  cout << s;
  return 0;
}
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.