正規表現：平等主義シリーズに一致

前書き

私はここで多くの正規表現の挑戦を見ていませんので、いくつかの正規表現のフレーバーを使用していくつかの方法で行うことができるこの一見単純なものを提供したいと思います。正規表現愛好家にちょっとしたゴルフの時間を提供してくれることを願っています。

チャレンジ

挑戦は、私が「平等主義」シリーズと非常に大まかに言ったものと一致させることです：異なるキャラクターの等しい数のシリーズ。これについては、例を挙げて説明するのが最適です。

一致：

aaabbbccc
xyz 
iillppddff
ggggggoooooollllllffffff
abc
banana

一致しない：

aabc
xxxyyzzz
iilllpppddff
ggggggoooooollllllfff
aaaaaabbbccc
aaabbbc
abbaa
aabbbc

一般化するために、我々は、（フォームの件名を一致させたい任意の文字のリストについては、に、すべてのためにc₁)ⁿ(c₂)ⁿ(c₃)ⁿ...(c_k)ⁿc₁c_kc_i != c_i+1i, k > 1, and n > 0.

明確化：

入力は空になりません。
文字は文字列の後半で繰り返される場合があります（例： "banana"）
k > 1、したがって、文字列には常に少なくとも2つの異なる文字があります。
ASCII文字のみが入力として渡され、文字が行末記号になることはないと想定できます。

ルール

（この非常に明快な規則のブロックについてMartin Enderに感謝します）

答えは、追加のコードを含まない単一の正規表現で構成する必要があります（オプションで、ソリューションを機能させるために必要な正規表現修飾子のリストを除く）。ホスティング言語でコードを呼び出すことができる言語の正規表現フレーバーの機能（Perlのe修飾子など）を使用しないでください。

このチャレンジの前に存在していた正規表現フレーバーを使用できますが、フレーバーを指定してください。

たとえば、Pythonを使用している場合、正規表現がre.matchではなくre.searchで使用されていると想定してください。正規表現は、有効な平等主義の文字列の文字列全体と一致し、無効な文字列の一致を生成しない必要があります。必要な数のキャプチャグループを使用できます。

入力は常に、ラインターミネータを含まない2つ以上のASCII文字の文字列であると想定できます。

これは正規表現のゴルフであるため、バイト単位の最短正規表現が優先されます。言語で/.../正規表現を表すために区切り文字（通常は）が必要な場合は、区切り文字自体をカウントしないでください。ソリューションで修飾子が必要な場合は、修飾子ごとに1バイトを追加します。

基準

これは昔ながらのゴルフですから、効率を忘れて、正規表現をできるだけ小さくしてください。

使用した正規表現フレーバーに言及し、可能であれば、実際の表現のオンラインデモを示すリンクを含めてください。

code-golf string regular-expression

— ジェイティー
ソース

これは特に正規表現のゴルフですか？あなたはおそらく、そのルールとともにそれを明確にする必要があります。このサイトのほとんどの課題は、さまざまなプログラミング言語のゴルフです。

— LyricLy

@LyricLyアドバイスをありがとう！はい、純粋に正規表現にしたいです。送信者が選択した正規表現の単一の正規表現。他に注意すべきルールはありますか？

— ジェイティー

あなたの「平等主義」の定義は理解できませんbanana。それは平等主義です。

— msh210

@ msh210シリーズを説明するために「平等主義」という用語を思いついたとき、シリーズの後半で文字を繰り返すことができるとは考えませんでした（「バナナ」や「aaabbbcccaaa」など）。。繰り返される文字のすべてのチャンクが同じサイズであるという考えを表す用語が必要でした。「バナナ」には繰り返される文字がないため、この定義はそれに対して当てはまります。

— jaytea

回答:

.NETフレーバー、48バイト

^(.)\1*((?<=(\5())*(.))(.)(?<-4>\6)*(?!\4|\6))+$

オンラインでお試しください！（Retinaを使用）

結局のところ、論理を否定しない方が簡単です。2つのアプローチは完全に異なるため、これを別の回答にします。

説明

^            # Anchor the match to the beginning of the string.
(.)\1*       # Match the first run of identical characters. In principle, 
             # it's possible that this matches only half, a quarter, an 
             # eighth etc of of the first run, but that won't affect the 
             # result of the match (in other words, if the match fails with 
             # matching this as the entire first run, then backtracking into
             # only matching half of it won't cause the rest of the regex to
             # match either).
(            # Match this part one or more times. Each instance matches one
             # run of identical letters.
  (?<=       #   We start with a lookbehind to record the length
             #   of the preceding run. Remember that the lookbehind
             #   should be read from the bottom up (and so should
             #   my comments).
    (\5())*  #     And then we match all of its adjacent copies, pushing an
             #     empty capture onto stack 4 each time. That means at the
             #     end of the lookbehind, we will have n-1 captures stack 4, 
             #     where n is the length of the preceding run. Due to the 
             #     atomic nature of lookbehinds, we don't have to worry 
             #     about backtracking matching less than n-1 copies here.
    (.)      #     We capture the character that makes up the preceding
             #     run in group 5.
  )
  (.)        #   Capture the character that makes up the next run in group 6.
  (?<-4>\6)* #   Match copies of that character while depleting stack 4.
             #   If the runs are the same length that means we need to be
             #   able to get to the end of the run at the same time we
             #   empty stack 4 completely.
  (?!\4|\6)  #   This lookahead ensures that. If stack 4 is not empty yet,
             #   \4 will match, because the captures are all empty, so the
             #   the backreference can't fail. If the stack is empty though,
             #   then the backreference will always fail. Similarly, if we
             #   are not at the end of the run yet, then \6 will match 
             #   another copy of the run. So we ensure that neither \4 nor
             #   \6 are possible at this position to assert that this run
             #   has the same length das the previous one.
)+
$            # Finally, we make sure that we can cover the entire string
             # by going through runs of identical lengths like this.

— マーティン・エンダー
ソース

2つの方法の間でシーソーを見ることが大好きです！また、ネガティブなアプローチは実際に試してみるともっと短くなるはずだと思ったので、もっと気まずいものになりました（もっとシンプルに思えるかもしれませんが）。私はPCREで48b、Perlで49bを完全に異なる方法で使用しており、.NETで同じサイズの3番目の方法で、これは非常にクールな正規表現の課題であると言えます：D

— jaytea

@jaytea私はそれらを見たいです。1週間ほどだれも思いつかない場合は、自分で投稿してください。:)そして、そうです、アプローチのバイト数が非常に近いことは素晴らしいことです。

— マーティンエンダー

かもしれない！また、Perl 1は46bまでゴルフされました;）

— jaytea

だから、あなたはこれらを今見たいと思うかもしれないと思った！ここでのPCREで48B：((^.|\2(?=.*\4\3)|\4(?!\3))(?=\2*+((.)\3?)))+\3$私が試した\3*の代わりに(?!\3)、それは45B作るためしかし「aabbbc」に失敗した:( Perlのバージョンが理解しやすくなり、そしてそれは今45Bダウンです：^((?=(.)\2*(.))(?=(\2(?4)?\3)(?!\3))\2+)+\3+$-私もかかわらず、Perlのそれをそれを呼び出す理由有効なPCREと思われるのは、PCRE (\2(?4)?\3)が無限に再帰できると考えているのに対し、Perlは少し賢く/寛容だからです！

— -jaytea

@jayteaああ、これらは本当にきちんとしたソリューションです。あなたは本当に別の答えでそれらを投稿すべきです。:)

— マーティン・エンダー

.NETフレーバー、54バイト

^(?!.*(?<=(\2)*(.))(?!\2)(?>(.)(?<-1>\3)*)(?(1)|\3)).+

オンラインでお試しください！（Retinaを使用）

これは次善の策であると確信していますが、現在、グループのバランスをとるのに最適です。同じバイトカウントで1つの選択肢がありますが、ほとんど同じです。

^(?!.*(?<=(\3())*(.))(?!\3)(?>(.)(?<-2>\4)*)(\2|\4)).+

説明

主なアイデアは、問題を逆にし、非平等主義の文字列を一致させ、全体を否定的な先読みにして結果を否定することです。利点は、すべての実行が同じ長さであることを確認するために、文字列全体でnを追跡する必要がないことです（バランスグループの性質により、通常はそれを確認するときにnを消費します）。代わりに、同じ長さではない隣接する実行の単一のペアを探します。そうすれば、nを 1回使用するだけで済みます。

正規表現の内訳は次のとおりです。

^(?!.*         # This negative lookahead means that we will match
               # all strings where the pattern inside the lookahead
               # would fail if it were used as a regex on its own.
               # Due to the .* that inner regex can match from any
               # position inside the string. The particular position
               # we're looking for is between two runs (and this
               # will be ensured later).

  (?<=         #   We start with a lookbehind to record the length
               #   of the preceding run. Remember that the lookbehind
               #   should be read from the bottom up (and so should
               #   my comments).
    (\2)*      #     And then we match all of its adjacent copies, capturing
               #     them separately in group 1. That means at the
               #     end of the lookbehind, we will have n-1 captures
               #     on stack 1, where n is the length of the preceding
               #     run. Due to the atomic nature of lookbehinds, we
               #     don't have to worry about backtracking matching
               #     less than n-1 copies here.
    (.)        #     We capture the character that makes up the preceding
               #     run in group 2.
  )
  (?!\2)       #   Make sure the next character isn't the same as the one
               #   we used for the preceding run. This ensures we're at a
               #   boundary between runs.
  (?>          #   Match the next stuff with an atomic group to avoid
               #   backtracking.
    (.)        #     Capture the character that makes up the next run
               #     in group 3.
    (?<-1>\3)* #     Match as many of these characters as possible while
               #     depleting the captures on stack 1.
  )
               #   Due to the atomic group, there are three two possible
               #   situations that cause the previous quantifier to stopp
               #   matching. 
               #   Either the run has ended, or stack 1 has been depleted.
               #   If both of those are true, the runs are the same length,
               #   and we don't actually want a match here. But if the runs
               #   are of different lengths than either the run ended but
               #   the stack isn't empty yet, or the stack was depleted but
               #   the run hasn't ended yet.
  (?(1)|\3)    #   This conditional matches these last two cases. If there's
               #   still a capture on stack 1, we don't match anything,
               #   because we know this run was shorter than the previous
               #   one. But if stack 1, we want to match another copy of 
               #   the character in this run to ensure that this run is 
               #   longer than the previous one.
)
.+             # Finally we just match the entire string to comply with the
               # challenge spec.

— マーティン・エンダー
ソース

私はそれが上の失敗作ってみました：banana、aba、bbbaaannnaaannnaaa、bbbaaannnaaannnaaaaaa、The Nineteenth Byte、11、110、^(?!.*(?<=(\2)*(.))(?!\2)(?>(.)(?<-1>\3)*)(?(1)|\3)).+、bababa。失敗したのは私です。:( +1

— エリック・ザ・アウトゴルファー

説明を終えて、正反対のアプローチを使用することで1バイトを節約できることを理解した瞬間...もう少し答えを出そうと思います...：|

— マーティンエンダー

@MartinEnder ...そして、これを2バイトでゴルフできることを実感しました（笑）：P

— Mr Xcoder

@ Mr.Xcoderは7バイトである必要があるため、安全であることを願っています。;）

— マーティン・エンダー