string.replaceに正規表現を入力するにはどうすればよいですか？

317

正規表現を宣言するのに助けが必要です。私の入力は次のようなものです：

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>

必要な出力は次のとおりです。

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. 
and there are many other lines in the txt files
with such tags

私はこれを試しました：

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
    for line in reader: 
        line2 = line.replace('<[1> ', '')
        line = line2.replace('</[1> ', '')
        line2 = line.replace('<[1>', '')
        line = line2.replace('</[1>', '')

        print line

私もこれを試しました（ただし、間違った正規表現構文を使用しているようです）：

    line2 = line.replace('<[*> ', '')
    line = line2.replace('</[*> ', '')
    line2 = line.replace('<[*>', '')
    line = line2.replace('</[*>', '')

replace1から99までをハードコードしたくありません。。。

— アルバ
ソース

4

受け入れられた答えはすでにあなたの問題をカバーし、それを解決します。ほかに何か要りますか？

— HamZa 2013年

の結果はどうあるべきwhere the<[99> number ranges from 1-100</[100>ですか？

— utapyngo 2013年

また、<...>タグ内の番号も削除されるため、出力はwhere the number rangers from 1-100 ?

— alvas

565

このテスト済みのスニペットはそれを行うはずです：

import re
line = re.sub(r"</?\[\d+>", "", line)

編集：これがどのように機能するかを説明するコメント付きバージョンです：

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal '<'
  /?   # Optionally match a '/'
  \[   # Match a literal '['
  \d+  # Match one or more digits
  >    # Match a literal '>'
  """, "", line)

正規表現は楽しいです！しかし、私は1時間か2時間基本を学ぶことを強くお勧めします。手始めに、どの文字が特別であるかを知る必要があります：エスケープする必要がある"メタ文字"（つまり、バックスラッシュを前に置いてください。ルールは文字クラスの内部と外部で異なります。）優れたオンラインチュートリアルがあります：www .regular-expressions.info。あなたがそこで過ごす時間は、それ自体何倍にもなります。ハッピー正規表現！

— リッジランナー
ソース

うん、うまくいくよ！ありがとうございますが、正規表現を簡単に説明できますか？

— alvas

9

また、「正規表現に関する本- 正規表現の習得」、Jeffrey Friedl

— 著

別の優れたリファレンスは、w3schools.com / python / python_regex.asp

— Carson

38

str.replace()固定置換を行います。re.sub()代わりに使用してください。

— イグナシオ・バスケス＝エイブラムス
ソース

3

また、パターンは「</ {0-1} \ d {1-2}>」のようなもの、またはpythonが使用する正規表現表記法の変種であることにも注意してください。

3

固定交換とはどういう意味ですか？

— 2015

@aviおそらく彼は、正規表現を介した部分的な単語の検索ではなく、固定された単語の置換を意味していました。

— Gunay Anach

固定（リテラル、定数）文字列

— vstepaniuk

23

私はこのようにします（コメントで正規表現が説明されています）：

import re

# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}\[\d+>")

# <\/{0,}\[\d+>
# 
# Match the character “<” literally «<»
# Match the character “/” literally «\/{0,}»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «\[»
# Match a single digit 0..9 «\d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»

subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>"""

result = pattern.sub("", subject)

print(result)

正規表現について詳しく知りたい場合は、Jan GoyvaertsとSteven Levithanによる「Regular Expressions Cookbook」を読むことをお勧めします。

— ロレンゾ・ペルシケッティ
ソース

2

*代わりに単に使用できます{0,}

— HamZa 2013年

3

Pythonのdocsから：はと{0,}同じで*、{1,}と同等で+、と{0,1}同じ?です。これは、使用することをお勧めします*、+または?ときにすることができ、彼らが短くて読みやすくしているという理由だけで、。

— winklerrr 2017

15

最も簡単な方法

import re

txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>'

out = re.sub("(<[^>]+>)", '', txt)
print out

— エゼキエル・マルケス
ソース

括弧は本当に必要ですか？それは同じ正規表現ではないでしょう<[^>]+>か？ちなみに、私はあなたの正規表現が一致しすぎると思います（たとえばのようなもの<html>）

— winklerrr

10

文字列オブジェクトのreplaceメソッドは正規表現を受け入れず、固定文字列のみを受け入れます（ドキュメントを参照：http : //docs.python.org/2/library/stdtypes.html#str.replace）。

reモジュールを使用する必要があります：

import re
newline= re.sub("<\/?\[[0-9]+>", "", line)

— ザック
ソース

4

\d+代わりに使用する必要があります[0-9]+

— winklerrr 2017

3

正規表現を使用する必要はありません（サンプル文字列に対して）

>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'

>>> for w in s.split(">"):
...   if "<" in w:
...      print w.split("<")[0]
...
this is a paragraph with
 in between
 and then there are cases ... where the
 number ranges from 1-100
.
and there are many other lines in the txt files
with
 such tags

— くるみ
ソース

3

import os, sys, re, glob

pattern = re.compile(r"\<\[\d\>")
replacementStringMatchesPattern = "<[1>"

for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
   for line in reader: 
      retline =  pattern.sub(replacementStringMatchesPattern, "", line)         
      sys.stdout.write(retline)
      print (retline)

— アベナ・サルカ
ソース