数日前、数値の文字列から末尾のゼロを削除する問題に取り組みました。
その問題の続きで、これは問題をカンマを含む数値に広げるので、これは興味深いものだと思います。
私は、以前に取り組んだその問題で書いた正規表現のパターンを採用し、この問題の答えとしてカンマ付きの数値を処理できるように改善しました。
私は私の熱意と正規表現の好みに夢中になりました。結果がマイケルプレスコットが表明したニーズにぴったり合うかどうかはわかりません。私の正規表現で過剰または不足している点を知り、それを修正してあなたにとってより適切なものにしたいと思います。
さて、この正規表現での長いセッションの後、私は一種の脳の重みを持っているので、私は多くの説明をするのに十分なほど新鮮ではありません。ポイントが曖昧な方、気になる方がいらっしゃいましたら、是非ご質問ください。
正規表現は、科学表記法で表現された数値を検出できるように構築されています 2E10または5,22,454.12E-00.0478でで、そのような数値の2つの部分の不要なゼロも削除されます。指数がゼロに等しい場合は、指数がなくなるように数値が変更されます。
例として'12 ..57 'が一致しないため、一部の特定のケースが一致しないように、パターンに検証を入れました。しかし '、111'では、文字列'111'、先行するコンマは数字ではなく文のコンマと見なされるため一致します。
インドの番号付けでは、コンマの間に2桁しかないと思われるので、コンマの管理を改善する必要があると思います。修正するのは難しくないと思います
以下は、私の正規表現がどのように機能するかを示すコードです。二つの機能は、1つの数字たい場合には、必要に応じ、あります」0.1245'に変換する『0.1245』かではありません。数値文字列の特定のケースでエラーまたは不要な一致または不一致が残る場合でも、私は驚かないでしょう。次に、これらのケースを知り、欠陥を理解して修正したいと思います。
Pythonで書かれたこのコードをお詫び申し上げますが、正規表現はトランスランゲージであり、誰もがreexのパターンを理解できるようになると思います
import re
regx = re.compile('(?<![\d.])(?!\.\.)(?<![\d.][eE][+-])(?<![\d.][eE])(?<!\d[.,])'
'' #---------------------------------
'([+-]?)'
'(?![\d,]*?\.[\d,]*?\.[\d,]*?)'
'(?:0|,(?=0)|(?<!\d),)*'
'(?:'
'((?:\d(?!\.[1-9])|,(?=\d))+)[.,]?'
'|\.(0)'
'|((?<!\.)\.\d+?)'
'|([\d,]+\.\d+?))'
'0*'
'' #---------------------------------
'(?:'
'([eE][+-]?)(?:0|,(?=0))*'
'(?:'
'(?!0+(?=\D|\Z))((?:\d(?!\.[1-9])|,(?=\d))+)[.,]?'
'|((?<!\.)\.(?!0+(?=\D|\Z))\d+?)'
'|([\d,]+\.(?!0+(?=\D|\Z))\d+?))'
'0*'
')?'
'' #---------------------------------
'(?![.,]?\d)')
def dzs_numbs(x,regx = regx): # ds = detect and zeros-shave
if not regx.findall(x):
yield ('No match,', 'No catched string,', 'No groups.')
for mat in regx.finditer(x):
yield (mat.group(), ''.join(mat.groups('')), mat.groups(''))
def dzs_numbs2(x,regx = regx): # ds = detect and zeros-shave
if not regx.findall(x):
yield ('No match,', 'No catched string,', 'No groups.')
for mat in regx.finditer(x):
yield (mat.group(),
''.join(('0' if n.startswith('.') else '')+n for n in mat.groups('')),
mat.groups(''))
NS = [' 23456000and23456000. or23456000.000 00023456000 s000023456000. 000023456000.000 ',
'arf 10000 sea10000.+10000.000 00010000-00010000. kant00010000.000 ',
' 24: 24, 24. 24.000 24.000, 00024r 00024. blue 00024.000 ',
' 8zoom8. 8.000 0008 0008. and0008.000 ',
' 0 00000M0. = 000. 0.0 0.000 000.0 000.000 .000000 .0 ',
' .0000023456 .0000023456000 '
' .0005872 .0005872000 .00503 .00503000 ',
' .068 .0680000 .8 .8000 .123456123456 .123456123456000 ',
' .657 .657000 .45 .4500000 .7 .70000 0.0000023230000 000.0000023230000 ',
' 0.0081000 0000.0081000 0.059000 0000.059000 ',
' 0.78987400000 snow 00000.78987400000 0.4400000 00000.4400000 ',
' -0.5000 -0000.5000 0.90 000.90 0.7 000.7 ',
' 2.6 00002.6 00002.60000 4.71 0004.71 0004.7100 ',
' 23.49 00023.49 00023.490000 103.45 0000103.45 0000103.45000 ',
' 10003.45067 000010003.45067 000010003.4506700 ',
' +15000.0012 +000015000.0012 +000015000.0012000 ',
' 78000.89 000078000.89 000078000.89000 ',
' .0457e10 .0457000e10 00000.0457000e10 ',
' 258e8 2580000e4 0000000002580000e4 ',
' 0.782e10 0000.782e10 0000.7820000e10 ',
' 1.23E2 0001.23E2 0001.2300000E2 ',
' 432e-102 0000432e-102 004320000e-106 ',
' 1.46e10and0001.46e10 0001.4600000e10 ',
' 1.077e-300 0001.077e-300 0001.077000e-300 ',
' 1.069e10 0001.069e10 0001.069000e10 ',
' 105040.03e10 000105040.03e10 105040.0300e10 ',
' +286E000024.487900 -78.4500e.14500 .0140E789. ',
' 081,12.40E07,95.0120 0045,78,123.03500e-0.00 ',
' 0096,78,473.0380e-0. 0008,78,373.066000E0. 0004512300.E0000 ',
' ..18000 25..00 36...77 2..8 ',
' 3.8..9 .12500. 12.51.400 ',
' 00099,111.8713000 -0012,45,83,987.26+0.000,099,88,44.or00,00,00.00must',
' 00099,44,and 0000,099,88,44.bom',
'00,000,00.587000 77,98,23,45., this,that ',
' ,111 145.20 +9,9,9 0012800 .,,. 1 100,000 ',
'1,1,1.111 000,001.111 -999. 0. 111.110000 1.1.1.111 9.909,888']
for ch in NS:
print 'string: '+repr(ch)
for strmatch, modified, the_groups in dzs_numbs2(ch):
print strmatch.rjust(20),'',modified,'',the_groups
print
結果
string: ' 23456000and23456000. or23456000.000 00023456000 s000023456000. 000023456000.000 '
23456000 23456000 ('', '23456000', '', '', '', '', '', '', '')
23456000. 23456000 ('', '23456000', '', '', '', '', '', '', '')
23456000.000 23456000 ('', '23456000', '', '', '', '', '', '', '')
00023456000 23456000 ('', '23456000', '', '', '', '', '', '', '')
000023456000. 23456000 ('', '23456000', '', '', '', '', '', '', '')
000023456000.000 23456000 ('', '23456000', '', '', '', '', '', '', '')
string: 'arf 10000 sea10000.+10000.000 00010000-00010000. kant00010000.000 '
10000 10000 ('', '10000', '', '', '', '', '', '', '')
10000. 10000 ('', '10000', '', '', '', '', '', '', '')
10000.000 10000 ('', '10000', '', '', '', '', '', '', '')
00010000 10000 ('', '10000', '', '', '', '', '', '', '')
00010000. 10000 ('', '10000', '', '', '', '', '', '', '')
00010000.000 10000 ('', '10000', '', '', '', '', '', '', '')
string: ' 24: 24, 24. 24.000 24.000, 00024r 00024. blue 00024.000 '
24 24 ('', '24', '', '', '', '', '', '', '')
24, 24 ('', '24', '', '', '', '', '', '', '')
24. 24 ('', '24', '', '', '', '', '', '', '')
24.000 24 ('', '24', '', '', '', '', '', '', '')
24.000 24 ('', '24', '', '', '', '', '', '', '')
00024 24 ('', '24', '', '', '', '', '', '', '')
00024. 24 ('', '24', '', '', '', '', '', '', '')
00024.000 24 ('', '24', '', '', '', '', '', '', '')
string: ' 8zoom8. 8.000 0008 0008. and0008.000 '
8 8 ('', '8', '', '', '', '', '', '', '')
8. 8 ('', '8', '', '', '', '', '', '', '')
8.000 8 ('', '8', '', '', '', '', '', '', '')
0008 8 ('', '8', '', '', '', '', '', '', '')
0008. 8 ('', '8', '', '', '', '', '', '', '')
0008.000 8 ('', '8', '', '', '', '', '', '', '')
string: ' 0 00000M0. = 000. 0.0 0.000 000.0 000.000 .000000 .0 '
0 0 ('', '0', '', '', '', '', '', '', '')
00000 0 ('', '0', '', '', '', '', '', '', '')
0. 0 ('', '0', '', '', '', '', '', '', '')
000. 0 ('', '0', '', '', '', '', '', '', '')
0.0 0 ('', '', '0', '', '', '', '', '', '')
0.000 0 ('', '', '0', '', '', '', '', '', '')
000.0 0 ('', '', '0', '', '', '', '', '', '')
000.000 0 ('', '', '0', '', '', '', '', '', '')
.000000 0 ('', '', '0', '', '', '', '', '', '')
.0 0 ('', '', '0', '', '', '', '', '', '')
string: ' .0000023456 .0000023456000 .0005872 .0005872000 .00503 .00503000 '
.0000023456 0.0000023456 ('', '', '', '.0000023456', '', '', '', '', '')
.0000023456000 0.0000023456 ('', '', '', '.0000023456', '', '', '', '', '')
.0005872 0.0005872 ('', '', '', '.0005872', '', '', '', '', '')
.0005872000 0.0005872 ('', '', '', '.0005872', '', '', '', '', '')
.00503 0.00503 ('', '', '', '.00503', '', '', '', '', '')
.00503000 0.00503 ('', '', '', '.00503', '', '', '', '', '')
string: ' .068 .0680000 .8 .8000 .123456123456 .123456123456000 '
.068 0.068 ('', '', '', '.068', '', '', '', '', '')
.0680000 0.068 ('', '', '', '.068', '', '', '', '', '')
.8 0.8 ('', '', '', '.8', '', '', '', '', '')
.8000 0.8 ('', '', '', '.8', '', '', '', '', '')
.123456123456 0.123456123456 ('', '', '', '.123456123456', '', '', '', '', '')
.123456123456000 0.123456123456 ('', '', '', '.123456123456', '', '', '', '', '')
string: ' .657 .657000 .45 .4500000 .7 .70000 0.0000023230000 000.0000023230000 '
.657 0.657 ('', '', '', '.657', '', '', '', '', '')
.657000 0.657 ('', '', '', '.657', '', '', '', '', '')
.45 0.45 ('', '', '', '.45', '', '', '', '', '')
.4500000 0.45 ('', '', '', '.45', '', '', '', '', '')
.7 0.7 ('', '', '', '.7', '', '', '', '', '')
.70000 0.7 ('', '', '', '.7', '', '', '', '', '')
0.0000023230000 0.000002323 ('', '', '', '.000002323', '', '', '', '', '')
000.0000023230000 0.000002323 ('', '', '', '.000002323', '', '', '', '', '')
string: ' 0.0081000 0000.0081000 0.059000 0000.059000 '
0.0081000 0.0081 ('', '', '', '.0081', '', '', '', '', '')
0000.0081000 0.0081 ('', '', '', '.0081', '', '', '', '', '')
0.059000 0.059 ('', '', '', '.059', '', '', '', '', '')
0000.059000 0.059 ('', '', '', '.059', '', '', '', '', '')
string: ' 0.78987400000 snow 00000.78987400000 0.4400000 00000.4400000 '
0.78987400000 0.789874 ('', '', '', '.789874', '', '', '', '', '')
00000.78987400000 0.789874 ('', '', '', '.789874', '', '', '', '', '')
0.4400000 0.44 ('', '', '', '.44', '', '', '', '', '')
00000.4400000 0.44 ('', '', '', '.44', '', '', '', '', '')
string: ' -0.5000 -0000.5000 0.90 000.90 0.7 000.7 '
-0.5000 -0.5 ('-', '', '', '.5', '', '', '', '', '')
-0000.5000 -0.5 ('-', '', '', '.5', '', '', '', '', '')
0.90 0.9 ('', '', '', '.9', '', '', '', '', '')
000.90 0.9 ('', '', '', '.9', '', '', '', '', '')
0.7 0.7 ('', '', '', '.7', '', '', '', '', '')
000.7 0.7 ('', '', '', '.7', '', '', '', '', '')
string: ' 2.6 00002.6 00002.60000 4.71 0004.71 0004.7100 '
2.6 2.6 ('', '', '', '', '2.6', '', '', '', '')
00002.6 2.6 ('', '', '', '', '2.6', '', '', '', '')
00002.60000 2.6 ('', '', '', '', '2.6', '', '', '', '')
4.71 4.71 ('', '', '', '', '4.71', '', '', '', '')
0004.71 4.71 ('', '', '', '', '4.71', '', '', '', '')
0004.7100 4.71 ('', '', '', '', '4.71', '', '', '', '')
string: ' 23.49 00023.49 00023.490000 103.45 0000103.45 0000103.45000 '
23.49 23.49 ('', '', '', '', '23.49', '', '', '', '')
00023.49 23.49 ('', '', '', '', '23.49', '', '', '', '')
00023.490000 23.49 ('', '', '', '', '23.49', '', '', '', '')
103.45 103.45 ('', '', '', '', '103.45', '', '', '', '')
0000103.45 103.45 ('', '', '', '', '103.45', '', '', '', '')
0000103.45000 103.45 ('', '', '', '', '103.45', '', '', '', '')
string: ' 10003.45067 000010003.45067 000010003.4506700 '
10003.45067 10003.45067 ('', '', '', '', '10003.45067', '', '', '', '')
000010003.45067 10003.45067 ('', '', '', '', '10003.45067', '', '', '', '')
000010003.4506700 10003.45067 ('', '', '', '', '10003.45067', '', '', '', '')
string: ' +15000.0012 +000015000.0012 +000015000.0012000 '
+15000.0012 +15000.0012 ('+', '', '', '', '15000.0012', '', '', '', '')
+000015000.0012 +15000.0012 ('+', '', '', '', '15000.0012', '', '', '', '')
+000015000.0012000 +15000.0012 ('+', '', '', '', '15000.0012', '', '', '', '')
string: ' 78000.89 000078000.89 000078000.89000 '
78000.89 78000.89 ('', '', '', '', '78000.89', '', '', '', '')
000078000.89 78000.89 ('', '', '', '', '78000.89', '', '', '', '')
000078000.89000 78000.89 ('', '', '', '', '78000.89', '', '', '', '')
string: ' .0457e10 .0457000e10 00000.0457000e10 '
.0457e10 0.0457e10 ('', '', '', '.0457', '', 'e', '10', '', '')
.0457000e10 0.0457e10 ('', '', '', '.0457', '', 'e', '10', '', '')
00000.0457000e10 0.0457e10 ('', '', '', '.0457', '', 'e', '10', '', '')
string: ' 258e8 2580000e4 0000000002580000e4 '
258e8 258e8 ('', '258', '', '', '', 'e', '8', '', '')
2580000e4 2580000e4 ('', '2580000', '', '', '', 'e', '4', '', '')
0000000002580000e4 2580000e4 ('', '2580000', '', '', '', 'e', '4', '', '')
string: ' 0.782e10 0000.782e10 0000.7820000e10 '
0.782e10 0.782e10 ('', '', '', '.782', '', 'e', '10', '', '')
0000.782e10 0.782e10 ('', '', '', '.782', '', 'e', '10', '', '')
0000.7820000e10 0.782e10 ('', '', '', '.782', '', 'e', '10', '', '')
string: ' 1.23E2 0001.23E2 0001.2300000E2 '
1.23E2 1.23E2 ('', '', '', '', '1.23', 'E', '2', '', '')
0001.23E2 1.23E2 ('', '', '', '', '1.23', 'E', '2', '', '')
0001.2300000E2 1.23E2 ('', '', '', '', '1.23', 'E', '2', '', '')
string: ' 432e-102 0000432e-102 004320000e-106 '
432e-102 432e-102 ('', '432', '', '', '', 'e-', '102', '', '')
0000432e-102 432e-102 ('', '432', '', '', '', 'e-', '102', '', '')
004320000e-106 4320000e-106 ('', '4320000', '', '', '', 'e-', '106', '', '')
string: ' 1.46e10and0001.46e10 0001.4600000e10 '
1.46e10 1.46e10 ('', '', '', '', '1.46', 'e', '10', '', '')
0001.46e10 1.46e10 ('', '', '', '', '1.46', 'e', '10', '', '')
0001.4600000e10 1.46e10 ('', '', '', '', '1.46', 'e', '10', '', '')
string: ' 1.077e-300 0001.077e-300 0001.077000e-300 '
1.077e-300 1.077e-300 ('', '', '', '', '1.077', 'e-', '300', '', '')
0001.077e-300 1.077e-300 ('', '', '', '', '1.077', 'e-', '300', '', '')
0001.077000e-300 1.077e-300 ('', '', '', '', '1.077', 'e-', '300', '', '')
string: ' 1.069e10 0001.069e10 0001.069000e10 '
1.069e10 1.069e10 ('', '', '', '', '1.069', 'e', '10', '', '')
0001.069e10 1.069e10 ('', '', '', '', '1.069', 'e', '10', '', '')
0001.069000e10 1.069e10 ('', '', '', '', '1.069', 'e', '10', '', '')
string: ' 105040.03e10 000105040.03e10 105040.0300e10 '
105040.03e10 105040.03e10 ('', '', '', '', '105040.03', 'e', '10', '', '')
000105040.03e10 105040.03e10 ('', '', '', '', '105040.03', 'e', '10', '', '')
105040.0300e10 105040.03e10 ('', '', '', '', '105040.03', 'e', '10', '', '')
string: ' +286E000024.487900 -78.4500e.14500 .0140E789. '
+286E000024.487900 +286E24.4879 ('+', '286', '', '', '', 'E', '', '', '24.4879')
-78.4500e.14500 -78.45e0.145 ('-', '', '', '', '78.45', 'e', '', '.145', '')
.0140E789. 0.014E789 ('', '', '', '.014', '', 'E', '789', '', '')
string: ' 081,12.40E07,95.0120 0045,78,123.03500e-0.00 '
081,12.40E07,95.0120 81,12.4E7,95.012 ('', '', '', '', '81,12.4', 'E', '', '', '7,95.012')
0045,78,123.03500 45,78,123.035 ('', '', '', '', '45,78,123.035', '', '', '', '')
string: ' 0096,78,473.0380e-0. 0008,78,373.066000E0. 0004512300.E0000 '
0096,78,473.0380 96,78,473.038 ('', '', '', '', '96,78,473.038', '', '', '', '')
0008,78,373.066000 8,78,373.066 ('', '', '', '', '8,78,373.066', '', '', '', '')
0004512300. 4512300 ('', '4512300', '', '', '', '', '', '', '')
string: ' ..18000 25..00 36...77 2..8 '
No match, No catched string, No groups.
string: ' 3.8..9 .12500. 12.51.400 '
No match, No catched string, No groups.
string: ' 00099,111.8713000 -0012,45,83,987.26+0.000,099,88,44.or00,00,00.00must'
00099,111.8713000 99,111.8713 ('', '', '', '', '99,111.8713', '', '', '', '')
-0012,45,83,987.26 -12,45,83,987.26 ('-', '', '', '', '12,45,83,987.26', '', '', '', '')
00,00,00.00 0 ('', '', '0', '', '', '', '', '', '')
string: ' 00099,44,and 0000,099,88,44.bom'
00099,44, 99,44 ('', '99,44', '', '', '', '', '', '', '')
0000,099,88,44. 99,88,44 ('', '99,88,44', '', '', '', '', '', '', '')
string: '00,000,00.587000 77,98,23,45., this,that '
00,000,00.587000 0.587 ('', '', '', '.587', '', '', '', '', '')
77,98,23,45. 77,98,23,45 ('', '77,98,23,45', '', '', '', '', '', '', '')
string: ' ,111 145.20 +9,9,9 0012800 .,,. 1 100,000 '
,111 111 ('', '111', '', '', '', '', '', '', '')
145.20 145.2 ('', '', '', '', '145.2', '', '', '', '')
+9,9,9 +9,9,9 ('+', '9,9,9', '', '', '', '', '', '', '')
0012800 12800 ('', '12800', '', '', '', '', '', '', '')
1 1 ('', '1', '', '', '', '', '', '', '')
100,000 100,000 ('', '100,000', '', '', '', '', '', '', '')
string: '1,1,1.111 000,001.111 -999. 0. 111.110000 1.1.1.111 9.909,888'
1,1,1.111 1,1,1.111 ('', '', '', '', '1,1,1.111', '', '', '', '')
000,001.111 1.111 ('', '', '', '', '1.111', '', '', '', '')
-999. -999 ('-', '999', '', '', '', '', '', '', '')
0. 0 ('', '0', '', '', '', '', '', '', '')
111.110000 111.11 ('', '', '', '', '111.11', '', '', '', '')