文字列と最初の単語を含む単語を出力する

10

テキスト行で文字列を検索し、その文字列（スペースの間）とフレーズの最初の単語を印刷したい。

例えば：

「これは単一のテキスト行です」
"別物"
「もう一度やり直してください。」
「良い」

文字列のリストは次のとおりです。

テキスト
事
試す
より良い

私が試みているのは、このようなテーブルを取得することです：

この[タブ]テキスト
別の[タブ]事
[タブ]試してみる
より良い

grepを試しましたが、何も起こりませんでした。なにか提案を？

command-line text-processing regex

— フェリペリラ
ソース

したがって、基本的に「行に文字列がある場合、最初の単語+文字列を出力します」。正しい？

— Sergiy Kolodyazhnyy

12

bash / grepバージョン：

#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.

text_file="$1"
shift

for string; do
    # Find string in file. Process output one line at a time.
    grep "$string" "$text_file" | 
        while read -r line
    do
        # Get the first word of the line.
        first_word="${line%% *}"
        # Remove special characters from the first word.
        first_word="${first_word//[^[:alnum:]]/}"

        # If the first word is the same as the string, don't print it twice.
        if [[ "$string" != "$first_word" ]]; then
            echo -ne "$first_word\t"
        fi

        echo "$string"
    done
done

次のように呼び出します。

./string-and-first-word.sh /path/to/file text thing try Better

出力：

This    text
Another thing
It  try
Better

— wjandrea
ソース

9

Perlが救い出します！

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;

open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
    if (my ($match) = /$regex/) {
        print my ($first) = /^\S+/g;
        if ($match ne $first) {
            print "\t$match";
        }
        print "\n";
    }
}

名前を付けて保存first-plus-word、実行

perl first-plus-word file.txt text thing try Better

入力した単語から正規表現を作成します。次に、各行が正規表現と照合され、一致がある場合は最初の単語が出力され、単語と異なる場合はその単語も出力されます。

— チョロバ
ソース

9

これがawkバージョンです：

awk '
  NR==FNR {a[$0]++; next;} 
  {
    gsub(/"/,"",$0);
    for (i=1; i<=NF; i++)
      if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
  }
  ' file2 file1

どこfile2単語リストがあるとfile1フレーズが含まれています。

— スチールドライバー
ソース

2

いいね！私は、スクリプトファイルにそれを入れているpaste.ubuntu.com/23063130を単に便宜のために、

— Sergiy Kolodyazhnyy

8

これがPythonのバージョンです：

#!/usr/bin/env python
from __future__ import print_function 
import sys

# List of strings that you want
# to search in the file. Change it
# as you fit necessary. Remember commas
strings = [
          'text', 'thing',
          'try', 'Better'
          ]


with open(sys.argv[1]) as input_file:
    for line in input_file:
        for string in strings:
            if string in line:
               words = line.strip().split()
               print(words[0],end="")
               if len(words) > 1:
                   print("\t",string)
               else:
                   print("")

デモ：

$> cat input_file.txt                                                          
This is a single text line
Another thing
It is better you try again
Better
$> python ./initial_word.py input_file.txt                                      
This    text
Another     thing
It  try
Better

補足：スクリプトはpython3互換性があるため、python2またはのいずれかで実行できますpython3。

— セルギー・コロディアズニー
ソース

7

これを試して：

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/p' File
This    text
Another thing
It      try
        Better

前のタブBetterが問題である場合は、これを試してください：

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/; ta; b; :a; s/^\t//; p' File
This    text
Another thing
It      try
Better

上記は、GNU sed（gsedOSXで呼び出される）でテストされました。BSD sedの場合、若干の変更が必要になる場合があります。

使い方

s/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/

これは、単語、[[:alnum:]]+続いてスペース[[:space:]]、続いて何か.*、続いて自分の単語の1つ、続いて何かを探しますtext|thing|try|Better。見つかった場合は、行の最初の単語（ある場合）、タブ、および一致した単語に置き換えられます。
ta; b; :a; s/^\t//; p

置換コマンドによって置換が行われた場合、つまり、単語の1 taつが行で見つかった場合、コマンドはsedにlabelにジャンプするように指示しaます。そうでない場合は、b次の行に分岐します（）。 :aラベルを定義します。したがって、単語の1つが見つかったs/^\t//場合は、（a）先行タブがある場合はそれを削除する置換を行い、（b）p行を印刷（）します。

— ジョン1024
ソース

7

単純なbash / sedアプローチ：

$ while read w; do sed -nE "s/\"(\S*).*$w.*/\1\t$w/p" file; done < words 
This    text
Another thing
It  try
    Better

while read w; do ...; done < wordsファイルの各行を反復処理するようなwordsととして保存します$w。-n作るにはsed、デフォルトでは何も印刷されません。sedコマンド、その後、二重引用符は、非空白が続く置き換える（う\"(\S*)、括弧がで一致したものを「捕獲」するのに役立つ\S*、最初の単語、そして私たちは、後でとしてそれを参照することができ\1、0個以上の文字（））、.*その後、探している単語（$w）および0個以上の文字（.*）。これが一致する場合は、最初の単語、タブと$w（\1\t$w）だけに置き換えて、行を出力します（それがpinのs///p動作です）。

— タードン
ソース

5

これはRubyバージョンです

str_list = ['text', 'thing', 'try', 'Better']

File.open(ARGV[0]) do |f|
  lines = f.readlines
  lines.each_with_index do |l, idx|
    if l.match(str_list[idx])
      l = l.split(' ')
      if l.length == 1
        puts l[0]
      else
        puts l[0] + "\t" + str_list[idx]
      end
    end
  end
end

サンプルテキストファイルにhello.txtは

This is a single text line
Another thing
It is better you try again
Better

ruby source.rb hello.txt結果で実行

This    text
Another thing
It      try
Better

— アンワル
ソース