UTF-8 BOM（バイトオーダーマーク）を含むディレクトリ内のすべてのファイルを見つける方法は？

8

Windowsでは、UTF-8 BOM（バイトオーダーマーク）を含むディレクトリ内のすべてのファイルを検索する必要があります。どのツールがそれを行うことができますか？

PowerShellスクリプト、一部のテキストエディターの高度な検索機能などを使用できます。

windows search utf-8

15

これは、PowerShellスクリプトの例です。C:最初の3バイトがすべてのファイルのパスを調べます0xEF, 0xBB, 0xBF。

Function ContainsBOM
{   
    return $input | where {
        $contents = [System.IO.File]::ReadAllBytes($_.FullName)
        $_.Length -gt 2 -and $contents[0] -eq 0xEF -and $contents[1] -eq 0xBB -and $contents[2] -eq 0xBF }
}

get-childitem "C:\*.*" | where {!$_.PsIsContainer } | ContainsBOM

「ReadAllBytes」が必要ですか？たぶん最初の数バイトだけを読む方がパフォーマンスが良いでしょうか？

フェアポイント。以下は、最初の3バイトのみを読み取る更新バージョンです。

Function ContainsBOM
{   
    return $input | where {
        $contents = new-object byte[] 3
        $stream = [System.IO.File]::OpenRead($_.FullName)
        $stream.Read($contents, 0, 3) | Out-Null
        $stream.Close()
        $contents[0] -eq 0xEF -and $contents[1] -eq 0xBB -and $contents[2] -eq 0xBF }
}

get-childitem "C:\*.*" | where {!$_.PsIsContainer -and $_.Length -gt 2 } | ContainsBOM

— vcsjones
ソース

1

涼しい。回答としてマークする前に、「ReadAllBytes」を行う必要がありますか？たぶん最初の数バイトだけを読む方がパフォーマンスが良いでしょうか？

— Borek Bernard、

@Borek編集を参照してください。

— vcsjones 2012

2

これは私の日を救った！get-childitem -recurseサブディレクトリも処理することも学びました。

— diynevala

上記のスクリプトを使用してBOMを削除する方法はあるのでしょうか。

— tom_mai78101

2

補足として、ソースファイルからUTF-8 BOM文字を削除するために使用するPowerShellスクリプトを次に示します。

$files=get-childitem -Path . -Include @("*.h","*.cpp") -Recurse
foreach ($f in $files)
{
(Get-Content $f.PSPath) | 
Foreach-Object {$_ -replace "\xEF\xBB\xBF", ""} | 
Set-Content $f.PSPath
}

— スコット・スミス
ソース

いくつかのファイルがありましたが、BOMがあるものとないものがあるだけです。あなたの答えは私がそれをすべてきれいにするために必要なものでした。ありがとうございました！

— テビヤ2018年

1

権限が制限されたエンタープライズコンピューター（私のような）を使用していて、PowerShellスクリプトを実行できない場合は、次のスクリプトを使用して、PythonScriptプラグインを備えたポータブルNotepad ++を使用してタスクを実行できます。

import os;
import sys;
filePathSrc="C:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
    for fn in files:
      if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
        notepad.open(root + "\\" + fn)
        console.write(root + "\\" + fn + "\r\n")
        notepad.runMenuCommand("Encoding", "Convert to UTF-8 without BOM")
        notepad.save()
        notepad.close()

クレジットはhttps://pw999.wordpress.com/2013/08/19/mass-convert-a-project-to-utf-8-using-notepad/に移動します

— ホアンロング
ソース