Webページの作業用ローカルコピーをダウンロードする[終了]

210

Webページのローカルコピーをダウンロードして、CSS、画像、JavaScriptなどをすべて取得したいと考えています。

以前のディスカッション（例：hereとhere、どちらも2年以上前）では、2つの提案が一般的に提案されています：wget -pとhttrack。ただし、これらの提案はどちらも失敗します。これらのツールのいずれかを使用してタスクを完了するための支援をいただければ幸いです。選択肢も素敵です。

オプション1： wget -p

wget -pWebページのすべての前提条件（css、画像、js）を正常にダウンロードします。ただし、ローカルコピーをWebブラウザーにロードすると、前提条件へのパスがWeb上のバージョンから変更されていないため、ページは前提条件をロードできません。

例えば：

ページのhtmlで、<link rel="stylesheet href="https://stackoverflow.com/stylesheets/foo.css" />の新しい相対パスを指すように修正する必要がありますfoo.css
cssファイルでbackground-image: url(/images/bar.png)は、同様に調整する必要があります。

wget -pパスが正しくなるように変更する方法はありますか？

オプション2：httrack

httrackWebサイト全体をミラーリングするための優れたツールのように見えますが、それを使用して単一のページのローカルコピーを作成する方法はわかりません。httrackフォーラムでは、このトピック（例：ここ）について多くの議論がありますが、防弾ソリューションを持っている人はいません。

オプション3：別のツール？

一部の人々は有料ツールを提案しましたが、私はそこに無料のソリューションがないとは信じられません。

download wget offline-browsing

— ブラウン
ソース

答えがうまくいかない場合は、次のことを試してください。- wget -E -H -k -K -p http://example.comこれだけがうまくいきました。クレジット：superuser.com/a/136335/94039

— its_me

それを行うソフトウェア、Teleport Proもあります。

— 2016年

wget --random-wait -r -p -e robots=off -U mozilla http://www.example.com

— davidcondrey 2017

css画像を含む、ダウンロードWebページと依存関係の重複の可能性。

— jww

262

wgetは、ユーザーが求めていることを実行できます。以下を試してください：

wget -p -k http://www.example.com/

-p正しくサイト（CSS、画像など）を表示するためにあなたに必要なすべての要素を取得します。これ-kにより、すべてのリンクが変更され（CSSおよび画像のリンクが含まれるようになります）、ページがオンラインで表示されたときにオフラインで表示できるようになります。

Wgetドキュメントから：

‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them
suitable for local viewing. This affects not only the visible hyperlinks, but
any part of the document that links to external content, such as embedded images,
links to style sheets, hyperlinks to non-html content, etc.

Each link will be changed in one of the two ways:

    The links to files that have been downloaded by Wget will be changed to refer
    to the file they point to as a relative link.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also
    downloaded, then the link in doc.html will be modified to point to
    ‘../bar/img.gif’. This kind of transformation works reliably for arbitrary
    combinations of directories.

    The links to files that have not been downloaded by Wget will be changed to
    include host name and absolute path of the location they point to.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to
    ../bar/img.gif), then the link in doc.html will be modified to point to
    http://hostname/bar/img.gif. 

Because of this, local browsing works reliably: if a linked file was downloaded,
the link will refer to its local name; if it was not downloaded, the link will
refer to its full Internet address rather than presenting a broken link. The fact
that the former links are converted to relative links ensures that you can move
the downloaded hierarchy to another directory.

Note that only at the end of the download can Wget know which links have been
downloaded. Because of that, the work done by ‘-k’ will be performed at the end
of all the downloads.

— セルク
ソース

私はこれを試しましたが、どういうわけかそのような内部リンクは機能しindex.html#link-to-element-on-same-pageなくなりました。

— rhand 2013

サイト全体：snipplr.com/view/23838/downloading-an-entire-web-site-with-wget

— Fedir RYKHTIK

ユーザーエージェントなしでwgetを使用した場合、一部のサーバーは403コードで応答します。追加できます-U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4'

— nikoskip

画像などがまだ見つからない場合は、次のコードを追加してみてください。働いた！

— John Hunt

外部ホストからリソースを取得するには-H, --span-hosts

— davidhq