USGS Earth Explorerにログインして、PythonでLandsat画像をダウンロードします

USGS Earth Explorer Webサイト（http://earthexplorer.usgs.gov/）にPython でログインし、Landsat Archiveコレクションからユーザー名、パスワード、パス/行（または緯度/経度）、日付を入力するだけで画像をダウンロードしたい、および雲量のしきい値。私はすでにこれを試しました：https : //github.com/olivierhagolle/LANDSAT-Download/wiki うまくいきませんでした。それから私はこれを調べました：https : //github.com/developmentseed/landsat-util しかし、私は仕事の制限のためにツールを追加できなかったのでそれを使用できません。

そう。私はPythonの経験が限られているため、それを使用してWebサイトにアクセスしたことがありません。Windows 7でPython 2.7.8を使用していて、Chromeを介してWebサイトにアクセスしようとしています（バージョン51.0.2704.106 mが役立つ場合）。

Webサイトへのアクセス、ログイン（すでにログイン資格情報を持っている）、画像の検索、フォルダーへのダウンロードのための簡単なスクリプトのアイデアはありますか？Pythonでできることのように思えますが、私にとってオンラインソリューションはすべて高度すぎるようです。必要なのは、さまざまな座標や日付で繰り返し使用できるスタンドアロンスクリプトだけです。

— MattS
ソース

landsat-utilは単なるpythonスクリプトです。依存関係を追加できない場合は、すでに持っているものを言う必要があるかもしれません。mapbox.github.io/usgs（またはgithub.com/mapbox/usgs）を使用できますか？

— BradHards 2016

「うまくいかなかった」と言うことは、人々にあなたの質問を無視させる確実な方法であることに注意してください。「うまくいかなかった」ということは、何も役に立たないことです。それを機能させるためのヘルプが必要な場合は、詳細を使用して質問を編集してください。

— user2856

@ルーク：同様に、私はそれを実行しましたが、タスクを実行しませんでした。Webサイトへのログインやファイルのダウンロードは行われませんでした。それがどのように誤解される可能性があるのかはわかりません。スクリプトが行います。ない。作業。だから、私は別の方法を探しています。

— MattS 2016

悪魔の支持者であるだけで、一括ダウンロード用のBDAアプリケーションの何が問題になっていますか？

— Hornbydd

@Hornbydd：BDAに問題はありません。いつも使っています。それが私の唯一の本当の選択肢だと思います。Python的な方法はありません。

— MattS

PathパラメータとRowパラメータを設定して、すべてのLandsatデータをダウンロードするために使用する非常に厄介なコードを取得しました。また、必要のないいくつかのarcpy関数やカスタム関数もあります。このコードをクリアして、目的に合わせて変更できます（コメントはロシア語です）：

# -*- coding: utf-8 -*-
import os
import sys
import shutil
import subprocess
import traceback
import time
import requests
from pprint import pprint

# выключить предупреждение
requests.packages.urllib3.disable_warnings()
# -----------------------------------------------------------
def find_scenes_in_DB(FC_in_GDB, field_names_list = ['ID', 'ARCHIVE']):
    with arcpy.da.SearchCursor(FC_in_GDB, field_names_list) as DB_cursor:
        uniq_sat_name = sorted({row[0] for row in DB_cursor})
        DB_cursor.reset()
        uniq_sat_archive = sorted({row[1] for row in DB_cursor})
    uniq_sat_name += uniq_sat_archive  
    return uniq_sat_name
# -----------------------------------------------------------
# замеры времени
def Time_now():
    return time.time()
# -----------------------------------------------------------
def Time_elapsed(time_start, time_end):
    return time_end - time_start
# -----------------------------------------------------------
# сгенерировать текст запроса (URL) и вернуть его в виде текста

def generate_json_request(
        request_code,
        json_request_content,
        http_service_endpoint=r'https://earthexplorer.usgs.gov/inventory/json/v/1.4.0/'
):
    return str(http_service_endpoint + request_code + r'?jsonRequest=' + json_request_content)
# -----------------------------------------------------------
# авторизироваться на сайте и получить токен доступа
def login(username, password, catalogId, authType = 'EROS'):
    URL = generate_json_request(
        request_code='login',
        json_request_content='{' +
        '"username":"'    + username  +
        '","password":"'  + password  +
        '","authType":"'  + authType  + 
        '","catalogId":"' + catalogId +
        '"}'
    )

    # послать POST запрос
    answer = requests.post(URL)

    # проверка прошёл ли запрос
    check_answer = (answer.status_code, answer.reason)

    if check_answer != (200, 'OK'):
        print u'Ошибка! Сайт не доступен.', check_answer, u'Ссылка:'
        print URL[:150], '...'
        return None

    else:
        # если ошибка (авторизация провалена)
        if answer.json()['errorCode'] is not None:
            print u'Ошибка!', answer.json()['errorCode'] + ':', answer.json()['error']

            # else:
            #     print u'Авторизация успешна! Токен:', answer.json()['data']
        return answer.json()['data']

# удалить токен авторизации
def logout(apiKey):
    if apiKey is not None:

        URL = generate_json_request(
            request_code='logout',
            json_request_content='{"apiKey":"' + apiKey + '"}'
        )

        # послать POST запрос
        answer = requests.post(URL)

        # проверка прошёл ли запрос
        check_answer = (answer.status_code, answer.reason)

        if check_answer != (200, 'OK'):
            print u'Ошибка! Сайт не доступен.', check_answer, u'Ссылка:'
            print URL[:150], '...'

        else:
            # если ошибка
            if answer.json()['errorCode'] is not None:
                #            print u'Ошибка!', answer.json()['errorCode'] + ':', answer.json()['error']
                return None

            else:
                #            if  answer.json()['data'] == False:
                #                print u'Токен НЕ удалён.'
                #                pprint(answer.json())
                #            else:
                #                print u'Токен удалён.'
                return answer.json()['data']

def datasetfields(datasetName, apiKey, node='EE'):
    URL = generate_json_request(
        request_code='datasetfields',
        json_request_content='{"apiKey":"' + apiKey + '", "node": "' + node + '", "datasetName": "' + datasetName + '"}'
    )

    # послать POST запрос
    answer = requests.post(URL)

    # проверка прошёл ли запрос
    check_answer = (answer.status_code, answer.reason)

    if check_answer != (200, 'OK'):
        print u'Ошибка! Сайт не доступен.', check_answer, u'Ссылка:'
        print URL[:150], '...'

    else:
        # если ошибка
        if answer.json()['errorCode'] is not None:
            print u'Ошибка!', answer.json()['errorCode'] + ':', answer.json()['error']
            return None
        else:
            return answer.json()

# сформировать критерий поиска снимков на основе списка Path Row
def additionalCriteria(list_Path_Row, datasetName):
    additionalCriteria = '{"filterType":"or", "childFilters":['
    # первая итерация?
    first_time = True

    for x in datasetfields(datasetName, apiKey).get('data'):
        if x.get('name') == 'WRS Path': Path_fieldId = x.get('fieldId')
        if x.get('name') == 'WRS Row':   Row_fieldId = x.get('fieldId')

    for PR in list_Path_Row:
        Path = str(PR[0])
        Row = str(PR[1])

        filter_Path = '{"filterType":"value", "fieldId":' + str(Path_fieldId) + ', "value":"' + Path + '", "operand":"="}'  # "fieldId":10036 - PATH
        filter_Row = '{"filterType":"value", "fieldId":' + str(Row_fieldId) + ', "value":"' + Row + '", "operand":"="}'  # "fieldId":10038 - ROW
        filter_And = '{"filterType":"and", "childFilters":[' + filter_Path + ',' + filter_Row + ']}'

        # если первая - не добавляем запятую в начало
        if first_time == True:
            additionalCriteria += filter_And
            first_time = False
        else:
            additionalCriteria += ' ,' + filter_And

    additionalCriteria += ']}'
    return additionalCriteria

# поиск снимков
def search_scenes(datasetName, list_Path_Row, apiKey,
                  startDate='2016-05-01T00:00:00Z',
                  endDate='2100-01-01T00:00:00Z',
                  months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                  maxResults=10,
                  ):
    URL = generate_json_request(
        request_code='search',
        json_request_content='{"datasetName":"' + datasetName + '"' + \
                             ',"temporalFilter":{"dateField": "search_date","startDate":"' + startDate + '","endDate":"' + endDate + '"}' + \
                             ',"months":' + str(months) + \
                             ',"maxResults":' + str(maxResults) + \
                             ',"additionalCriteria":' + additionalCriteria(list_Path_Row, datasetName) + \
                             ',"node":"EE","apiKey":"' + apiKey + '"}'
    )

    # послать POST запрос
    answer = requests.post(URL)

    # проверка прошёл ли запрос
    check_answer = (answer.status_code, answer.reason)

    if check_answer != (200, 'OK'):
        print u'Ошибка! Сайт не доступен.', check_answer, u'Ссылка:'
        print URL[:150], '...'
        print
        print URL

    else:
        # если ошибка
        if answer.json()['errorCode'] is not None:
            print u'Ошибка!', answer.json()['errorCode'] + ':', answer.json()['error']
            return None
        else:
            if answer.json()['data'] in (None, ''):
                print u'Снимков не найдено.'
            else:
                number_of_scenes = answer.json()['data']['totalHits']

            dwnld_URL_list = []
            for scene in answer.json()['data']['results']:
                dwnld_URL_list.append(scene['entityId'])

            return dwnld_URL_list

# вернуть список URL для прямой закачки снимков
def get_download_list(datasetName, entityIds, apiKey, node='EE', products='["STANDARD"]'):
    entityIds = str(['"' + x.encode('UTF8') + '"' for x in entityIds]).replace("'", '')

    URL = generate_json_request(
        request_code='download',
        json_request_content='{' + \
                             '"datasetName": "' + datasetName + '",' + \
                             '"apiKey": "' + apiKey + '",' + \
                             '"node": "' + node + '",' + \
                             '"entityIds": ' + str(entityIds) + ',' + \
                             '"products": ' + products + \
                             '}'
    )

    # послать POST запрос
    answer = requests.post(URL)

    # проверка прошёл ли запрос
    check_answer = (answer.status_code, answer.reason)

    if check_answer != (200, 'OK'):
        print u'Ошибка! Сайт не доступен.', check_answer, u'Ссылка:'
        print URL[:150], '...'
        return None

    else:
        # если ошибка (авторизация провалена)
        if answer.json()['errorCode'] is not None:
            print u'Ошибка!', answer.json()['errorCode'] + ':', answer.json()['error']
        return answer.json()['data']

# делаем список уже закаченных файлов
def list_of_scenes_in_archive(archive_path):
    # archive_path = archive_path.decode('utf-8')
    loaded = []
    dirList = os.listdir(archive_path)
    for fname in dirList:
        fname = fname[:fname.rfind('.')]  # удаляются 4 последних знака (в моём случае это ".rar")
        loaded.append(fname)
    return loaded

# скачать и сохранить файл
def download_file(url, file_path):
    r = requests.get(url, timeout=120, stream=True)
    with open(file_path, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            f.write(chunk)
    return file_path

# тестировать файл формата tar.gz
def test_gz_archive(archive_path, GZIP_path):
    coding = 'cp1251'
    GZIP_path = GZIP_path.encode(coding)
    archive_path = archive_path.encode(coding)
    cmd_list = [GZIP_path, '-t', archive_path]
    # запуск консоли и выполнение команды в скрытном режиме с выводом
    subprocess.check_output(cmd_list, stderr=subprocess.STDOUT, shell=True)

def main(username, password, catalogId, datasets, WRS_2_Path_Row_list, WRS_1_Path_Row_list, archive_path, temp_dwnld_folder, FC_in_GDB):
    while True:

        try:
            print time.strftime("%d.%m.%Y %H:%M:%S"),
            global apiKey  # глобальная переменная!!!
            apiKey = login(username, password, catalogId)  # авторизация

            # список уже закаченных снимков
            loaded = list_of_scenes_in_archive(archive_path)
            print u"Всего файлов в локальном хранилище:", len(loaded)

            loaded_2 = find_scenes_in_DB(FC_in_GDB)
            loaded += loaded_2

            for dataset in datasets:
                # print '---------------------------'
                print u'Набор данных:', dataset

                if dataset in 'LANDSAT_MSS':
                    list_Path_Row = WRS_1_Path_Row_list
                else:
                    list_Path_Row = WRS_2_Path_Row_list

                # делаем запросы на каждый отдельный Path-Row, чтобы не превысить ограничение по длине URL
                for path_row in list_Path_Row:

                    entityIds = search_scenes(datasetName=dataset,
                                              list_Path_Row=[path_row],
                                              apiKey=apiKey,
                                              maxResults=50000)

                    # список сцен которых нет в архиве
                    dwnld_list = list(set(entityIds) - set(loaded))

                    if len(dwnld_list) > 0:
                        print time.strftime("%d.%m.%Y %H:%M:%S"),
                        print u'Path Row:', str(path_row[0]), str(path_row[1]) + '.',
                        print u'Всего снимков:', str(len(entityIds)) + '.',
                        print u'Новых:', str(len(dwnld_list)) + '.',
                        print u'Закачка файлов:'

                        # получаем сслыку на закачку для каждой отдельной сцены
                        for (scene_num, dwnld_scene) in enumerate(dwnld_list):

                            DICTs_list = get_download_list(dataset, [dwnld_scene], apiKey, node='EE',
                                                          products='["STANDARD"]')

                            # скачать снимки
                            for DICT in DICTs_list:
                                print time.strftime("%d.%m.%Y %H:%M:%S"), u'Cнимок:',
                                URL = DICT[u'url']

                                # формирование пути
                                scene_product_ID = URL[URL.rfind('/') + 1:URL.rfind('.tar.gz?')]   # название снимка по-новому
                                scene_ID = DICT[u'entityId'] # название снимка по-старому

                                scene_path = os.path.join(temp_dwnld_folder, scene_product_ID + u'.tar.gz')

                                print str(scene_num + 1), scene_product_ID

                                time_start = Time_now()  # текущее время (начало закачки)

                                try:
                                    ##                                    print
                                    ##                                    print '-'*10
                                    ##                                    print URL
                                    ##                                    print scene_path
                                    real_file_len = requests.head(URL).headers[
                                        'content-length']  # размер файла на сервере
                                    download_file(URL, scene_path)  # закачка
                                    dwnld_file_len = str(long(os.path.getsize(scene_path)))  # размер закаченного файла                                    

                                    if os.path.isfile(scene_path):
                                        if real_file_len != dwnld_file_len:                                            
                                            os.remove(scene_path)
                                            print u'<= DEL, файл не докачался'

                                    # print real_file_len
                                    # print dwnld_file_len
                                    # print real_file_len == dwnld_file_len
                                    # print '-'*10

                                    # проверка архива
                                    # test_gz_archive(scene_path, GZIP_path = ur'G:\Install\GZIP\gzip.exe')

                                    if scene_num + 1 != len(dwnld_list):  # вывести запятую
                                        print ',',

                                # если ошибка при закачке - удалить недокаченный файл
                                except:
                                    print u'<= DEL,',
                                    if os.path.isfile(scene_path):
                                        os.remove(scene_path)

                                    traceback.print_exc()  # напечатать ошибку

                                time_end = Time_now()  # текущее время (конец закачки)

                                # если закачка была менее или равно 60 минут назад
                                if Time_elapsed(time_start, time_end) <= 60 * 60:
                                    logout(apiKey)
                                    apiKey = login(username, password, catalogId)  # авторизация

                                elif Time_elapsed(time_start, time_end) > 60 * 60:
                                    apiKey = login(username, password, catalogId)  # авторизация

            # если авторизация была успешной - удалить токен авторизации
            if apiKey is not None: logout(apiKey)

        # если в процессе выполнения была ошибка
        except:
            try:
                logout(apiKey)  # удалить токен авторизации
            except:
                pass
            print
            traceback.print_exc()  # напечатать ошибку

        print time.strftime("%d.%m.%Y %H:%M:%S") + u' =========== Повтор через час ===========\n'
        time.sleep(3600)


if __name__ == "__main__":

    WRS_2_Path_Row_list=[
        [165, 14], [166, 14],
    ]

    WRS_1_Path_Row_list=[
        [182, 13], [176, 14],
    ]


    main(
        username='username',
        password='password',
        catalogId = "EE",  # возможные параметры catalogId: "EE" "GLOVIS" "HDDS" "LPCS"

        datasets=[
            'LANDSAT_8_C1',
##            'LANDSAT_8_PREWRS',
##            'LANDSAT_ETM_C1',
##            'LANDSAT_TM_C1',
##            'LANDSAT_MSS',
        ],
        WRS_2_Path_Row_list=WRS_2_Path_Row_list,
        WRS_1_Path_Row_list=WRS_1_Path_Row_list,
        archive_path=ur"S:\Landsat",
        temp_dwnld_folder=ur'G:\temp'
    )

    # LANDSAT_8_C1          Landsat 8 Operational Land Imager and Thermal Infrared Sensor Collection 1 Level-1
    # LANDSAT_8_PREWRS      Landsat 8 Operational Land Imager and Thermal Infrared Sensor Pre-WRS-2: 2013
    # LANDSAT_ETM_C1        Landsat 7 Enhanced Thematic Mapper Plus Collection 1 Level-1
    # LANDSAT_TM_C1         Landsat 4-5 Thematic Mapper Collection 1 Level-1
    # LANDSAT_MSS           Landsat 1-5 Multispectral Scanner: 1972-2013

— 同志チェ
ソース

セレンWebドライバーを使用して、Webサイトにアクセスしてナビゲートできます。BeatifulSoupを使用して、ダウンロードをこすり、特定することもできます。これらのパッケージを一緒に使用すると、問題が解決します。

https://www.seleniumhq.org/

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

これらのパッケージを使用してWebデータのダウンロードを自動化する作業コードの例を次に示します。このスクリプトは、ニーズに合わせて変更できるはずです。

import time
import os
import urllib2
import re
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import requests

#Set Output
output = (r"C:\DataDownloads")

#Get current date
current_date = (time.strftime("%m%d%Y"))


# This script requires a web driver to run, and must be downloaded prior to executing the script
# For instance, chrome webdriver (https://sites.google.com/a/chromium.org/chromedriver/downloads), or webdriver for browser of choice.
# This scripts webdriver is currently set to the chrome browser. 


#                                      **Make sure the webdriver in your PATH or else the script will fail.**
driver = webdriver.Chrome()


##################################################################################################################
Automate website data download
##################################################################################################################

#Web address for the data site
url = "https://data-mrgis.opendata.arcgis.com/datasets/madison-county-parcels-live"

#initialize webdriver
driver.get(url)

# wait till the web page is fully loaded
time.sleep(8)

# make the dropdown options available for scraping
element = driver.find_element_by_id('download-button')
element.click()

# scrape the page in its current state and close browser
content = driver.page_source.encode('utf-8').strip()
driver.close()

# Use BeaurifulSoup to scrape the page data
soup = BeautifulSoup(content,"html.parser")

# Find everything with "li" (list) tags
li_tags = soup.find_all("li")

# create empty list
zip_tags = []

# get a list of li tags that have a zip file path in them
for n in li_tags:
    s = str(n.contents[0])

    if ".zip" in s:
        zip_tags.append(s)

#Find our download link using regex
for n in zip_tags:
    result = re.search('href="(.*)" id', n)
    dwnld_url = result.group(1)

#Check if the output directory exists. If not, than create new directory.
Idaho = os.path.join(output, "ID")
if not os.path.exists(Idaho):
    os.makedirs(Idaho)

#Create path for data export
Complete_Path = os.path.join(Idaho, "Madison_" + current_date + ".zip")

#Read and Write download to file output location
with open(Complete_Path, "wb") as Madison:
    ID_data = urllib2.urlopen(dwnld_url)
    ID_data_write = ID_data.read()
    print ("Downloading Data")

— Pdavis327
ソース

BDAアプリケーションを使用して、シーン全体をダウンロードします。しかしながら; 私は最近、Google Earth EngineのPython APIとモジュール（https://github.com/loicdtx/landsat-extract-gee）を使用しており、単一ピクセルの抽出に非常にうまく機能しますが、シーン全体も簡単に実行できます。設定が簡単で、きちんと文書化されています。「ホワイトリスト」承認プロトコルを必ず実行してください。そうしないと、403エラーが発生します。

— プラネットサンドマン
ソース