UnicodeEncodeError： 'latin-1'コーデックは文字をエンコードできません

Question 1

データベースに外来文字を挿入しようとすると、このエラーの原因は何ですか？

>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)

そして、どうすれば解決できますか？

ありがとう！

Question 2

文字U + 201Cの左二重引用符は、Latin-1（ISO-8859-1）エンコーディングには存在しません。

これは、あるコードページ1252（西ヨーロッパ）に存在します。これは、ISO-8859-1に基づくWindows固有のエンコードですが、追加の文字を0x80-0x9Fの範囲に入れます。コードページ1252はISO-8859-1と混同されることが多く、ページをISO-8859-1として提供する場合、ブラウザーはそれらをcp1252として処理するという、煩わしいが今では標準的なWebブラウザーの動作です。ただし、これらは実際には2つの異なるエンコーディングです。

>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'

データベースをバイトストアとしてのみ使用している場合は、cp1252を使用“して、Windowsウエスタンコードページにあるその他の文字をエンコードできます。しかし、cp1252に存在しない他のUnicode文字はエラーの原因になります。

を使用encode(..., 'ignore')して文字を取り除くことでエラーを抑制することができますが、今世紀の実際には、データベースとページの両方でUTF-8を使用する必要があります。このエンコーディングでは、任意の文字を使用できます。また、理想的には、MySQLにUTF-8文字列を使用していることを伝える必要があります（データベース接続と文字列列の照合を設定することにより）。これにより、大文字と小文字を区別しない比較と並べ替えが正しく行われます。

Question 3

Python MySQLdbモジュールを使用しているときに、同じ問題に遭遇しました。MySQLでは、文字セットに関係なく、必要なほぼすべてのバイナリデータをテキストフィールドに保存できるため、ここに私の解決策を見つけました。

Python MySQLdbでのUTF8の使用

編集：最初のコメントでリクエストを満たすために上記のURLから引用します...

"UnicodeEncodeError： 'latin-1'コーデックは文字をエンコードできません..."

これは、MySQLdbが通常すべてをラテン-1にエンコードしようとするためです。これは、接続を確立した直後に次のコマンドを実行することで修正できます。

db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')

「db」はの結果MySQLdb.connect()、「dbc」はの結果です db.cursor()。

Question 4

最良の解決策は

mysqlの文字セットを「utf-8」に設定します
このコメントが好き（追加use_unicode=Trueとcharset="utf8"）

db = MySQLdb.connect（host = "localhost"、user = "root"、passwd = ""、db = "testdb"、use_unicode = True、charset = "utf8"）– KyungHoon Kim Mar 13 '14 at 17:04

詳細を参照してください：

class Connection(_mysql.connection):

    """MySQL Database Connection Object"""

    default_cursor = cursors.Cursor

    def __init__(self, *args, **kwargs):
        """

        Create a connection to the database. It is strongly recommended
        that you only use keyword parameters. Consult the MySQL C API
        documentation for more information.

        host
          string, host to connect

        user
          string, user to connect as

        passwd
          string, password to use

        db
          string, database to use

        port
          integer, TCP/IP port to connect to

        unix_socket
          string, location of unix_socket to use

        conv
          conversion dictionary, see MySQLdb.converters

        connect_timeout
          number of seconds to wait before the connection attempt
          fails.

        compress
          if set, compression is enabled

        named_pipe
          if set, a named pipe is used to connect (Windows only)

        init_command
          command which is run once the connection is created

        read_default_file
          file from which default client values are read

        read_default_group
          configuration group to use from the default file

        cursorclass
          class object, used to create cursors (keyword only)

        use_unicode
          If True, text-like columns are returned as unicode objects
          using the connection's character set.  Otherwise, text-like
          columns are returned as strings.  columns are returned as
          normal strings. Unicode objects will always be encoded to
          the connection's character set regardless of this setting.

        charset
          If supplied, the connection character set will be changed
          to this character set (MySQL-4.1 and newer). This implies
          use_unicode=True.

        sql_mode
          If supplied, the session SQL mode will be changed to this
          setting (MySQL-4.1 and newer). For more details and legal
          values, see the MySQL documentation.

        client_flag
          integer, flags to use or 0
          (see MySQL docs or constants/CLIENTS.py)

        ssl
          dictionary or mapping, contains SSL connection parameters;
          see the MySQL documentation for more details
          (mysql_ssl_set()).  If this is set, and the client does not
          support SSL, NotSupportedError will be raised.

        local_infile
          integer, non-zero enables LOAD LOCAL INFILE; zero disables

        autocommit
          If False (default), autocommit is disabled.
          If True, autocommit is enabled.
          If None, autocommit isn't set and server default is used.

        There are a number of undocumented, non-standard methods. See the
        documentation for the MySQL C API for some hints on what they do.

        """

Question 5

データベースが少なくともUTF-8であることを願っています。次にyourstring.encode('utf-8')、それをデータベースに配置する前に実行する必要があります。

Question 6

そのコードポイントを記述できない\u201cエンコーディングISO-8859-1 / Latin-1を使用してUnicodeコードポイントを保存しようとしています。utf-8を使用するようにデータベースを変更し、適切なエンコーディングを使用して文字列データを格納する必要がある場合と、コンテンツを格納する前に入力を無害化する場合があります。つまり、Sam Rubyの優れたi18nガイドのようなものを使用します。これwindows-1252により、発生する可能性のある問題について説明し、その処理方法と、サンプルコードへのリンクを提案します。

Question 7

SQLAlchemyユーザーは、フィールドをと指定するだけconvert_unicode=Trueです。

例： sqlalchemy.String(1000, convert_unicode=True)

SQLAlchemyは単にUnicodeオブジェクトを受け入れ、それらを返し、エンコーディング自体を処理します。

文書

Question 8

Latin-1（別名ISO 8859-1）は単一オクテット文字エンコード方式であり、バイトに\u201c（“）を合わせることができません。

UTF-8エンコーディングを使用するつもりでしたか？

Question 9

以下のスニペットを使用して、テキストをラテン語から英語に変換します

import unicodedata
def strip_accents(text):
    return "".join(char for char in
                   unicodedata.normalize('NFKD', text)
                   if unicodedata.category(char) != 'Mn')

strip_accents('áéíñóúü')

出力：

「あいのう」

Question 10

Python：＃-*-コーディング：UTF-8-*-（*の前後のスペースを削除） をpythonファイルの最初の行に追加する必要があります。次に、エンコードするテキストに.encode（ 'ascii'、 'xmlcharrefreplace'）を追加します。これにより、すべてのUnicode文字がASCIIの同等の文字に置き換えられます。