パンダを使用して2つの列を比較する

Question 1

これを出発点として使用する：

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

ifパンダの中でステートメントのようなものを使いたいです。

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

基本的に、ifステートメントを介して各行を確認し、新しい列を作成します。

ドキュメントは使用すると言っています.allが、例はありません...

Question 2

np.whereを使用できます。場合condブール配列であり、AそしてBその後、アレイであります

C = np.where(cond, A, B)

Cを、AwherecondがTrue、BwherecondがFalseに等しいと定義します。

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

収量

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

複数の条件がある場合は、代わりにnp.selectを使用できます。たとえば、次の場合df['que']に等しくdf['two']したい場合df['one'] < df['two']は、

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

収量

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

df['one'] >= df['two']whendf['one'] < df['two']がFalseであると想定できる場合、条件と選択は次のように簡略化できます。

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

（NaNが含まれている場合、df['one']またはdf['two']含まれている場合、この仮定は正しくない可能性があります。）

ご了承ください

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

文字列値を使用してDataFrameを定義します。それらは数値に見えるので、これらの文字列を浮動小数点数に変換したほうがよい場合があります。

df2 = df.astype(float)

ただし、文字列は文字ごとに比較され、浮動小数点数は数値で比較されるため、これにより結果が変わります。

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False

Question 3

.equals列またはデータフレーム全体に使用できます。

df['col1'].equals(df['col2'])

それらが等しい場合、そのステートメントはTrue、elseを返しますFalse。

Question 4

apply（）を使用して、次のようなことを行うことができます

df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)

またはラムダを使用したくない場合

def que(x):
    if x['one'] >= x['two'] and x['one'] <= x['three']:
        return x['one']
    return ''
df['que'] = df.apply(que, axis=1)

Question 5

1つの方法は、ブール系列を使用して列にインデックスを付けることdf['one']です。これはあなたに新しい列与えTrueエントリが同じ行と同じ値を持っているdf['one']し、False値がされているがNaN。

ブール系列は、ifステートメントによって指定されます（ただし、の&代わりに使用する必要がありますand）。

>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
    one two three   que
0   10  1.2 4.2      10
1   15  70  0.03    NaN
2   8   5   0       NaN

NaN値を他の値に置き換えたい場合fillnaは、新しい列でメソッドを使用できますque。0ここでは空の文字列の代わりに使用しました：

>>> df['que'] = df['que'].fillna(0)
>>> df
    one two three   que
0   10  1.2   4.2    10
1   15   70  0.03     0
2    8    5     0     0

Question 6

個々の条件を括弧で囲み、&演算子を使用して条件を組み合わせます。

df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']

~（「not」演算子）を使用して一致を反転するだけで、一致しない行を埋めることができます。

df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''

あなたは使用する必要がある&と~いうよりandとnotするので&と~オペレーターの作業要素ごと。

最終結果：

df
Out[8]: 
  one  two three que
0  10  1.2   4.2  10
1  15   70  0.03    
2   8    5     0

Question 7

np.selectデータフレームからチェックする条件が複数あり、特定の選択肢を別の列に出力する場合に使用します

conditions=[(condition1),(condition2)]
choices=["choice1","chocie2"]

df["new column"]=np.select=(condtion,choice,default=)

注：条件と選択肢は一致しないはずです。2つの異なる条件で同じ選択肢がある場合は、選択肢のテキストを繰り返してください。

Question 8

OPの直感に最も近いのは、インラインのifステートメントだと思います。

df['que'] = (df['one'] if ((df['one'] >= df['two']) and (df['one'] <= df['three']))