Table of Contents

샘플 데이터

By Column

DataFrame 에 대괄호는 컬럼 기준으로 인덱싱한다
대괄호 안에 컬럼명을 넣으면 Series 리턴
대괄호 안에 다시 대괄호로 리스트를 넘기면 DataFrame 리턴

x = df['ticker']

type(x)
--------------------------
pandas.core.series.Series

x = df[['ticker']]

type(x)
----------------------------
pandas.core.frame.DataFrame

df.filter(like='(배)').head(3)

df.filter(regex='\([배원]\)').head(3)

By Type

df.select_dtypes(include=['object']).head(3)

By Row

df.set_index('ticker', inplace=True)
df.index.name = '종목명'

# 숫자(순서) 이용 -> 잘 안쓰임
df.iloc[[0, 1], 0:5]
df.iloc[[0, 1], [0, 3, 5]]

# loc
df.loc[['AK홀딩스', '삼성전자'], ['매출액(억원)', '순이익률(%)']]

df.sort_index(ascending=True, inplace=True)

# 인덱스를 정렬하고 나면 '삼성' 처럼 '삼성'이 없어도 범위 인덱싱을 지원
df.loc['삼성':'삼성생명']

By At

%timeit df.loc[100, '순이익률(%)']
----------------------------------------------------------------------------
3.73 µs ± 4.38 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# df.at 이 2배 이상 빠르다
%timeit df.at[100, '순이익률(%)']
------------------------------------------------------------------------------
1.73 µs ± 2.25 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

By Boolean

condition = df['순이익률(%)'] > 20

df[condition].head(3)

print(f'순이익률이 20%를 넘는 회사의 수: {condition.sum()}개')
print(f'순이익률이 20%를 넘는 회사의 비율: {condition.mean()*100:.1f}%')
-----------------------------------
순이익률이 20%를 넘는 회사의 수: 22개
순이익률이 20%를 넘는 회사의 비율: 3.2%

# loc 안에도 Boolean Series 넣을 수 있다
df.loc[condition].head(3)

# 다양한 조건 예시
condition1 = df['순이익률(%)'] > 20
condition2 = df['ticker'] == '현대건설'
condition3 = df['ticker'].isin(['삼성전자', 'LG디스플레이'])
condition4 = df['매출액(억원)'].isin([1389.7075])
condition5 = df['ticker'].str.contains('LG')
condition6 = condition3 & condition5

# 모든 값이 조건을 만족하면 True
(df['순이익률(%)'] > 20).all()
----------------------------------
False


# 조건을 만족하는 값이 하나라도 있으면 True
(df['순이익률(%)'] > 20).any()
----------------------------------
True