Posted 2021-04-30Updated 2021-04-308 hours read (About 70138 words)

Beyesian_Optimization

Beyesian Optimization¶

불필요한 하이퍼 파라미터 반복 탐색을 줄여 빠르게 최적의 하이퍼 파라미터를 찾을 수 있는 방법
하이퍼 파라미터를 선택하는 문제에 대하여 정형화된 방법을 찾지 못해, 경험에 의해서 선택해야 함

사용하는 이유

그리드 서치 or 랜덤 서치는 도출된 하이퍼 파라미터 값을 일일이 모델에 적용한 뒤 성능 비교를 해야 하는 부분이 존재하지만 베이지안 최적화는 사용자가 설정한 파라미터 구간내에서 최적의 값을 도출함

베이지안 최적화에 사용되는 핵심 모듈¶

Surrogate model

확보된 데이터와 평가지표의 숨겨진 관계를 모델링

Acquisition function

Surrogate model을 활용해 다음 탐색할 지점을 결정

베이지안 최적화는 사전에 정의된 하이퍼 파라미터 집합으로부터 일반화 성능을 도출하는 surrogate model과 acquisition function을 사용하여 최적의 조합 결정

베이지안 최적화 과정

In [ ]:

lgb_params = {
    'num_leaves': (2, 50),
    'colsample_bytree':(0.1, 1), 
    'subsample': (0.1, 1),
    'max_depth': (1, 50),
    'reg_alpha': (0, 0.5),
    'reg_lambda': (0, 0.5), 
    'min_split_gain': (0.001, 0.1),
    'min_child_weight':(0, 50),
    'subsample_freq': (2, 50),
    'max_bin': (5,200),
}

일정량의 hyperparameter 세팅을 X변수로 하고

In [ ]:

def lgb_roc_eval(num_leaves, colsample_bytree, subsample, max_depth, reg_alpha, reg_lambda, min_split_gain, min_child_weight, subsample_freq, max_bin):
    
    params = {
        'learning_rate':0.01,
        'num_leaves': int(round(num_leaves)),   #  호출 시 실수형 값이 들어오므로 정수형 하이퍼 파라미터는 정수형으로 변경 
        'colsample_bytree': colsample_bytree, 
        'subsample': subsample,
        'max_depth': int(round(max_depth)),
        'reg_alpha': reg_alpha,
        'reg_lambda': reg_lambda, 
        'min_split_gain': min_split_gain,
        'min_child_weight': min_child_weight,
        'subsample_freq': int(round(subsample_freq)),
        'max_bin': int(round(max_bin)),
    }
    lgb_model = LGBMClassifier(**params)
    lgb_model.fit(bayes_x, bayes_y, eval_set=[(bayes_x_test, bayes_y_test)], early_stopping_rounds=100, eval_metric="auc", verbose=False)
    valid_proba = lgb_model.predict_proba(bayes_x_test, num_iteration=10)[:,1]
    roc_preds = roc_auc_score(bayes_y_test, valid_proba)
    
    return roc_preds

모델의 성능을 Y변수로 하여 데이터셋을 만든다

In [ ]:

BO_lgb = BayesianOptimization(lgb_roc_eval, lgb_params, random_state=2121)

In [ ]:

BO_lgb.maximize(init_points=5, n_iter=10)

BO_lgb.max

In [ ]:

|   iter    |  target   | colsam... |  max_bin  | max_depth | min_ch... | min_sp... | num_le... | reg_alpha | reg_la... | subsample | subsam... |
-------------------------------------------------------------------------------------------------------------------------------------------------
|  1        |  0.8342   |  0.329    |  196.3    |  41.25    |  45.29    |  0.06231  |  28.93    |  0.1433   |  0.4287   |  0.3861   |  46.96    |
|  2        |  0.838    |  0.6284   |  15.83    |  39.32    |  44.95    |  0.04359  |  24.72    |  0.4127   |  0.000694 |  0.7192   |  13.98    |
|  3        |  0.8338   |  0.3291   |  72.7     |  38.96    |  25.69    |  0.06552  |  20.02    |  0.03046  |  0.3621   |  0.54     |  12.71    |
|  4        |  0.8277   |  0.2622   |  34.52    |  13.99    |  3.131    |  0.0492   |  11.02    |  0.1738   |  0.4319   |  0.308    |  17.25    |
|  5        |  0.8271   |  0.4439   |  138.9    |  9.848    |  48.94    |  0.02053  |  14.95    |  0.4408   |  0.213    |  0.8062   |  14.36    |
|  6        |  0.8363   |  0.6169   |  74.36    |  36.76    |  28.7     |  0.07251  |  19.8     |  0.2751   |  0.05854  |  0.3577   |  13.47    |
|  7        |  0.761    |  0.1915   |  13.88    |  33.33    |  44.99    |  0.06156  |  29.77    |  0.3761   |  0.4008   |  0.801    |  8.465    |
|  8        |  0.8006   |  0.1517   |  72.77    |  32.76    |  25.55    |  0.05765  |  22.96    |  0.03691  |  0.3594   |  0.9731   |  14.69    |
|  9        |  0.8348   |  0.9296   |  74.38    |  40.81    |  29.16    |  0.024    |  16.55    |  0.3314   |  0.1303   |  0.75     |  15.69    |
|  10       |  0.8271   |  0.4366   |  17.04    |  39.74    |  46.38    |  0.09246  |  23.49    |  0.1639   |  0.09529  |  0.518    |  19.89    |
|  11       |  0.8382   |  0.5201   |  16.37    |  39.06    |  48.15    |  0.05973  |  19.88    |  0.3476   |  0.06915  |  0.6018   |  13.36    |
|  12       |  0.7623   |  0.1818   |  75.99    |  41.25    |  32.32    |  0.04383  |  22.84    |  0.4678   |  0.4575   |  0.4314   |  12.78    |
|  13       |  0.8294   |  0.2367   |  15.62    |  40.93    |  41.88    |  0.09569  |  18.08    |  0.4144   |  0.2859   |  0.6626   |  11.77    |
|  14       |  0.7633   |  0.1896   |  197.7    |  41.72    |  41.81    |  0.0144   |  29.92    |  0.1247   |  0.1963   |  0.2601   |  43.21    |
|  15       |  0.8383   |  0.838    |  75.9     |  37.78    |  28.52    |  0.04416  |  15.26    |  0.3148   |  0.3394   |  0.736    |  14.43    |
=================================================================================================================================================
{'target': 0.8382516227577165,
 'params': {'colsample_bytree': 0.8380432818402,
  'max_bin': 75.89932895205534,
  'max_depth': 37.77537008850686,
  'min_child_weight': 28.524551188271897,
  'min_split_gain': 0.04415839847288688,
  'num_leaves': 15.259817309189941,
  'reg_alpha': 0.31476726664722326,
  'reg_lambda': 0.3394257380973246,
  'subsample': 0.7360076401454746,
  'subsample_freq': 14.425454096134617}

해당 데이터셋에서 X와 Y의 관계를 가지고 사전 지식을 만든 후 새로운 데이터(X변수, hyperparameter 조합)이 들어 왔을 때 일반화 성능이 우수하도록 하는 최적의 X를 찾아서 추가해주는 형식으로 진행

위의 프로세스에서 surrogate model과 acquisition function 등장하는데

surrogate model은 X변수를 입력받아 일반화 성능과 불확실성에 관련된 분포를 내뱉는 함수
Acquisition function은 가장 최적의 x값을 찾는 함수

장점¶

탐색 시간적인 측면에서 효율을 가질 수 있음

단점¶

hyperparameter가 증가하면 차원이 급격하게 늘어나 성능을 예측하는 것이 점점 어려움

참고자료 : https://www.kaggle.com/elon4773/titanic-visualization-bayesian-optimization

Posted 2021-04-28Updated 2021-04-2843 minutes read (About 6523 words)

Pycaret을 사용한 모델링

Pycaret을 사용한 모델링!

이 노트의 목적
그 전까지는 사이킷런으로 lightGBM, Catboost 모델링 한 것을 kaggle에 summit 했었다.
하지만 pycaret을 사용하여 lightGBM과 Catboost를 자동으로 하이퍼 파라미터 튜닝 하여 기존의 점수와 비교해보려 한다.

-결과
*사이킷런으로 스스로 하이퍼파라미터 튜닝한 것 보다, pycaret tune_model로 하이퍼 파라미터 튜닝한 것이 kaggle 점수가 높게 나왔다. *

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

1	!pip install scikit-learn==0.23.2

Collecting scikit-learn==0.23.2
  Downloading scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 5.4 MB/s 
[?25hRequirement already satisfied: scipy>=0.19.1 in /opt/conda/lib/python3.7/site-packages (from scikit-learn==0.23.2) (1.5.4)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn==0.23.2) (2.1.0)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn==0.23.2) (1.0.1)
Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.7/site-packages (from scikit-learn==0.23.2) (1.19.5)
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.24.1
    Uninstalling scikit-learn-0.24.1:
      Successfully uninstalled scikit-learn-0.24.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyldavis 3.3.1 requires numpy>=1.20.0, but you have numpy 1.19.5 which is incompatible.
pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.4.0 which is incompatible.
imbalanced-learn 0.8.0 requires scikit-learn>=0.24, but you have scikit-learn 0.23.2 which is incompatible.[0m
Successfully installed scikit-learn-0.23.2

https://www.kaggle.com/udbhavpangotra/tps-apr21-eda-model

https://www.kaggle.com/hiro5299834/tps-apr-2021-voting-pseudo-labeling

KAGGLE 스터디

import pandas as pd
import numpy as np
import random
import os

from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold

import lightgbm as lgb
import catboost as ctb
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier, export_graphviz

import graphviz
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter('ignore')

TARGET = 'Survived'

N_ESTIMATORS = 1000
N_SPLITS = 10
SEED = 2021
EARLY_STOPPING_ROUNDS = 100
VERBOSE = 100

# #랜덤 시드 생성
def set_seed(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    
set_seed(SEED)

데이터 전처리

lode data

train_df = pd.read_csv('../input/tabular-playground-series-apr-2021/train.csv')
test_df = pd.read_csv('../input/tabular-playground-series-apr-2021/test.csv')
submission = pd.read_csv('../input/tabular-playground-series-apr-2021/sample_submission.csv')
#test_df['Survived'] = pd.read_csv("../input/submission-merged3/submission_merged3.csv")['Survived']

all_df = pd.concat([train_df, test_df]).reset_index(drop=True)
#reset_index : 인덱스를 세팅한다. drop=True를 하면 인덱스를 세팅한걸 삭제함.

1 2	print('Rows and Columns in train dataset:', train_df.shape) print('Rows and Columns in test dataset:', test_df.shape)

Rows and Columns in train dataset: (100000, 12)
Rows and Columns in test dataset: (100000, 11)

결측치 갯수 출력

print('Missing values per columns in train dataset')
for col in train_df.columns:
    temp_col = train_df[col].isnull().sum()
    print(f'{col}: {temp_col}')
print()
print('Missing values per columns in test dataset')
for col in test_df.columns:
    temp_col = test_df[col].isnull().sum()
    print(f'{col}: {temp_col}')

Missing values per columns in train dataset
PassengerId: 0
Survived: 0
Pclass: 0
Name: 0
Sex: 0
Age: 3292
SibSp: 0
Parch: 0
Ticket: 4623
Fare: 134
Cabin: 67866
Embarked: 250

Missing values per columns in test dataset
PassengerId: 0
Pclass: 0
Name: 0
Sex: 0
Age: 3487
SibSp: 0
Parch: 0
Ticket: 5181
Fare: 133
Cabin: 70831
Embarked: 277

Filling missing values

#나이는 나이의 평균치로 채운다.
all_df['Age'] = all_df['Age'].fillna(all_df['Age'].mean())

#cabin은 문자열을 분할하고, 제일 첫번째 글자를 따와서 넣는다. 결측치엔 X를 넣는다.
#strip() : 양쪽 공백을 지운다. 여기서느 x[0]외엔 다 지우는듯. 
all_df['Cabin'] = all_df['Cabin'].fillna('X').map(lambda x: x[0].strip())


#print(all_df['Ticket'].head(10))
#Ticket, fillna with 'X', split string and take first split 
#split() : 문자열 나누기. 디폴트는 ' '이고, 문자를 가진 데이터들이 전부 띄워쓰기로 구분되어있기때문에 가능. 
all_df['Ticket'] = all_df['Ticket'].fillna('X').map(lambda x:str(x).split()[0] if len(str(x).split()) > 1 else 'X')

#pclass에 따른 Fare의 평균을 구해서 dictionary형태로 만든다. 
fare_map = all_df[['Fare', 'Pclass']].dropna().groupby('Pclass').median().to_dict()
#fare의 결측치에 본인 행의 pclass 값을 넣고, 그 값을 fare 평균에 맵핑시킨다.  
all_df['Fare'] = all_df['Fare'].fillna(all_df['Pclass'].map(fare_map['Fare']))
#유독 높은 가격이나 낮은 가격이 있기때문에, 이상치의 영향을 줄이기 위해서 Fare에 log를 취해준다.
all_df['Fare'] = np.log1p(all_df['Fare'])


#항구의 결측치를 X로 채운다. 
all_df['Embarked'] = all_df['Embarked'].fillna('X')

#이름은 성만 사용한다.
all_df['Name'] = all_df['Name'].map(lambda x: x.split(',')[0])

data_1=all_df.loc[all_df['Pclass']==1].groupby('Ticket')['Ticket'].count().sort_values(ascending=False)
print(data_1)
print()
data_2=all_df.loc[all_df['Pclass']==2].groupby('Ticket')['Ticket'].count().sort_values(ascending=False)
print(data_2)
print()
data_3=all_df.loc[all_df['Pclass']==3].groupby('Ticket')['Ticket'].count().sort_values(ascending=False)
print(data_3)
print()

Ticket
X             36336
PC            16814
C.A.            338
SC/Paris        334
SC/PARIS        260
W./C.           206
S.O.C.          192
S.C./PARIS      191
PP              186
F.C.            183
SC/AH           178
F.C.C.          167
STON/O          163
CA.             161
SOTON/O.Q.      123
A/4             115
A/5.            108
W.E.P.           94
WE/P             92
SOTON/OQ         87
CA               81
STON/O2.         81
A/5              70
C                67
A/4.             66
P/PP             66
SC               59
SOTON/O2         48
A./5.            46
S.O./P.P.        40
A.5.             33
AQ/4             27
A/S              23
SCO/W            19
S.P.             17
SC/A4            16
SW/PP            16
SC/A.3           15
S.O.P.           15
C.A./SOTON       14
A.               14
SO/C             14
S.C./A.4.        14
STON/OQ.         13
W/C              13
LP               11
S.W./PP          11
AQ/3.             8
Fa                7
A4.               6
Name: Ticket, dtype: int64

Ticket
X             31337
A.              997
C.A.            717
SC/PARIS        470
STON/O          387
PC              330
S.O.C.          313
PP              308
SC/AH           284
W./C.           259
SOTON/O.Q.      219
F.C.C.          203
A/5.            200
A/4             152
SC/Paris        135
S.C./PARIS      119
SOTON/O2        112
CA.             107
STON/O2.        106
C               104
F.C.            100
WE/P             92
SOTON/OQ         86
A/5              82
CA               66
W.E.P.           60
A./5.            60
S.O./P.P.        54
P/PP             50
A/4.             46
SCO/W            36
SC               33
A.5.             29
AQ/4             29
LP               25
SC/A.3           20
C.A./SOTON       19
A/S              19
SC/A4            17
Fa               15
S.W./PP          13
SO/C             13
S.C./A.4.        13
STON/OQ.         12
W/C              11
S.P.             10
SW/PP             9
S.O.P.            9
A4.               7
AQ/3.             6
Name: Ticket, dtype: int64

Ticket
X             84781
A.             6420
C.A.           2615
STON/O         1508
A/5.            918
SOTON/O.Q.      719
PP              679
SC/PARIS        642
W./C.           623
PC              595
F.C.C.          541
A/5             420
CA.             368
STON/O2.        363
SC/AH           331
A/4             268
SOTON/O2        264
S.O.C.          231
C               227
SC/Paris        177
S.O./P.P.       177
SOTON/OQ        172
CA              172
W.E.P.          154
F.C.            131
S.C./PARIS      127
A./5.           122
WE/P            121
SC              106
A/4.            104
SCO/W            74
A.5.             72
P/PP             68
SC/A4            67
AQ/4             56
LP               41
Fa               37
STON/OQ.         37
S.W./PP          32
SC/A.3           31
C.A./SOTON       31
SW/PP            30
A/S              28
SO/C             28
AQ/3.            26
S.P.             24
S.C./A.4.        23
S.O.P.           21
W/C              20
A4.              20
Name: Ticket, dtype: int64

인코딩

변수별로 인코딩을 다르게 해준다.

1
2
3

label_cols = ['Name', 'Ticket', 'Sex','Pclass','Embarked']
onehot_cols = [ 'Cabin',]
numerical_cols = [ 'Age', 'SibSp', 'Parch', 'Fare']

#라벨 인코딩 함수. c라는 매개변수를 받아서 맞게 트렌스폼 해준다. 
def label_encoder(c):
    le = LabelEncoder()
    return le.fit_transform(c)


#StandardScaler(): 평균을 제거하고 데이터를 단위 분산으로 조정한다. 
#그러나 이상치가 있다면 평균과 표준편차에 영향을 미쳐 변환된 데이터의 확산은 매우 달라지게 되는 함수
scaler = StandardScaler()

onehot_encoded_df = pd.get_dummies(all_df[onehot_cols])
label_encoded_df = all_df[label_cols].apply(label_encoder)
numerical_df = pd.DataFrame(scaler.fit_transform(all_df[numerical_cols]), columns=numerical_cols)
target_df = all_df[TARGET]

all_df = pd.concat([numerical_df, label_encoded_df,onehot_encoded_df, target_df], axis=1)
#all_df = pd.concat([numerical_df, label_encoded_df, target_df], axis=1)

모델링

1	drop_list=['Survived','Parch']

not pseudo

train = all_df.iloc[:100000, :]#0개~100000개
test = all_df.iloc[100000:, :] #100000개~ 
#iloc은 정수형 인덱싱
test = test.drop('Survived', axis=1) #test에서 종속변수를 드랍한다. 
model_results = pd.DataFrame()
folds = 5

1 2	y= train.loc[:,'Survived'] X= train.drop(drop_list,axis=1)

pycarot

1	caret_train=train.drop('Parch',axis=1)

1
2
3



!pip install pycaret==2.2.3

Collecting pycaret==2.2.3
  Downloading pycaret-2.2.3-py3-none-any.whl (249 kB)
[K     |████████████████████████████████| 249 kB 867 kB/s 
[?25hRequirement already satisfied: scikit-plot in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.3.7)
Requirement already satisfied: xgboost>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (1.3.3)
Requirement already satisfied: scikit-learn==0.23.2 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.23.2)
Requirement already satisfied: textblob in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.15.3)
Requirement already satisfied: wordcloud in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (1.8.1)
Requirement already satisfied: pyLDAvis in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (3.3.1)
Requirement already satisfied: joblib in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (1.0.1)
Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (1.19.5)
Requirement already satisfied: plotly>=4.4.1 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (4.14.3)
Requirement already satisfied: mlxtend in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.18.0)
Requirement already satisfied: IPython in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (7.20.0)
Requirement already satisfied: gensim in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (4.0.0)
Requirement already satisfied: ipywidgets in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (7.6.3)
Requirement already satisfied: imbalanced-learn>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.8.0)
Requirement already satisfied: nltk in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (3.2.4)
Requirement already satisfied: catboost>=0.23.2 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.25)
Requirement already satisfied: pandas-profiling>=2.8.0 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (2.8.0)
Requirement already satisfied: cufflinks>=0.17.0 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.17.3)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (3.4.0)
Requirement already satisfied: umap-learn in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.5.1)
Collecting pyod
  Downloading pyod-0.8.8.tar.gz (102 kB)
[K     |████████████████████████████████| 102 kB 4.0 MB/s 
[?25hRequirement already satisfied: kmodes>=0.10.1 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.11.0)
Collecting mlflow
  Downloading mlflow-1.16.0-py3-none-any.whl (14.2 MB)
[K     |████████████████████████████████| 14.2 MB 7.2 MB/s 
[?25hRequirement already satisfied: seaborn in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (0.11.1)
Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (1.2.2)
Requirement already satisfied: yellowbrick>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (1.3.post1)
Requirement already satisfied: spacy in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (2.3.5)
Requirement already satisfied: lightgbm>=2.3.1 in /opt/conda/lib/python3.7/site-packages (from pycaret==2.2.3) (3.1.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn==0.23.2->pycaret==2.2.3) (2.1.0)
Requirement already satisfied: scipy>=0.19.1 in /opt/conda/lib/python3.7/site-packages (from scikit-learn==0.23.2->pycaret==2.2.3) (1.5.4)
Requirement already satisfied: graphviz in /opt/conda/lib/python3.7/site-packages (from catboost>=0.23.2->pycaret==2.2.3) (0.8.4)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from catboost>=0.23.2->pycaret==2.2.3) (1.15.0)
Requirement already satisfied: setuptools>=34.4.1 in /opt/conda/lib/python3.7/site-packages (from cufflinks>=0.17.0->pycaret==2.2.3) (49.6.0.post20210108)
Requirement already satisfied: colorlover>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from cufflinks>=0.17.0->pycaret==2.2.3) (0.3.0)
Collecting imbalanced-learn>=0.7.0
  Downloading imbalanced_learn-0.7.0-py3-none-any.whl (167 kB)
[K     |████████████████████████████████| 167 kB 14.9 MB/s 
[?25hRequirement already satisfied: pickleshare in /opt/conda/lib/python3.7/site-packages (from IPython->pycaret==2.2.3) (0.7.5)
Requirement already satisfied: pygments in /opt/conda/lib/python3.7/site-packages (from IPython->pycaret==2.2.3) (2.8.0)
Requirement already satisfied: decorator in /opt/conda/lib/python3.7/site-packages (from IPython->pycaret==2.2.3) (4.4.2)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from IPython->pycaret==2.2.3) (3.0.16)
Requirement already satisfied: backcall in /opt/conda/lib/python3.7/site-packages (from IPython->pycaret==2.2.3) (0.2.0)
Requirement already satisfied: traitlets>=4.2 in /opt/conda/lib/python3.7/site-packages (from IPython->pycaret==2.2.3) (5.0.5)
Requirement already satisfied: pexpect>4.3 in /opt/conda/lib/python3.7/site-packages (from IPython->pycaret==2.2.3) (4.8.0)
Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.7/site-packages (from IPython->pycaret==2.2.3) (0.17.2)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from ipywidgets->pycaret==2.2.3) (1.0.0)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /opt/conda/lib/python3.7/site-packages (from ipywidgets->pycaret==2.2.3) (3.5.1)
Requirement already satisfied: ipykernel>=4.5.1 in /opt/conda/lib/python3.7/site-packages (from ipywidgets->pycaret==2.2.3) (5.1.1)
Requirement already satisfied: nbformat>=4.2.0 in /opt/conda/lib/python3.7/site-packages (from ipywidgets->pycaret==2.2.3) (5.1.2)
Requirement already satisfied: jupyter-client in /opt/conda/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets->pycaret==2.2.3) (6.1.11)
Requirement already satisfied: tornado>=4.2 in /opt/conda/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets->pycaret==2.2.3) (5.0.2)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from jedi>=0.16->IPython->pycaret==2.2.3) (0.7.1)
Requirement already satisfied: wheel in /opt/conda/lib/python3.7/site-packages (from lightgbm>=2.3.1->pycaret==2.2.3) (0.36.2)
Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets->pycaret==2.2.3) (0.2.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets->pycaret==2.2.3) (3.2.0)
Requirement already satisfied: jupyter-core in /opt/conda/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets->pycaret==2.2.3) (4.7.1)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.2.3) (0.17.3)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.2.3) (3.4.0)
Requirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.2.3) (20.3.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas->pycaret==2.2.3) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas->pycaret==2.2.3) (2021.1)
Requirement already satisfied: visions[type_image_path]==0.4.4 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (0.4.4)
Requirement already satisfied: astropy>=4.0 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (4.2)
Requirement already satisfied: tqdm>=4.43.0 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (4.56.2)
Requirement already satisfied: confuse>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (1.4.0)
Requirement already satisfied: htmlmin>=0.1.12 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (0.1.12)
Requirement already satisfied: jinja2>=2.11.1 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (2.11.3)
Requirement already satisfied: missingno>=0.4.2 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (0.4.2)
Requirement already satisfied: requests>=2.23.0 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (2.25.1)
Requirement already satisfied: phik>=0.9.10 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (0.10.0)
Requirement already satisfied: tangled-up-in-unicode>=0.0.6 in /opt/conda/lib/python3.7/site-packages (from pandas-profiling>=2.8.0->pycaret==2.2.3) (0.0.6)
Requirement already satisfied: networkx>=2.4 in /opt/conda/lib/python3.7/site-packages (from visions[type_image_path]==0.4.4->pandas-profiling>=2.8.0->pycaret==2.2.3) (2.5)
Requirement already satisfied: imagehash in /opt/conda/lib/python3.7/site-packages (from visions[type_image_path]==0.4.4->pandas-profiling>=2.8.0->pycaret==2.2.3) (4.2.0)
Requirement already satisfied: Pillow in /opt/conda/lib/python3.7/site-packages (from visions[type_image_path]==0.4.4->pandas-profiling>=2.8.0->pycaret==2.2.3) (7.2.0)
Requirement already satisfied: pyerfa in /opt/conda/lib/python3.7/site-packages (from astropy>=4.0->pandas-profiling>=2.8.0->pycaret==2.2.3) (1.7.1.1)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.7/site-packages (from confuse>=1.0.0->pandas-profiling>=2.8.0->pycaret==2.2.3) (5.3.1)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.7/site-packages (from jinja2>=2.11.1->pandas-profiling>=2.8.0->pycaret==2.2.3) (1.1.1)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->pycaret==2.2.3) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->pycaret==2.2.3) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->pycaret==2.2.3) (1.3.1)
Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.7/site-packages (from pexpect>4.3->IPython->pycaret==2.2.3) (0.7.0)
Requirement already satisfied: numba>=0.38.1 in /opt/conda/lib/python3.7/site-packages (from phik>=0.9.10->pandas-profiling>=2.8.0->pycaret==2.2.3) (0.52.0)
Requirement already satisfied: llvmlite<0.36,>=0.35.0 in /opt/conda/lib/python3.7/site-packages (from numba>=0.38.1->phik>=0.9.10->pandas-profiling>=2.8.0->pycaret==2.2.3) (0.35.0)
Requirement already satisfied: retrying>=1.3.3 in /opt/conda/lib/python3.7/site-packages (from plotly>=4.4.1->pycaret==2.2.3) (1.3.3)
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->IPython->pycaret==2.2.3) (0.2.5)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests>=2.23.0->pandas-profiling>=2.8.0->pycaret==2.2.3) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests>=2.23.0->pandas-profiling>=2.8.0->pycaret==2.2.3) (2020.12.5)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests>=2.23.0->pandas-profiling>=2.8.0->pycaret==2.2.3) (1.26.3)
Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests>=2.23.0->pandas-profiling>=2.8.0->pycaret==2.2.3) (3.0.4)
Requirement already satisfied: notebook>=4.4.1 in /opt/conda/lib/python3.7/site-packages (from widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (5.5.0)
Requirement already satisfied: pyzmq>=17 in /opt/conda/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (22.0.3)
Requirement already satisfied: terminado>=0.8.1 in /opt/conda/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (0.9.2)
Requirement already satisfied: nbconvert in /opt/conda/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (6.0.7)
Requirement already satisfied: Send2Trash in /opt/conda/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (1.5.0)
Requirement already satisfied: smart-open>=1.8.1 in /opt/conda/lib/python3.7/site-packages (from gensim->pycaret==2.2.3) (5.0.0)
Requirement already satisfied: PyWavelets in /opt/conda/lib/python3.7/site-packages (from imagehash->visions[type_image_path]==0.4.4->pandas-profiling>=2.8.0->pycaret==2.2.3) (1.1.1)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.2.3) (3.4.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.2.3) (3.7.4.3)
Requirement already satisfied: sqlparse>=0.3.1 in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (0.4.1)
Collecting databricks-cli>=0.8.7
  Downloading databricks-cli-0.14.3.tar.gz (54 kB)
[K     |████████████████████████████████| 54 kB 1.7 MB/s 
[?25hRequirement already satisfied: sqlalchemy in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (1.3.23)
Requirement already satisfied: click>=7.0 in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (7.1.2)
Requirement already satisfied: cloudpickle in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (1.6.0)
Requirement already satisfied: entrypoints in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (0.3)
Requirement already satisfied: gitpython>=2.1.0 in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (3.1.13)
Collecting querystring-parser
  Downloading querystring_parser-1.2.4-py2.py3-none-any.whl (7.9 kB)
Collecting gunicorn
  Downloading gunicorn-20.1.0.tar.gz (370 kB)
[K     |████████████████████████████████| 370 kB 14.5 MB/s 
[?25hRequirement already satisfied: Flask in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (1.1.2)
Collecting alembic<=1.4.1
  Downloading alembic-1.4.1.tar.gz (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 15.3 MB/s 
[?25hRequirement already satisfied: protobuf>=3.6.0 in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (3.15.6)
Requirement already satisfied: docker>=4.0.0 in /opt/conda/lib/python3.7/site-packages (from mlflow->pycaret==2.2.3) (4.4.1)
Collecting prometheus-flask-exporter
  Downloading prometheus_flask_exporter-0.18.1.tar.gz (21 kB)
Requirement already satisfied: Mako in /opt/conda/lib/python3.7/site-packages (from alembic<=1.4.1->mlflow->pycaret==2.2.3) (1.1.4)
Requirement already satisfied: python-editor>=0.3 in /opt/conda/lib/python3.7/site-packages (from alembic<=1.4.1->mlflow->pycaret==2.2.3) (1.0.4)
Requirement already satisfied: tabulate>=0.7.7 in /opt/conda/lib/python3.7/site-packages (from databricks-cli>=0.8.7->mlflow->pycaret==2.2.3) (0.8.9)
Requirement already satisfied: websocket-client>=0.32.0 in /opt/conda/lib/python3.7/site-packages (from docker>=4.0.0->mlflow->pycaret==2.2.3) (0.57.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in /opt/conda/lib/python3.7/site-packages (from gitpython>=2.1.0->mlflow->pycaret==2.2.3) (4.0.5)
Requirement already satisfied: smmap<4,>=3.0.1 in /opt/conda/lib/python3.7/site-packages (from gitdb<5,>=4.0.1->gitpython>=2.1.0->mlflow->pycaret==2.2.3) (3.0.5)
Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/lib/python3.7/site-packages (from Flask->mlflow->pycaret==2.2.3) (1.0.1)
Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/lib/python3.7/site-packages (from Flask->mlflow->pycaret==2.2.3) (1.1.0)
Requirement already satisfied: defusedxml in /opt/conda/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (0.6.0)
Requirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (1.4.2)
Requirement already satisfied: bleach in /opt/conda/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (3.3.0)
Requirement already satisfied: jupyterlab-pygments in /opt/conda/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (0.1.2)
Requirement already satisfied: mistune<2,>=0.8.1 in /opt/conda/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (0.8.4)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /opt/conda/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (0.5.2)
Requirement already satisfied: testpath in /opt/conda/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (0.4.4)
Requirement already satisfied: nest-asyncio in /opt/conda/lib/python3.7/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (1.4.3)
Requirement already satisfied: async-generator in /opt/conda/lib/python3.7/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (1.10)
Requirement already satisfied: webencodings in /opt/conda/lib/python3.7/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (0.5.1)
Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.2.3) (20.9)
Requirement already satisfied: prometheus_client in /opt/conda/lib/python3.7/site-packages (from prometheus-flask-exporter->mlflow->pycaret==2.2.3) (0.9.0)
Requirement already satisfied: future in /opt/conda/lib/python3.7/site-packages (from pyLDAvis->pycaret==2.2.3) (0.18.2)
Requirement already satisfied: funcy in /opt/conda/lib/python3.7/site-packages (from pyLDAvis->pycaret==2.2.3) (1.15)
Requirement already satisfied: numexpr in /opt/conda/lib/python3.7/site-packages (from pyLDAvis->pycaret==2.2.3) (2.7.3)
Requirement already satisfied: sklearn in /opt/conda/lib/python3.7/site-packages (from pyLDAvis->pycaret==2.2.3) (0.0)
Collecting pyLDAvis
  Downloading pyLDAvis-3.3.0.tar.gz (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 18.5 MB/s 
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25h  Downloading pyLDAvis-3.2.2.tar.gz (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 18.5 MB/s 
[?25hRequirement already satisfied: statsmodels in /opt/conda/lib/python3.7/site-packages (from pyod->pycaret==2.2.3) (0.12.2)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (1.0.0)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (1.1.3)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (2.0.5)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (0.7.4)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (1.0.5)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (0.8.2)
Requirement already satisfied: thinc<7.5.0,>=7.4.1 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (7.4.5)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (1.0.5)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from spacy->pycaret==2.2.3) (3.0.5)
Requirement already satisfied: patsy>=0.5 in /opt/conda/lib/python3.7/site-packages (from statsmodels->pyod->pycaret==2.2.3) (0.5.1)
Requirement already satisfied: pynndescent>=0.5 in /opt/conda/lib/python3.7/site-packages (from umap-learn->pycaret==2.2.3) (0.5.2)
Building wheels for collected packages: alembic, databricks-cli, gunicorn, prometheus-flask-exporter, pyLDAvis, pyod
  Building wheel for alembic (setup.py) ... [?25ldone
[?25h  Created wheel for alembic: filename=alembic-1.4.1-py2.py3-none-any.whl size=158155 sha256=d8f392ff192a5f199c46190c11c01ebd9ee89c41e962f8c53ca200a8f722820d
  Stored in directory: /root/.cache/pip/wheels/be/5d/0a/9e13f53f4f5dfb67cd8d245bb7cdffe12f135846f491a283e3
  Building wheel for databricks-cli (setup.py) ... [?25ldone
[?25h  Created wheel for databricks-cli: filename=databricks_cli-0.14.3-py3-none-any.whl size=100555 sha256=9a7dcf0f0828c4d66f92d9a9aa6a9648bf13c7e106e6f2a319b7f4e0f8b869d3
  Stored in directory: /root/.cache/pip/wheels/3b/60/14/6930445b08959fbdf4e3029bac7e1f2cccb2e94df8afa00b29
  Building wheel for gunicorn (setup.py) ... [?25ldone
[?25h  Created wheel for gunicorn: filename=gunicorn-20.1.0-py3-none-any.whl size=78917 sha256=27823621640949d8ef7678db5b24d401299123365598476d18f7734668b6a362
  Stored in directory: /root/.cache/pip/wheels/48/64/50/67e9a3524590218acb6a0c0f94038c0d60815866c52a667d57
  Building wheel for prometheus-flask-exporter (setup.py) ... [?25ldone
[?25h  Created wheel for prometheus-flask-exporter: filename=prometheus_flask_exporter-0.18.1-py3-none-any.whl size=17158 sha256=b30adb8b36411d5ac696a62d1f72fd5f3f0a6071d76737d1901d929229e1120e
  Stored in directory: /root/.cache/pip/wheels/c4/b6/b5/e76659f3b2a3a226565e27f0a7eb7a3ac93c3f4d68acfbe617
  Building wheel for pyLDAvis (setup.py) ... [?25ldone
[?25h  Created wheel for pyLDAvis: filename=pyLDAvis-3.2.2-py2.py3-none-any.whl size=135593 sha256=f2c6a61232597db4377bb34194186a0c15694c4ec4bf0451684c68495c1ac812
  Stored in directory: /root/.cache/pip/wheels/f8/b1/9b/560ac1931796b7303f7b517b949d2d31a4fbc512aad3b9f284
  Building wheel for pyod (setup.py) ... [?25ldone
[?25h  Created wheel for pyod: filename=pyod-0.8.8-py3-none-any.whl size=116965 sha256=4bff0abc63f27aaf26ccb9c6f4ee9af5c336d94730dd88558cd8bc79aadc58af
  Stored in directory: /root/.cache/pip/wheels/77/59/4c/18e7ef198e2c737674b0bd8b6fa0fb1163c83ecc4e622fbda4
Successfully built alembic databricks-cli gunicorn prometheus-flask-exporter pyLDAvis pyod
Installing collected packages: querystring-parser, prometheus-flask-exporter, gunicorn, databricks-cli, alembic, pyod, pyLDAvis, mlflow, imbalanced-learn, pycaret
  Attempting uninstall: alembic
    Found existing installation: alembic 1.5.8
    Uninstalling alembic-1.5.8:
      Successfully uninstalled alembic-1.5.8
  Attempting uninstall: pyLDAvis
    Found existing installation: pyLDAvis 3.3.1
    Uninstalling pyLDAvis-3.3.1:
      Successfully uninstalled pyLDAvis-3.3.1
  Attempting uninstall: imbalanced-learn
    Found existing installation: imbalanced-learn 0.8.0
    Uninstalling imbalanced-learn-0.8.0:
      Successfully uninstalled imbalanced-learn-0.8.0
Successfully installed alembic-1.4.1 databricks-cli-0.14.3 gunicorn-20.1.0 imbalanced-learn-0.7.0 mlflow-1.16.0 prometheus-flask-exporter-0.18.1 pyLDAvis-3.2.2 pycaret-2.2.3 pyod-0.8.8 querystring-parser-1.2.4

from pycaret.utils import version
import sklearn
print("pycaret version:", version())
print("sklearn version:", sklearn.__version__)

pycaret version: 2.2.3
sklearn version: 0.23.2

import pycaret.classification 
# from pycaret.regression import *
# reg1 = setup(data = train,target = 'Survived')
from pycaret.classification import *

caret_train.loc[:,'Embarked'].unique()
caret_train.loc[:,'Pclass'].unique()
caret_train.loc[:,'Sex'].unique()

array([1, 0])

1	category_caret={'Sex':['0','1'],'Pclass':['0','1','2'], 'Embarked':['0','1','2','3']}

setup(data = caret_train, 
      target = 'Survived',
      ordinal_features= category_caret,
      #numeric_imputation = 'Age','SibSp','Name','Ticket','Fare',
      fold=5,
      silent = True,
      session_id=1,
      #data_split_shuffle=True
      fold_shuffle=True

     )

            <tr>
                    <th id="T_4e1bb_level0_row0" class="row_heading level0 row0" >0</th>
                    <td id="T_4e1bb_row0_col0" class="data row0 col0" >session_id</td>
                    <td id="T_4e1bb_row0_col1" class="data row0 col1" >1</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row1" class="row_heading level0 row1" >1</th>
                    <td id="T_4e1bb_row1_col0" class="data row1 col0" >Target</td>
                    <td id="T_4e1bb_row1_col1" class="data row1 col1" >Survived</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row2" class="row_heading level0 row2" >2</th>
                    <td id="T_4e1bb_row2_col0" class="data row2 col0" >Target Type</td>
                    <td id="T_4e1bb_row2_col1" class="data row2 col1" >Binary</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row3" class="row_heading level0 row3" >3</th>
                    <td id="T_4e1bb_row3_col0" class="data row3 col0" >Label Encoded</td>
                    <td id="T_4e1bb_row3_col1" class="data row3 col1" >0.0: 0, 1.0: 1</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row4" class="row_heading level0 row4" >4</th>
                    <td id="T_4e1bb_row4_col0" class="data row4 col0" >Original Data</td>
                    <td id="T_4e1bb_row4_col1" class="data row4 col1" >(100000, 18)</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row5" class="row_heading level0 row5" >5</th>
                    <td id="T_4e1bb_row5_col0" class="data row5 col0" >Missing Values</td>
                    <td id="T_4e1bb_row5_col1" class="data row5 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row6" class="row_heading level0 row6" >6</th>
                    <td id="T_4e1bb_row6_col0" class="data row6 col0" >Numeric Features</td>
                    <td id="T_4e1bb_row6_col1" class="data row6 col1" >14</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row7" class="row_heading level0 row7" >7</th>
                    <td id="T_4e1bb_row7_col0" class="data row7 col0" >Categorical Features</td>
                    <td id="T_4e1bb_row7_col1" class="data row7 col1" >3</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row8" class="row_heading level0 row8" >8</th>
                    <td id="T_4e1bb_row8_col0" class="data row8 col0" >Ordinal Features</td>
                    <td id="T_4e1bb_row8_col1" class="data row8 col1" >True</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row9" class="row_heading level0 row9" >9</th>
                    <td id="T_4e1bb_row9_col0" class="data row9 col0" >High Cardinality Features</td>
                    <td id="T_4e1bb_row9_col1" class="data row9 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row10" class="row_heading level0 row10" >10</th>
                    <td id="T_4e1bb_row10_col0" class="data row10 col0" >High Cardinality Method</td>
                    <td id="T_4e1bb_row10_col1" class="data row10 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row11" class="row_heading level0 row11" >11</th>
                    <td id="T_4e1bb_row11_col0" class="data row11 col0" >Transformed Train Set</td>
                    <td id="T_4e1bb_row11_col1" class="data row11 col1" >(69999, 17)</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row12" class="row_heading level0 row12" >12</th>
                    <td id="T_4e1bb_row12_col0" class="data row12 col0" >Transformed Test Set</td>
                    <td id="T_4e1bb_row12_col1" class="data row12 col1" >(30001, 17)</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row13" class="row_heading level0 row13" >13</th>
                    <td id="T_4e1bb_row13_col0" class="data row13 col0" >Shuffle Train-Test</td>
                    <td id="T_4e1bb_row13_col1" class="data row13 col1" >True</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row14" class="row_heading level0 row14" >14</th>
                    <td id="T_4e1bb_row14_col0" class="data row14 col0" >Stratify Train-Test</td>
                    <td id="T_4e1bb_row14_col1" class="data row14 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row15" class="row_heading level0 row15" >15</th>
                    <td id="T_4e1bb_row15_col0" class="data row15 col0" >Fold Generator</td>
                    <td id="T_4e1bb_row15_col1" class="data row15 col1" >StratifiedKFold</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row16" class="row_heading level0 row16" >16</th>
                    <td id="T_4e1bb_row16_col0" class="data row16 col0" >Fold Number</td>
                    <td id="T_4e1bb_row16_col1" class="data row16 col1" >5</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row17" class="row_heading level0 row17" >17</th>
                    <td id="T_4e1bb_row17_col0" class="data row17 col0" >CPU Jobs</td>
                    <td id="T_4e1bb_row17_col1" class="data row17 col1" >-1</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row18" class="row_heading level0 row18" >18</th>
                    <td id="T_4e1bb_row18_col0" class="data row18 col0" >Use GPU</td>
                    <td id="T_4e1bb_row18_col1" class="data row18 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row19" class="row_heading level0 row19" >19</th>
                    <td id="T_4e1bb_row19_col0" class="data row19 col0" >Log Experiment</td>
                    <td id="T_4e1bb_row19_col1" class="data row19 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row20" class="row_heading level0 row20" >20</th>
                    <td id="T_4e1bb_row20_col0" class="data row20 col0" >Experiment Name</td>
                    <td id="T_4e1bb_row20_col1" class="data row20 col1" >clf-default-name</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row21" class="row_heading level0 row21" >21</th>
                    <td id="T_4e1bb_row21_col0" class="data row21 col0" >USI</td>
                    <td id="T_4e1bb_row21_col1" class="data row21 col1" >9220</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row22" class="row_heading level0 row22" >22</th>
                    <td id="T_4e1bb_row22_col0" class="data row22 col0" >Imputation Type</td>
                    <td id="T_4e1bb_row22_col1" class="data row22 col1" >simple</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row23" class="row_heading level0 row23" >23</th>
                    <td id="T_4e1bb_row23_col0" class="data row23 col0" >Iterative Imputation Iteration</td>
                    <td id="T_4e1bb_row23_col1" class="data row23 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row24" class="row_heading level0 row24" >24</th>
                    <td id="T_4e1bb_row24_col0" class="data row24 col0" >Numeric Imputer</td>
                    <td id="T_4e1bb_row24_col1" class="data row24 col1" >mean</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row25" class="row_heading level0 row25" >25</th>
                    <td id="T_4e1bb_row25_col0" class="data row25 col0" >Iterative Imputation Numeric Model</td>
                    <td id="T_4e1bb_row25_col1" class="data row25 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row26" class="row_heading level0 row26" >26</th>
                    <td id="T_4e1bb_row26_col0" class="data row26 col0" >Categorical Imputer</td>
                    <td id="T_4e1bb_row26_col1" class="data row26 col1" >constant</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row27" class="row_heading level0 row27" >27</th>
                    <td id="T_4e1bb_row27_col0" class="data row27 col0" >Iterative Imputation Categorical Model</td>
                    <td id="T_4e1bb_row27_col1" class="data row27 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row28" class="row_heading level0 row28" >28</th>
                    <td id="T_4e1bb_row28_col0" class="data row28 col0" >Unknown Categoricals Handling</td>
                    <td id="T_4e1bb_row28_col1" class="data row28 col1" >least_frequent</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row29" class="row_heading level0 row29" >29</th>
                    <td id="T_4e1bb_row29_col0" class="data row29 col0" >Normalize</td>
                    <td id="T_4e1bb_row29_col1" class="data row29 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row30" class="row_heading level0 row30" >30</th>
                    <td id="T_4e1bb_row30_col0" class="data row30 col0" >Normalize Method</td>
                    <td id="T_4e1bb_row30_col1" class="data row30 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row31" class="row_heading level0 row31" >31</th>
                    <td id="T_4e1bb_row31_col0" class="data row31 col0" >Transformation</td>
                    <td id="T_4e1bb_row31_col1" class="data row31 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row32" class="row_heading level0 row32" >32</th>
                    <td id="T_4e1bb_row32_col0" class="data row32 col0" >Transformation Method</td>
                    <td id="T_4e1bb_row32_col1" class="data row32 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row33" class="row_heading level0 row33" >33</th>
                    <td id="T_4e1bb_row33_col0" class="data row33 col0" >PCA</td>
                    <td id="T_4e1bb_row33_col1" class="data row33 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row34" class="row_heading level0 row34" >34</th>
                    <td id="T_4e1bb_row34_col0" class="data row34 col0" >PCA Method</td>
                    <td id="T_4e1bb_row34_col1" class="data row34 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row35" class="row_heading level0 row35" >35</th>
                    <td id="T_4e1bb_row35_col0" class="data row35 col0" >PCA Components</td>
                    <td id="T_4e1bb_row35_col1" class="data row35 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row36" class="row_heading level0 row36" >36</th>
                    <td id="T_4e1bb_row36_col0" class="data row36 col0" >Ignore Low Variance</td>
                    <td id="T_4e1bb_row36_col1" class="data row36 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row37" class="row_heading level0 row37" >37</th>
                    <td id="T_4e1bb_row37_col0" class="data row37 col0" >Combine Rare Levels</td>
                    <td id="T_4e1bb_row37_col1" class="data row37 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row38" class="row_heading level0 row38" >38</th>
                    <td id="T_4e1bb_row38_col0" class="data row38 col0" >Rare Level Threshold</td>
                    <td id="T_4e1bb_row38_col1" class="data row38 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row39" class="row_heading level0 row39" >39</th>
                    <td id="T_4e1bb_row39_col0" class="data row39 col0" >Numeric Binning</td>
                    <td id="T_4e1bb_row39_col1" class="data row39 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row40" class="row_heading level0 row40" >40</th>
                    <td id="T_4e1bb_row40_col0" class="data row40 col0" >Remove Outliers</td>
                    <td id="T_4e1bb_row40_col1" class="data row40 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row41" class="row_heading level0 row41" >41</th>
                    <td id="T_4e1bb_row41_col0" class="data row41 col0" >Outliers Threshold</td>
                    <td id="T_4e1bb_row41_col1" class="data row41 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row42" class="row_heading level0 row42" >42</th>
                    <td id="T_4e1bb_row42_col0" class="data row42 col0" >Remove Multicollinearity</td>
                    <td id="T_4e1bb_row42_col1" class="data row42 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row43" class="row_heading level0 row43" >43</th>
                    <td id="T_4e1bb_row43_col0" class="data row43 col0" >Multicollinearity Threshold</td>
                    <td id="T_4e1bb_row43_col1" class="data row43 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row44" class="row_heading level0 row44" >44</th>
                    <td id="T_4e1bb_row44_col0" class="data row44 col0" >Clustering</td>
                    <td id="T_4e1bb_row44_col1" class="data row44 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row45" class="row_heading level0 row45" >45</th>
                    <td id="T_4e1bb_row45_col0" class="data row45 col0" >Clustering Iteration</td>
                    <td id="T_4e1bb_row45_col1" class="data row45 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row46" class="row_heading level0 row46" >46</th>
                    <td id="T_4e1bb_row46_col0" class="data row46 col0" >Polynomial Features</td>
                    <td id="T_4e1bb_row46_col1" class="data row46 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row47" class="row_heading level0 row47" >47</th>
                    <td id="T_4e1bb_row47_col0" class="data row47 col0" >Polynomial Degree</td>
                    <td id="T_4e1bb_row47_col1" class="data row47 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row48" class="row_heading level0 row48" >48</th>
                    <td id="T_4e1bb_row48_col0" class="data row48 col0" >Trignometry Features</td>
                    <td id="T_4e1bb_row48_col1" class="data row48 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row49" class="row_heading level0 row49" >49</th>
                    <td id="T_4e1bb_row49_col0" class="data row49 col0" >Polynomial Threshold</td>
                    <td id="T_4e1bb_row49_col1" class="data row49 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row50" class="row_heading level0 row50" >50</th>
                    <td id="T_4e1bb_row50_col0" class="data row50 col0" >Group Features</td>
                    <td id="T_4e1bb_row50_col1" class="data row50 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row51" class="row_heading level0 row51" >51</th>
                    <td id="T_4e1bb_row51_col0" class="data row51 col0" >Feature Selection</td>
                    <td id="T_4e1bb_row51_col1" class="data row51 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row52" class="row_heading level0 row52" >52</th>
                    <td id="T_4e1bb_row52_col0" class="data row52 col0" >Features Selection Threshold</td>
                    <td id="T_4e1bb_row52_col1" class="data row52 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row53" class="row_heading level0 row53" >53</th>
                    <td id="T_4e1bb_row53_col0" class="data row53 col0" >Feature Interaction</td>
                    <td id="T_4e1bb_row53_col1" class="data row53 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row54" class="row_heading level0 row54" >54</th>
                    <td id="T_4e1bb_row54_col0" class="data row54 col0" >Feature Ratio</td>
                    <td id="T_4e1bb_row54_col1" class="data row54 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row55" class="row_heading level0 row55" >55</th>
                    <td id="T_4e1bb_row55_col0" class="data row55 col0" >Interaction Threshold</td>
                    <td id="T_4e1bb_row55_col1" class="data row55 col1" >None</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row56" class="row_heading level0 row56" >56</th>
                    <td id="T_4e1bb_row56_col0" class="data row56 col0" >Fix Imbalance</td>
                    <td id="T_4e1bb_row56_col1" class="data row56 col1" >False</td>
        </tr>
        <tr>
                    <th id="T_4e1bb_level0_row57" class="row_heading level0 row57" >57</th>
                    <td id="T_4e1bb_row57_col0" class="data row57 col0" >Fix Imbalance Method</td>
                    <td id="T_4e1bb_row57_col1" class="data row57 col1" >SMOTE</td>
        </tr>
</tbody></table>





(1,
 {'lr': <pycaret.containers.models.classification.LogisticRegressionClassifierContainer at 0x7fdbc0f458d0>,
  'knn': <pycaret.containers.models.classification.KNeighborsClassifierContainer at 0x7fdbc0f45950>,
  'nb': <pycaret.containers.models.classification.GaussianNBClassifierContainer at 0x7fdbc0f45750>,
  'dt': <pycaret.containers.models.classification.DecisionTreeClassifierContainer at 0x7fdbc0f45790>,
  'svm': <pycaret.containers.models.classification.SGDClassifierContainer at 0x7fdbd7c5a2d0>,
  'rbfsvm': <pycaret.containers.models.classification.SVCClassifierContainer at 0x7fdbc188d650>,
  'gpc': <pycaret.containers.models.classification.GaussianProcessClassifierContainer at 0x7fdbc188d8d0>,
  'mlp': <pycaret.containers.models.classification.MLPClassifierContainer at 0x7fdbc188d490>,
  'ridge': <pycaret.containers.models.classification.RidgeClassifierContainer at 0x7fdbc188d810>,
  'rf': <pycaret.containers.models.classification.RandomForestClassifierContainer at 0x7fdbc0f45c90>,
  'qda': <pycaret.containers.models.classification.QuadraticDiscriminantAnalysisContainer at 0x7fdbc0857350>,
  'ada': <pycaret.containers.models.classification.AdaBoostClassifierContainer at 0x7fdbc0857410>,
  'gbc': <pycaret.containers.models.classification.GradientBoostingClassifierContainer at 0x7fdbc076e2d0>,
  'lda': <pycaret.containers.models.classification.LinearDiscriminantAnalysisContainer at 0x7fdbc076e250>,
  'et': <pycaret.containers.models.classification.ExtraTreesClassifierContainer at 0x7fdbc07621d0>,
  'xgboost': <pycaret.containers.models.classification.XGBClassifierContainer at 0x7fdbc07620d0>,
  'lightgbm': <pycaret.containers.models.classification.LGBMClassifierContainer at 0x7fdbc0762850>,
  'catboost': <pycaret.containers.models.classification.CatBoostClassifierContainer at 0x7fdbc0762890>},
 False,
 False,
 'clf-default-name',
 True,
 [<pandas.io.formats.style.Styler at 0x7fdbc0f33cd0>],
 {'acc': <pycaret.containers.metrics.classification.AccuracyMetricContainer at 0x7fdbc0f33590>,
  'auc': <pycaret.containers.metrics.classification.ROCAUCMetricContainer at 0x7fdbc0f335d0>,
  'recall': <pycaret.containers.metrics.classification.RecallMetricContainer at 0x7fdbc0f33650>,
  'precision': <pycaret.containers.metrics.classification.PrecisionMetricContainer at 0x7fdbc0f337d0>,
  'f1': <pycaret.containers.metrics.classification.F1MetricContainer at 0x7fdbc0f33950>,
  'kappa': <pycaret.containers.metrics.classification.KappaMetricContainer at 0x7fdbc0f33b10>,
  'mcc': <pycaret.containers.metrics.classification.MCCMetricContainer at 0x7fdbc0f33b90>},
 StratifiedKFold(n_splits=5, random_state=1, shuffle=True),
 {'USI',
  'X',
  'X_test',
  'X_train',
  '_all_metrics',
  '_all_models',
  '_all_models_internal',
  '_available_plots',
  '_gpu_n_jobs_param',
  '_internal_pipeline',
  '_ml_usecase',
  'create_model_container',
  'data_before_preprocess',
  'display_container',
  'exp_name_log',
  'experiment__',
  'fix_imbalance_method_param',
  'fix_imbalance_param',
  'fold_generator',
  'fold_groups_param',
  'fold_param',
  'fold_shuffle_param',
  'gpu_param',
  'html_param',
  'imputation_classifier',
  'imputation_regressor',
  'iterative_imputation_iters_param',
  'log_plots_param',
  'logging_param',
  'master_model_container',
  'n_jobs_param',
  'prep_pipe',
  'pycaret_globals',
  'seed',
  'stratify_param',
  'target_param',
  'transform_target_method_param',
  'transform_target_param',
  'y',
  'y_test',
  'y_train'},
 'Survived',
 True,
 -1,
             Age     SibSp      Fare     Name  Ticket  Sex  Pclass  Embarked  \
 62017 -1.786066 -0.539572 -0.425227  23058.0    38.0  0.0     1.0       2.0   
 5005   0.274926 -0.539572  0.041264   9979.0    49.0  1.0     1.0       2.0   
 56849 -1.361744 -0.539572  0.215883  14486.0    49.0  0.0     1.0       2.0   
 42434  1.426657 -0.539572  1.209948   8350.0    21.0  0.0     0.0       0.0   
 54712 -1.725448 -0.539572 -0.906172  18642.0     0.0  0.0     2.0       2.0   
 ...         ...       ...       ...      ...     ...  ...     ...       ...   
 50057  1.608510 -0.539572  1.938114   4224.0    49.0  1.0     0.0       2.0   
 98047 -0.149396  0.680848 -0.914255  21294.0    49.0  1.0     2.0       2.0   
 5192  -0.634335 -0.539572 -1.037225  24329.0    49.0  1.0     1.0       2.0   
 77708  0.820483 -0.539572  2.156857   5150.0    49.0  0.0     0.0       0.0   
 98539  0.396161 -0.539572  2.729733  25222.0    49.0  1.0     0.0       1.0   
 
        Cabin_A  Cabin_B  Cabin_C  Cabin_D  Cabin_E  Cabin_F  Cabin_G  Cabin_T  \
 62017        0        0        0        0        0        0        0        0   
 5005         0        0        0        0        0        0        0        0   
 56849        0        0        0        0        0        0        0        0   
 42434        0        0        1        0        0        0        0        0   
 54712        0        0        0        0        0        0        0        0   
 ...        ...      ...      ...      ...      ...      ...      ...      ...   
 50057        1        0        0        0        0        0        0        0   
 98047        0        0        0        0        0        0        0        0   
 5192         0        0        0        0        0        0        0        0   
 77708        0        1        0        0        0        0        0        0   
 98539        1        0        0        0        0        0        0        0   
 
        Cabin_X  
 62017        1  
 5005         1  
 56849        1  
 42434        0  
 54712        1  
 ...        ...  
 50057        0  
 98047        1  
 5192         1  
 77708        0  
 98539        0  
 
 [69999 rows x 17 columns],
 Pipeline(memory=None, steps=[('empty_step', 'passthrough')], verbose=False),
 'lightgbm',
 False,
 Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=False, features_todrop=[],
                                       id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[], target='Survived',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric_st...
                 ('scaling', 'passthrough'), ('P_transform', 'passthrough'),
                 ('binn', 'passthrough'), ('rem_outliers', 'passthrough'),
                 ('cluster_all', 'passthrough'),
                 ('dummy', Dummify(target='Survived')),
                 ('fix_perfect', Remove_100(target='Survived')),
                 ('clean_names', Clean_Colum_Names()),
                 ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                 ('dfs', 'passthrough'), ('pca', 'passthrough')],
          verbose=False),
 [],
 '9220',
 0        1
 1        0
 2        0
 3        0
 4        1
         ..
 99995    1
 99996    0
 99997    0
 99998    0
 99999    0
 Name: Survived, Length: 100000, dtype: int64,
 False,
 [],
 'box-cox',
 {'lr': <pycaret.containers.models.classification.LogisticRegressionClassifierContainer at 0x7fdbc078af90>,
  'knn': <pycaret.containers.models.classification.KNeighborsClassifierContainer at 0x7fdbc078ae50>,
  'nb': <pycaret.containers.models.classification.GaussianNBClassifierContainer at 0x7fdbc078ad10>,
  'dt': <pycaret.containers.models.classification.DecisionTreeClassifierContainer at 0x7fdbc078ae90>,
  'svm': <pycaret.containers.models.classification.SGDClassifierContainer at 0x7fdbc078aa10>,
  'rbfsvm': <pycaret.containers.models.classification.SVCClassifierContainer at 0x7fdbc078a710>,
  'gpc': <pycaret.containers.models.classification.GaussianProcessClassifierContainer at 0x7fdbc078a8d0>,
  'mlp': <pycaret.containers.models.classification.MLPClassifierContainer at 0x7fdbc078a550>,
  'ridge': <pycaret.containers.models.classification.RidgeClassifierContainer at 0x7fdbc078a2d0>,
  'rf': <pycaret.containers.models.classification.RandomForestClassifierContainer at 0x7fdbc078a990>,
  'qda': <pycaret.containers.models.classification.QuadraticDiscriminantAnalysisContainer at 0x7fdbc078b210>,
  'ada': <pycaret.containers.models.classification.AdaBoostClassifierContainer at 0x7fdbc078b250>,
  'gbc': <pycaret.containers.models.classification.GradientBoostingClassifierContainer at 0x7fdbc078b4d0>,
  'lda': <pycaret.containers.models.classification.LinearDiscriminantAnalysisContainer at 0x7fdbc078b7d0>,
  'et': <pycaret.containers.models.classification.ExtraTreesClassifierContainer at 0x7fdbc078b8d0>,
  'xgboost': <pycaret.containers.models.classification.XGBClassifierContainer at 0x7fdbc078bc10>,
  'lightgbm': <pycaret.containers.models.classification.LGBMClassifierContainer at 0x7fdbc0f33090>,
  'catboost': <pycaret.containers.models.classification.CatBoostClassifierContainer at 0x7fdbc078afd0>,
  'Bagging': <pycaret.containers.models.classification.BaggingClassifierContainer at 0x7fdbc0762ed0>,
  'Stacking': <pycaret.containers.models.classification.StackingClassifierContainer at 0x7fdbc0762f10>,
  'Voting': <pycaret.containers.models.classification.VotingClassifierContainer at 0x7fdbc0f33190>,
  'CalibratedCV': <pycaret.containers.models.classification.CalibratedClassifierCVContainer at 0x7fdbc0f334d0>},
                 Age     SibSp      Fare   Name  Ticket  Sex  Pclass  Embarked  \
 0     -8.614253e-16  1.901268  0.134351  17441      49    1       0         2   
 1     -8.614253e-16 -0.539572 -0.533837   3063      49    1       2         2   
 2     -2.069149e+00  0.680848  1.070483  17798      14    1       2         2   
 3     -9.374220e-01 -0.539572 -0.555506  12742       0    1       2         2   
 4     -5.737175e-01 -0.539572 -1.023540   2335      49    1       2         2   
 ...             ...       ...       ...    ...     ...  ...     ...       ...   
 99995  1.669127e+00 -0.539572 -0.434567   1590      21    0       1         0   
 99996  1.911597e+00 -0.539572 -0.698959   2992      49    1       1         2   
 99997  1.536915e-01 -0.539572 -0.802137   4219      49    1       2         2   
 99998  1.002335e+00 -0.539572  0.259408   3941      49    1       2         2   
 99999  1.244805e+00 -0.539572 -0.492531   7055      49    1       2         2   
 
        Cabin_A  Cabin_B  Cabin_C  Cabin_D  Cabin_E  Cabin_F  Cabin_G  Cabin_T  \
 0            0        0        1        0        0        0        0        0   
 1            0        0        0        0        0        0        0        0   
 2            0        0        0        0        0        0        0        0   
 3            0        0        0        0        0        0        0        0   
 4            0        0        0        0        0        0        0        0   
 ...        ...      ...      ...      ...      ...      ...      ...      ...   
 99995        0        0        0        1        0        0        0        0   
 99996        0        0        0        0        0        0        0        0   
 99997        0        0        0        0        0        0        0        0   
 99998        0        0        0        0        0        0        0        0   
 99999        0        0        0        0        0        0        0        0   
 
        Cabin_X  Survived  
 0            0       1.0  
 1            1       0.0  
 2            1       0.0  
 3            1       0.0  
 4            1       1.0  
 ...        ...       ...  
 99995        0       1.0  
 99996        1       0.0  
 99997        1       0.0  
 99998        1       0.0  
 99999        1       0.0  
 
 [100000 rows x 18 columns],
 5,
 [('Setup Config',
                                  Description             Value
   0                               session_id                 1
   1                                   Target          Survived
   2                              Target Type            Binary
   3                            Label Encoded    0.0: 0, 1.0: 1
   4                            Original Data      (100000, 18)
   5                           Missing Values             False
   6                         Numeric Features                14
   7                     Categorical Features                 3
   8                         Ordinal Features              True
   9                High Cardinality Features             False
   10                 High Cardinality Method              None
   11                   Transformed Train Set       (69999, 17)
   12                    Transformed Test Set       (30001, 17)
   13                      Shuffle Train-Test              True
   14                     Stratify Train-Test             False
   15                          Fold Generator   StratifiedKFold
   16                             Fold Number                 5
   17                                CPU Jobs                -1
   18                                 Use GPU             False
   19                          Log Experiment             False
   20                         Experiment Name  clf-default-name
   21                                     USI              9220
   22                         Imputation Type            simple
   23          Iterative Imputation Iteration              None
   24                         Numeric Imputer              mean
   25      Iterative Imputation Numeric Model              None
   26                     Categorical Imputer          constant
   27  Iterative Imputation Categorical Model              None
   28           Unknown Categoricals Handling    least_frequent
   29                               Normalize             False
   30                        Normalize Method              None
   31                          Transformation             False
   32                   Transformation Method              None
   33                                     PCA             False
   34                              PCA Method              None
   35                          PCA Components              None
   36                     Ignore Low Variance             False
   37                     Combine Rare Levels             False
   38                    Rare Level Threshold              None
   39                         Numeric Binning             False
   40                         Remove Outliers             False
   41                      Outliers Threshold              None
   42                Remove Multicollinearity             False
   43             Multicollinearity Threshold              None
   44                              Clustering             False
   45                    Clustering Iteration              None
   46                     Polynomial Features             False
   47                       Polynomial Degree              None
   48                    Trignometry Features             False
   49                    Polynomial Threshold              None
   50                          Group Features             False
   51                       Feature Selection             False
   52            Features Selection Threshold              None
   53                     Feature Interaction             False
   54                           Feature Ratio             False
   55                   Interaction Threshold              None
   56                           Fix Imbalance             False
   57                    Fix Imbalance Method             SMOTE),
  ('X_training Set',
               Age     SibSp      Fare     Name  Ticket  Sex  Pclass  Embarked  \
   62017 -1.786066 -0.539572 -0.425227  23058.0    38.0  0.0     1.0       2.0   
   5005   0.274926 -0.539572  0.041264   9979.0    49.0  1.0     1.0       2.0   
   56849 -1.361744 -0.539572  0.215883  14486.0    49.0  0.0     1.0       2.0   
   42434  1.426657 -0.539572  1.209948   8350.0    21.0  0.0     0.0       0.0   
   54712 -1.725448 -0.539572 -0.906172  18642.0     0.0  0.0     2.0       2.0   
   ...         ...       ...       ...      ...     ...  ...     ...       ...   
   50057  1.608510 -0.539572  1.938114   4224.0    49.0  1.0     0.0       2.0   
   98047 -0.149396  0.680848 -0.914255  21294.0    49.0  1.0     2.0       2.0   
   5192  -0.634335 -0.539572 -1.037225  24329.0    49.0  1.0     1.0       2.0   
   77708  0.820483 -0.539572  2.156857   5150.0    49.0  0.0     0.0       0.0   
   98539  0.396161 -0.539572  2.729733  25222.0    49.0  1.0     0.0       1.0   
   
          Cabin_A  Cabin_B  Cabin_C  Cabin_D  Cabin_E  Cabin_F  Cabin_G  Cabin_T  \
   62017        0        0        0        0        0        0        0        0   
   5005         0        0        0        0        0        0        0        0   
   56849        0        0        0        0        0        0        0        0   
   42434        0        0        1        0        0        0        0        0   
   54712        0        0        0        0        0        0        0        0   
   ...        ...      ...      ...      ...      ...      ...      ...      ...   
   50057        1        0        0        0        0        0        0        0   
   98047        0        0        0        0        0        0        0        0   
   5192         0        0        0        0        0        0        0        0   
   77708        0        1        0        0        0        0        0        0   
   98539        1        0        0        0        0        0        0        0   
   
          Cabin_X  
   62017        1  
   5005         1  
   56849        1  
   42434        0  
   54712        1  
   ...        ...  
   50057        0  
   98047        1  
   5192         1  
   77708        0  
   98539        0  
   
   [69999 rows x 17 columns]),
  ('y_training Set',
   62017    0
   5005     0
   56849    1
   42434    1
   54712    0
           ..
   50057    1
   98047    0
   5192     0
   77708    0
   98539    1
   Name: Survived, Length: 69999, dtype: int64),
  ('X_test Set',
               Age     SibSp      Fare     Name  Ticket  Sex  Pclass  Embarked  \
   43660  1.244805 -0.539572  0.820679  25313.0    49.0  0.0     0.0       2.0   
   87278 -0.210013 -0.539572 -0.555506   3156.0    49.0  1.0     0.0       0.0   
   14317  0.941718  0.680848  1.561942  11588.0    49.0  1.0     0.0       2.0   
   81932  0.638631 -0.539572  0.050516  16175.0    21.0  1.0     0.0       0.0   
   95321 -0.694952  0.680848  0.063476  10196.0    49.0  1.0     0.0       2.0   
   ...         ...       ...       ...      ...     ...  ...     ...       ...   
   42287 -1.179892 -0.539572 -0.463126  11629.0    49.0  1.0     2.0       2.0   
   4967  -0.452483 -0.539572  0.265297   3394.0    49.0  1.0     1.0       0.0   
   47725  1.244805 -0.539572 -0.364529  19040.0    49.0  1.0     1.0       2.0   
   42348 -0.694952 -0.539572 -1.197666  15816.0    49.0  1.0     2.0       2.0   
   80630 -0.452483 -0.539572 -0.900153  24505.0    49.0  1.0     2.0       2.0   
   
          Cabin_A  Cabin_B  Cabin_C  Cabin_D  Cabin_E  Cabin_F  Cabin_G  Cabin_T  \
   43660        0        0        1        0        0        0        0        0   
   87278        0        0        1        0        0        0        0        0   
   14317        1        0        0        0        0        0        0        0   
   81932        1        0        0        0        0        0        0        0   
   95321        0        0        1        0        0        0        0        0   
   ...        ...      ...      ...      ...      ...      ...      ...      ...   
   42287        0        0        0        0        0        0        0        0   
   4967         0        0        0        0        0        0        0        0   
   47725        0        0        0        0        0        0        0        0   
   42348        0        0        0        0        0        0        0        0   
   80630        0        0        0        0        0        0        0        0   
   
          Cabin_X  
   43660        0  
   87278        0  
   14317        0  
   81932        0  
   95321        0  
   ...        ...  
   42287        1  
   4967         1  
   47725        1  
   42348        1  
   80630        1  
   
   [30001 rows x 17 columns]),
  ('y_test Set',
   43660    1
   87278    0
   14317    0
   81932    1
   95321    0
           ..
   42287    0
   4967     1
   47725    0
   42348    0
   80630    0
   Name: Survived, Length: 30001, dtype: int64),
  ('Transformation Pipeline',
   Pipeline(memory=None,
            steps=[('dtypes',
                    DataTypes_Auto_infer(categorical_features=[],
                                         display_types=False, features_todrop=[],
                                         id_columns=[],
                                         ml_usecase='classification',
                                         numerical_features=[], target='Survived',
                                         time_features=[])),
                   ('imputer',
                    Simple_Imputer(categorical_strategy='not_available',
                                   fill_value_categorical=None,
                                   fill_value_numerical=None,
                                   numeric_st...
                   ('scaling', 'passthrough'), ('P_transform', 'passthrough'),
                   ('binn', 'passthrough'), ('rem_outliers', 'passthrough'),
                   ('cluster_all', 'passthrough'),
                   ('dummy', Dummify(target='Survived')),
                   ('fix_perfect', Remove_100(target='Survived')),
                   ('clean_names', Clean_Colum_Names()),
                   ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                   ('dfs', 'passthrough'), ('pca', 'passthrough')],
            verbose=False))],
 'lightgbm',
 False,
             Age     SibSp      Fare     Name  Ticket  Sex  Pclass  Embarked  \
 43660  1.244805 -0.539572  0.820679  25313.0    49.0  0.0     0.0       2.0   
 87278 -0.210013 -0.539572 -0.555506   3156.0    49.0  1.0     0.0       0.0   
 14317  0.941718  0.680848  1.561942  11588.0    49.0  1.0     0.0       2.0   
 81932  0.638631 -0.539572  0.050516  16175.0    21.0  1.0     0.0       0.0   
 95321 -0.694952  0.680848  0.063476  10196.0    49.0  1.0     0.0       2.0   
 ...         ...       ...       ...      ...     ...  ...     ...       ...   
 42287 -1.179892 -0.539572 -0.463126  11629.0    49.0  1.0     2.0       2.0   
 4967  -0.452483 -0.539572  0.265297   3394.0    49.0  1.0     1.0       0.0   
 47725  1.244805 -0.539572 -0.364529  19040.0    49.0  1.0     1.0       2.0   
 42348 -0.694952 -0.539572 -1.197666  15816.0    49.0  1.0     2.0       2.0   
 80630 -0.452483 -0.539572 -0.900153  24505.0    49.0  1.0     2.0       2.0   
 
        Cabin_A  Cabin_B  Cabin_C  Cabin_D  Cabin_E  Cabin_F  Cabin_G  Cabin_T  \
 43660        0        0        1        0        0        0        0        0   
 87278        0        0        1        0        0        0        0        0   
 14317        1        0        0        0        0        0        0        0   
 81932        1        0        0        0        0        0        0        0   
 95321        0        0        1        0        0        0        0        0   
 ...        ...      ...      ...      ...      ...      ...      ...      ...   
 42287        0        0        0        0        0        0        0        0   
 4967         0        0        0        0        0        0        0        0   
 47725        0        0        0        0        0        0        0        0   
 42348        0        0        0        0        0        0        0        0   
 80630        0        0        0        0        0        0        0        0   
 
        Cabin_X  
 43660        0  
 87278        0  
 14317        0  
 81932        0  
 95321        0  
 ...        ...  
 42287        1  
 4967         1  
 47725        1  
 42348        1  
 80630        1  
 
 [30001 rows x 17 columns],
 62017    0
 5005     0
 56849    1
 42434    1
 54712    0
         ..
 50057    1
 98047    0
 5192     0
 77708    0
 98539    1
 Name: Survived, Length: 69999, dtype: int64,
 {'parameter': 'Hyperparameters',
  'auc': 'AUC',
  'confusion_matrix': 'Confusion Matrix',
  'threshold': 'Threshold',
  'pr': 'Precision Recall',
  'error': 'Prediction Error',
  'class_report': 'Class Report',
  'rfe': 'Feature Selection',
  'learning': 'Learning Curve',
  'manifold': 'Manifold Learning',
  'calibration': 'Calibration Curve',
  'vc': 'Validation Curve',
  'dimension': 'Dimensions',
  'feature': 'Feature Importance',
  'feature_all': 'Feature Importance (All)',
  'boundary': 'Decision Boundary',
  'lift': 'Lift Chart',
  'gain': 'Gain Chart',
  'tree': 'Decision Tree'},
 None,
 <MLUsecase.CLASSIFICATION: 1>,
 -1,
 43660    1
 87278    0
 14317    0
 81932    1
 95321    0
         ..
 42287    0
 4967     1
 47725    0
 42348    0
 80630    0
 Name: Survived, Length: 30001, dtype: int64,
                 Age     SibSp      Fare     Name  Ticket  Sex  Pclass  \
 0     -8.614253e-16  1.901268  0.134351  17441.0    49.0  1.0     0.0   
 1     -8.614253e-16 -0.539572 -0.533837   3063.0    49.0  1.0     2.0   
 2     -2.069149e+00  0.680848  1.070483  17798.0    14.0  1.0     2.0   
 3     -9.374220e-01 -0.539572 -0.555506  12742.0     0.0  1.0     2.0   
 4     -5.737175e-01 -0.539572 -1.023540   2335.0    49.0  1.0     2.0   
 ...             ...       ...       ...      ...     ...  ...     ...   
 99995  1.669127e+00 -0.539572 -0.434567   1590.0    21.0  0.0     1.0   
 99996  1.911597e+00 -0.539572 -0.698959   2992.0    49.0  1.0     1.0   
 99997  1.536915e-01 -0.539572 -0.802137   4219.0    49.0  1.0     2.0   
 99998  1.002335e+00 -0.539572  0.259408   3941.0    49.0  1.0     2.0   
 99999  1.244805e+00 -0.539572 -0.492531   7055.0    49.0  1.0     2.0   
 
        Embarked  Cabin_A  Cabin_B  Cabin_C  Cabin_D  Cabin_E  Cabin_F  \
 0           2.0        0        0        1        0        0        0   
 1           2.0        0        0        0        0        0        0   
 2           2.0        0        0        0        0        0        0   
 3           2.0        0        0        0        0        0        0   
 4           2.0        0        0        0        0        0        0   
 ...         ...      ...      ...      ...      ...      ...      ...   
 99995       0.0        0        0        0        1        0        0   
 99996       2.0        0        0        0        0        0        0   
 99997       2.0        0        0        0        0        0        0   
 99998       2.0        0        0        0        0        0        0   
 99999       2.0        0        0        0        0        0        0   
 
        Cabin_G  Cabin_T  Cabin_X  
 0            0        0        0  
 1            0        0        1  
 2            0        0        1  
 3            0        0        1  
 4            0        0        1  
 ...        ...      ...      ...  
 99995        0        0        0  
 99996        0        0        1  
 99997        0        0        1  
 99998        0        0        1  
 99999        0        0        1  
 
 [100000 rows x 17 columns],
 5,
 None,
 False)

	Description	Value

1	#best_model = compare_models(sort = 'Accuracy', n_select = 4)

1	# gbc=create_model('gbc')

1	# gbc=tune_model(gbc)

1	# print(gbc)

1
2
3

lightgbm=create_model('lightgbm'

                     )

            <tr>
                    <th id="T_222f1_level0_row0" class="row_heading level0 row0" >0</th>
                    <td id="T_222f1_row0_col0" class="data row0 col0" >0.7825</td>
                    <td id="T_222f1_row0_col1" class="data row0 col1" >0.8494</td>
                    <td id="T_222f1_row0_col2" class="data row0 col2" >0.7359</td>
                    <td id="T_222f1_row0_col3" class="data row0 col3" >0.7513</td>
                    <td id="T_222f1_row0_col4" class="data row0 col4" >0.7435</td>
                    <td id="T_222f1_row0_col5" class="data row0 col5" >0.5547</td>
                    <td id="T_222f1_row0_col6" class="data row0 col6" >0.5548</td>
        </tr>
        <tr>
                    <th id="T_222f1_level0_row1" class="row_heading level0 row1" >1</th>
                    <td id="T_222f1_row1_col0" class="data row1 col0" >0.7801</td>
                    <td id="T_222f1_row1_col1" class="data row1 col1" >0.8490</td>
                    <td id="T_222f1_row1_col2" class="data row1 col2" >0.7437</td>
                    <td id="T_222f1_row1_col3" class="data row1 col3" >0.7431</td>
                    <td id="T_222f1_row1_col4" class="data row1 col4" >0.7434</td>
                    <td id="T_222f1_row1_col5" class="data row1 col5" >0.5510</td>
                    <td id="T_222f1_row1_col6" class="data row1 col6" >0.5510</td>
        </tr>
        <tr>
                    <th id="T_222f1_level0_row2" class="row_heading level0 row2" >2</th>
                    <td id="T_222f1_row2_col0" class="data row2 col0" >0.7863</td>
                    <td id="T_222f1_row2_col1" class="data row2 col1" >0.8547</td>
                    <td id="T_222f1_row2_col2" class="data row2 col2" >0.7467</td>
                    <td id="T_222f1_row2_col3" class="data row2 col3" >0.7525</td>
                    <td id="T_222f1_row2_col4" class="data row2 col4" >0.7496</td>
                    <td id="T_222f1_row2_col5" class="data row2 col5" >0.5632</td>
                    <td id="T_222f1_row2_col6" class="data row2 col6" >0.5632</td>
        </tr>
        <tr>
                    <th id="T_222f1_level0_row3" class="row_heading level0 row3" >3</th>
                    <td id="T_222f1_row3_col0" class="data row3 col0" >0.7796</td>
                    <td id="T_222f1_row3_col1" class="data row3 col1" >0.8490</td>
                    <td id="T_222f1_row3_col2" class="data row3 col2" >0.7391</td>
                    <td id="T_222f1_row3_col3" class="data row3 col3" >0.7447</td>
                    <td id="T_222f1_row3_col4" class="data row3 col4" >0.7419</td>
                    <td id="T_222f1_row3_col5" class="data row3 col5" >0.5496</td>
                    <td id="T_222f1_row3_col6" class="data row3 col6" >0.5497</td>
        </tr>
        <tr>
                    <th id="T_222f1_level0_row4" class="row_heading level0 row4" >4</th>
                    <td id="T_222f1_row4_col0" class="data row4 col0" >0.7833</td>
                    <td id="T_222f1_row4_col1" class="data row4 col1" >0.8502</td>
                    <td id="T_222f1_row4_col2" class="data row4 col2" >0.7475</td>
                    <td id="T_222f1_row4_col3" class="data row4 col3" >0.7469</td>
                    <td id="T_222f1_row4_col4" class="data row4 col4" >0.7472</td>
                    <td id="T_222f1_row4_col5" class="data row4 col5" >0.5577</td>
                    <td id="T_222f1_row4_col6" class="data row4 col6" >0.5577</td>
        </tr>
        <tr>
                    <th id="T_222f1_level0_row5" class="row_heading level0 row5" >Mean</th>
                    <td id="T_222f1_row5_col0" class="data row5 col0" >0.7824</td>
                    <td id="T_222f1_row5_col1" class="data row5 col1" >0.8505</td>
                    <td id="T_222f1_row5_col2" class="data row5 col2" >0.7426</td>
                    <td id="T_222f1_row5_col3" class="data row5 col3" >0.7477</td>
                    <td id="T_222f1_row5_col4" class="data row5 col4" >0.7451</td>
                    <td id="T_222f1_row5_col5" class="data row5 col5" >0.5552</td>
                    <td id="T_222f1_row5_col6" class="data row5 col6" >0.5553</td>
        </tr>
        <tr>
                    <th id="T_222f1_level0_row6" class="row_heading level0 row6" >SD</th>
                    <td id="T_222f1_row6_col0" class="data row6 col0" >0.0024</td>
                    <td id="T_222f1_row6_col1" class="data row6 col1" >0.0022</td>
                    <td id="T_222f1_row6_col2" class="data row6 col2" >0.0045</td>
                    <td id="T_222f1_row6_col3" class="data row6 col3" >0.0037</td>
                    <td id="T_222f1_row6_col4" class="data row6 col4" >0.0029</td>
                    <td id="T_222f1_row6_col5" class="data row6 col5" >0.0049</td>
                    <td id="T_222f1_row6_col6" class="data row6 col6" >0.0049</td>
        </tr>
</tbody></table>

	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC

1
2
3

lightgbm = tune_model(lightgbm
                     , optimize='AUC' 
                     )

            <tr>
                    <th id="T_9e81c_level0_row0" class="row_heading level0 row0" >0</th>
                    <td id="T_9e81c_row0_col0" class="data row0 col0" >0.7793</td>
                    <td id="T_9e81c_row0_col1" class="data row0 col1" >0.8496</td>
                    <td id="T_9e81c_row0_col2" class="data row0 col2" >0.7382</td>
                    <td id="T_9e81c_row0_col3" class="data row0 col3" >0.7444</td>
                    <td id="T_9e81c_row0_col4" class="data row0 col4" >0.7413</td>
                    <td id="T_9e81c_row0_col5" class="data row0 col5" >0.5488</td>
                    <td id="T_9e81c_row0_col6" class="data row0 col6" >0.5489</td>
        </tr>
        <tr>
                    <th id="T_9e81c_level0_row1" class="row_heading level0 row1" >1</th>
                    <td id="T_9e81c_row1_col0" class="data row1 col0" >0.7797</td>
                    <td id="T_9e81c_row1_col1" class="data row1 col1" >0.8492</td>
                    <td id="T_9e81c_row1_col2" class="data row1 col2" >0.7439</td>
                    <td id="T_9e81c_row1_col3" class="data row1 col3" >0.7424</td>
                    <td id="T_9e81c_row1_col4" class="data row1 col4" >0.7431</td>
                    <td id="T_9e81c_row1_col5" class="data row1 col5" >0.5503</td>
                    <td id="T_9e81c_row1_col6" class="data row1 col6" >0.5503</td>
        </tr>
        <tr>
                    <th id="T_9e81c_level0_row2" class="row_heading level0 row2" >2</th>
                    <td id="T_9e81c_row2_col0" class="data row2 col0" >0.7858</td>
                    <td id="T_9e81c_row2_col1" class="data row2 col1" >0.8547</td>
                    <td id="T_9e81c_row2_col2" class="data row2 col2" >0.7489</td>
                    <td id="T_9e81c_row2_col3" class="data row2 col3" >0.7505</td>
                    <td id="T_9e81c_row2_col4" class="data row2 col4" >0.7497</td>
                    <td id="T_9e81c_row2_col5" class="data row2 col5" >0.5625</td>
                    <td id="T_9e81c_row2_col6" class="data row2 col6" >0.5625</td>
        </tr>
        <tr>
                    <th id="T_9e81c_level0_row3" class="row_heading level0 row3" >3</th>
                    <td id="T_9e81c_row3_col0" class="data row3 col0" >0.7777</td>
                    <td id="T_9e81c_row3_col1" class="data row3 col1" >0.8493</td>
                    <td id="T_9e81c_row3_col2" class="data row3 col2" >0.7467</td>
                    <td id="T_9e81c_row3_col3" class="data row3 col3" >0.7376</td>
                    <td id="T_9e81c_row3_col4" class="data row3 col4" >0.7422</td>
                    <td id="T_9e81c_row3_col5" class="data row3 col5" >0.5468</td>
                    <td id="T_9e81c_row3_col6" class="data row3 col6" >0.5469</td>
        </tr>
        <tr>
                    <th id="T_9e81c_level0_row4" class="row_heading level0 row4" >4</th>
                    <td id="T_9e81c_row4_col0" class="data row4 col0" >0.7825</td>
                    <td id="T_9e81c_row4_col1" class="data row4 col1" >0.8508</td>
                    <td id="T_9e81c_row4_col2" class="data row4 col2" >0.7429</td>
                    <td id="T_9e81c_row4_col3" class="data row4 col3" >0.7477</td>
                    <td id="T_9e81c_row4_col4" class="data row4 col4" >0.7453</td>
                    <td id="T_9e81c_row4_col5" class="data row4 col5" >0.5555</td>
                    <td id="T_9e81c_row4_col6" class="data row4 col6" >0.5555</td>
        </tr>
        <tr>
                    <th id="T_9e81c_level0_row5" class="row_heading level0 row5" >Mean</th>
                    <td id="T_9e81c_row5_col0" class="data row5 col0" >0.7810</td>
                    <td id="T_9e81c_row5_col1" class="data row5 col1" >0.8507</td>
                    <td id="T_9e81c_row5_col2" class="data row5 col2" >0.7441</td>
                    <td id="T_9e81c_row5_col3" class="data row5 col3" >0.7445</td>
                    <td id="T_9e81c_row5_col4" class="data row5 col4" >0.7443</td>
                    <td id="T_9e81c_row5_col5" class="data row5 col5" >0.5528</td>
                    <td id="T_9e81c_row5_col6" class="data row5 col6" >0.5528</td>
        </tr>
        <tr>
                    <th id="T_9e81c_level0_row6" class="row_heading level0 row6" >SD</th>
                    <td id="T_9e81c_row6_col0" class="data row6 col0" >0.0028</td>
                    <td id="T_9e81c_row6_col1" class="data row6 col1" >0.0021</td>
                    <td id="T_9e81c_row6_col2" class="data row6 col2" >0.0036</td>
                    <td id="T_9e81c_row6_col3" class="data row6 col3" >0.0044</td>
                    <td id="T_9e81c_row6_col4" class="data row6 col4" >0.0030</td>
                    <td id="T_9e81c_row6_col5" class="data row6 col5" >0.0056</td>
                    <td id="T_9e81c_row6_col6" class="data row6 col6" >0.0056</td>
        </tr>
</tbody></table>

	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC

1	print(lightgbm)

LGBMClassifier(bagging_fraction=0.8, bagging_freq=5, boosting_type='gbdt',
               class_weight=None, colsample_bytree=1.0, feature_fraction=0.9,
               importance_type='split', learning_rate=0.103, max_depth=-1,
               min_child_samples=30, min_child_weight=0.001, min_split_gain=0.4,
               n_estimators=40, n_jobs=-1, num_leaves=30, objective=None,
               random_state=1, reg_alpha=2, reg_lambda=0.2, silent=True,
               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

1
2
3

catboost=create_model('catboost'

                     )

            <tr>
                    <th id="T_cd7ef_level0_row0" class="row_heading level0 row0" >0</th>
                    <td id="T_cd7ef_row0_col0" class="data row0 col0" >0.7786</td>
                    <td id="T_cd7ef_row0_col1" class="data row0 col1" >0.8480</td>
                    <td id="T_cd7ef_row0_col2" class="data row0 col2" >0.7287</td>
                    <td id="T_cd7ef_row0_col3" class="data row0 col3" >0.7479</td>
                    <td id="T_cd7ef_row0_col4" class="data row0 col4" >0.7382</td>
                    <td id="T_cd7ef_row0_col5" class="data row0 col5" >0.5464</td>
                    <td id="T_cd7ef_row0_col6" class="data row0 col6" >0.5465</td>
        </tr>
        <tr>
                    <th id="T_cd7ef_level0_row1" class="row_heading level0 row1" >1</th>
                    <td id="T_cd7ef_row1_col0" class="data row1 col0" >0.7774</td>
                    <td id="T_cd7ef_row1_col1" class="data row1 col1" >0.8474</td>
                    <td id="T_cd7ef_row1_col2" class="data row1 col2" >0.7394</td>
                    <td id="T_cd7ef_row1_col3" class="data row1 col3" >0.7405</td>
                    <td id="T_cd7ef_row1_col4" class="data row1 col4" >0.7399</td>
                    <td id="T_cd7ef_row1_col5" class="data row1 col5" >0.5453</td>
                    <td id="T_cd7ef_row1_col6" class="data row1 col6" >0.5453</td>
        </tr>
        <tr>
                    <th id="T_cd7ef_level0_row2" class="row_heading level0 row2" >2</th>
                    <td id="T_cd7ef_row2_col0" class="data row2 col0" >0.7826</td>
                    <td id="T_cd7ef_row2_col1" class="data row2 col1" >0.8532</td>
                    <td id="T_cd7ef_row2_col2" class="data row2 col2" >0.7417</td>
                    <td id="T_cd7ef_row2_col3" class="data row2 col3" >0.7486</td>
                    <td id="T_cd7ef_row2_col4" class="data row2 col4" >0.7452</td>
                    <td id="T_cd7ef_row2_col5" class="data row2 col5" >0.5557</td>
                    <td id="T_cd7ef_row2_col6" class="data row2 col6" >0.5557</td>
        </tr>
        <tr>
                    <th id="T_cd7ef_level0_row3" class="row_heading level0 row3" >3</th>
                    <td id="T_cd7ef_row3_col0" class="data row3 col0" >0.7761</td>
                    <td id="T_cd7ef_row3_col1" class="data row3 col1" >0.8477</td>
                    <td id="T_cd7ef_row3_col2" class="data row3 col2" >0.7354</td>
                    <td id="T_cd7ef_row3_col3" class="data row3 col3" >0.7403</td>
                    <td id="T_cd7ef_row3_col4" class="data row3 col4" >0.7379</td>
                    <td id="T_cd7ef_row3_col5" class="data row3 col5" >0.5425</td>
                    <td id="T_cd7ef_row3_col6" class="data row3 col6" >0.5425</td>
        </tr>
        <tr>
                    <th id="T_cd7ef_level0_row4" class="row_heading level0 row4" >4</th>
                    <td id="T_cd7ef_row4_col0" class="data row4 col0" >0.7835</td>
                    <td id="T_cd7ef_row4_col1" class="data row4 col1" >0.8499</td>
                    <td id="T_cd7ef_row4_col2" class="data row4 col2" >0.7410</td>
                    <td id="T_cd7ef_row4_col3" class="data row4 col3" >0.7504</td>
                    <td id="T_cd7ef_row4_col4" class="data row4 col4" >0.7457</td>
                    <td id="T_cd7ef_row4_col5" class="data row4 col5" >0.5572</td>
                    <td id="T_cd7ef_row4_col6" class="data row4 col6" >0.5572</td>
        </tr>
        <tr>
                    <th id="T_cd7ef_level0_row5" class="row_heading level0 row5" >Mean</th>
                    <td id="T_cd7ef_row5_col0" class="data row5 col0" >0.7796</td>
                    <td id="T_cd7ef_row5_col1" class="data row5 col1" >0.8492</td>
                    <td id="T_cd7ef_row5_col2" class="data row5 col2" >0.7373</td>
                    <td id="T_cd7ef_row5_col3" class="data row5 col3" >0.7456</td>
                    <td id="T_cd7ef_row5_col4" class="data row5 col4" >0.7414</td>
                    <td id="T_cd7ef_row5_col5" class="data row5 col5" >0.5494</td>
                    <td id="T_cd7ef_row5_col6" class="data row5 col6" >0.5495</td>
        </tr>
        <tr>
                    <th id="T_cd7ef_level0_row6" class="row_heading level0 row6" >SD</th>
                    <td id="T_cd7ef_row6_col0" class="data row6 col0" >0.0029</td>
                    <td id="T_cd7ef_row6_col1" class="data row6 col1" >0.0022</td>
                    <td id="T_cd7ef_row6_col2" class="data row6 col2" >0.0048</td>
                    <td id="T_cd7ef_row6_col3" class="data row6 col3" >0.0043</td>
                    <td id="T_cd7ef_row6_col4" class="data row6 col4" >0.0034</td>
                    <td id="T_cd7ef_row6_col5" class="data row6 col5" >0.0059</td>
                    <td id="T_cd7ef_row6_col6" class="data row6 col6" >0.0059</td>
        </tr>
</tbody></table>

	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC

1
2
3

catboost = tune_model(catboost
                     , optimize='AUC' 
                     )

            <tr>
                    <th id="T_27911_level0_row0" class="row_heading level0 row0" >0</th>
                    <td id="T_27911_row0_col0" class="data row0 col0" >0.7815</td>
                    <td id="T_27911_row0_col1" class="data row0 col1" >0.8501</td>
                    <td id="T_27911_row0_col2" class="data row0 col2" >0.7455</td>
                    <td id="T_27911_row0_col3" class="data row0 col3" >0.7447</td>
                    <td id="T_27911_row0_col4" class="data row0 col4" >0.7451</td>
                    <td id="T_27911_row0_col5" class="data row0 col5" >0.5539</td>
                    <td id="T_27911_row0_col6" class="data row0 col6" >0.5539</td>
        </tr>
        <tr>
                    <th id="T_27911_level0_row1" class="row_heading level0 row1" >1</th>
                    <td id="T_27911_row1_col0" class="data row1 col0" >0.7802</td>
                    <td id="T_27911_row1_col1" class="data row1 col1" >0.8490</td>
                    <td id="T_27911_row1_col2" class="data row1 col2" >0.7485</td>
                    <td id="T_27911_row1_col3" class="data row1 col3" >0.7410</td>
                    <td id="T_27911_row1_col4" class="data row1 col4" >0.7448</td>
                    <td id="T_27911_row1_col5" class="data row1 col5" >0.5518</td>
                    <td id="T_27911_row1_col6" class="data row1 col6" >0.5518</td>
        </tr>
        <tr>
                    <th id="T_27911_level0_row2" class="row_heading level0 row2" >2</th>
                    <td id="T_27911_row2_col0" class="data row2 col0" >0.7861</td>
                    <td id="T_27911_row2_col1" class="data row2 col1" >0.8554</td>
                    <td id="T_27911_row2_col2" class="data row2 col2" >0.7539</td>
                    <td id="T_27911_row2_col3" class="data row2 col3" >0.7486</td>
                    <td id="T_27911_row2_col4" class="data row2 col4" >0.7512</td>
                    <td id="T_27911_row2_col5" class="data row2 col5" >0.5636</td>
                    <td id="T_27911_row2_col6" class="data row2 col6" >0.5636</td>
        </tr>
        <tr>
                    <th id="T_27911_level0_row3" class="row_heading level0 row3" >3</th>
                    <td id="T_27911_row3_col0" class="data row3 col0" >0.7786</td>
                    <td id="T_27911_row3_col1" class="data row3 col1" >0.8494</td>
                    <td id="T_27911_row3_col2" class="data row3 col2" >0.7462</td>
                    <td id="T_27911_row3_col3" class="data row3 col3" >0.7395</td>
                    <td id="T_27911_row3_col4" class="data row3 col4" >0.7428</td>
                    <td id="T_27911_row3_col5" class="data row3 col5" >0.5485</td>
                    <td id="T_27911_row3_col6" class="data row3 col6" >0.5486</td>
        </tr>
        <tr>
                    <th id="T_27911_level0_row4" class="row_heading level0 row4" >4</th>
                    <td id="T_27911_row4_col0" class="data row4 col0" >0.7830</td>
                    <td id="T_27911_row4_col1" class="data row4 col1" >0.8509</td>
                    <td id="T_27911_row4_col2" class="data row4 col2" >0.7482</td>
                    <td id="T_27911_row4_col3" class="data row4 col3" >0.7460</td>
                    <td id="T_27911_row4_col4" class="data row4 col4" >0.7471</td>
                    <td id="T_27911_row4_col5" class="data row4 col5" >0.5570</td>
                    <td id="T_27911_row4_col6" class="data row4 col6" >0.5570</td>
        </tr>
        <tr>
                    <th id="T_27911_level0_row5" class="row_heading level0 row5" >Mean</th>
                    <td id="T_27911_row5_col0" class="data row5 col0" >0.7819</td>
                    <td id="T_27911_row5_col1" class="data row5 col1" >0.8510</td>
                    <td id="T_27911_row5_col2" class="data row5 col2" >0.7485</td>
                    <td id="T_27911_row5_col3" class="data row5 col3" >0.7439</td>
                    <td id="T_27911_row5_col4" class="data row5 col4" >0.7462</td>
                    <td id="T_27911_row5_col5" class="data row5 col5" >0.5550</td>
                    <td id="T_27911_row5_col6" class="data row5 col6" >0.5550</td>
        </tr>
        <tr>
                    <th id="T_27911_level0_row6" class="row_heading level0 row6" >SD</th>
                    <td id="T_27911_row6_col0" class="data row6 col0" >0.0025</td>
                    <td id="T_27911_row6_col1" class="data row6 col1" >0.0023</td>
                    <td id="T_27911_row6_col2" class="data row6 col2" >0.0029</td>
                    <td id="T_27911_row6_col3" class="data row6 col3" >0.0033</td>
                    <td id="T_27911_row6_col4" class="data row6 col4" >0.0028</td>
                    <td id="T_27911_row6_col5" class="data row6 col5" >0.0051</td>
                    <td id="T_27911_row6_col6" class="data row6 col6" >0.0051</td>
        </tr>
</tbody></table>

	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC

1	# ada=create_model('ada')

1	# ada=tune_model(ada,optimize='AUC')

def create_submission(model, test, test_passenger_id, model_name):
    y_pred_test = model.predict_proba(test)[:, 1]
    submission = pd.DataFrame(
        {
            'PassengerId': test_passenger_id, 
            'Survived': (y_pred_test >= 0.5).astype(int),
        }
    )
    submission.to_csv(f"submission_{model_name}.csv", index=False)
    
    return y_pred_test

1
2
3

test = all_df.iloc[100000:, :] #100000개~ 
X_test=test.drop(drop_list,axis=1)
X_test.head()

	Age	SibSp	Fare	Name	Ticket	Sex	Pclass	Embarked	Cabin_B	Cabin_X
100000	-0.937422	-0.539572	0.949786	10830	49	1	2	2	0	1
100001	1.123570	-0.539572	-1.273379	17134	49	0	2	2	0	1
100002	-0.937422	-0.539572	0.481059	9978	49	0	0	0	1	0
100003	-0.573717	-0.539572	-0.563310	13303	49	1	1	2	0	1
100004	-1.058657	-0.539572	0.125497	4406	49	0	0	0	1	0

test_pred_lightgbm = create_submission(
    lightgbm, X_test, test_df["PassengerId"], "lightgbm"
)
# test_pred_ada = create_submission(
#     ada, X_test, test_df["PassengerId"], "ada"
# )
# test_pred_gbc = create_submission(
#     ada, X_test, test_df["PassengerId"], "gbc"
# )
test_pred_catboost = create_submission(
    catboost, X_test, test_df["PassengerId"], "catboost"
)

test_pred_merged = (

    test_pred_lightgbm + 
    test_pred_catboost
#     test_pred_ada +
#     test_pred_gbc
)

test_pred_merged = np.round(test_pred_merged / 2)

submission = pd.DataFrame(
    {
        'PassengerId': test_df["PassengerId"], 
        'Survived': test_pred_merged.astype(int),
    }
submission.to_csv(f"submission_merged.csv", index=False)

  File "<ipython-input-39-f1cf7d72db78>", line 6
    submission.to_csv(f"submission_merged.csv", index=False)
             ^
SyntaxError: invalid syntax

Posted 2021-04-27Updated 2021-04-28 이현정 23 minutes read (About 3519 words)

2021 0427 타이타닉 playground EDA 스터디

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

https://www.kaggle.com/udbhavpangotra/tps-apr21-eda-model

https://www.kaggle.com/hiro5299834/tps-apr-2021-voting-pseudo-labeling

1	!pip install catboost

Collecting catboost
[?25l  Downloading https://files.pythonhosted.org/packages/47/80/8e9c57ec32dfed6ba2922bc5c96462cbf8596ce1a6f5de532ad1e43e53fe/catboost-0.25.1-cp37-none-manylinux1_x86_64.whl (67.3MB)
[K     |████████████████████████████████| 67.3MB 42kB/s 
[?25hRequirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from catboost) (1.15.0)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from catboost) (1.4.1)
Requirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages (from catboost) (4.4.1)
Requirement already satisfied: numpy>=1.16.0 in /usr/local/lib/python3.7/dist-packages (from catboost) (1.19.5)
Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.7/dist-packages (from catboost) (1.1.5)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from catboost) (3.2.2)
Requirement already satisfied: graphviz in /usr/local/lib/python3.7/dist-packages (from catboost) (0.10.1)
Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.7/dist-packages (from plotly->catboost) (1.3.3)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->catboost) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->catboost) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (1.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (2.4.7)
Installing collected packages: catboost
Successfully installed catboost-0.25.1

KAGGLE 스터디

import pandas as pd
import numpy as np
import random
import os

from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold

import lightgbm as lgb
import catboost as ctb
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier, export_graphviz

import graphviz
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter('ignore')

1	!pip install kaggle

Requirement already satisfied: kaggle in /usr/local/lib/python3.7/dist-packages (1.5.12)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.8.1)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.24.3)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.23.0)
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.15.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle) (2020.12.5)
Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle) (4.0.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle) (4.41.1)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (2.10)
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify->kaggle) (1.3)

1
2
3

!mkdir ~/.kaggle
!echo '{ "username": "tlgks32", "key": "ebc90b09f1ea143f3ff91bf4b19c9956"}' > ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json

1	!kaggle competitions list

Warning: Looks like you're using an outdated API Version, please consider updating (server 1.5.12 / client 1.5.4)
ref                                            deadline             category            reward  teamCount  userHasEntered  
---------------------------------------------  -------------------  ---------------  ---------  ---------  --------------  
contradictory-my-dear-watson                   2030-07-01 23:59:00  Getting Started     Prizes        132           False  
gan-getting-started                            2030-07-01 23:59:00  Getting Started     Prizes        244           False  
tpu-getting-started                            2030-06-03 23:59:00  Getting Started  Knowledge        783           False  
digit-recognizer                               2030-01-01 00:00:00  Getting Started  Knowledge       4207           False  
titanic                                        2030-01-01 00:00:00  Getting Started  Knowledge      34273            True  
house-prices-advanced-regression-techniques    2030-01-01 00:00:00  Getting Started  Knowledge       8993            True  
connectx                                       2030-01-01 00:00:00  Getting Started  Knowledge        727           False  
nlp-getting-started                            2030-01-01 00:00:00  Getting Started  Knowledge       2403            True  
competitive-data-science-predict-future-sales  2022-12-31 23:59:00  Playground           Kudos      11051           False  
jane-street-market-prediction                  2021-08-23 23:59:00  Featured          $100,000       4245           False  
hungry-geese                                   2021-07-26 23:59:00  Playground          Prizes        556           False  
coleridgeinitiative-show-us-the-data           2021-06-22 23:59:00  Featured           $90,000        707           False  
bms-molecular-translation                      2021-06-02 23:59:00  Featured           $50,000        553            True  
birdclef-2021                                  2021-05-31 23:59:00  Research            $5,000        314           False  
iwildcam2021-fgvc8                             2021-05-26 23:59:00  Research         Knowledge         25           False  
herbarium-2021-fgvc8                           2021-05-26 23:59:00  Research         Knowledge         50           False  
plant-pathology-2021-fgvc8                     2021-05-26 23:59:00  Research         Knowledge        354           False  
hotel-id-2021-fgvc8                            2021-05-26 23:59:00  Research         Knowledge         67           False  
hashcode-2021-oqr-extension                    2021-05-25 23:59:00  Playground       Knowledge        136           False  
indoor-location-navigation                     2021-05-17 23:59:00  Research           $10,000       1020           False

1	!kaggle competitions download -c tabular-playground-series-apr-2021

Warning: Looks like you're using an outdated API Version, please consider updating (server 1.5.12 / client 1.5.4)
Downloading train.csv.zip to /content
  0% 0.00/2.13M [00:00<?, ?B/s]
100% 2.13M/2.13M [00:00<00:00, 72.1MB/s]
Downloading sample_submission.csv to /content
  0% 0.00/879k [00:00<?, ?B/s]
100% 879k/879k [00:00<00:00, 125MB/s]
Downloading test.csv.zip to /content
  0% 0.00/2.07M [00:00<?, ?B/s]
100% 2.07M/2.07M [00:00<00:00, 141MB/s]

TARGET = 'Survived'

N_ESTIMATORS = 1000
N_SPLITS = 10
SEED = 2021
EARLY_STOPPING_ROUNDS = 100
VERBOSE = 100

#랜덤 시드 생성
def set_seed(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    
set_seed(SEED)

데이터 전처리

lode data

train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
submission = pd.read_csv('sample_submission.csv')
#test_df['Survived'] = pd.read_csv("../input/submission-merged3/submission_merged3.csv")['Survived']

all_df = pd.concat([train_df, test_df]).reset_index(drop=True)
#reset_index : 인덱스를 세팅한다. drop=True를 하면 인덱스를 세팅한걸 삭제함.

1 2	print('Rows and Columns in train dataset:', train_df.shape) print('Rows and Columns in test dataset:', test_df.shape)

Rows and Columns in train dataset: (100000, 12)
Rows and Columns in test dataset: (100000, 11)

결측치 갯수 출력

print('Missing values per columns in train dataset')
for col in train_df.columns:
    temp_col = train_df[col].isnull().sum()
    print(f'{col}: {temp_col}')
print()
print('Missing values per columns in test dataset')
for col in test_df.columns:
    temp_col = test_df[col].isnull().sum()
    print(f'{col}: {temp_col}')

Missing values per columns in train dataset
PassengerId: 0
Survived: 0
Pclass: 0
Name: 0
Sex: 0
Age: 3292
SibSp: 0
Parch: 0
Ticket: 4623
Fare: 134
Cabin: 67866
Embarked: 250

Missing values per columns in test dataset
PassengerId: 0
Pclass: 0
Name: 0
Sex: 0
Age: 3487
SibSp: 0
Parch: 0
Ticket: 5181
Fare: 133
Cabin: 70831
Embarked: 277

Filling missing values

#나이는 나이의 평균치로 채운다.
all_df['Age'] = all_df['Age'].fillna(all_df['Age'].mean())

#cabin은 문자열을 분할하고, 제일 첫번째 글자를 따와서 넣는다. 결측치엔 X를 넣는다.
#strip() : 양쪽 공백을 지운다. 여기서느 x[0]외엔 다 지우는듯. 
all_df['Cabin'] = all_df['Cabin'].fillna('X').map(lambda x: x[0].strip())


#print(all_df['Ticket'].head(10))
#Ticket, fillna with 'X', split string and take first split 
#split() : 문자열 나누기. 디폴트는 ' '이고, 문자를 가진 데이터들이 전부 띄워쓰기로 구분되어있기때문에 가능. 
all_df['Ticket'] = all_df['Ticket'].fillna('X').map(lambda x:str(x).split()[0] if len(str(x).split()) > 1 else 'X')

#pclass에 따른 Fare의 평균을 구해서 dictionary형태로 만든다. 
fare_map = all_df[['Fare', 'Pclass']].dropna().groupby('Pclass').median().to_dict()
#fare의 결측치에 본인 행의 pclass 값을 넣고, 그 값을 fare 평균에 맵핑시킨다.  
all_df['Fare'] = all_df['Fare'].fillna(all_df['Pclass'].map(fare_map['Fare']))
#유독 높은 가격이나 낮은 가격이 있기때문에, 이상치의 영향을 줄이기 위해서 Fare에 log를 취해준다.
all_df['Fare'] = np.log1p(all_df['Fare'])


#항구의 결측치를 X로 채운다. 
all_df['Embarked'] = all_df['Embarked'].fillna('X')

#이름은 성만 사용한다.
all_df['Name'] = all_df['Name'].map(lambda x: x.split(',')[0])

data_1=all_df.loc[all_df['Pclass']==1].groupby('Ticket')['Ticket'].count().sort_values(ascending=False)
print(data_1)
print()
data_2=all_df.loc[all_df['Pclass']==2].groupby('Ticket')['Ticket'].count().sort_values(ascending=False)
print(data_2)
print()
data_3=all_df.loc[all_df['Pclass']==3].groupby('Ticket')['Ticket'].count().sort_values(ascending=False)
print(data_3)
print()

Ticket
X             36336
PC            16814
C.A.            338
SC/Paris        334
SC/PARIS        260
W./C.           206
S.O.C.          192
S.C./PARIS      191
PP              186
F.C.            183
SC/AH           178
F.C.C.          167
STON/O          163
CA.             161
SOTON/O.Q.      123
A/4             115
A/5.            108
W.E.P.           94
WE/P             92
SOTON/OQ         87
STON/O2.         81
CA               81
A/5              70
C                67
A/4.             66
P/PP             66
SC               59
SOTON/O2         48
A./5.            46
S.O./P.P.        40
A.5.             33
AQ/4             27
A/S              23
SCO/W            19
S.P.             17
SC/A4            16
SW/PP            16
S.O.P.           15
SC/A.3           15
SO/C             14
S.C./A.4.        14
C.A./SOTON       14
A.               14
STON/OQ.         13
W/C              13
S.W./PP          11
LP               11
AQ/3.             8
Fa                7
A4.               6
Name: Ticket, dtype: int64

Ticket
X             31337
A.              997
C.A.            717
SC/PARIS        470
STON/O          387
PC              330
S.O.C.          313
PP              308
SC/AH           284
W./C.           259
SOTON/O.Q.      219
F.C.C.          203
A/5.            200
A/4             152
SC/Paris        135
S.C./PARIS      119
SOTON/O2        112
CA.             107
STON/O2.        106
C               104
F.C.            100
WE/P             92
SOTON/OQ         86
A/5              82
CA               66
W.E.P.           60
A./5.            60
S.O./P.P.        54
P/PP             50
A/4.             46
SCO/W            36
SC               33
A.5.             29
AQ/4             29
LP               25
SC/A.3           20
A/S              19
C.A./SOTON       19
SC/A4            17
Fa               15
S.C./A.4.        13
S.W./PP          13
SO/C             13
STON/OQ.         12
W/C              11
S.P.             10
S.O.P.            9
SW/PP             9
A4.               7
AQ/3.             6
Name: Ticket, dtype: int64

Ticket
X             84781
A.             6420
C.A.           2615
STON/O         1508
A/5.            918
SOTON/O.Q.      719
PP              679
SC/PARIS        642
W./C.           623
PC              595
F.C.C.          541
A/5             420
CA.             368
STON/O2.        363
SC/AH           331
A/4             268
SOTON/O2        264
S.O.C.          231
C               227
SC/Paris        177
S.O./P.P.       177
CA              172
SOTON/OQ        172
W.E.P.          154
F.C.            131
S.C./PARIS      127
A./5.           122
WE/P            121
SC              106
A/4.            104
SCO/W            74
A.5.             72
P/PP             68
SC/A4            67
AQ/4             56
LP               41
Fa               37
STON/OQ.         37
S.W./PP          32
SC/A.3           31
C.A./SOTON       31
SW/PP            30
SO/C             28
A/S              28
AQ/3.            26
S.P.             24
S.C./A.4.        23
S.O.P.           21
A4.              20
W/C              20
Name: Ticket, dtype: int64

인코딩

변수별로 인코딩을 다르게 해준다.

1
2
3

label_cols = ['Name', 'Ticket', 'Sex','Pclass','Embarked']
onehot_cols = [ 'Cabin',]
numerical_cols = [ 'Age', 'SibSp', 'Parch', 'Fare']

#라벨 인코딩 함수. c라는 매개변수를 받아서 맞게 트렌스폼 해준다. 
def label_encoder(c):
    le = LabelEncoder()
    return le.fit_transform(c)


#StandardScaler(): 평균을 제거하고 데이터를 단위 분산으로 조정한다. 
#그러나 이상치가 있다면 평균과 표준편차에 영향을 미쳐 변환된 데이터의 확산은 매우 달라지게 되는 함수
scaler = StandardScaler()

onehot_encoded_df = pd.get_dummies(all_df[onehot_cols])
label_encoded_df = all_df[label_cols].apply(label_encoder)
numerical_df = pd.DataFrame(scaler.fit_transform(all_df[numerical_cols]), columns=numerical_cols)
target_df = all_df[TARGET]

all_df = pd.concat([numerical_df, label_encoded_df,onehot_encoded_df, target_df], axis=1)
#all_df = pd.concat([numerical_df, label_encoded_df, target_df], axis=1)

모델링

1	drop_list=['Survived','Parch']

not pseudo

train = all_df.iloc[:100000, :]#0개~100000개
test = all_df.iloc[100000:, :] #100000개~ 
#iloc은 정수형 인덱싱
test = test.drop('Survived', axis=1) #test에서 종속변수를 드랍한다. 
model_results = pd.DataFrame()
folds = 5

1 2	y= train.loc[:,'Survived'] X= train.drop(drop_list,axis=1)

pseudo

1 2	# y=all_df.loc[:,'Survived'] # X=all_df.drop('Survived',axis=1)

1 2	X_train, X_valid, y_train, y_valid = train_test_split(X,y,test_size=0.25, random_state=21)

1
2
3

from sklearn import metrics  
from sklearn.metrics import accuracy_score
import numpy as np

params = {
    'metric': 'binary_logloss',
    'n_estimators': N_ESTIMATORS,
    'objective': 'binary',
    'random_state': SEED,
    'learning_rate': 0.01,
    'min_child_samples': 150,
    'reg_alpha': 3e-5,
    'reg_lambda': 9e-2,
    'num_leaves': 20,
    'max_depth': 16,
    'colsample_bytree': 0.8,
    'subsample': 0.8,
    'subsample_freq': 2,
    'max_bin': 240,
}

lgbm_model=lgb.LGBMClassifier(**params)
lgbm_model.fit(X_train,y_train)
lgbm_pred=lgbm_model.predict(X_valid)

lgbm_R2=metrics.accuracy_score(y_valid,lgbm_pred)
#lgbm_rmse = np.sqrt(mean_squared_error(lgbm_pred,y_valid))
print('R2 : ',lgbm_R2)
#print("RMSE : ", lgbm_rmse)

R2 :  0.78076

1 2	print(len(X_train.columns)) print(X_train.columns)

17
Index(['Age', 'SibSp', 'Fare', 'Name', 'Ticket', 'Sex', 'Pclass', 'Embarked',
       'Cabin_A', 'Cabin_B', 'Cabin_C', 'Cabin_D', 'Cabin_E', 'Cabin_F',
       'Cabin_G', 'Cabin_T', 'Cabin_X'],
      dtype='object')

def cal_adjust_r2(r2):
    n=80000
    k= len(X_train.columns)
    temp=(1-r2)*(n-1)
    temp2=n-k-1
    ad_r2=1-(temp/temp2)
    return ad_r2

1 2	ad_r2_lgbm=cal_adjust_r2(lgbm_R2) print(ad_r2_lgbm)

0.7807134010152285

1
2
3

#NOT Pseudo
train_kf_feature=train.drop(drop_list,axis=1)
train_kf_label=train.loc[:,'Survived']

1
2
3

#Pseudo
# train_kf_feature=all_df.drop(drop_list,axis=1)
# train_kf_label=all_df.loc[:,'Survived']

1	lgbm_temp = lgbm_model.booster_

n_iter=0
kfold=StratifiedKFold(n_splits=5)
cv_accuracy=[]
feature_importances = pd.DataFrame()

for train_idx, test_idx in kfold.split(train_kf_feature,train_kf_label):

    X_train=train_kf_feature.iloc[train_idx]
    X_test=train_kf_feature.iloc[test_idx]
    y_train,y_test=train_kf_label.iloc[train_idx],train_kf_label.iloc[test_idx]
    #학습 진행
    lgbm_model.fit(X_train,y_train)
    #예측
    fold_pred=lgbm_model.predict(X_test)
    
    #정확도
    n_iter+=1
    fold_accuracy=metrics.accuracy_score(y_test,fold_pred)
    print("\n {}번째  교차 검증 정확도 : {} , 학습 데이터 크기:{}, 검증 데이터 크기 :{} ".
          format(n_iter,fold_accuracy,X_train.shape[0],X_test.shape[0]))
    cv_accuracy.append(fold_accuracy)
    
    #중요도 
    fi_tmp = pd.DataFrame()
    fi_tmp["feature"] = lgbm_temp.feature_name()
    fi_tmp["importance"] = lgbm_model.feature_importances_
    feature_importances = feature_importances.append(fi_tmp)

print('\n 평균 검증 정확도 : ',np.mean(cv_accuracy))

 1번째  교차 검증 정확도 : 0.78015 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 

 2번째  교차 검증 정확도 : 0.7824 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 

 3번째  교차 검증 정확도 : 0.78185 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 

 4번째  교차 검증 정확도 : 0.7816 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 

 5번째  교차 검증 정확도 : 0.7809 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 

 평균 검증 정확도 :  0.78138

order = list(feature_importances.groupby("feature").
             mean().sort_values("importance", ascending=False).index)
plt.figure(figsize=(10, 10))
sns.barplot(x="importance", y="feature", data=feature_importances, order=order)
plt.title("{} importance".format("LGBMRegressor"))
plt.tight_layout()

output_46_0

CATBoost

params_cat = {
    'bootstrap_type': 'Poisson',
    'loss_function': 'Logloss',
    'eval_metric': 'Logloss',
    'random_seed': SEED,
    'task_type': 'GPU',
    'max_depth': 8,
    'learning_rate': 0.01,
    'n_estimators': N_ESTIMATORS,
    'max_bin': 280,
    'min_data_in_leaf': 64,
    'l2_leaf_reg': 0.01,
    'subsample': 0.8
}

1
2
3

#새로운 트레인 valid 셋
X_train, X_valid, y_train, y_valid = train_test_split(X,y,test_size=0.25, random_state=21)


cat_model=ctb.CatBoostClassifier(**params_cat)
cat_model.fit(X_train, y_train,verbose=300)
cat_pred=cat_model.predict(X_valid)
print("\n정확도: ", metrics.accuracy_score(y_valid, cat_pred))
cat_R2=metrics.accuracy_score(y_valid,cat_pred)
#lgbm_rmse = np.sqrt(mean_squared_error(lgbm_pred,y_valid))
print('R2 : ',cat_R2)

0:    learn: 0.6881875    total: 18ms    remaining: 18s
300:    learn: 0.4671082    total: 3.33s    remaining: 7.73s
600:    learn: 0.4580212    total: 6.44s    remaining: 4.28s
900:    learn: 0.4512272    total: 9.49s    remaining: 1.04s
999:    learn: 0.4491741    total: 10.5s    remaining: 0us

정확도:  0.78044
R2 :  0.78044

cv_accuracy=[]
feature_importances = pd.DataFrame()

for train_idx, test_idx in kfold.split(train_kf_feature,train_kf_label):

    X_train=train_kf_feature.iloc[train_idx]
    X_test=train_kf_feature.iloc[test_idx]
    y_train,y_test=train_kf_label.iloc[train_idx],train_kf_label.iloc[test_idx]
    #학습 진행
    cat_model.fit(X_train,y_train,verbose=500)
    #예측
    fold_pred=cat_model.predict(X_test)
    
    #정확도
    n_iter+=1
    fold_accuracy=metrics.accuracy_score(y_test,fold_pred)
    print("\n {}번째  교차 검증 정확도 : {} , 학습 데이터 크기:{}, 검증 데이터 크기 :{} ".
          format(n_iter,fold_accuracy,X_train.shape[0],X_test.shape[0]))
    cv_accuracy.append(fold_accuracy)
    
    #중요도 . lgbm이랑 명령어가 다르다.
    fi_tmp = pd.DataFrame()
    fi_tmp["feature"] = X_test.columns.to_list()
    fi_tmp["importance"] = cat_model.get_feature_importance()
    feature_importances = feature_importances.append(fi_tmp)

print('\n 평균 검증 정확도 : ',np.mean(cv_accuracy))

0:    learn: 0.6881430    total: 11.2ms    remaining: 11.2s
500:    learn: 0.4620724    total: 5.17s    remaining: 5.15s
999:    learn: 0.4513527    total: 10.2s    remaining: 0us

 6번째  교차 검증 정확도 : 0.77945 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 
0:    learn: 0.6881914    total: 12.4ms    remaining: 12.3s
500:    learn: 0.4635447    total: 5.02s    remaining: 5s
999:    learn: 0.4529141    total: 10.2s    remaining: 0us

 7번째  교차 검증 정확도 : 0.78335 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 
0:    learn: 0.6881970    total: 13.6ms    remaining: 13.6s
500:    learn: 0.4635994    total: 5.2s    remaining: 5.18s
999:    learn: 0.4529137    total: 10.3s    remaining: 0us

 8번째  교차 검증 정확도 : 0.78265 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 
0:    learn: 0.6882583    total: 11.2ms    remaining: 11.2s
500:    learn: 0.4622575    total: 5.08s    remaining: 5.06s
999:    learn: 0.4513804    total: 10.1s    remaining: 0us

 9번째  교차 검증 정확도 : 0.7821 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 
0:    learn: 0.6882789    total: 15.4ms    remaining: 15.3s
500:    learn: 0.4630108    total: 5.1s    remaining: 5.08s
999:    learn: 0.4522854    total: 10.1s    remaining: 0us

 10번째  교차 검증 정확도 : 0.7802 , 학습 데이터 크기:80000, 검증 데이터 크기 :20000 

 평균 검증 정확도 :  0.78155

# just to get ideas to improve
order = list(feature_importances.groupby("feature").mean().sort_values("importance", ascending=False).index)
plt.figure(figsize=(10, 10))
sns.barplot(x="importance", y="feature", data=feature_importances, order=order)
plt.title("{} importance".format("CatBoostClassifier"))
plt.tight_layout()

output_52_0

Submission

def create_submission(model, test, test_passenger_id, model_name):
    y_pred_test = model.predict_proba(test)[:, 1]
    submission = pd.DataFrame(
        {
            'PassengerId': test_passenger_id, 
            'Survived': (y_pred_test >= 0.5).astype(int),
        }
    )
    submission.to_csv(f"submission_{model_name}.csv", index=False)
    
    return y_pred_test

1	test_df.head()

	PassengerId	Pclass	Name	Sex	Age	Parch	Ticket	Fare	Cabin	Embarked
0	100000	3	Holliday, Daniel	male	19.0	0	24745	63.01	NaN	S
1	100001	3	Nguyen, Lorraine	female	53.0	0	13264	5.81	NaN	S
2	100002	1	Harris, Heather	female	19.0	0	25990	38.91	B15315	C
3	100003	2	Larsen, Eric	male	25.0	0	314011	12.93	NaN	S
4	100004	1	Cleary, Sarah	female	17.0	2	26203	26.89	B22515	C

#X_test=test.drop('Pclass',axis=1)
test = all_df.iloc[100000:, :] #100000개~ 
X_test=test.drop(drop_list,axis=1)
X_test.head()

	Age	SibSp	Fare	Name	Ticket	Sex	Pclass	Embarked	Cabin_B	Cabin_X
100000	-0.937422	-0.539572	0.949786	10830	49	1	2	2	0	1
100001	1.123570	-0.539572	-1.273379	17134	49	0	2	2	0	1
100002	-0.937422	-0.539572	0.481059	9978	49	0	0	0	1	0
100003	-0.573717	-0.539572	-0.563310	13303	49	1	1	2	0	1
100004	-1.058657	-0.539572	0.125497	4406	49	0	0	0	1	0

test_pred_lightgbm = create_submission(
    lgbm_model, X_test, test_df["PassengerId"], "lightgbm"
)
test_pred_catboost = create_submission(
    cat_model, X_test, test_df["PassengerId"], "catboost"
)

test_pred_merged = (

    test_pred_lightgbm + 
    test_pred_catboost 
)
test_pred_merged = np.round(test_pred_merged / 2)

submission = pd.DataFrame(
    {
        'PassengerId': test_df["PassengerId"], 
        'Survived': test_pred_merged.astype(int),
    }
)
submission.to_csv(f"submission_merged3.csv", index=False)

score

kaggle public score : 0.80354

Posted 2021-04-16Updated 2021-04-164 minutes read (About 641 words)

210416발표

발표 및 정리

선생님 피드백 :

가독성, 코드도 중요하지만, 코드를 돌려 나온 결과들을 표로 정리해서 성능들을 비교하는게 좋다.
결론이 필요하다.
디테일한 내용이 필요. 한계는 뭔지
깔끔함 필요.

1. 가독성 보완을 위해 표를 작성

청자에게 전달을 잘 하기 위해서 뿐만이 아니라, 결과 발표를 잘 하기 위해서는 결과를 표로 정리하는 것이 필요하다고 느꼈다.

2. 결론 필요성

대체적인 결과에대한 분석은 있었지만, 나의 결론이 부족하다는 것을 깨달았다.
그래서 내린 결론은, 범주형 데이터를 할때에는 시간이 오래걸려도 CatBoost를 사용하는게 가장 좋을 것이라 생각하고,
다른 수치형 데이터들을 다룰때에는 속도 측면에서 압도적인 lightBGM이 좋을 것이라 느꼈다.
그리고 하이퍼 파라미터 튜닝을 할 때에는 하이퍼 파라미터가 다양하고 다이나믹하게 적용되는 XGBoost를 사용하는것이 좋을것이라는 생각이 들었다.

또한 Grid 서치와 Random 서치는 데이터마다 어떤 방식을 사용해야할지 다르게 판단해야한다고 생각했다.

3. 디테일한 내용과 한계

디테일한 내용은 실습파일에다가 추가 보완 기재를 하였고,
우리가 실습한 것의 한계는 우리들의 지식 수준과 Feature Engineering을 주체적으로 하지 못한것에 있다고 생각한다.
또한 하이퍼파라미터를 다양하게 구사하기에는 속도가 너무 오래걸릴 뿐더러, 경험적 지식이 부족하여 튜닝을 많이 해보아야 할 것같다는 생각이 들었다.

4. 깔끔함.

깔끔해 보이는것은 시각화와 연관되어있다고 생각한다. 1번에서 다루었듯, 표를 만들어 깔끔하게 정리를 했다면 발표가 좀 더 체계적으로 보였을 것이라 생각한다.

결론

강사님이 말씀하신 것을 피드백하여 다음부턴 시각화에 조금더 신경쓰고, 퀄리티 있는 자료를 만들어야겠다는 생각을 했다.
시각화가 된 자료로 잘 정제된 결론도 도출할 것이다.

Posted 2021-04-15Updated 2021-04-16 이현정 6 minutes read (About 952 words)

210415study

Feature Engineering에 필요한 지식

코드

pd.concat() 데이터 프레임 두개를 붙여 새로운 DataFrame을 만듬
().fit_transform(데이터[‘열’]) : 데이터 열을 레이블인코딩해준다.

라벨인코더 (LabelEncoder)
문자를 숫자(수치화), 숫자를 문자로 매핑
문자를 0부터 시작하는 정수형 숫자로 바꿔주는 기능을 제공한다.
sklearn.model_selection.StratifiedKFold
K-FOLD 교차검증을 위한 함수
Parameters
- n_splits : default=5
- shuffle : bool, default=False
- random_state: (int) default=None
  
  각 클래스에대한 폴드의 랜덤성을 제어하는 인덱스들에 순서를 부여하고 그 순서의 영향을 미치게 한다.

텍스트 마이닝 (text Mining)

텍스트마이닝은 자연어처리 기술에 기반한다. 인간 언어 중 문자로 표현된 언어를 컴퓨터로 분석 처리하고 그 구조와 의미를 이해하고자하는 기술이 바로 자연언어처리기술이다.
텍스트 마이닝은 한마디로 비정형 텍스트 데이터에서 가치와 의미가 있는 정보를 찾아내는(mining) 기술이라고 할 수 있다. 사용자는 텍스트 마이닝 기술을 통해 방대한 정보 뭉치에서 의미있는 정보를 추출해내고, 다른 정보와의 연계성을 파악하며, 텍스트가 가진 카테고리를 찾아내는 등, 단순한 정보 검색 그 이상의 결과를 얻어 낼 수 있다. 컴퓨터가 인간이 사용하는 언어로 기술된 정보를 깊이 분석하고 그 안에 숨겨진 정보를 발굴해내기 위해서는 대용량 언어 자원과 복잡한 통계적, 규칙적 알고리즘이 적용되어야만 한다.

교차 검증

Train set과 test set으로 평가를 하고 반복적으로 모델을 튜닝하다보면 test set에만 과적합되는 결과가 생긴다. 내가 만든 모델이 test set에만 잘 동작하는 모델이 되는 것이다. 이 문제를 해결하고자 교차검증이 필요하다.
과적합의 원인은 test set이 데이터 중 일부분으로 고정되어있고, 이 일부분의 데이터셋에 대해 성능이 잘 나오도록 파라미터를 반복적으로 튜닝하기 때문에 발생한다. 교차검증은 데이터의 모든 부분을 사용하여 모델을 검증하고 test set을 하나로 고정하지않는다.

전체 데이터셋을 k개의 subset으로 나누고 k번의 평가를 실행하는데, 이때 test set을 중복없이 바꾸어가면서 평가를 진행한다.

K-Fold 교차검증

정의 : K개의 fold를 만들어서 진행하는 교차 검증
사용 이유

총 데이터 개수가 적은 데이터 셋에대해 정확도 향상
Train, valid, test 세개의 데이터셋으로 분류하는것보다 train과 test로만 분류할 때 학습 데이터 셋이 더 많기 때문

과정

기존 과정과 같이 training set와 test set을 나눈다
Training을 k개의 fold로 나눈다(위의 그림은 5개로 나눔
한 개의 fold에 있는 데이터를 다시 k개로 쪼갠 다음, k-1개는 training data, 마지막 한 개한 validation Data set으로 지정한다.
모델을 생성하고 예측을 진행하여 이에대한 에러값을 추출한다.
다음 Fold에서는 Vaildation set을 바꿔서 지정하고, 이전 Fold에서 Validation역할을 했던 Set은 다시 training set으로 활용한다.
이를 K번 반복한다.

Posted 2021-04-15Updated 2021-04-152 hours read (About 13601 words)

Titanic data

결정 트리, 랜덤포레스트, XGBoost, lightBGM, CATBoost 비교

전처리

1	!pip install catboost

Requirement already satisfied: catboost in /usr/local/lib/python3.7/dist-packages (0.25.1)
Requirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages (from catboost) (4.4.1)
Requirement already satisfied: graphviz in /usr/local/lib/python3.7/dist-packages (from catboost) (0.10.1)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from catboost) (1.15.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from catboost) (3.2.2)
Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.7/dist-packages (from catboost) (1.1.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from catboost) (1.4.1)
Requirement already satisfied: numpy>=1.16.0 in /usr/local/lib/python3.7/dist-packages (from catboost) (1.19.5)
Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.7/dist-packages (from plotly->catboost) (1.3.3)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (1.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (0.10.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->catboost) (2018.9)

import os
import random

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost as xgb
import lightgbm as lgbm
import catboost as cb
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier

from sklearn import metrics    
from sklearn.model_selection import RandomizedSearchCV

def set_seed(seed_value):
    random.seed(seed_value)
    np.random.seed(seed_value)
    os.environ["PYTHONHASHSEED"] = str(seed_value)
    

SEED = 42
set_seed(SEED)

train_df = pd.read_csv('/content/sample_data/titanic_train.csv')
test_df = pd.read_csv('/content/sample_data/titanic_test.csv')
print(f"Train shape: {train_df.shape}")
train_df.sample(3)

Train shape: (891, 12)

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
709	710	1	3	Moubarek, Master. Halim Gonios ("William George")	male	NaN	1	1	2661	15.2458	NaN	C
439	440	0	2	Kvillner, Mr. Johan Henrik Johannesson	male	31.0	0	0	C.A. 18723	10.5000	NaN	S
840	841	0	3	Alhomaki, Mr. Ilmari Rudolf	male	20.0	0	0	SOTON/O2 3101287	7.9250	NaN	S

1 2	print(f"Test shape: {test_df.shape}") test_df.sample(3)

Test shape: (418, 11)

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
20	912	1	Rothschild, Mr. Martin	male	55.00	1	0	PC 17603	59.40	NaN	C
338	1230	2	Denbury, Mr. Herbert	male	25.00	0	0	C.A. 31029	31.50	NaN	S
250	1142	2	West, Miss. Barbara J	female	0.92	1	2	C.A. 34651	27.75	NaN	S

full_df = pd.concat(
    [
        train_df.drop(["PassengerId", "Survived"], axis=1), 
        test_df.drop(["PassengerId"], axis=1),
    ]
)
y_train = train_df["Survived"].values

1	full_df.isna().sum()

Pclass         0
Name           0
Sex            0
Age          263
SibSp          0
Parch          0
Ticket         0
Fare           1
Cabin       1014
Embarked       2
dtype: int64

1	full_df = full_df.drop(["Age", "Cabin"], axis=1)

plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.hist(full_df["Fare"], bins=20)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.title("Fare distribution", fontsize=16)

plt.subplot(1, 2, 2)
embarked_info = full_df["Embarked"].value_counts()
plt.bar(embarked_info.index, embarked_info.values)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.title("Embarked distribution", fontsize=16);

png

1 2	full_df["Embarked"].fillna("S", inplace=True) full_df["Fare"].fillna(full_df["Fare"].mean(), inplace=True)

full_df["Title"] = full_df["Name"].str.extract(" ([A-Za-z]+)\.")
full_df["Title"] = full_df["Title"].replace(["Ms", "Mlle"], "Miss")
full_df["Title"] = full_df["Title"].replace(["Mme", "Countess", "Lady", "Dona"], "Mrs")
full_df["Title"] = full_df["Title"].replace(["Dr", "Major", "Col", "Sir", "Rev", "Jonkheer", "Capt", "Don"], "Mr")
full_df = full_df.drop(["Name"], axis=1)

1
2
3

full_df["Sex"] = full_df["Sex"].map({"male": 1, "female": 0}).astype(int)    
full_df["Embarked"] = full_df["Embarked"].map({"S": 1, "C": 2, "Q": 3}).astype(int)    
full_df['Title'] = full_df['Title'].map({"Mr": 0, "Miss": 1, "Mrs": 2, "Master": 3}).astype(int)

full_df["TicketNumber"] = full_df["Ticket"].str.split()
full_df["TicketNumber"] = full_df["TicketNumber"].str[-1]
full_df["TicketNumber"] = LabelEncoder().fit_transform(full_df["TicketNumber"])
full_df = full_df.drop(["Ticket"], axis=1)

1 2	full_df["FamilySize"] = full_df["SibSp"] + full_df["Parch"] + 1 full_df["IsAlone"] = full_df["FamilySize"].apply(lambda x: 1 if x == 1 else 0)

1	full_df.head()

	Pclass	Sex	SibSp	Fare	Embarked	Title	TicketNumber	FamilySize	IsAlone
0	3	1	1	7.2500	1	0	209	2	0
1	1	0	1	71.2833	2	2	166	2	0
2	3	0	0	7.9250	1	1	466	1	1
3	1	0	1	53.1000	1	2	67	2	0
4	3	1	0	8.0500	1	0	832	1	1

X_train = full_df[:y_train.shape[0]]
X_test = full_df[y_train.shape[0]:]

print(f"Train X shape: {X_train.shape}")
print(f"Train y shape: {y_train.shape}")
print(f"Test X shape: {X_test.shape}")

Train X shape: (891, 10)
Train y shape: (891,)
Test X shape: (418, 10)

one_hot_cols = ["Embarked", "Title"]
for col in one_hot_cols:
    full_df = pd.concat(
        [full_df, pd.get_dummies(full_df[col], prefix=col)], 
        axis=1, 
        join="inner",
    )
full_df = full_df.drop(one_hot_cols, axis=1)

1 2	scaler = StandardScaler() full_df.loc[:] = scaler.fit_transform(full_df)

1	print(full_df)

       Pclass       Sex     SibSp  ...   Title_1   Title_2   Title_3
0    0.841916  0.743497  0.481288  ... -0.502625 -0.425920 -0.221084
1   -1.546098 -1.344995  0.481288  ... -0.502625  2.347858 -0.221084
2    0.841916 -1.344995 -0.479087  ...  1.989556 -0.425920 -0.221084
3   -1.546098 -1.344995  0.481288  ... -0.502625  2.347858 -0.221084
4    0.841916  0.743497 -0.479087  ... -0.502625 -0.425920 -0.221084
..        ...       ...       ...  ...       ...       ...       ...
413  0.841916  0.743497 -0.479087  ... -0.502625 -0.425920 -0.221084
414 -1.546098 -1.344995 -0.479087  ... -0.502625  2.347858 -0.221084
415  0.841916  0.743497 -0.479087  ... -0.502625 -0.425920 -0.221084
416  0.841916  0.743497 -0.479087  ... -0.502625 -0.425920 -0.221084
417  0.841916  0.743497  0.481288  ... -0.502625 -0.425920  4.523164

[1309 rows x 15 columns]

X_train_norm = full_df[:y_train.shape[0]]
X_test_norm = full_df[y_train.shape[0]:]

print(f"Train norm X shape: {X_train_norm.shape}")
print(f"Train y shape: {y_train.shape}")
print(f"Test norm X shape: {X_test_norm.shape}")

Train norm X shape: (891, 15)
Train y shape: (891,)
Test norm X shape: (418, 15)

1	categorical_columns = ['Sex', 'Embarked', 'Title', 'TicketNumber', 'IsAlone']

1	cross_valid_scores = {}

1	X1_train, X1_test, y1_train, y1_test = train_test_split(X_train, y_train, test_size=0.3)

결정트리 생성

GridSearch

%%time
parameters = {
    "max_depth": [3, 5, 7, 9, 11, 13],
}

model_desicion_tree = DecisionTreeClassifier(
    random_state=SEED,
    class_weight='balanced',
)

model_desicion_tree = GridSearchCV(
    model_desicion_tree, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_desicion_tree.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_desicion_tree.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + \
    f'{model_desicion_tree.best_score_:.3f}'
)
cross_valid_scores['desicion_tree'] = model_desicion_tree.best_score_
print('-----')

-----
Best parameters {'max_depth': 11}
Mean cross-validated accuracy score of the best_estimator: 0.817
-----
CPU times: user 191 ms, sys: 4.34 ms, total: 196 ms
Wall time: 205 ms

랜덤 서치

%%time

params = {
    "max_depth":[3, 5, 7, 9, 11, 13],
}

model_desicion_tree_rs = DecisionTreeClassifier(
    random_state=SEED,
    class_weight='balanced',
)

model_desicion_tree_rs = RandomizedSearchCV(model_desicion_tree_rs,params,cv=5,n_iter=50,random_state=0,scoring="accuracy")

model_desicion_tree_rs.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_desicion_tree_rs.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + \
    f'{model_desicion_tree_rs.best_score_:.3f}'
)
cross_valid_scores['desicion_tree'] = model_desicion_tree_rs.best_score_
print('-----')

-----
Best parameters {'max_depth': 11}
Mean cross-validated accuracy score of the best_estimator: 0.817
-----
CPU times: user 166 ms, sys: 680 µs, total: 167 ms
Wall time: 168 ms


/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_search.py:281: UserWarning: The total space of parameters 6 is smaller than n_iter=50. Running 6 iterations. For exhaustive searches, use GridSearchCV.
  % (grid_size, self.n_iter, grid_size), UserWarning)

하이퍼파라미터 튜닝 전

%%time
model_dtree1=DecisionTreeClassifier(max_depth=5)
model_dtree1.fit(X_train, y_train)
y_pred_dtree1=model_dtree1.predict(X1_test)
print("\n정확도: ", metrics.accuracy_score(y1_test, y_pred_dtree1))
print("-----")

정확도:  0.8582089552238806
-----
CPU times: user 9.36 ms, sys: 46 µs, total: 9.41 ms
Wall time: 9.77 ms

하이퍼 파라미터 튜닝 후

%%time
model_dtree2=DecisionTreeClassifier(max_depth=11)
model_dtree2.fit(X_train, y_train)
y_pred_dtree2=model_dtree2.predict(X1_test)
print("\n정확도: ", metrics.accuracy_score(y1_test, y_pred_dtree2))
print("-----")

정확도:  0.9850746268656716
-----
CPU times: user 9.64 ms, sys: 91 µs, total: 9.73 ms
Wall time: 13.7 ms

랜덤 포레스트

그리드서치

%%time
parameters = {
    "n_estimators": [5, 10, 15, 20, 25], 
    "max_depth": [3, 5, 7, 9, 11, 13],
}

model_random_forest = RandomForestClassifier(
    random_state=SEED,
    class_weight='balanced',
)

model_random_forest = GridSearchCV(
    model_random_forest, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_random_forest.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_random_forest.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: '+ \
    f'{model_random_forest.best_score_:.3f}'
)
cross_valid_scores['random_forest'] = model_random_forest.best_score_
print('-----')

-----
Best parameters {'max_depth': 11, 'n_estimators': 25}
Mean cross-validated accuracy score of the best_estimator: 0.844
-----
CPU times: user 4.93 s, sys: 30.4 ms, total: 4.96 s
Wall time: 4.98 s

랜덤 서치

%%time

parameters = {
    "n_estimators": [5, 10, 15, 20, 25], 
    "max_depth": [3, 5, 7, 9, 11, 13],
}
model2_random_forest_rs = RandomForestClassifier(
    random_state=SEED,
    class_weight='balanced',
)

model2_random_forest_rs = RandomizedSearchCV(model2_random_forest_rs,parameters,cv=5,n_iter=50,random_state=0,scoring="accuracy")
model2_random_forest_rs.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model2_random_forest_rs.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: '+ \
    f'{model2_random_forest_rs.best_score_:.3f}'
)
cross_valid_scores['random_forest'] = model2_random_forest_rs.best_score_
print('-----')

/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_search.py:281: UserWarning: The total space of parameters 30 is smaller than n_iter=50. Running 30 iterations. For exhaustive searches, use GridSearchCV.
  % (grid_size, self.n_iter, grid_size), UserWarning)


-----
Best parameters {'n_estimators': 25, 'max_depth': 11}
Mean cross-validated accuracy score of the best_estimator: 0.844
-----
CPU times: user 4.88 s, sys: 35.5 ms, total: 4.92 s
Wall time: 4.92 s

###파라미터 튜닝을 하지않은 randomForest

%%time
model_rf1=RandomForestClassifier(max_depth=5)  #default 값으로 넣으면 과적합 문제때문에 max_depth를 임의로 5로 설정했다.
model_rf1.fit(X_train,y_train)

y_pred_rf1=model_rf1.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, y_pred_rf1))
print("-----")

정확도 : 0.8544776119402985
-----
CPU times: user 187 ms, sys: 1.96 ms, total: 189 ms
Wall time: 190 ms

%%time
model_rf2=RandomForestClassifier(n_estimators= 25, max_depth= 11)
model_rf2.fit(X_train,y_train)

y_pred_rf2=model_rf2.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, y_pred_rf2))
print("-----")

정확도 : 0.9589552238805971
-----
CPU times: user 62.9 ms, sys: 2.11 ms, total: 65 ms
Wall time: 65.6 ms

1	print(model_rf1)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=5, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

XGBOOST

gridSearch

%%time
parameters = {
    'max_depth': [3, 5, 7, 9], 
    'n_estimators': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1]
}

model_xgb = xgb.XGBClassifier(
    random_state=SEED,
)

model_xgb = GridSearchCV(
    model_xgb, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_xgb.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_xgb.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_xgb.best_score_:.3f}'
)
cross_valid_scores['xgboost'] = model_xgb.best_score_
print('-----')

-----
Best parameters {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100}
Mean cross-validated accuracy score of the best_estimator: 0.846
-----
CPU times: user 13.8 s, sys: 189 ms, total: 14 s
Wall time: 14.1 s

xgboost에서 하이퍼파라미터튜닝을 위해 GridSearch를 진행.
time : 14.3 s
Best parameters {‘n_estimators’: 100, ‘max_depth’: 7, ‘learning_rate’: 0.1}

랜덤 서치

%%time
params = {
    'max_depth': [3, 5, 7, 9], 
    'n_estimators': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1]
}
model_xgb_random = xgb.XGBClassifier(
    random_state=SEED,
)
model_xgb_random =RandomizedSearchCV(model_xgb_random ,params,cv=5,n_iter=50,random_state=0,scoring="accuracy")

model_xgb_random .fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_xgb_random .best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_xgb_random .best_score_:.3f}'
)
cross_valid_scores['xgboost'] = model_xgb_random .best_score_
print('-----')

-----
Best parameters {'n_estimators': 100, 'max_depth': 7, 'learning_rate': 0.1}
Mean cross-validated accuracy score of the best_estimator: 0.846
-----
CPU times: user 9.31 s, sys: 113 ms, total: 9.42 s
Wall time: 9.41 s

xgboost에서 두 서치의 성능을 보기위해 똑같은 환경에서 RandomSearch를 진행.
time : 9.46 s
Best parameters {‘n_estimators’: 100, ‘max_depth’: 7, ‘learning_rate’: 0.1}

파라미터튜닝을 하지않은 xgboost

%%time
model_1=xgb.XGBClassifier()
model_1.fit(X_train,y_train)
pred_y1=model_1.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, pred_y1))
print("-----")

정확도 : 0.8843283582089553
-----
CPU times: user 63.7 ms, sys: 993 µs, total: 64.7 ms
Wall time: 63.4 ms

하이퍼 파라미터 적용

%%time
model_2=xgb.XGBClassifier(learning_rate= 0.1, max_depth= 7, n_estimators=100)
model_2.fit(X_train,y_train)

pred_y2=model_2.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, pred_y2))
print("-----")

정확도 : 0.9514925373134329
-----
CPU times: user 118 ms, sys: 2.04 ms, total: 120 ms
Wall time: 119 ms

그리드 서치보다 랜덤 서치의 속도가 더 빠른 것을 알 수있다.
또한 하이퍼 파라미터를 튜닝 한 후의 정확도가 훨씬 올라갔음을 알 수 있다.

lightBGM

GridSearch

%%time
parameters = {
    'n_estimators': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1],
    'num_leaves': [7, 15,  31],
}

model_lgbm = lgbm.LGBMClassifier(
    random_state=SEED,
    class_weight='balanced',
)

model_lgbm = GridSearchCV(
    model_lgbm, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_lgbm.fit(
    X_train, 
    y_train, 
    categorical_feature=categorical_columns
)

print('-----')
print(f'Best parameters {model_lgbm.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_lgbm.best_score_:.3f}'
)
cross_valid_scores['lightgbm'] = model_lgbm.best_score_
print('-----')

/usr/local/lib/python3.7/dist-packages/lightgbm/basic.py:1209: UserWarning: categorical_feature in Dataset is overridden.
New categorical_feature is ['Embarked', 'IsAlone', 'Sex', 'TicketNumber', 'Title']
  'New categorical_feature is {}'.format(sorted(list(categorical_feature))))


-----
Best parameters {'learning_rate': 0.1, 'n_estimators': 25, 'num_leaves': 15}
Mean cross-validated accuracy score of the best_estimator: 0.827
-----
CPU times: user 5.83 s, sys: 346 ms, total: 6.18 s
Wall time: 6.2 s

랜덤 서치

%%time
params = {
    'n_estimators': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1],
    'num_leaves': [7, 15, 31],
}
model_lgbm_random = lgbm.LGBMClassifier(
    random_state=SEED,
)
model_lgbm_random =RandomizedSearchCV(model_lgbm_random ,params,cv=5,n_iter=50,random_state=0,scoring="accuracy")

model_lgbm_random .fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_lgbm_random .best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_lgbm_random .best_score_:.3f}'
)
cross_valid_scores['LightGBM'] = model_lgbm_random .best_score_
print('-----')

-----
Best parameters {'num_leaves': 31, 'n_estimators': 100, 'learning_rate': 0.05}
Mean cross-validated accuracy score of the best_estimator: 0.846
-----
CPU times: user 4.66 s, sys: 210 ms, total: 4.87 s
Wall time: 4.87 s

파라미터튜닝을 하지않은 LightGBM

%%time
model_1=lgbm.LGBMClassifier()
model_1.fit(X_train,y_train)
pred_y1=model_1.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, pred_y1))
print('-----')

정확도 : 0.9514925373134329
-----
CPU times: user 69.5 ms, sys: 3.96 ms, total: 73.5 ms
Wall time: 76.1 ms

하이퍼 파라미터 적용

%%time

model_2=lgbm.LGBMClassifier(num_leaves= 15,n_estimators=25, learning_rate= 0.1)
model_2.fit(X_train,y_train)
pred_y2=model_2.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, pred_y2))
print('-----')

정확도 : 0.8582089552238806
-----
CPU times: user 20.3 ms, sys: 1.01 ms, total: 21.4 ms
Wall time: 22.4 ms

하이퍼 파라미터 적용2

%%time

model_2=lgbm.LGBMClassifier(num_leaves= 31,n_estimators=100, learning_rate= 0.05)
model_2.fit(X_train,y_train)
pred_y2=model_2.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, pred_y2))

정확도 : 0.914179104477612
CPU times: user 66.4 ms, sys: 6.96 ms, total: 73.3 ms
Wall time: 74.2 ms

Catboost

그리드서치

%%time
parameters = {
    'iterations': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1],
    'depth': [3, 5, 7, 9, 11, 13],
}

model_catboost = cb.CatBoostClassifier(
    verbose=False,
)

model_catboost = GridSearchCV(
    model_catboost, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_catboost.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_catboost.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_catboost.best_score_:.3f}'
)
cross_valid_scores['catboost'] = model_catboost.best_score_
print('-----')

-----
Best parameters {'depth': 13, 'iterations': 100, 'learning_rate': 0.1}
Mean cross-validated accuracy score of the best_estimator: 0.838
-----
CPU times: user 3min 49s, sys: 6.43 s, total: 3min 55s
Wall time: 2min 27s

랜덤 서치

%%time

model_catboost_rs = cb.CatBoostClassifier(
    verbose=False,
)


model_catboost_rs=RandomizedSearchCV(model_catboost_rs ,params,cv=5,n_iter=50,random_state=0,scoring="accuracy")
model_catboost_rs.fit(X_train,y_train)


print('-----')
print(f'Best parameters {model_catboost_rs.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_catboost_rs.best_score_:.3f}'
)
cross_valid_scores['catboost'] = model_catboost_rs.best_score_
print('-----')

/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
_catboost.CatBoostError: catboost/private/libs/options/catboost_options.cpp:893: max_leaves option works only with lossguide tree growing

  FitFailedWarning)


-----
Best parameters {'num_leaves': 31, 'n_estimators': 100, 'learning_rate': 0.01}
Mean cross-validated accuracy score of the best_estimator: 0.832
-----
CPU times: user 6.45 s, sys: 702 ms, total: 7.15 s
Wall time: 7.18 s

하이퍼 파라미터 튜닝 전

1	from catboost import Pool, CatBoostClassifier, cv

%%time
model_cb1=cb.CatBoostClassifier()
model_cb1.fit(X_train, y_train)
y_pred_cb1=model_cb1.predict(X1_test)
print("\n정확도: ", metrics.accuracy_score(y1_test, y_pred_cb1))

Learning rate set to 0.009807
0:    learn: 0.6864423    total: 1.15ms    remaining: 1.15s
1:    learn: 0.6796678    total: 3.78ms    remaining: 1.89s
2:    learn: 0.6736018    total: 7.16ms    remaining: 2.38s
3:    learn: 0.6665873    total: 10.6ms    remaining: 2.63s
4:    learn: 0.6600996    total: 12.4ms    remaining: 2.47s
5:    learn: 0.6534897    total: 14.2ms    remaining: 2.36s
6:    learn: 0.6472195    total: 15.8ms    remaining: 2.25s
7:    learn: 0.6415114    total: 17.4ms    remaining: 2.16s
8:    learn: 0.6353287    total: 19ms    remaining: 2.09s
9:    learn: 0.6295821    total: 20.5ms    remaining: 2.03s
10:    learn: 0.6242876    total: 21.6ms    remaining: 1.95s
11:    learn: 0.6184887    total: 23.1ms    remaining: 1.9s
12:    learn: 0.6131840    total: 24.7ms    remaining: 1.87s
13:    learn: 0.6078112    total: 26.2ms    remaining: 1.84s
14:    learn: 0.6030343    total: 27.9ms    remaining: 1.83s
15:    learn: 0.5977374    total: 29.4ms    remaining: 1.81s
16:    learn: 0.5928195    total: 31ms    remaining: 1.79s
17:    learn: 0.5892999    total: 32.1ms    remaining: 1.75s
18:    learn: 0.5855036    total: 33.5ms    remaining: 1.73s
19:    learn: 0.5812113    total: 35.3ms    remaining: 1.73s
20:    learn: 0.5765712    total: 36.8ms    remaining: 1.72s
21:    learn: 0.5720403    total: 38.3ms    remaining: 1.7s
22:    learn: 0.5680552    total: 39.8ms    remaining: 1.69s
23:    learn: 0.5638984    total: 41.4ms    remaining: 1.68s
24:    learn: 0.5596681    total: 42.9ms    remaining: 1.67s
25:    learn: 0.5553698    total: 44.4ms    remaining: 1.66s
26:    learn: 0.5516134    total: 46ms    remaining: 1.66s
27:    learn: 0.5481987    total: 47.5ms    remaining: 1.65s
28:    learn: 0.5447553    total: 49.1ms    remaining: 1.64s
29:    learn: 0.5408223    total: 50.6ms    remaining: 1.64s
30:    learn: 0.5369519    total: 52.1ms    remaining: 1.63s
31:    learn: 0.5330715    total: 53.6ms    remaining: 1.62s
32:    learn: 0.5299491    total: 55.1ms    remaining: 1.61s
33:    learn: 0.5267190    total: 56.4ms    remaining: 1.6s
34:    learn: 0.5231781    total: 57.9ms    remaining: 1.6s
35:    learn: 0.5208511    total: 58.8ms    remaining: 1.57s
36:    learn: 0.5177569    total: 60.3ms    remaining: 1.57s
37:    learn: 0.5150180    total: 61.8ms    remaining: 1.56s
38:    learn: 0.5118389    total: 63.4ms    remaining: 1.56s
39:    learn: 0.5092100    total: 64.9ms    remaining: 1.56s
40:    learn: 0.5064001    total: 66.3ms    remaining: 1.55s
41:    learn: 0.5038346    total: 67.9ms    remaining: 1.55s
42:    learn: 0.5013958    total: 69.5ms    remaining: 1.54s
43:    learn: 0.4984976    total: 71ms    remaining: 1.54s
44:    learn: 0.4968538    total: 72.5ms    remaining: 1.54s
45:    learn: 0.4948834    total: 73.6ms    remaining: 1.53s
46:    learn: 0.4922797    total: 75ms    remaining: 1.52s
47:    learn: 0.4898559    total: 76.5ms    remaining: 1.52s
48:    learn: 0.4882097    total: 78.1ms    remaining: 1.51s
49:    learn: 0.4858639    total: 79.7ms    remaining: 1.51s
50:    learn: 0.4839866    total: 81.4ms    remaining: 1.51s
51:    learn: 0.4815496    total: 83ms    remaining: 1.51s
52:    learn: 0.4792559    total: 84.6ms    remaining: 1.51s
53:    learn: 0.4770495    total: 86.2ms    remaining: 1.51s
54:    learn: 0.4747478    total: 87.8ms    remaining: 1.51s
55:    learn: 0.4727874    total: 89.3ms    remaining: 1.5s
56:    learn: 0.4712635    total: 90.4ms    remaining: 1.5s
57:    learn: 0.4694542    total: 92ms    remaining: 1.49s
58:    learn: 0.4679098    total: 93.5ms    remaining: 1.49s
59:    learn: 0.4666425    total: 94.6ms    remaining: 1.48s
60:    learn: 0.4651955    total: 96.2ms    remaining: 1.48s
61:    learn: 0.4629838    total: 97.8ms    remaining: 1.48s
62:    learn: 0.4613419    total: 99.3ms    remaining: 1.48s
63:    learn: 0.4596180    total: 101ms    remaining: 1.48s
64:    learn: 0.4582795    total: 103ms    remaining: 1.48s
65:    learn: 0.4567869    total: 104ms    remaining: 1.48s
66:    learn: 0.4554270    total: 106ms    remaining: 1.48s
67:    learn: 0.4537690    total: 108ms    remaining: 1.48s
68:    learn: 0.4520165    total: 109ms    remaining: 1.47s
69:    learn: 0.4504949    total: 111ms    remaining: 1.47s
70:    learn: 0.4492505    total: 113ms    remaining: 1.47s
71:    learn: 0.4477988    total: 115ms    remaining: 1.48s
72:    learn: 0.4470907    total: 116ms    remaining: 1.47s
73:    learn: 0.4456455    total: 117ms    remaining: 1.47s
74:    learn: 0.4439957    total: 119ms    remaining: 1.46s
75:    learn: 0.4424007    total: 120ms    remaining: 1.46s
76:    learn: 0.4411398    total: 122ms    remaining: 1.46s
77:    learn: 0.4393015    total: 123ms    remaining: 1.46s
78:    learn: 0.4377323    total: 125ms    remaining: 1.46s
79:    learn: 0.4366939    total: 127ms    remaining: 1.46s
80:    learn: 0.4357767    total: 128ms    remaining: 1.45s
81:    learn: 0.4342050    total: 129ms    remaining: 1.45s
82:    learn: 0.4329284    total: 131ms    remaining: 1.45s
83:    learn: 0.4319752    total: 134ms    remaining: 1.46s
84:    learn: 0.4305898    total: 136ms    remaining: 1.46s
85:    learn: 0.4292832    total: 140ms    remaining: 1.49s
86:    learn: 0.4279310    total: 143ms    remaining: 1.5s
87:    learn: 0.4265082    total: 145ms    remaining: 1.5s
88:    learn: 0.4252939    total: 147ms    remaining: 1.5s
89:    learn: 0.4243060    total: 148ms    remaining: 1.5s
90:    learn: 0.4236286    total: 150ms    remaining: 1.5s
91:    learn: 0.4225358    total: 151ms    remaining: 1.49s
92:    learn: 0.4220645    total: 153ms    remaining: 1.49s
93:    learn: 0.4210174    total: 154ms    remaining: 1.49s
94:    learn: 0.4199543    total: 156ms    remaining: 1.48s
95:    learn: 0.4188798    total: 158ms    remaining: 1.48s
96:    learn: 0.4179159    total: 159ms    remaining: 1.48s
97:    learn: 0.4168335    total: 161ms    remaining: 1.48s
98:    learn: 0.4164371    total: 162ms    remaining: 1.47s
99:    learn: 0.4156060    total: 163ms    remaining: 1.47s
100:    learn: 0.4147372    total: 165ms    remaining: 1.47s
101:    learn: 0.4137972    total: 167ms    remaining: 1.47s
102:    learn: 0.4133928    total: 168ms    remaining: 1.46s
103:    learn: 0.4123880    total: 169ms    remaining: 1.46s
104:    learn: 0.4114707    total: 171ms    remaining: 1.46s
105:    learn: 0.4106357    total: 173ms    remaining: 1.46s
106:    learn: 0.4098909    total: 174ms    remaining: 1.45s
107:    learn: 0.4091430    total: 176ms    remaining: 1.45s
108:    learn: 0.4085852    total: 177ms    remaining: 1.45s
109:    learn: 0.4079336    total: 181ms    remaining: 1.46s
110:    learn: 0.4068446    total: 184ms    remaining: 1.47s
111:    learn: 0.4059413    total: 186ms    remaining: 1.47s
112:    learn: 0.4051074    total: 187ms    remaining: 1.47s
113:    learn: 0.4045525    total: 189ms    remaining: 1.47s
114:    learn: 0.4039265    total: 190ms    remaining: 1.46s
115:    learn: 0.4033559    total: 192ms    remaining: 1.46s
116:    learn: 0.4026959    total: 194ms    remaining: 1.46s
117:    learn: 0.4021791    total: 195ms    remaining: 1.46s
118:    learn: 0.4013881    total: 197ms    remaining: 1.46s
119:    learn: 0.4005294    total: 199ms    remaining: 1.46s
120:    learn: 0.4000457    total: 201ms    remaining: 1.46s
121:    learn: 0.3993558    total: 203ms    remaining: 1.46s
122:    learn: 0.3987516    total: 205ms    remaining: 1.46s
123:    learn: 0.3983254    total: 206ms    remaining: 1.46s
124:    learn: 0.3981335    total: 207ms    remaining: 1.45s
125:    learn: 0.3975284    total: 209ms    remaining: 1.45s
126:    learn: 0.3968148    total: 211ms    remaining: 1.45s
127:    learn: 0.3962797    total: 213ms    remaining: 1.45s
128:    learn: 0.3955650    total: 214ms    remaining: 1.45s
129:    learn: 0.3949156    total: 216ms    remaining: 1.45s
130:    learn: 0.3943753    total: 218ms    remaining: 1.44s
131:    learn: 0.3939230    total: 219ms    remaining: 1.44s
132:    learn: 0.3933117    total: 221ms    remaining: 1.44s
133:    learn: 0.3928159    total: 222ms    remaining: 1.44s
134:    learn: 0.3922164    total: 224ms    remaining: 1.44s
135:    learn: 0.3918342    total: 225ms    remaining: 1.43s
136:    learn: 0.3912796    total: 227ms    remaining: 1.43s
137:    learn: 0.3905823    total: 229ms    remaining: 1.43s
138:    learn: 0.3898593    total: 230ms    remaining: 1.43s
139:    learn: 0.3893237    total: 232ms    remaining: 1.43s
140:    learn: 0.3888304    total: 234ms    remaining: 1.43s
141:    learn: 0.3884924    total: 236ms    remaining: 1.42s
142:    learn: 0.3881707    total: 238ms    remaining: 1.42s
143:    learn: 0.3878014    total: 239ms    remaining: 1.42s
144:    learn: 0.3873020    total: 241ms    remaining: 1.42s
145:    learn: 0.3868475    total: 243ms    remaining: 1.42s
146:    learn: 0.3862259    total: 245ms    remaining: 1.42s
147:    learn: 0.3858858    total: 246ms    remaining: 1.42s
148:    learn: 0.3856409    total: 248ms    remaining: 1.42s
149:    learn: 0.3855390    total: 249ms    remaining: 1.41s
150:    learn: 0.3850505    total: 251ms    remaining: 1.41s
151:    learn: 0.3844543    total: 253ms    remaining: 1.41s
152:    learn: 0.3839649    total: 254ms    remaining: 1.41s
153:    learn: 0.3832490    total: 256ms    remaining: 1.41s
154:    learn: 0.3828330    total: 257ms    remaining: 1.4s
155:    learn: 0.3823213    total: 259ms    remaining: 1.4s
156:    learn: 0.3819428    total: 261ms    remaining: 1.4s
157:    learn: 0.3813389    total: 262ms    remaining: 1.4s
158:    learn: 0.3809790    total: 264ms    remaining: 1.4s
159:    learn: 0.3805731    total: 266ms    remaining: 1.39s
160:    learn: 0.3799352    total: 267ms    remaining: 1.39s
161:    learn: 0.3794951    total: 269ms    remaining: 1.39s
162:    learn: 0.3790710    total: 271ms    remaining: 1.39s
163:    learn: 0.3786079    total: 272ms    remaining: 1.39s
164:    learn: 0.3783021    total: 274ms    remaining: 1.39s
165:    learn: 0.3778044    total: 275ms    remaining: 1.38s
166:    learn: 0.3773915    total: 277ms    remaining: 1.38s
167:    learn: 0.3772473    total: 278ms    remaining: 1.38s
168:    learn: 0.3768942    total: 280ms    remaining: 1.37s
169:    learn: 0.3762634    total: 281ms    remaining: 1.37s
170:    learn: 0.3758273    total: 283ms    remaining: 1.37s
171:    learn: 0.3754783    total: 284ms    remaining: 1.37s
172:    learn: 0.3753974    total: 285ms    remaining: 1.36s
173:    learn: 0.3752110    total: 287ms    remaining: 1.36s
174:    learn: 0.3750740    total: 288ms    remaining: 1.36s
175:    learn: 0.3747491    total: 290ms    remaining: 1.35s
176:    learn: 0.3740551    total: 291ms    remaining: 1.35s
177:    learn: 0.3736829    total: 293ms    remaining: 1.35s
178:    learn: 0.3733106    total: 295ms    remaining: 1.35s
179:    learn: 0.3730030    total: 296ms    remaining: 1.35s
180:    learn: 0.3727365    total: 298ms    remaining: 1.35s
181:    learn: 0.3726708    total: 299ms    remaining: 1.34s
182:    learn: 0.3724766    total: 300ms    remaining: 1.34s
183:    learn: 0.3720786    total: 302ms    remaining: 1.34s
184:    learn: 0.3717549    total: 304ms    remaining: 1.34s
185:    learn: 0.3713830    total: 305ms    remaining: 1.33s
186:    learn: 0.3710702    total: 307ms    remaining: 1.33s
187:    learn: 0.3707764    total: 308ms    remaining: 1.33s
188:    learn: 0.3707388    total: 309ms    remaining: 1.33s
189:    learn: 0.3704293    total: 311ms    remaining: 1.32s
190:    learn: 0.3700587    total: 313ms    remaining: 1.32s
191:    learn: 0.3699942    total: 314ms    remaining: 1.32s
192:    learn: 0.3696027    total: 318ms    remaining: 1.33s
193:    learn: 0.3695621    total: 319ms    remaining: 1.32s
194:    learn: 0.3691724    total: 320ms    remaining: 1.32s
195:    learn: 0.3688717    total: 322ms    remaining: 1.32s
196:    learn: 0.3685092    total: 323ms    remaining: 1.32s
197:    learn: 0.3682595    total: 325ms    remaining: 1.32s
198:    learn: 0.3679573    total: 327ms    remaining: 1.32s
199:    learn: 0.3678256    total: 328ms    remaining: 1.31s
200:    learn: 0.3675041    total: 331ms    remaining: 1.31s
201:    learn: 0.3670877    total: 332ms    remaining: 1.31s
202:    learn: 0.3668744    total: 334ms    remaining: 1.31s
203:    learn: 0.3665374    total: 337ms    remaining: 1.31s
204:    learn: 0.3662455    total: 338ms    remaining: 1.31s
205:    learn: 0.3662061    total: 339ms    remaining: 1.31s
206:    learn: 0.3659549    total: 341ms    remaining: 1.3s
207:    learn: 0.3657520    total: 342ms    remaining: 1.3s
208:    learn: 0.3651990    total: 344ms    remaining: 1.3s
209:    learn: 0.3650540    total: 346ms    remaining: 1.3s
210:    learn: 0.3648195    total: 347ms    remaining: 1.3s
211:    learn: 0.3643852    total: 349ms    remaining: 1.29s
212:    learn: 0.3643304    total: 350ms    remaining: 1.29s
213:    learn: 0.3639860    total: 352ms    remaining: 1.29s
214:    learn: 0.3636619    total: 356ms    remaining: 1.3s
215:    learn: 0.3634521    total: 359ms    remaining: 1.3s
216:    learn: 0.3631182    total: 361ms    remaining: 1.3s
217:    learn: 0.3629000    total: 362ms    remaining: 1.3s
218:    learn: 0.3628546    total: 363ms    remaining: 1.29s
219:    learn: 0.3626871    total: 364ms    remaining: 1.29s
220:    learn: 0.3625245    total: 365ms    remaining: 1.29s
221:    learn: 0.3622324    total: 367ms    remaining: 1.28s
222:    learn: 0.3620531    total: 368ms    remaining: 1.28s
223:    learn: 0.3617895    total: 369ms    remaining: 1.28s
224:    learn: 0.3613159    total: 371ms    remaining: 1.28s
225:    learn: 0.3611309    total: 372ms    remaining: 1.27s
226:    learn: 0.3608805    total: 373ms    remaining: 1.27s
227:    learn: 0.3603855    total: 374ms    remaining: 1.27s
228:    learn: 0.3603646    total: 375ms    remaining: 1.26s
229:    learn: 0.3601897    total: 376ms    remaining: 1.26s
230:    learn: 0.3601518    total: 377ms    remaining: 1.25s
231:    learn: 0.3601231    total: 377ms    remaining: 1.25s
232:    learn: 0.3599573    total: 379ms    remaining: 1.25s
233:    learn: 0.3599279    total: 379ms    remaining: 1.24s
234:    learn: 0.3595377    total: 381ms    remaining: 1.24s
235:    learn: 0.3592815    total: 382ms    remaining: 1.24s
236:    learn: 0.3589589    total: 383ms    remaining: 1.23s
237:    learn: 0.3586604    total: 385ms    remaining: 1.23s
238:    learn: 0.3584376    total: 386ms    remaining: 1.23s
239:    learn: 0.3583077    total: 387ms    remaining: 1.23s
240:    learn: 0.3580581    total: 388ms    remaining: 1.22s
241:    learn: 0.3577347    total: 390ms    remaining: 1.22s
242:    learn: 0.3577167    total: 390ms    remaining: 1.22s
243:    learn: 0.3576102    total: 391ms    remaining: 1.21s
244:    learn: 0.3573276    total: 393ms    remaining: 1.21s
245:    learn: 0.3570597    total: 394ms    remaining: 1.21s
246:    learn: 0.3570164    total: 395ms    remaining: 1.2s
247:    learn: 0.3568129    total: 396ms    remaining: 1.2s
248:    learn: 0.3566178    total: 397ms    remaining: 1.2s
249:    learn: 0.3563533    total: 399ms    remaining: 1.2s
250:    learn: 0.3560881    total: 400ms    remaining: 1.19s
251:    learn: 0.3559918    total: 401ms    remaining: 1.19s
252:    learn: 0.3556106    total: 403ms    remaining: 1.19s
253:    learn: 0.3552943    total: 404ms    remaining: 1.19s
254:    learn: 0.3551120    total: 405ms    remaining: 1.18s
255:    learn: 0.3549299    total: 407ms    remaining: 1.18s
256:    learn: 0.3546934    total: 408ms    remaining: 1.18s
257:    learn: 0.3544919    total: 410ms    remaining: 1.18s
258:    learn: 0.3543437    total: 411ms    remaining: 1.18s
259:    learn: 0.3541476    total: 413ms    remaining: 1.17s
260:    learn: 0.3538909    total: 414ms    remaining: 1.17s
261:    learn: 0.3537310    total: 416ms    remaining: 1.17s
262:    learn: 0.3535340    total: 417ms    remaining: 1.17s
263:    learn: 0.3534128    total: 419ms    remaining: 1.17s
264:    learn: 0.3531355    total: 420ms    remaining: 1.17s
265:    learn: 0.3529686    total: 422ms    remaining: 1.16s
266:    learn: 0.3527602    total: 423ms    remaining: 1.16s
267:    learn: 0.3524416    total: 425ms    remaining: 1.16s
268:    learn: 0.3520992    total: 426ms    remaining: 1.16s
269:    learn: 0.3519173    total: 428ms    remaining: 1.16s
270:    learn: 0.3517764    total: 429ms    remaining: 1.16s
271:    learn: 0.3516484    total: 431ms    remaining: 1.15s
272:    learn: 0.3513071    total: 432ms    remaining: 1.15s
273:    learn: 0.3511635    total: 434ms    remaining: 1.15s
274:    learn: 0.3507677    total: 435ms    remaining: 1.15s
275:    learn: 0.3504876    total: 437ms    remaining: 1.15s
276:    learn: 0.3503400    total: 438ms    remaining: 1.14s
277:    learn: 0.3502119    total: 440ms    remaining: 1.14s
278:    learn: 0.3500942    total: 442ms    remaining: 1.14s
279:    learn: 0.3500215    total: 443ms    remaining: 1.14s
280:    learn: 0.3498679    total: 445ms    remaining: 1.14s
281:    learn: 0.3494740    total: 446ms    remaining: 1.14s
282:    learn: 0.3493393    total: 448ms    remaining: 1.13s
283:    learn: 0.3492231    total: 452ms    remaining: 1.14s
284:    learn: 0.3489882    total: 453ms    remaining: 1.14s
285:    learn: 0.3488344    total: 454ms    remaining: 1.13s
286:    learn: 0.3486306    total: 456ms    remaining: 1.13s
287:    learn: 0.3484715    total: 457ms    remaining: 1.13s
288:    learn: 0.3483100    total: 458ms    remaining: 1.13s
289:    learn: 0.3481646    total: 460ms    remaining: 1.13s
290:    learn: 0.3480016    total: 461ms    remaining: 1.12s
291:    learn: 0.3477597    total: 463ms    remaining: 1.12s
292:    learn: 0.3476850    total: 464ms    remaining: 1.12s
293:    learn: 0.3476563    total: 466ms    remaining: 1.12s
294:    learn: 0.3474599    total: 467ms    remaining: 1.12s
295:    learn: 0.3471187    total: 469ms    remaining: 1.11s
296:    learn: 0.3469055    total: 470ms    remaining: 1.11s
297:    learn: 0.3466185    total: 472ms    remaining: 1.11s
298:    learn: 0.3465162    total: 473ms    remaining: 1.11s
299:    learn: 0.3464710    total: 475ms    remaining: 1.11s
300:    learn: 0.3463116    total: 476ms    remaining: 1.11s
301:    learn: 0.3462097    total: 478ms    remaining: 1.1s
302:    learn: 0.3460624    total: 479ms    remaining: 1.1s
303:    learn: 0.3458503    total: 481ms    remaining: 1.1s
304:    learn: 0.3457330    total: 483ms    remaining: 1.1s
305:    learn: 0.3454716    total: 484ms    remaining: 1.1s
306:    learn: 0.3453798    total: 490ms    remaining: 1.1s
307:    learn: 0.3452549    total: 494ms    remaining: 1.11s
308:    learn: 0.3451684    total: 497ms    remaining: 1.11s
309:    learn: 0.3448519    total: 498ms    remaining: 1.11s
310:    learn: 0.3447327    total: 500ms    remaining: 1.11s
311:    learn: 0.3445245    total: 502ms    remaining: 1.11s
312:    learn: 0.3444054    total: 503ms    remaining: 1.1s
313:    learn: 0.3442425    total: 505ms    remaining: 1.1s
314:    learn: 0.3441252    total: 506ms    remaining: 1.1s
315:    learn: 0.3438394    total: 508ms    remaining: 1.1s
316:    learn: 0.3437785    total: 510ms    remaining: 1.1s
317:    learn: 0.3436580    total: 511ms    remaining: 1.1s
318:    learn: 0.3434121    total: 513ms    remaining: 1.09s
319:    learn: 0.3434015    total: 514ms    remaining: 1.09s
320:    learn: 0.3432309    total: 515ms    remaining: 1.09s
321:    learn: 0.3430105    total: 517ms    remaining: 1.09s
322:    learn: 0.3430040    total: 518ms    remaining: 1.08s
323:    learn: 0.3428893    total: 519ms    remaining: 1.08s
324:    learn: 0.3427908    total: 521ms    remaining: 1.08s
325:    learn: 0.3426938    total: 522ms    remaining: 1.08s
326:    learn: 0.3424298    total: 526ms    remaining: 1.08s
327:    learn: 0.3422154    total: 528ms    remaining: 1.08s
328:    learn: 0.3420228    total: 531ms    remaining: 1.08s
329:    learn: 0.3419105    total: 533ms    remaining: 1.08s
330:    learn: 0.3417604    total: 535ms    remaining: 1.08s
331:    learn: 0.3417005    total: 537ms    remaining: 1.08s
332:    learn: 0.3415153    total: 538ms    remaining: 1.08s
333:    learn: 0.3413515    total: 540ms    remaining: 1.08s
334:    learn: 0.3412281    total: 542ms    remaining: 1.07s
335:    learn: 0.3412011    total: 543ms    remaining: 1.07s
336:    learn: 0.3410229    total: 544ms    remaining: 1.07s
337:    learn: 0.3409003    total: 546ms    remaining: 1.07s
338:    learn: 0.3407465    total: 547ms    remaining: 1.07s
339:    learn: 0.3405374    total: 549ms    remaining: 1.06s
340:    learn: 0.3404344    total: 550ms    remaining: 1.06s
341:    learn: 0.3403684    total: 552ms    remaining: 1.06s
342:    learn: 0.3400907    total: 554ms    remaining: 1.06s
343:    learn: 0.3398546    total: 555ms    remaining: 1.06s
344:    learn: 0.3397414    total: 557ms    remaining: 1.06s
345:    learn: 0.3396128    total: 558ms    remaining: 1.05s
346:    learn: 0.3395383    total: 560ms    remaining: 1.05s
347:    learn: 0.3393384    total: 561ms    remaining: 1.05s
348:    learn: 0.3391067    total: 564ms    remaining: 1.05s
349:    learn: 0.3389667    total: 567ms    remaining: 1.05s
350:    learn: 0.3387429    total: 568ms    remaining: 1.05s
351:    learn: 0.3385726    total: 571ms    remaining: 1.05s
352:    learn: 0.3384251    total: 573ms    remaining: 1.05s
353:    learn: 0.3381414    total: 575ms    remaining: 1.05s
354:    learn: 0.3379741    total: 576ms    remaining: 1.05s
355:    learn: 0.3378329    total: 577ms    remaining: 1.04s
356:    learn: 0.3377206    total: 578ms    remaining: 1.04s
357:    learn: 0.3375257    total: 579ms    remaining: 1.04s
358:    learn: 0.3373769    total: 581ms    remaining: 1.04s
359:    learn: 0.3372383    total: 582ms    remaining: 1.03s
360:    learn: 0.3370866    total: 583ms    remaining: 1.03s
361:    learn: 0.3370745    total: 584ms    remaining: 1.03s
362:    learn: 0.3369338    total: 586ms    remaining: 1.03s
363:    learn: 0.3368527    total: 587ms    remaining: 1.02s
364:    learn: 0.3367434    total: 588ms    remaining: 1.02s
365:    learn: 0.3365576    total: 589ms    remaining: 1.02s
366:    learn: 0.3364554    total: 591ms    remaining: 1.02s
367:    learn: 0.3363495    total: 592ms    remaining: 1.02s
368:    learn: 0.3361014    total: 593ms    remaining: 1.01s
369:    learn: 0.3359145    total: 595ms    remaining: 1.01s
370:    learn: 0.3358289    total: 596ms    remaining: 1.01s
371:    learn: 0.3356082    total: 598ms    remaining: 1.01s
372:    learn: 0.3354244    total: 599ms    remaining: 1.01s
373:    learn: 0.3351816    total: 601ms    remaining: 1s
374:    learn: 0.3349759    total: 602ms    remaining: 1s
375:    learn: 0.3348936    total: 604ms    remaining: 1s
376:    learn: 0.3346268    total: 605ms    remaining: 1s
377:    learn: 0.3343676    total: 607ms    remaining: 999ms
378:    learn: 0.3341921    total: 609ms    remaining: 998ms
379:    learn: 0.3339870    total: 611ms    remaining: 996ms
380:    learn: 0.3337607    total: 612ms    remaining: 995ms
381:    learn: 0.3337276    total: 614ms    remaining: 993ms
382:    learn: 0.3335497    total: 615ms    remaining: 991ms
383:    learn: 0.3333378    total: 617ms    remaining: 990ms
384:    learn: 0.3331444    total: 619ms    remaining: 989ms
385:    learn: 0.3328706    total: 620ms    remaining: 987ms
386:    learn: 0.3327889    total: 622ms    remaining: 985ms
387:    learn: 0.3326313    total: 624ms    remaining: 985ms
388:    learn: 0.3324981    total: 626ms    remaining: 983ms
389:    learn: 0.3323998    total: 628ms    remaining: 982ms
390:    learn: 0.3323033    total: 629ms    remaining: 980ms
391:    learn: 0.3322477    total: 631ms    remaining: 978ms
392:    learn: 0.3319772    total: 632ms    remaining: 977ms
393:    learn: 0.3317993    total: 634ms    remaining: 975ms
394:    learn: 0.3316346    total: 636ms    remaining: 973ms
395:    learn: 0.3315460    total: 637ms    remaining: 972ms
396:    learn: 0.3314495    total: 639ms    remaining: 970ms
397:    learn: 0.3313958    total: 640ms    remaining: 969ms
398:    learn: 0.3313636    total: 642ms    remaining: 966ms
399:    learn: 0.3312036    total: 643ms    remaining: 965ms
400:    learn: 0.3310957    total: 645ms    remaining: 964ms
401:    learn: 0.3309663    total: 647ms    remaining: 962ms
402:    learn: 0.3309633    total: 648ms    remaining: 959ms
403:    learn: 0.3308602    total: 649ms    remaining: 958ms
404:    learn: 0.3307102    total: 651ms    remaining: 957ms
405:    learn: 0.3306083    total: 653ms    remaining: 955ms
406:    learn: 0.3303933    total: 655ms    remaining: 954ms
407:    learn: 0.3303254    total: 657ms    remaining: 953ms
408:    learn: 0.3301241    total: 658ms    remaining: 951ms
409:    learn: 0.3300132    total: 660ms    remaining: 950ms
410:    learn: 0.3298714    total: 662ms    remaining: 949ms
411:    learn: 0.3297545    total: 666ms    remaining: 951ms
412:    learn: 0.3296896    total: 670ms    remaining: 953ms
413:    learn: 0.3293730    total: 672ms    remaining: 951ms
414:    learn: 0.3291582    total: 673ms    remaining: 949ms
415:    learn: 0.3289779    total: 675ms    remaining: 947ms
416:    learn: 0.3288515    total: 676ms    remaining: 945ms
417:    learn: 0.3286225    total: 677ms    remaining: 943ms
418:    learn: 0.3284417    total: 679ms    remaining: 941ms
419:    learn: 0.3282645    total: 680ms    remaining: 939ms
420:    learn: 0.3282057    total: 682ms    remaining: 937ms
421:    learn: 0.3280742    total: 683ms    remaining: 936ms
422:    learn: 0.3279897    total: 684ms    remaining: 933ms
423:    learn: 0.3279414    total: 685ms    remaining: 931ms
424:    learn: 0.3278043    total: 687ms    remaining: 929ms
425:    learn: 0.3277831    total: 687ms    remaining: 926ms
426:    learn: 0.3276888    total: 689ms    remaining: 924ms
427:    learn: 0.3276084    total: 690ms    remaining: 922ms
428:    learn: 0.3274960    total: 692ms    remaining: 921ms
429:    learn: 0.3273183    total: 693ms    remaining: 919ms
430:    learn: 0.3271913    total: 694ms    remaining: 917ms
431:    learn: 0.3271639    total: 696ms    remaining: 915ms
432:    learn: 0.3269948    total: 698ms    remaining: 913ms
433:    learn: 0.3268786    total: 700ms    remaining: 912ms
434:    learn: 0.3267243    total: 702ms    remaining: 911ms
435:    learn: 0.3266135    total: 703ms    remaining: 910ms
436:    learn: 0.3264746    total: 705ms    remaining: 908ms
437:    learn: 0.3260983    total: 706ms    remaining: 906ms
438:    learn: 0.3260188    total: 708ms    remaining: 904ms
439:    learn: 0.3259822    total: 709ms    remaining: 902ms
440:    learn: 0.3257988    total: 710ms    remaining: 901ms
441:    learn: 0.3256852    total: 712ms    remaining: 899ms
442:    learn: 0.3255033    total: 713ms    remaining: 897ms
443:    learn: 0.3254455    total: 715ms    remaining: 895ms
444:    learn: 0.3253134    total: 716ms    remaining: 893ms
445:    learn: 0.3252633    total: 717ms    remaining: 891ms
446:    learn: 0.3251332    total: 719ms    remaining: 889ms
447:    learn: 0.3249648    total: 720ms    remaining: 887ms
448:    learn: 0.3248574    total: 722ms    remaining: 886ms
449:    learn: 0.3247845    total: 723ms    remaining: 884ms
450:    learn: 0.3245354    total: 724ms    remaining: 882ms
451:    learn: 0.3244206    total: 726ms    remaining: 880ms
452:    learn: 0.3243711    total: 727ms    remaining: 878ms
453:    learn: 0.3243288    total: 729ms    remaining: 877ms
454:    learn: 0.3242242    total: 730ms    remaining: 875ms
455:    learn: 0.3239490    total: 732ms    remaining: 873ms
456:    learn: 0.3238705    total: 733ms    remaining: 871ms
457:    learn: 0.3238606    total: 734ms    remaining: 869ms
458:    learn: 0.3236528    total: 735ms    remaining: 867ms
459:    learn: 0.3235039    total: 737ms    remaining: 865ms
460:    learn: 0.3234211    total: 738ms    remaining: 863ms
461:    learn: 0.3233266    total: 740ms    remaining: 861ms
462:    learn: 0.3231303    total: 741ms    remaining: 859ms
463:    learn: 0.3229646    total: 742ms    remaining: 857ms
464:    learn: 0.3227962    total: 744ms    remaining: 855ms
465:    learn: 0.3226196    total: 745ms    remaining: 854ms
466:    learn: 0.3223571    total: 746ms    remaining: 852ms
467:    learn: 0.3222110    total: 748ms    remaining: 850ms
468:    learn: 0.3220662    total: 749ms    remaining: 848ms
469:    learn: 0.3219840    total: 750ms    remaining: 846ms
470:    learn: 0.3219059    total: 752ms    remaining: 844ms
471:    learn: 0.3218827    total: 754ms    remaining: 843ms
472:    learn: 0.3217921    total: 755ms    remaining: 841ms
473:    learn: 0.3216864    total: 757ms    remaining: 840ms
474:    learn: 0.3216762    total: 758ms    remaining: 837ms
475:    learn: 0.3215663    total: 759ms    remaining: 836ms
476:    learn: 0.3214919    total: 760ms    remaining: 834ms
477:    learn: 0.3213385    total: 762ms    remaining: 832ms
478:    learn: 0.3212131    total: 764ms    remaining: 830ms
479:    learn: 0.3211449    total: 765ms    remaining: 829ms
480:    learn: 0.3210588    total: 767ms    remaining: 827ms
481:    learn: 0.3210299    total: 768ms    remaining: 825ms
482:    learn: 0.3207860    total: 770ms    remaining: 824ms
483:    learn: 0.3206680    total: 773ms    remaining: 825ms
484:    learn: 0.3204886    total: 776ms    remaining: 823ms
485:    learn: 0.3201907    total: 778ms    remaining: 823ms
486:    learn: 0.3201631    total: 780ms    remaining: 821ms
487:    learn: 0.3200827    total: 782ms    remaining: 820ms
488:    learn: 0.3199985    total: 784ms    remaining: 819ms
489:    learn: 0.3198093    total: 786ms    remaining: 818ms
490:    learn: 0.3197330    total: 787ms    remaining: 816ms
491:    learn: 0.3195681    total: 789ms    remaining: 814ms
492:    learn: 0.3194094    total: 790ms    remaining: 813ms
493:    learn: 0.3193594    total: 792ms    remaining: 811ms
494:    learn: 0.3192246    total: 793ms    remaining: 809ms
495:    learn: 0.3190883    total: 795ms    remaining: 808ms
496:    learn: 0.3190022    total: 796ms    remaining: 806ms
497:    learn: 0.3189700    total: 798ms    remaining: 804ms
498:    learn: 0.3187582    total: 799ms    remaining: 803ms
499:    learn: 0.3186219    total: 801ms    remaining: 801ms
500:    learn: 0.3185476    total: 803ms    remaining: 799ms
501:    learn: 0.3184275    total: 804ms    remaining: 798ms
502:    learn: 0.3183448    total: 806ms    remaining: 796ms
503:    learn: 0.3182237    total: 808ms    remaining: 795ms
504:    learn: 0.3179514    total: 809ms    remaining: 793ms
505:    learn: 0.3179136    total: 811ms    remaining: 792ms
506:    learn: 0.3176711    total: 813ms    remaining: 790ms
507:    learn: 0.3175794    total: 814ms    remaining: 788ms
508:    learn: 0.3174719    total: 816ms    remaining: 787ms
509:    learn: 0.3172911    total: 817ms    remaining: 785ms
510:    learn: 0.3172861    total: 818ms    remaining: 783ms
511:    learn: 0.3172683    total: 819ms    remaining: 781ms
512:    learn: 0.3172510    total: 820ms    remaining: 779ms
513:    learn: 0.3171707    total: 822ms    remaining: 777ms
514:    learn: 0.3170907    total: 824ms    remaining: 776ms
515:    learn: 0.3170215    total: 825ms    remaining: 774ms
516:    learn: 0.3169492    total: 827ms    remaining: 772ms
517:    learn: 0.3168907    total: 828ms    remaining: 771ms
518:    learn: 0.3166368    total: 830ms    remaining: 769ms
519:    learn: 0.3165108    total: 831ms    remaining: 767ms
520:    learn: 0.3164277    total: 834ms    remaining: 767ms
521:    learn: 0.3163670    total: 836ms    remaining: 766ms
522:    learn: 0.3162119    total: 838ms    remaining: 764ms
523:    learn: 0.3159909    total: 840ms    remaining: 763ms
524:    learn: 0.3158627    total: 843ms    remaining: 763ms
525:    learn: 0.3157602    total: 845ms    remaining: 762ms
526:    learn: 0.3157078    total: 847ms    remaining: 760ms
527:    learn: 0.3156260    total: 848ms    remaining: 758ms
528:    learn: 0.3154974    total: 850ms    remaining: 756ms
529:    learn: 0.3153165    total: 851ms    remaining: 755ms
530:    learn: 0.3151412    total: 853ms    remaining: 753ms
531:    learn: 0.3150721    total: 854ms    remaining: 751ms
532:    learn: 0.3149426    total: 856ms    remaining: 750ms
533:    learn: 0.3148334    total: 858ms    remaining: 748ms
534:    learn: 0.3146896    total: 859ms    remaining: 747ms
535:    learn: 0.3144815    total: 861ms    remaining: 745ms
536:    learn: 0.3143912    total: 862ms    remaining: 744ms
537:    learn: 0.3143369    total: 864ms    remaining: 742ms
538:    learn: 0.3142618    total: 868ms    remaining: 743ms
539:    learn: 0.3141539    total: 870ms    remaining: 741ms
540:    learn: 0.3140318    total: 872ms    remaining: 740ms
541:    learn: 0.3139340    total: 874ms    remaining: 739ms
542:    learn: 0.3138693    total: 883ms    remaining: 743ms
543:    learn: 0.3137137    total: 884ms    remaining: 741ms
544:    learn: 0.3136775    total: 888ms    remaining: 741ms
545:    learn: 0.3135136    total: 890ms    remaining: 740ms
546:    learn: 0.3132667    total: 892ms    remaining: 738ms
547:    learn: 0.3131847    total: 897ms    remaining: 740ms
548:    learn: 0.3131351    total: 899ms    remaining: 738ms
549:    learn: 0.3130413    total: 900ms    remaining: 736ms
550:    learn: 0.3129377    total: 901ms    remaining: 735ms
551:    learn: 0.3127922    total: 903ms    remaining: 733ms
552:    learn: 0.3126559    total: 904ms    remaining: 731ms
553:    learn: 0.3125413    total: 906ms    remaining: 729ms
554:    learn: 0.3123807    total: 907ms    remaining: 727ms
555:    learn: 0.3122795    total: 908ms    remaining: 725ms
556:    learn: 0.3121609    total: 910ms    remaining: 724ms
557:    learn: 0.3120492    total: 911ms    remaining: 722ms
558:    learn: 0.3120059    total: 912ms    remaining: 720ms
559:    learn: 0.3118993    total: 914ms    remaining: 718ms
560:    learn: 0.3117650    total: 915ms    remaining: 716ms
561:    learn: 0.3116605    total: 917ms    remaining: 714ms
562:    learn: 0.3114923    total: 918ms    remaining: 713ms
563:    learn: 0.3114171    total: 919ms    remaining: 711ms
564:    learn: 0.3112964    total: 921ms    remaining: 709ms
565:    learn: 0.3111950    total: 922ms    remaining: 707ms
566:    learn: 0.3111227    total: 924ms    remaining: 705ms
567:    learn: 0.3109404    total: 925ms    remaining: 703ms
568:    learn: 0.3109345    total: 926ms    remaining: 701ms
569:    learn: 0.3107864    total: 927ms    remaining: 699ms
570:    learn: 0.3107062    total: 928ms    remaining: 697ms
571:    learn: 0.3105907    total: 930ms    remaining: 696ms
572:    learn: 0.3104622    total: 932ms    remaining: 694ms
573:    learn: 0.3104295    total: 933ms    remaining: 693ms
574:    learn: 0.3102594    total: 935ms    remaining: 691ms
575:    learn: 0.3099801    total: 936ms    remaining: 689ms
576:    learn: 0.3098861    total: 938ms    remaining: 688ms
577:    learn: 0.3097799    total: 940ms    remaining: 686ms
578:    learn: 0.3096591    total: 941ms    remaining: 684ms
579:    learn: 0.3094861    total: 943ms    remaining: 683ms
580:    learn: 0.3094233    total: 945ms    remaining: 681ms
581:    learn: 0.3093261    total: 947ms    remaining: 680ms
582:    learn: 0.3092554    total: 948ms    remaining: 678ms
583:    learn: 0.3091984    total: 950ms    remaining: 677ms
584:    learn: 0.3090779    total: 951ms    remaining: 675ms
585:    learn: 0.3089114    total: 953ms    remaining: 673ms
586:    learn: 0.3087072    total: 955ms    remaining: 672ms
587:    learn: 0.3086577    total: 957ms    remaining: 671ms
588:    learn: 0.3085933    total: 959ms    remaining: 669ms
589:    learn: 0.3084591    total: 960ms    remaining: 667ms
590:    learn: 0.3082292    total: 962ms    remaining: 666ms
591:    learn: 0.3081197    total: 964ms    remaining: 665ms
592:    learn: 0.3078835    total: 966ms    remaining: 663ms
593:    learn: 0.3077445    total: 968ms    remaining: 661ms
594:    learn: 0.3076817    total: 969ms    remaining: 660ms
595:    learn: 0.3076539    total: 971ms    remaining: 658ms
596:    learn: 0.3075941    total: 973ms    remaining: 657ms
597:    learn: 0.3074667    total: 975ms    remaining: 655ms
598:    learn: 0.3074483    total: 976ms    remaining: 653ms
599:    learn: 0.3073675    total: 977ms    remaining: 651ms
600:    learn: 0.3072547    total: 979ms    remaining: 650ms
601:    learn: 0.3072363    total: 980ms    remaining: 648ms
602:    learn: 0.3070790    total: 982ms    remaining: 646ms
603:    learn: 0.3069212    total: 984ms    remaining: 645ms
604:    learn: 0.3067527    total: 993ms    remaining: 648ms
605:    learn: 0.3066531    total: 995ms    remaining: 647ms
606:    learn: 0.3065907    total: 997ms    remaining: 645ms
607:    learn: 0.3064529    total: 998ms    remaining: 644ms
608:    learn: 0.3063562    total: 999ms    remaining: 642ms
609:    learn: 0.3062009    total: 1s    remaining: 640ms
610:    learn: 0.3060596    total: 1s    remaining: 638ms
611:    learn: 0.3059804    total: 1s    remaining: 636ms
612:    learn: 0.3059462    total: 1s    remaining: 635ms
613:    learn: 0.3058976    total: 1.01s    remaining: 633ms
614:    learn: 0.3057840    total: 1.01s    remaining: 632ms
615:    learn: 0.3057076    total: 1.01s    remaining: 630ms
616:    learn: 0.3055833    total: 1.01s    remaining: 628ms
617:    learn: 0.3055278    total: 1.01s    remaining: 627ms
618:    learn: 0.3054147    total: 1.01s    remaining: 625ms
619:    learn: 0.3052816    total: 1.02s    remaining: 624ms
620:    learn: 0.3052263    total: 1.02s    remaining: 623ms
621:    learn: 0.3051848    total: 1.02s    remaining: 621ms
622:    learn: 0.3051363    total: 1.02s    remaining: 619ms
623:    learn: 0.3049397    total: 1.02s    remaining: 618ms
624:    learn: 0.3049107    total: 1.03s    remaining: 616ms
625:    learn: 0.3047204    total: 1.03s    remaining: 615ms
626:    learn: 0.3045337    total: 1.03s    remaining: 613ms
627:    learn: 0.3044718    total: 1.03s    remaining: 611ms
628:    learn: 0.3044015    total: 1.03s    remaining: 610ms
629:    learn: 0.3043145    total: 1.03s    remaining: 608ms
630:    learn: 0.3042671    total: 1.04s    remaining: 606ms
631:    learn: 0.3041630    total: 1.04s    remaining: 605ms
632:    learn: 0.3040370    total: 1.04s    remaining: 603ms
633:    learn: 0.3039320    total: 1.04s    remaining: 601ms
634:    learn: 0.3038157    total: 1.04s    remaining: 599ms
635:    learn: 0.3037399    total: 1.04s    remaining: 598ms
636:    learn: 0.3037030    total: 1.04s    remaining: 596ms
637:    learn: 0.3035112    total: 1.05s    remaining: 595ms
638:    learn: 0.3033812    total: 1.05s    remaining: 594ms
639:    learn: 0.3032892    total: 1.05s    remaining: 593ms
640:    learn: 0.3031651    total: 1.05s    remaining: 591ms
641:    learn: 0.3030900    total: 1.06s    remaining: 590ms
642:    learn: 0.3030082    total: 1.06s    remaining: 588ms
643:    learn: 0.3029507    total: 1.06s    remaining: 586ms
644:    learn: 0.3027877    total: 1.06s    remaining: 585ms
645:    learn: 0.3027289    total: 1.06s    remaining: 583ms
646:    learn: 0.3026159    total: 1.06s    remaining: 581ms
647:    learn: 0.3025691    total: 1.07s    remaining: 580ms
648:    learn: 0.3024699    total: 1.07s    remaining: 578ms
649:    learn: 0.3023822    total: 1.07s    remaining: 576ms
650:    learn: 0.3022204    total: 1.07s    remaining: 574ms
651:    learn: 0.3021950    total: 1.07s    remaining: 573ms
652:    learn: 0.3020537    total: 1.07s    remaining: 571ms
653:    learn: 0.3018853    total: 1.08s    remaining: 569ms
654:    learn: 0.3018510    total: 1.08s    remaining: 568ms
655:    learn: 0.3017033    total: 1.08s    remaining: 566ms
656:    learn: 0.3016264    total: 1.08s    remaining: 564ms
657:    learn: 0.3015334    total: 1.08s    remaining: 563ms
658:    learn: 0.3014958    total: 1.08s    remaining: 561ms
659:    learn: 0.3014106    total: 1.08s    remaining: 559ms
660:    learn: 0.3013390    total: 1.09s    remaining: 557ms
661:    learn: 0.3012686    total: 1.09s    remaining: 556ms
662:    learn: 0.3011331    total: 1.09s    remaining: 554ms
663:    learn: 0.3010920    total: 1.09s    remaining: 552ms
664:    learn: 0.3009207    total: 1.09s    remaining: 552ms
665:    learn: 0.3008078    total: 1.1s    remaining: 550ms
666:    learn: 0.3007882    total: 1.1s    remaining: 549ms
667:    learn: 0.3007260    total: 1.1s    remaining: 548ms
668:    learn: 0.3006588    total: 1.1s    remaining: 546ms
669:    learn: 0.3004912    total: 1.11s    remaining: 545ms
670:    learn: 0.3004723    total: 1.11s    remaining: 543ms
671:    learn: 0.3003003    total: 1.11s    remaining: 542ms
672:    learn: 0.3001678    total: 1.11s    remaining: 540ms
673:    learn: 0.3000326    total: 1.11s    remaining: 538ms
674:    learn: 0.2999521    total: 1.11s    remaining: 537ms
675:    learn: 0.2997049    total: 1.12s    remaining: 535ms
676:    learn: 0.2995324    total: 1.12s    remaining: 533ms
677:    learn: 0.2994917    total: 1.12s    remaining: 531ms
678:    learn: 0.2994258    total: 1.12s    remaining: 530ms
679:    learn: 0.2992969    total: 1.12s    remaining: 528ms
680:    learn: 0.2991673    total: 1.12s    remaining: 527ms
681:    learn: 0.2990420    total: 1.13s    remaining: 525ms
682:    learn: 0.2989613    total: 1.13s    remaining: 523ms
683:    learn: 0.2988837    total: 1.13s    remaining: 521ms
684:    learn: 0.2988569    total: 1.13s    remaining: 520ms
685:    learn: 0.2988010    total: 1.13s    remaining: 518ms
686:    learn: 0.2986281    total: 1.13s    remaining: 516ms
687:    learn: 0.2985230    total: 1.14s    remaining: 515ms
688:    learn: 0.2984262    total: 1.14s    remaining: 513ms
689:    learn: 0.2983868    total: 1.14s    remaining: 511ms
690:    learn: 0.2983422    total: 1.14s    remaining: 510ms
691:    learn: 0.2982868    total: 1.14s    remaining: 508ms
692:    learn: 0.2981226    total: 1.14s    remaining: 507ms
693:    learn: 0.2980691    total: 1.15s    remaining: 505ms
694:    learn: 0.2980002    total: 1.15s    remaining: 504ms
695:    learn: 0.2977908    total: 1.15s    remaining: 502ms
696:    learn: 0.2977052    total: 1.15s    remaining: 500ms
697:    learn: 0.2976894    total: 1.15s    remaining: 498ms
698:    learn: 0.2975668    total: 1.15s    remaining: 497ms
699:    learn: 0.2974088    total: 1.16s    remaining: 495ms
700:    learn: 0.2972473    total: 1.16s    remaining: 493ms
701:    learn: 0.2971332    total: 1.16s    remaining: 492ms
702:    learn: 0.2970478    total: 1.16s    remaining: 490ms
703:    learn: 0.2969479    total: 1.16s    remaining: 488ms
704:    learn: 0.2968603    total: 1.16s    remaining: 486ms
705:    learn: 0.2968035    total: 1.16s    remaining: 485ms
706:    learn: 0.2967609    total: 1.17s    remaining: 483ms
707:    learn: 0.2965663    total: 1.17s    remaining: 481ms
708:    learn: 0.2963934    total: 1.17s    remaining: 479ms
709:    learn: 0.2961962    total: 1.17s    remaining: 478ms
710:    learn: 0.2961123    total: 1.17s    remaining: 476ms
711:    learn: 0.2960399    total: 1.17s    remaining: 474ms
712:    learn: 0.2959317    total: 1.17s    remaining: 473ms
713:    learn: 0.2957971    total: 1.18s    remaining: 471ms
714:    learn: 0.2956215    total: 1.18s    remaining: 469ms
715:    learn: 0.2954575    total: 1.18s    remaining: 468ms
716:    learn: 0.2953437    total: 1.18s    remaining: 466ms
717:    learn: 0.2952515    total: 1.18s    remaining: 465ms
718:    learn: 0.2951054    total: 1.18s    remaining: 463ms
719:    learn: 0.2950176    total: 1.19s    remaining: 461ms
720:    learn: 0.2949562    total: 1.19s    remaining: 460ms
721:    learn: 0.2948627    total: 1.19s    remaining: 458ms
722:    learn: 0.2946604    total: 1.19s    remaining: 456ms
723:    learn: 0.2945315    total: 1.19s    remaining: 455ms
724:    learn: 0.2943817    total: 1.2s    remaining: 453ms
725:    learn: 0.2942896    total: 1.2s    remaining: 452ms
726:    learn: 0.2942133    total: 1.2s    remaining: 452ms
727:    learn: 0.2941519    total: 1.21s    remaining: 451ms
728:    learn: 0.2940815    total: 1.21s    remaining: 450ms
729:    learn: 0.2939787    total: 1.21s    remaining: 449ms
730:    learn: 0.2938856    total: 1.22s    remaining: 447ms
731:    learn: 0.2937554    total: 1.22s    remaining: 446ms
732:    learn: 0.2936156    total: 1.22s    remaining: 444ms
733:    learn: 0.2934647    total: 1.22s    remaining: 442ms
734:    learn: 0.2933211    total: 1.22s    remaining: 441ms
735:    learn: 0.2932438    total: 1.23s    remaining: 440ms
736:    learn: 0.2931398    total: 1.23s    remaining: 439ms
737:    learn: 0.2929100    total: 1.23s    remaining: 437ms
738:    learn: 0.2927724    total: 1.23s    remaining: 436ms
739:    learn: 0.2926515    total: 1.24s    remaining: 434ms
740:    learn: 0.2924777    total: 1.24s    remaining: 432ms
741:    learn: 0.2922992    total: 1.24s    remaining: 430ms
742:    learn: 0.2922210    total: 1.24s    remaining: 429ms
743:    learn: 0.2919906    total: 1.24s    remaining: 427ms
744:    learn: 0.2919204    total: 1.24s    remaining: 425ms
745:    learn: 0.2917676    total: 1.24s    remaining: 424ms
746:    learn: 0.2917028    total: 1.25s    remaining: 422ms
747:    learn: 0.2916301    total: 1.25s    remaining: 420ms
748:    learn: 0.2915584    total: 1.25s    remaining: 418ms
749:    learn: 0.2914372    total: 1.25s    remaining: 416ms
750:    learn: 0.2913890    total: 1.25s    remaining: 415ms
751:    learn: 0.2912267    total: 1.25s    remaining: 413ms
752:    learn: 0.2910140    total: 1.25s    remaining: 411ms
753:    learn: 0.2908936    total: 1.25s    remaining: 409ms
754:    learn: 0.2907613    total: 1.25s    remaining: 407ms
755:    learn: 0.2906129    total: 1.26s    remaining: 406ms
756:    learn: 0.2905391    total: 1.26s    remaining: 404ms
757:    learn: 0.2903551    total: 1.26s    remaining: 402ms
758:    learn: 0.2902709    total: 1.26s    remaining: 400ms
759:    learn: 0.2901705    total: 1.26s    remaining: 398ms
760:    learn: 0.2899778    total: 1.26s    remaining: 397ms
761:    learn: 0.2898210    total: 1.26s    remaining: 395ms
762:    learn: 0.2897040    total: 1.26s    remaining: 393ms
763:    learn: 0.2896362    total: 1.27s    remaining: 391ms
764:    learn: 0.2895478    total: 1.27s    remaining: 390ms
765:    learn: 0.2894245    total: 1.27s    remaining: 388ms
766:    learn: 0.2893510    total: 1.27s    remaining: 386ms
767:    learn: 0.2891791    total: 1.27s    remaining: 384ms
768:    learn: 0.2891680    total: 1.27s    remaining: 383ms
769:    learn: 0.2890059    total: 1.27s    remaining: 381ms
770:    learn: 0.2889026    total: 1.28s    remaining: 379ms
771:    learn: 0.2886843    total: 1.28s    remaining: 377ms
772:    learn: 0.2885872    total: 1.28s    remaining: 376ms
773:    learn: 0.2884660    total: 1.28s    remaining: 374ms
774:    learn: 0.2883364    total: 1.28s    remaining: 372ms
775:    learn: 0.2882417    total: 1.28s    remaining: 371ms
776:    learn: 0.2881458    total: 1.29s    remaining: 369ms
777:    learn: 0.2880760    total: 1.29s    remaining: 367ms
778:    learn: 0.2880472    total: 1.29s    remaining: 366ms
779:    learn: 0.2879450    total: 1.29s    remaining: 364ms
780:    learn: 0.2877392    total: 1.29s    remaining: 362ms
781:    learn: 0.2876009    total: 1.29s    remaining: 361ms
782:    learn: 0.2873887    total: 1.29s    remaining: 359ms
783:    learn: 0.2872053    total: 1.3s    remaining: 358ms
784:    learn: 0.2870761    total: 1.3s    remaining: 356ms
785:    learn: 0.2869364    total: 1.3s    remaining: 355ms
786:    learn: 0.2868042    total: 1.31s    remaining: 354ms
787:    learn: 0.2866404    total: 1.31s    remaining: 352ms
788:    learn: 0.2864998    total: 1.31s    remaining: 351ms
789:    learn: 0.2863503    total: 1.31s    remaining: 349ms
790:    learn: 0.2862753    total: 1.31s    remaining: 347ms
791:    learn: 0.2861606    total: 1.31s    remaining: 346ms
792:    learn: 0.2861053    total: 1.32s    remaining: 344ms
793:    learn: 0.2859749    total: 1.32s    remaining: 342ms
794:    learn: 0.2858847    total: 1.32s    remaining: 340ms
795:    learn: 0.2857806    total: 1.32s    remaining: 339ms
796:    learn: 0.2857007    total: 1.32s    remaining: 337ms
797:    learn: 0.2856353    total: 1.32s    remaining: 335ms
798:    learn: 0.2855295    total: 1.33s    remaining: 334ms
799:    learn: 0.2854726    total: 1.33s    remaining: 332ms
800:    learn: 0.2854507    total: 1.33s    remaining: 330ms
801:    learn: 0.2853060    total: 1.33s    remaining: 329ms
802:    learn: 0.2852216    total: 1.33s    remaining: 327ms
803:    learn: 0.2851484    total: 1.33s    remaining: 325ms
804:    learn: 0.2850786    total: 1.33s    remaining: 324ms
805:    learn: 0.2850163    total: 1.34s    remaining: 322ms
806:    learn: 0.2849433    total: 1.34s    remaining: 320ms
807:    learn: 0.2848859    total: 1.34s    remaining: 319ms
808:    learn: 0.2847031    total: 1.34s    remaining: 317ms
809:    learn: 0.2846279    total: 1.34s    remaining: 315ms
810:    learn: 0.2844524    total: 1.35s    remaining: 314ms
811:    learn: 0.2843816    total: 1.35s    remaining: 312ms
812:    learn: 0.2842996    total: 1.35s    remaining: 310ms
813:    learn: 0.2841085    total: 1.35s    remaining: 309ms
814:    learn: 0.2839898    total: 1.35s    remaining: 307ms
815:    learn: 0.2839256    total: 1.35s    remaining: 305ms
816:    learn: 0.2838326    total: 1.36s    remaining: 304ms
817:    learn: 0.2837922    total: 1.36s    remaining: 302ms
818:    learn: 0.2837661    total: 1.36s    remaining: 300ms
819:    learn: 0.2837009    total: 1.36s    remaining: 299ms
820:    learn: 0.2836201    total: 1.36s    remaining: 297ms
821:    learn: 0.2834531    total: 1.36s    remaining: 295ms
822:    learn: 0.2833575    total: 1.36s    remaining: 294ms
823:    learn: 0.2832791    total: 1.37s    remaining: 292ms
824:    learn: 0.2832314    total: 1.37s    remaining: 290ms
825:    learn: 0.2831767    total: 1.37s    remaining: 289ms
826:    learn: 0.2830808    total: 1.37s    remaining: 287ms
827:    learn: 0.2829269    total: 1.37s    remaining: 286ms
828:    learn: 0.2828895    total: 1.38s    remaining: 284ms
829:    learn: 0.2828296    total: 1.38s    remaining: 282ms
830:    learn: 0.2826825    total: 1.38s    remaining: 281ms
831:    learn: 0.2825853    total: 1.38s    remaining: 279ms
832:    learn: 0.2824905    total: 1.39s    remaining: 278ms
833:    learn: 0.2823213    total: 1.39s    remaining: 276ms
834:    learn: 0.2822432    total: 1.39s    remaining: 274ms
835:    learn: 0.2821129    total: 1.39s    remaining: 273ms
836:    learn: 0.2820086    total: 1.39s    remaining: 271ms
837:    learn: 0.2819346    total: 1.39s    remaining: 269ms
838:    learn: 0.2817989    total: 1.4s    remaining: 268ms
839:    learn: 0.2817084    total: 1.4s    remaining: 266ms
840:    learn: 0.2813877    total: 1.4s    remaining: 265ms
841:    learn: 0.2812499    total: 1.4s    remaining: 263ms
842:    learn: 0.2811951    total: 1.41s    remaining: 262ms
843:    learn: 0.2810829    total: 1.41s    remaining: 260ms
844:    learn: 0.2810392    total: 1.41s    remaining: 259ms
845:    learn: 0.2809627    total: 1.41s    remaining: 257ms
846:    learn: 0.2808334    total: 1.41s    remaining: 255ms
847:    learn: 0.2807824    total: 1.41s    remaining: 253ms
848:    learn: 0.2807220    total: 1.41s    remaining: 252ms
849:    learn: 0.2806410    total: 1.42s    remaining: 251ms
850:    learn: 0.2805382    total: 1.43s    remaining: 250ms
851:    learn: 0.2804362    total: 1.43s    remaining: 249ms
852:    learn: 0.2803172    total: 1.44s    remaining: 248ms
853:    learn: 0.2802016    total: 1.44s    remaining: 247ms
854:    learn: 0.2800346    total: 1.45s    remaining: 245ms
855:    learn: 0.2800234    total: 1.45s    remaining: 244ms
856:    learn: 0.2798293    total: 1.45s    remaining: 242ms
857:    learn: 0.2797126    total: 1.45s    remaining: 240ms
858:    learn: 0.2795744    total: 1.45s    remaining: 238ms
859:    learn: 0.2794426    total: 1.45s    remaining: 237ms
860:    learn: 0.2791854    total: 1.46s    remaining: 235ms
861:    learn: 0.2791077    total: 1.46s    remaining: 233ms
862:    learn: 0.2790048    total: 1.46s    remaining: 232ms
863:    learn: 0.2788733    total: 1.46s    remaining: 230ms
864:    learn: 0.2788182    total: 1.46s    remaining: 228ms
865:    learn: 0.2786590    total: 1.46s    remaining: 227ms
866:    learn: 0.2785414    total: 1.47s    remaining: 225ms
867:    learn: 0.2784293    total: 1.47s    remaining: 223ms
868:    learn: 0.2783252    total: 1.47s    remaining: 221ms
869:    learn: 0.2781887    total: 1.47s    remaining: 220ms
870:    learn: 0.2779748    total: 1.47s    remaining: 218ms
871:    learn: 0.2779000    total: 1.47s    remaining: 216ms
872:    learn: 0.2778225    total: 1.48s    remaining: 215ms
873:    learn: 0.2777576    total: 1.48s    remaining: 213ms
874:    learn: 0.2777006    total: 1.48s    remaining: 211ms
875:    learn: 0.2776684    total: 1.48s    remaining: 210ms
876:    learn: 0.2776048    total: 1.49s    remaining: 208ms
877:    learn: 0.2775366    total: 1.49s    remaining: 207ms
878:    learn: 0.2774703    total: 1.49s    remaining: 205ms
879:    learn: 0.2773172    total: 1.49s    remaining: 203ms
880:    learn: 0.2772617    total: 1.49s    remaining: 202ms
881:    learn: 0.2771427    total: 1.49s    remaining: 200ms
882:    learn: 0.2770267    total: 1.5s    remaining: 198ms
883:    learn: 0.2769163    total: 1.5s    remaining: 196ms
884:    learn: 0.2767768    total: 1.5s    remaining: 195ms
885:    learn: 0.2767300    total: 1.5s    remaining: 193ms
886:    learn: 0.2766843    total: 1.5s    remaining: 191ms
887:    learn: 0.2765526    total: 1.5s    remaining: 190ms
888:    learn: 0.2764557    total: 1.5s    remaining: 188ms
889:    learn: 0.2763369    total: 1.51s    remaining: 186ms
890:    learn: 0.2762687    total: 1.51s    remaining: 184ms
891:    learn: 0.2762030    total: 1.51s    remaining: 183ms
892:    learn: 0.2761240    total: 1.51s    remaining: 181ms
893:    learn: 0.2760416    total: 1.51s    remaining: 179ms
894:    learn: 0.2759239    total: 1.51s    remaining: 178ms
895:    learn: 0.2757411    total: 1.52s    remaining: 176ms
896:    learn: 0.2756781    total: 1.52s    remaining: 174ms
897:    learn: 0.2755656    total: 1.52s    remaining: 173ms
898:    learn: 0.2755017    total: 1.53s    remaining: 172ms
899:    learn: 0.2753771    total: 1.53s    remaining: 170ms
900:    learn: 0.2752091    total: 1.53s    remaining: 168ms
901:    learn: 0.2751239    total: 1.53s    remaining: 167ms
902:    learn: 0.2750431    total: 1.53s    remaining: 165ms
903:    learn: 0.2748063    total: 1.53s    remaining: 163ms
904:    learn: 0.2747393    total: 1.54s    remaining: 161ms
905:    learn: 0.2747020    total: 1.54s    remaining: 160ms
906:    learn: 0.2746334    total: 1.54s    remaining: 158ms
907:    learn: 0.2744415    total: 1.54s    remaining: 156ms
908:    learn: 0.2743686    total: 1.54s    remaining: 155ms
909:    learn: 0.2742677    total: 1.55s    remaining: 153ms
910:    learn: 0.2741824    total: 1.55s    remaining: 151ms
911:    learn: 0.2741178    total: 1.55s    remaining: 150ms
912:    learn: 0.2739814    total: 1.55s    remaining: 148ms
913:    learn: 0.2737799    total: 1.55s    remaining: 146ms
914:    learn: 0.2736915    total: 1.55s    remaining: 144ms
915:    learn: 0.2736188    total: 1.56s    remaining: 143ms
916:    learn: 0.2734803    total: 1.56s    remaining: 142ms
917:    learn: 0.2733654    total: 1.57s    remaining: 140ms
918:    learn: 0.2732999    total: 1.57s    remaining: 138ms
919:    learn: 0.2731856    total: 1.57s    remaining: 136ms
920:    learn: 0.2731466    total: 1.57s    remaining: 135ms
921:    learn: 0.2730750    total: 1.57s    remaining: 133ms
922:    learn: 0.2729086    total: 1.57s    remaining: 131ms
923:    learn: 0.2728371    total: 1.57s    remaining: 130ms
924:    learn: 0.2727040    total: 1.58s    remaining: 128ms
925:    learn: 0.2726765    total: 1.58s    remaining: 126ms
926:    learn: 0.2726018    total: 1.58s    remaining: 124ms
927:    learn: 0.2725268    total: 1.58s    remaining: 123ms
928:    learn: 0.2723675    total: 1.58s    remaining: 121ms
929:    learn: 0.2723010    total: 1.58s    remaining: 119ms
930:    learn: 0.2721701    total: 1.59s    remaining: 118ms
931:    learn: 0.2721272    total: 1.59s    remaining: 116ms
932:    learn: 0.2720654    total: 1.59s    remaining: 115ms
933:    learn: 0.2717534    total: 1.6s    remaining: 113ms
934:    learn: 0.2716384    total: 1.6s    remaining: 111ms
935:    learn: 0.2714989    total: 1.6s    remaining: 109ms
936:    learn: 0.2712303    total: 1.6s    remaining: 108ms
937:    learn: 0.2711671    total: 1.6s    remaining: 106ms
938:    learn: 0.2711033    total: 1.6s    remaining: 104ms
939:    learn: 0.2709477    total: 1.61s    remaining: 103ms
940:    learn: 0.2708769    total: 1.61s    remaining: 101ms
941:    learn: 0.2705752    total: 1.61s    remaining: 99.2ms
942:    learn: 0.2704340    total: 1.61s    remaining: 97.5ms
943:    learn: 0.2702520    total: 1.61s    remaining: 95.8ms
944:    learn: 0.2701982    total: 1.62s    remaining: 94.1ms
945:    learn: 0.2700223    total: 1.62s    remaining: 92.3ms
946:    learn: 0.2699653    total: 1.62s    remaining: 90.6ms
947:    learn: 0.2698782    total: 1.62s    remaining: 88.9ms
948:    learn: 0.2698224    total: 1.62s    remaining: 87.2ms
949:    learn: 0.2697586    total: 1.62s    remaining: 85.5ms
950:    learn: 0.2696519    total: 1.63s    remaining: 83.8ms
951:    learn: 0.2695586    total: 1.63s    remaining: 82.1ms
952:    learn: 0.2694578    total: 1.63s    remaining: 80.3ms
953:    learn: 0.2693905    total: 1.63s    remaining: 78.6ms
954:    learn: 0.2693518    total: 1.63s    remaining: 76.9ms
955:    learn: 0.2692253    total: 1.63s    remaining: 75.2ms
956:    learn: 0.2690230    total: 1.64s    remaining: 73.5ms
957:    learn: 0.2689603    total: 1.64s    remaining: 71.8ms
958:    learn: 0.2689104    total: 1.64s    remaining: 70ms
959:    learn: 0.2688855    total: 1.64s    remaining: 68.3ms
960:    learn: 0.2688177    total: 1.64s    remaining: 66.6ms
961:    learn: 0.2686715    total: 1.64s    remaining: 64.9ms
962:    learn: 0.2686072    total: 1.64s    remaining: 63.2ms
963:    learn: 0.2685623    total: 1.65s    remaining: 61.5ms
964:    learn: 0.2685164    total: 1.65s    remaining: 59.7ms
965:    learn: 0.2683830    total: 1.65s    remaining: 58ms
966:    learn: 0.2683233    total: 1.65s    remaining: 56.3ms
967:    learn: 0.2681680    total: 1.65s    remaining: 54.6ms
968:    learn: 0.2680945    total: 1.65s    remaining: 52.9ms
969:    learn: 0.2680299    total: 1.66s    remaining: 51.2ms
970:    learn: 0.2679874    total: 1.66s    remaining: 49.5ms
971:    learn: 0.2678332    total: 1.66s    remaining: 47.8ms
972:    learn: 0.2677245    total: 1.66s    remaining: 46.1ms
973:    learn: 0.2675624    total: 1.66s    remaining: 44.4ms
974:    learn: 0.2674923    total: 1.66s    remaining: 42.6ms
975:    learn: 0.2674601    total: 1.66s    remaining: 40.9ms
976:    learn: 0.2674036    total: 1.67s    remaining: 39.2ms
977:    learn: 0.2673396    total: 1.67s    remaining: 37.5ms
978:    learn: 0.2672855    total: 1.67s    remaining: 35.8ms
979:    learn: 0.2672348    total: 1.67s    remaining: 34.1ms
980:    learn: 0.2671636    total: 1.67s    remaining: 32.4ms
981:    learn: 0.2671071    total: 1.67s    remaining: 30.7ms
982:    learn: 0.2670526    total: 1.68s    remaining: 29ms
983:    learn: 0.2670075    total: 1.68s    remaining: 27.3ms
984:    learn: 0.2669261    total: 1.68s    remaining: 25.6ms
985:    learn: 0.2668792    total: 1.68s    remaining: 23.9ms
986:    learn: 0.2668012    total: 1.68s    remaining: 22.2ms
987:    learn: 0.2667278    total: 1.68s    remaining: 20.4ms
988:    learn: 0.2665946    total: 1.69s    remaining: 18.7ms
989:    learn: 0.2664965    total: 1.69s    remaining: 17ms
990:    learn: 0.2663784    total: 1.69s    remaining: 15.3ms
991:    learn: 0.2663341    total: 1.69s    remaining: 13.6ms
992:    learn: 0.2662716    total: 1.69s    remaining: 11.9ms
993:    learn: 0.2661899    total: 1.69s    remaining: 10.2ms
994:    learn: 0.2660623    total: 1.69s    remaining: 8.51ms
995:    learn: 0.2660156    total: 1.7s    remaining: 6.81ms
996:    learn: 0.2659742    total: 1.7s    remaining: 5.11ms
997:    learn: 0.2658469    total: 1.7s    remaining: 3.4ms
998:    learn: 0.2657372    total: 1.7s    remaining: 1.7ms
999:    learn: 0.2656287    total: 1.7s    remaining: 0us

정확도:  0.9216417910447762
CPU times: user 2.54 s, sys: 282 ms, total: 2.82 s
Wall time: 2.04 s

하이퍼파라미터 튜닝 후

%%time
model_cb2=cb.CatBoostClassifier(depth= 13,learning_rate= 0.1)
model_cb2.fit(X_train, y_train)
y_pred_cb2=model_cb2.predict(X1_test)
print("\n정확도: ", metrics.accuracy_score(y1_test, y_pred_cb2))

0:    learn: 0.6291440    total: 1.36ms    remaining: 1.36s
1:    learn: 0.5748476    total: 6.07ms    remaining: 3.03s
2:    learn: 0.5279115    total: 65.1ms    remaining: 21.6s
3:    learn: 0.4926697    total: 67.6ms    remaining: 16.8s
4:    learn: 0.4642216    total: 83.3ms    remaining: 16.6s
5:    learn: 0.4382921    total: 144ms    remaining: 23.9s
6:    learn: 0.4215442    total: 159ms    remaining: 22.6s
7:    learn: 0.4041541    total: 179ms    remaining: 22.2s
8:    learn: 0.3930502    total: 181ms    remaining: 20s
9:    learn: 0.3799636    total: 241ms    remaining: 23.9s
10:    learn: 0.3699306    total: 257ms    remaining: 23.1s
11:    learn: 0.3651040    total: 259ms    remaining: 21.3s
12:    learn: 0.3566676    total: 316ms    remaining: 24s
13:    learn: 0.3551631    total: 317ms    remaining: 22.3s
14:    learn: 0.3513713    total: 319ms    remaining: 21s
15:    learn: 0.3442301    total: 389ms    remaining: 23.9s
16:    learn: 0.3419305    total: 394ms    remaining: 22.8s
17:    learn: 0.3397292    total: 396ms    remaining: 21.6s
18:    learn: 0.3385750    total: 397ms    remaining: 20.5s
19:    learn: 0.3357366    total: 400ms    remaining: 19.6s
20:    learn: 0.3336362    total: 402ms    remaining: 18.7s
21:    learn: 0.3320698    total: 403ms    remaining: 17.9s
22:    learn: 0.3246891    total: 484ms    remaining: 20.6s
23:    learn: 0.3184729    total: 561ms    remaining: 22.8s
24:    learn: 0.3148940    total: 567ms    remaining: 22.1s
25:    learn: 0.3147399    total: 568ms    remaining: 21.3s
26:    learn: 0.3102915    total: 612ms    remaining: 22s
27:    learn: 0.3079201    total: 622ms    remaining: 21.6s
28:    learn: 0.3016318    total: 707ms    remaining: 23.7s
29:    learn: 0.2984929    total: 794ms    remaining: 25.7s
30:    learn: 0.2959946    total: 882ms    remaining: 27.6s
31:    learn: 0.2956280    total: 884ms    remaining: 26.7s
32:    learn: 0.2925718    total: 927ms    remaining: 27.2s
33:    learn: 0.2875422    total: 1.01s    remaining: 28.6s
34:    learn: 0.2860851    total: 1.01s    remaining: 27.9s
35:    learn: 0.2829340    total: 1.03s    remaining: 27.7s
36:    learn: 0.2789452    total: 1.12s    remaining: 29.1s
37:    learn: 0.2783872    total: 1.12s    remaining: 28.4s
38:    learn: 0.2749716    total: 1.21s    remaining: 29.7s
39:    learn: 0.2719646    total: 1.28s    remaining: 30.9s
40:    learn: 0.2709034    total: 1.29s    remaining: 30.2s
41:    learn: 0.2673618    total: 1.38s    remaining: 31.4s
42:    learn: 0.2642811    total: 1.46s    remaining: 32.4s
43:    learn: 0.2637187    total: 1.46s    remaining: 31.7s
44:    learn: 0.2635482    total: 1.46s    remaining: 31s
45:    learn: 0.2630957    total: 1.46s    remaining: 30.3s
46:    learn: 0.2606977    total: 1.54s    remaining: 31.2s
47:    learn: 0.2588854    total: 1.62s    remaining: 32.1s
48:    learn: 0.2582335    total: 1.62s    remaining: 31.4s
49:    learn: 0.2562634    total: 1.64s    remaining: 31.1s
50:    learn: 0.2533263    total: 1.72s    remaining: 32s
51:    learn: 0.2517890    total: 1.74s    remaining: 31.7s
52:    learn: 0.2509030    total: 1.74s    remaining: 31.2s
53:    learn: 0.2476277    total: 1.83s    remaining: 32.1s
54:    learn: 0.2450707    total: 1.91s    remaining: 32.9s
55:    learn: 0.2450066    total: 1.91s    remaining: 32.3s
56:    learn: 0.2445548    total: 1.93s    remaining: 31.9s
57:    learn: 0.2425602    total: 2s    remaining: 32.6s
58:    learn: 0.2424471    total: 2s    remaining: 32s
59:    learn: 0.2394531    total: 2.09s    remaining: 32.8s
60:    learn: 0.2382583    total: 2.12s    remaining: 32.6s
61:    learn: 0.2374918    total: 2.12s    remaining: 32s
62:    learn: 0.2336260    total: 2.2s    remaining: 32.8s
63:    learn: 0.2329741    total: 2.24s    remaining: 32.8s
64:    learn: 0.2325789    total: 2.24s    remaining: 32.3s
65:    learn: 0.2299386    total: 2.32s    remaining: 32.9s
66:    learn: 0.2281529    total: 2.42s    remaining: 33.6s
67:    learn: 0.2257874    total: 2.46s    remaining: 33.7s
68:    learn: 0.2232054    total: 2.54s    remaining: 34.3s
69:    learn: 0.2222198    total: 2.55s    remaining: 33.9s
70:    learn: 0.2199046    total: 2.63s    remaining: 34.4s
71:    learn: 0.2177259    total: 2.71s    remaining: 35s
72:    learn: 0.2144199    total: 2.8s    remaining: 35.6s
73:    learn: 0.2127443    total: 2.89s    remaining: 36.1s
74:    learn: 0.2111802    total: 2.96s    remaining: 36.6s
75:    learn: 0.2091821    total: 3.04s    remaining: 37s
76:    learn: 0.2066723    total: 3.13s    remaining: 37.5s
77:    learn: 0.2058984    total: 3.15s    remaining: 37.3s
78:    learn: 0.2048881    total: 3.17s    remaining: 37s
79:    learn: 0.2031754    total: 3.26s    remaining: 37.4s
80:    learn: 0.2030224    total: 3.26s    remaining: 37s
81:    learn: 0.2014270    total: 3.34s    remaining: 37.4s
82:    learn: 0.1992327    total: 3.44s    remaining: 38s
83:    learn: 0.1974043    total: 3.52s    remaining: 38.4s
84:    learn: 0.1969113    total: 3.55s    remaining: 38.2s
85:    learn: 0.1952535    total: 3.63s    remaining: 38.6s
86:    learn: 0.1949188    total: 3.63s    remaining: 38.1s
87:    learn: 0.1941748    total: 3.71s    remaining: 38.5s
88:    learn: 0.1933274    total: 3.79s    remaining: 38.8s
89:    learn: 0.1920005    total: 3.88s    remaining: 39.2s
90:    learn: 0.1902233    total: 3.96s    remaining: 39.5s
91:    learn: 0.1886434    total: 4.04s    remaining: 39.9s
92:    learn: 0.1878609    total: 4.12s    remaining: 40.2s
93:    learn: 0.1873971    total: 4.14s    remaining: 39.9s
94:    learn: 0.1860442    total: 4.22s    remaining: 40.2s
95:    learn: 0.1841309    total: 4.3s    remaining: 40.5s
96:    learn: 0.1836029    total: 4.31s    remaining: 40.1s
97:    learn: 0.1830616    total: 4.39s    remaining: 40.5s
98:    learn: 0.1818786    total: 4.47s    remaining: 40.7s
99:    learn: 0.1812028    total: 4.52s    remaining: 40.7s
100:    learn: 0.1801801    total: 4.59s    remaining: 40.9s
101:    learn: 0.1789388    total: 4.68s    remaining: 41.2s
102:    learn: 0.1772152    total: 4.76s    remaining: 41.4s
103:    learn: 0.1766453    total: 4.84s    remaining: 41.7s
104:    learn: 0.1749187    total: 4.92s    remaining: 41.9s
105:    learn: 0.1742036    total: 5s    remaining: 42.1s
106:    learn: 0.1726317    total: 5.08s    remaining: 42.4s
107:    learn: 0.1708060    total: 5.17s    remaining: 42.7s
108:    learn: 0.1694547    total: 5.25s    remaining: 42.9s
109:    learn: 0.1689197    total: 5.33s    remaining: 43.1s
110:    learn: 0.1674540    total: 5.41s    remaining: 43.3s
111:    learn: 0.1665666    total: 5.49s    remaining: 43.5s
112:    learn: 0.1655337    total: 5.57s    remaining: 43.7s
113:    learn: 0.1640463    total: 5.65s    remaining: 43.9s
114:    learn: 0.1632857    total: 5.73s    remaining: 44.1s
115:    learn: 0.1624247    total: 5.81s    remaining: 44.3s
116:    learn: 0.1609198    total: 5.89s    remaining: 44.5s
117:    learn: 0.1599200    total: 5.97s    remaining: 44.7s
118:    learn: 0.1588361    total: 6.05s    remaining: 44.8s
119:    learn: 0.1579644    total: 6.06s    remaining: 44.4s
120:    learn: 0.1569607    total: 6.14s    remaining: 44.6s
121:    learn: 0.1560274    total: 6.22s    remaining: 44.7s
122:    learn: 0.1551732    total: 6.29s    remaining: 44.9s
123:    learn: 0.1544019    total: 6.38s    remaining: 45s
124:    learn: 0.1537151    total: 6.46s    remaining: 45.3s
125:    learn: 0.1511900    total: 6.53s    remaining: 45.3s
126:    learn: 0.1507331    total: 6.59s    remaining: 45.3s
127:    learn: 0.1499530    total: 6.65s    remaining: 45.3s
128:    learn: 0.1490988    total: 6.72s    remaining: 45.4s
129:    learn: 0.1484887    total: 6.78s    remaining: 45.4s
130:    learn: 0.1476770    total: 6.84s    remaining: 45.4s
131:    learn: 0.1465705    total: 6.9s    remaining: 45.4s
132:    learn: 0.1462378    total: 6.93s    remaining: 45.2s
133:    learn: 0.1457998    total: 6.99s    remaining: 45.2s
134:    learn: 0.1452223    total: 7s    remaining: 44.8s
135:    learn: 0.1439411    total: 7.05s    remaining: 44.8s
136:    learn: 0.1435812    total: 7.11s    remaining: 44.8s
137:    learn: 0.1428338    total: 7.18s    remaining: 44.8s
138:    learn: 0.1416907    total: 7.24s    remaining: 44.8s
139:    learn: 0.1414533    total: 7.24s    remaining: 44.5s
140:    learn: 0.1405029    total: 7.3s    remaining: 44.5s
141:    learn: 0.1398655    total: 7.36s    remaining: 44.5s
142:    learn: 0.1388855    total: 7.43s    remaining: 44.5s
143:    learn: 0.1382867    total: 7.49s    remaining: 44.5s
144:    learn: 0.1376590    total: 7.5s    remaining: 44.3s
145:    learn: 0.1368349    total: 7.58s    remaining: 44.4s
146:    learn: 0.1356921    total: 7.67s    remaining: 44.5s
147:    learn: 0.1354803    total: 7.67s    remaining: 44.1s
148:    learn: 0.1346190    total: 7.75s    remaining: 44.3s
149:    learn: 0.1341611    total: 7.83s    remaining: 44.4s
150:    learn: 0.1334851    total: 7.92s    remaining: 44.5s
151:    learn: 0.1329815    total: 8s    remaining: 44.6s
152:    learn: 0.1325443    total: 8.03s    remaining: 44.5s
153:    learn: 0.1314231    total: 8.12s    remaining: 44.6s
154:    learn: 0.1303450    total: 8.14s    remaining: 44.4s
155:    learn: 0.1300380    total: 8.22s    remaining: 44.5s
156:    learn: 0.1291482    total: 8.29s    remaining: 44.5s
157:    learn: 0.1285500    total: 8.38s    remaining: 44.6s
158:    learn: 0.1276863    total: 8.47s    remaining: 44.8s
159:    learn: 0.1267840    total: 8.54s    remaining: 44.9s
160:    learn: 0.1261672    total: 8.63s    remaining: 45s
161:    learn: 0.1258289    total: 8.65s    remaining: 44.7s
162:    learn: 0.1256662    total: 8.69s    remaining: 44.6s
163:    learn: 0.1251198    total: 8.77s    remaining: 44.7s
164:    learn: 0.1243440    total: 8.85s    remaining: 44.8s
165:    learn: 0.1238743    total: 8.93s    remaining: 44.9s
166:    learn: 0.1232649    total: 9.01s    remaining: 45s
167:    learn: 0.1223070    total: 9.09s    remaining: 45s
168:    learn: 0.1222166    total: 9.17s    remaining: 45.1s
169:    learn: 0.1214763    total: 9.25s    remaining: 45.2s
170:    learn: 0.1208712    total: 9.33s    remaining: 45.2s
171:    learn: 0.1203816    total: 9.35s    remaining: 45s
172:    learn: 0.1195046    total: 9.43s    remaining: 45.1s
173:    learn: 0.1191177    total: 9.52s    remaining: 45.2s
174:    learn: 0.1190539    total: 9.52s    remaining: 44.9s
175:    learn: 0.1186965    total: 9.6s    remaining: 45s
176:    learn: 0.1186388    total: 9.61s    remaining: 44.7s
177:    learn: 0.1180183    total: 9.69s    remaining: 44.7s
178:    learn: 0.1174797    total: 9.77s    remaining: 44.8s
179:    learn: 0.1170483    total: 9.87s    remaining: 45s
180:    learn: 0.1163883    total: 9.95s    remaining: 45s
181:    learn: 0.1161562    total: 9.99s    remaining: 44.9s
182:    learn: 0.1157569    total: 10s    remaining: 44.7s
183:    learn: 0.1157034    total: 10s    remaining: 44.4s
184:    learn: 0.1150241    total: 10.1s    remaining: 44.5s
185:    learn: 0.1143419    total: 10.2s    remaining: 44.5s
186:    learn: 0.1142538    total: 10.2s    remaining: 44.3s
187:    learn: 0.1138790    total: 10.3s    remaining: 44.3s
188:    learn: 0.1133401    total: 10.3s    remaining: 44.4s
189:    learn: 0.1132788    total: 10.3s    remaining: 44.1s
190:    learn: 0.1131678    total: 10.4s    remaining: 43.8s
191:    learn: 0.1125761    total: 10.4s    remaining: 43.9s
192:    learn: 0.1119667    total: 10.5s    remaining: 43.8s
193:    learn: 0.1116648    total: 10.6s    remaining: 43.9s
194:    learn: 0.1109593    total: 10.6s    remaining: 43.9s
195:    learn: 0.1102323    total: 10.7s    remaining: 44s
196:    learn: 0.1096026    total: 10.8s    remaining: 44s
197:    learn: 0.1095286    total: 10.8s    remaining: 43.8s
198:    learn: 0.1090764    total: 10.9s    remaining: 43.8s
199:    learn: 0.1085613    total: 11s    remaining: 43.9s
200:    learn: 0.1081235    total: 11s    remaining: 43.9s
201:    learn: 0.1075332    total: 11.1s    remaining: 44s
202:    learn: 0.1070773    total: 11.2s    remaining: 44s
203:    learn: 0.1066367    total: 11.3s    remaining: 44.1s
204:    learn: 0.1061256    total: 11.4s    remaining: 44.1s
205:    learn: 0.1057380    total: 11.5s    remaining: 44.1s
206:    learn: 0.1052409    total: 11.5s    remaining: 44.2s
207:    learn: 0.1049293    total: 11.6s    remaining: 44.2s
208:    learn: 0.1048501    total: 11.6s    remaining: 44s
209:    learn: 0.1046446    total: 11.7s    remaining: 43.9s
210:    learn: 0.1037971    total: 11.7s    remaining: 43.9s
211:    learn: 0.1035343    total: 11.8s    remaining: 44s
212:    learn: 0.1030647    total: 11.9s    remaining: 44s
213:    learn: 0.1027919    total: 12s    remaining: 44.1s
214:    learn: 0.1025205    total: 12.1s    remaining: 44.1s
215:    learn: 0.1022581    total: 12.2s    remaining: 44.2s
216:    learn: 0.1019889    total: 12.2s    remaining: 44.2s
217:    learn: 0.1017563    total: 12.3s    remaining: 44.2s
218:    learn: 0.1012053    total: 12.4s    remaining: 44.2s
219:    learn: 0.1008358    total: 12.5s    remaining: 44.3s
220:    learn: 0.1002939    total: 12.6s    remaining: 44.3s
221:    learn: 0.0998744    total: 12.7s    remaining: 44.3s
222:    learn: 0.0995125    total: 12.7s    remaining: 44.4s
223:    learn: 0.0994734    total: 12.8s    remaining: 44.2s
224:    learn: 0.0991801    total: 12.8s    remaining: 44.2s
225:    learn: 0.0989244    total: 12.9s    remaining: 44.2s
226:    learn: 0.0985547    total: 13s    remaining: 44.3s
227:    learn: 0.0981702    total: 13.1s    remaining: 44.3s
228:    learn: 0.0978596    total: 13.2s    remaining: 44.3s
229:    learn: 0.0975044    total: 13.2s    remaining: 44.4s
230:    learn: 0.0972430    total: 13.3s    remaining: 44.4s
231:    learn: 0.0972029    total: 13.3s    remaining: 44.2s
232:    learn: 0.0970625    total: 13.4s    remaining: 44.2s
233:    learn: 0.0969125    total: 13.5s    remaining: 44.2s
234:    learn: 0.0965909    total: 13.6s    remaining: 44.2s
235:    learn: 0.0961394    total: 13.7s    remaining: 44.3s
236:    learn: 0.0956679    total: 13.8s    remaining: 44.3s
237:    learn: 0.0954217    total: 13.8s    remaining: 44.3s
238:    learn: 0.0952190    total: 13.9s    remaining: 44.3s
239:    learn: 0.0946616    total: 14s    remaining: 44.3s
240:    learn: 0.0946270    total: 14s    remaining: 44.1s
241:    learn: 0.0944199    total: 14.1s    remaining: 44.1s
242:    learn: 0.0941707    total: 14.2s    remaining: 44.2s
243:    learn: 0.0939590    total: 14.3s    remaining: 44.2s
244:    learn: 0.0935693    total: 14.3s    remaining: 44.2s
245:    learn: 0.0932719    total: 14.4s    remaining: 44.2s
246:    learn: 0.0929236    total: 14.4s    remaining: 44s
247:    learn: 0.0925825    total: 14.5s    remaining: 44s
248:    learn: 0.0923408    total: 14.6s    remaining: 44s
249:    learn: 0.0921541    total: 14.7s    remaining: 44s
250:    learn: 0.0917607    total: 14.8s    remaining: 44s
251:    learn: 0.0916959    total: 14.8s    remaining: 43.8s
252:    learn: 0.0916654    total: 14.8s    remaining: 43.6s
253:    learn: 0.0914250    total: 14.9s    remaining: 43.6s
254:    learn: 0.0912275    total: 14.9s    remaining: 43.6s
255:    learn: 0.0909331    total: 15s    remaining: 43.5s
256:    learn: 0.0907212    total: 15s    remaining: 43.5s
257:    learn: 0.0903835    total: 15.1s    remaining: 43.5s
258:    learn: 0.0899619    total: 15.2s    remaining: 43.5s
259:    learn: 0.0895930    total: 15.3s    remaining: 43.5s
260:    learn: 0.0894562    total: 15.3s    remaining: 43.3s
261:    learn: 0.0891225    total: 15.4s    remaining: 43.3s
262:    learn: 0.0889105    total: 15.4s    remaining: 43.1s
263:    learn: 0.0887177    total: 15.5s    remaining: 43.1s
264:    learn: 0.0885528    total: 15.5s    remaining: 43s
265:    learn: 0.0884071    total: 15.5s    remaining: 42.8s
266:    learn: 0.0880970    total: 15.6s    remaining: 42.7s
267:    learn: 0.0878855    total: 15.6s    remaining: 42.7s
268:    learn: 0.0876066    total: 15.7s    remaining: 42.7s
269:    learn: 0.0874569    total: 15.8s    remaining: 42.6s
270:    learn: 0.0872114    total: 15.8s    remaining: 42.6s
271:    learn: 0.0868274    total: 15.9s    remaining: 42.5s
272:    learn: 0.0865278    total: 15.9s    remaining: 42.5s
273:    learn: 0.0862970    total: 16s    remaining: 42.4s
274:    learn: 0.0859278    total: 16.1s    remaining: 42.3s
275:    learn: 0.0857807    total: 16.1s    remaining: 42.3s
276:    learn: 0.0855715    total: 16.2s    remaining: 42.3s
277:    learn: 0.0853894    total: 16.2s    remaining: 42.1s
278:    learn: 0.0853644    total: 16.2s    remaining: 41.9s
279:    learn: 0.0851855    total: 16.3s    remaining: 41.8s
280:    learn: 0.0850113    total: 16.3s    remaining: 41.8s
281:    learn: 0.0848486    total: 16.4s    remaining: 41.7s
282:    learn: 0.0844737    total: 16.5s    remaining: 41.7s
283:    learn: 0.0841315    total: 16.5s    remaining: 41.6s
284:    learn: 0.0839427    total: 16.6s    remaining: 41.6s
285:    learn: 0.0837591    total: 16.6s    remaining: 41.5s
286:    learn: 0.0835672    total: 16.7s    remaining: 41.5s
287:    learn: 0.0834055    total: 16.8s    remaining: 41.5s
288:    learn: 0.0832002    total: 16.9s    remaining: 41.5s
289:    learn: 0.0829168    total: 17s    remaining: 41.5s
290:    learn: 0.0826480    total: 17s    remaining: 41.5s
291:    learn: 0.0823963    total: 17.1s    remaining: 41.5s
292:    learn: 0.0821079    total: 17.2s    remaining: 41.5s
293:    learn: 0.0817978    total: 17.3s    remaining: 41.5s
294:    learn: 0.0816177    total: 17.3s    remaining: 41.5s
295:    learn: 0.0813600    total: 17.4s    remaining: 41.4s
296:    learn: 0.0812284    total: 17.5s    remaining: 41.4s
297:    learn: 0.0810590    total: 17.6s    remaining: 41.4s
298:    learn: 0.0807188    total: 17.7s    remaining: 41.4s
299:    learn: 0.0805831    total: 17.8s    remaining: 41.4s
300:    learn: 0.0802811    total: 17.8s    remaining: 41.4s
301:    learn: 0.0801180    total: 17.9s    remaining: 41.4s
302:    learn: 0.0800344    total: 18s    remaining: 41.4s
303:    learn: 0.0799437    total: 18.1s    remaining: 41.4s
304:    learn: 0.0795872    total: 18.2s    remaining: 41.4s
305:    learn: 0.0793887    total: 18.2s    remaining: 41.4s
306:    learn: 0.0792553    total: 18.3s    remaining: 41.3s
307:    learn: 0.0790550    total: 18.4s    remaining: 41.3s
308:    learn: 0.0788949    total: 18.5s    remaining: 41.3s
309:    learn: 0.0787694    total: 18.6s    remaining: 41.3s
310:    learn: 0.0786095    total: 18.6s    remaining: 41.3s
311:    learn: 0.0783436    total: 18.7s    remaining: 41.3s
312:    learn: 0.0781246    total: 18.8s    remaining: 41.3s
313:    learn: 0.0780335    total: 18.9s    remaining: 41.3s
314:    learn: 0.0778421    total: 19s    remaining: 41.3s
315:    learn: 0.0776884    total: 19.1s    remaining: 41.3s
316:    learn: 0.0774921    total: 19.1s    remaining: 41.2s
317:    learn: 0.0773268    total: 19.2s    remaining: 41.2s
318:    learn: 0.0771909    total: 19.3s    remaining: 41.1s
319:    learn: 0.0771140    total: 19.3s    remaining: 41.1s
320:    learn: 0.0769700    total: 19.4s    remaining: 41.1s
321:    learn: 0.0767145    total: 19.5s    remaining: 41s
322:    learn: 0.0765796    total: 19.6s    remaining: 41s
323:    learn: 0.0764386    total: 19.7s    remaining: 41s
324:    learn: 0.0763118    total: 19.7s    remaining: 41s
325:    learn: 0.0760452    total: 19.8s    remaining: 41s
326:    learn: 0.0759214    total: 19.9s    remaining: 41s
327:    learn: 0.0758953    total: 19.9s    remaining: 40.8s
328:    learn: 0.0758316    total: 19.9s    remaining: 40.6s
329:    learn: 0.0755765    total: 20s    remaining: 40.6s
330:    learn: 0.0754392    total: 20.1s    remaining: 40.7s
331:    learn: 0.0753975    total: 20.1s    remaining: 40.5s
332:    learn: 0.0752450    total: 20.2s    remaining: 40.5s
333:    learn: 0.0750084    total: 20.3s    remaining: 40.4s
334:    learn: 0.0747665    total: 20.4s    remaining: 40.4s
335:    learn: 0.0746458    total: 20.4s    remaining: 40.4s
336:    learn: 0.0744223    total: 20.5s    remaining: 40.4s
337:    learn: 0.0742375    total: 20.6s    remaining: 40.4s
338:    learn: 0.0741206    total: 20.7s    remaining: 40.3s
339:    learn: 0.0738751    total: 20.8s    remaining: 40.3s
340:    learn: 0.0738069    total: 20.8s    remaining: 40.3s
341:    learn: 0.0736982    total: 20.9s    remaining: 40.3s
342:    learn: 0.0735798    total: 21s    remaining: 40.2s
343:    learn: 0.0733606    total: 21.1s    remaining: 40.2s
344:    learn: 0.0731264    total: 21.2s    remaining: 40.2s
345:    learn: 0.0729077    total: 21.2s    remaining: 40.1s
346:    learn: 0.0727358    total: 21.3s    remaining: 40s
347:    learn: 0.0726109    total: 21.4s    remaining: 40s
348:    learn: 0.0723158    total: 21.4s    remaining: 40s
349:    learn: 0.0721597    total: 21.5s    remaining: 39.9s
350:    learn: 0.0720101    total: 21.6s    remaining: 39.9s
351:    learn: 0.0718753    total: 21.7s    remaining: 39.9s
352:    learn: 0.0718178    total: 21.7s    remaining: 39.8s
353:    learn: 0.0716506    total: 21.8s    remaining: 39.8s
354:    learn: 0.0713943    total: 21.9s    remaining: 39.8s
355:    learn: 0.0711916    total: 22s    remaining: 39.7s
356:    learn: 0.0710649    total: 22.1s    remaining: 39.7s
357:    learn: 0.0709570    total: 22.1s    remaining: 39.7s
358:    learn: 0.0709018    total: 22.2s    remaining: 39.7s
359:    learn: 0.0708145    total: 22.3s    remaining: 39.6s
360:    learn: 0.0707875    total: 22.4s    remaining: 39.6s
361:    learn: 0.0706960    total: 22.5s    remaining: 39.6s
362:    learn: 0.0704835    total: 22.5s    remaining: 39.6s
363:    learn: 0.0704660    total: 22.6s    remaining: 39.4s
364:    learn: 0.0704422    total: 22.6s    remaining: 39.3s
365:    learn: 0.0703499    total: 22.7s    remaining: 39.3s
366:    learn: 0.0702633    total: 22.8s    remaining: 39.2s
367:    learn: 0.0702122    total: 22.8s    remaining: 39.2s
368:    learn: 0.0700797    total: 22.9s    remaining: 39.1s
369:    learn: 0.0699954    total: 23s    remaining: 39.1s
370:    learn: 0.0698622    total: 23s    remaining: 39.1s
371:    learn: 0.0697984    total: 23.1s    remaining: 39s
372:    learn: 0.0697504    total: 23.2s    remaining: 39s
373:    learn: 0.0696325    total: 23.3s    remaining: 39s
374:    learn: 0.0695827    total: 23.3s    remaining: 38.8s
375:    learn: 0.0694660    total: 23.4s    remaining: 38.8s
376:    learn: 0.0693729    total: 23.4s    remaining: 38.8s
377:    learn: 0.0691513    total: 23.5s    remaining: 38.7s
378:    learn: 0.0690860    total: 23.6s    remaining: 38.6s
379:    learn: 0.0689910    total: 23.7s    remaining: 38.6s
380:    learn: 0.0688406    total: 23.7s    remaining: 38.6s
381:    learn: 0.0687428    total: 23.8s    remaining: 38.5s
382:    learn: 0.0686822    total: 23.9s    remaining: 38.5s
383:    learn: 0.0685927    total: 24s    remaining: 38.5s
384:    learn: 0.0684202    total: 24.1s    remaining: 38.4s
385:    learn: 0.0682822    total: 24.1s    remaining: 38.4s
386:    learn: 0.0682056    total: 24.2s    remaining: 38.4s
387:    learn: 0.0681720    total: 24.2s    remaining: 38.2s
388:    learn: 0.0680716    total: 24.3s    remaining: 38.2s
389:    learn: 0.0679639    total: 24.4s    remaining: 38.2s
390:    learn: 0.0677765    total: 24.5s    remaining: 38.2s
391:    learn: 0.0677630    total: 24.5s    remaining: 38s
392:    learn: 0.0676595    total: 24.6s    remaining: 38s
393:    learn: 0.0675553    total: 24.7s    remaining: 38s
394:    learn: 0.0674711    total: 24.8s    remaining: 37.9s
395:    learn: 0.0673370    total: 24.8s    remaining: 37.9s
396:    learn: 0.0673117    total: 24.9s    remaining: 37.8s
397:    learn: 0.0672352    total: 24.9s    remaining: 37.7s
398:    learn: 0.0671287    total: 25s    remaining: 37.7s
399:    learn: 0.0670080    total: 25.1s    remaining: 37.6s
400:    learn: 0.0668827    total: 25.2s    remaining: 37.6s
401:    learn: 0.0667930    total: 25.3s    remaining: 37.6s
402:    learn: 0.0667226    total: 25.3s    remaining: 37.5s
403:    learn: 0.0666502    total: 25.4s    remaining: 37.5s
404:    learn: 0.0665833    total: 25.5s    remaining: 37.4s
405:    learn: 0.0664556    total: 25.6s    remaining: 37.4s
406:    learn: 0.0663563    total: 25.7s    remaining: 37.4s
407:    learn: 0.0662776    total: 25.7s    remaining: 37.2s
408:    learn: 0.0662696    total: 25.7s    remaining: 37.1s
409:    learn: 0.0662331    total: 25.7s    remaining: 37s
410:    learn: 0.0662140    total: 25.8s    remaining: 36.9s
411:    learn: 0.0660842    total: 25.9s    remaining: 36.9s
412:    learn: 0.0659252    total: 25.9s    remaining: 36.9s
413:    learn: 0.0658368    total: 26s    remaining: 36.8s
414:    learn: 0.0657215    total: 26.1s    remaining: 36.8s
415:    learn: 0.0656556    total: 26.2s    remaining: 36.8s
416:    learn: 0.0655536    total: 26.3s    remaining: 36.7s
417:    learn: 0.0655416    total: 26.3s    remaining: 36.6s
418:    learn: 0.0655031    total: 26.4s    remaining: 36.5s
419:    learn: 0.0654451    total: 26.4s    remaining: 36.5s
420:    learn: 0.0653336    total: 26.5s    remaining: 36.5s
421:    learn: 0.0652475    total: 26.6s    remaining: 36.4s
422:    learn: 0.0652147    total: 26.6s    remaining: 36.3s
423:    learn: 0.0650865    total: 26.7s    remaining: 36.3s
424:    learn: 0.0649944    total: 26.8s    remaining: 36.2s
425:    learn: 0.0649487    total: 26.9s    remaining: 36.2s
426:    learn: 0.0648933    total: 26.9s    remaining: 36.2s
427:    learn: 0.0647768    total: 27s    remaining: 36.1s
428:    learn: 0.0646220    total: 27.1s    remaining: 36.1s
429:    learn: 0.0645673    total: 27.2s    remaining: 36s
430:    learn: 0.0644823    total: 27.3s    remaining: 36s
431:    learn: 0.0644046    total: 27.4s    remaining: 36s
432:    learn: 0.0643120    total: 27.4s    remaining: 35.9s
433:    learn: 0.0642234    total: 27.5s    remaining: 35.9s
434:    learn: 0.0641458    total: 27.6s    remaining: 35.8s
435:    learn: 0.0640602    total: 27.7s    remaining: 35.8s
436:    learn: 0.0639917    total: 27.8s    remaining: 35.8s
437:    learn: 0.0638438    total: 27.8s    remaining: 35.7s
438:    learn: 0.0637128    total: 27.9s    remaining: 35.7s
439:    learn: 0.0636231    total: 28s    remaining: 35.7s
440:    learn: 0.0635662    total: 28.1s    remaining: 35.6s
441:    learn: 0.0634986    total: 28.2s    remaining: 35.6s
442:    learn: 0.0634137    total: 28.3s    remaining: 35.5s
443:    learn: 0.0633724    total: 28.3s    remaining: 35.5s
444:    learn: 0.0632873    total: 28.4s    remaining: 35.4s
445:    learn: 0.0632346    total: 28.5s    remaining: 35.4s
446:    learn: 0.0631867    total: 28.6s    remaining: 35.3s
447:    learn: 0.0631247    total: 28.7s    remaining: 35.3s
448:    learn: 0.0630819    total: 28.7s    remaining: 35.3s
449:    learn: 0.0630281    total: 28.8s    remaining: 35.2s
450:    learn: 0.0629809    total: 28.9s    remaining: 35.2s
451:    learn: 0.0629226    total: 29s    remaining: 35.1s
452:    learn: 0.0628627    total: 29.1s    remaining: 35.1s
453:    learn: 0.0627670    total: 29.1s    remaining: 35s
454:    learn: 0.0627075    total: 29.2s    remaining: 35s
455:    learn: 0.0626426    total: 29.3s    remaining: 35s
456:    learn: 0.0625294    total: 29.4s    remaining: 34.9s
457:    learn: 0.0624632    total: 29.5s    remaining: 34.9s
458:    learn: 0.0623462    total: 29.5s    remaining: 34.8s
459:    learn: 0.0622866    total: 29.6s    remaining: 34.8s
460:    learn: 0.0622428    total: 29.7s    remaining: 34.7s
461:    learn: 0.0622024    total: 29.8s    remaining: 34.7s
462:    learn: 0.0621693    total: 29.9s    remaining: 34.6s
463:    learn: 0.0620835    total: 29.9s    remaining: 34.6s
464:    learn: 0.0620437    total: 30s    remaining: 34.5s
465:    learn: 0.0620116    total: 30s    remaining: 34.4s
466:    learn: 0.0619525    total: 30.1s    remaining: 34.4s
467:    learn: 0.0619261    total: 30.2s    remaining: 34.3s
468:    learn: 0.0618856    total: 30.3s    remaining: 34.3s
469:    learn: 0.0618618    total: 30.4s    remaining: 34.3s
470:    learn: 0.0617853    total: 30.5s    remaining: 34.2s
471:    learn: 0.0616711    total: 30.6s    remaining: 34.2s
472:    learn: 0.0616231    total: 30.6s    remaining: 34.1s
473:    learn: 0.0615676    total: 30.7s    remaining: 34s
474:    learn: 0.0614766    total: 30.8s    remaining: 34s
475:    learn: 0.0614160    total: 30.9s    remaining: 34s
476:    learn: 0.0613391    total: 30.9s    remaining: 33.9s
477:    learn: 0.0612957    total: 31s    remaining: 33.9s
478:    learn: 0.0612656    total: 31.1s    remaining: 33.8s
479:    learn: 0.0611741    total: 31.2s    remaining: 33.8s
480:    learn: 0.0611504    total: 31.2s    remaining: 33.7s
481:    learn: 0.0610432    total: 31.3s    remaining: 33.7s
482:    learn: 0.0609883    total: 31.4s    remaining: 33.6s
483:    learn: 0.0608631    total: 31.5s    remaining: 33.6s
484:    learn: 0.0608306    total: 31.6s    remaining: 33.5s
485:    learn: 0.0607123    total: 31.6s    remaining: 33.5s
486:    learn: 0.0606671    total: 31.7s    remaining: 33.4s
487:    learn: 0.0605979    total: 31.8s    remaining: 33.4s
488:    learn: 0.0604872    total: 31.9s    remaining: 33.3s
489:    learn: 0.0603922    total: 32s    remaining: 33.3s
490:    learn: 0.0602644    total: 32s    remaining: 33.2s
491:    learn: 0.0601780    total: 32.1s    remaining: 33.2s
492:    learn: 0.0601248    total: 32.2s    remaining: 33.1s
493:    learn: 0.0600746    total: 32.3s    remaining: 33.1s
494:    learn: 0.0600219    total: 32.4s    remaining: 33s
495:    learn: 0.0599442    total: 32.5s    remaining: 33s
496:    learn: 0.0598624    total: 32.5s    remaining: 32.9s
497:    learn: 0.0598182    total: 32.6s    remaining: 32.9s
498:    learn: 0.0597954    total: 32.7s    remaining: 32.8s
499:    learn: 0.0597308    total: 32.8s    remaining: 32.8s
500:    learn: 0.0596585    total: 32.9s    remaining: 32.7s
501:    learn: 0.0596202    total: 32.9s    remaining: 32.7s
502:    learn: 0.0595124    total: 33s    remaining: 32.6s
503:    learn: 0.0594510    total: 33.1s    remaining: 32.6s
504:    learn: 0.0593790    total: 33.2s    remaining: 32.5s
505:    learn: 0.0593038    total: 33.3s    remaining: 32.5s
506:    learn: 0.0591878    total: 33.3s    remaining: 32.4s
507:    learn: 0.0591315    total: 33.4s    remaining: 32.4s
508:    learn: 0.0590626    total: 33.5s    remaining: 32.3s
509:    learn: 0.0589951    total: 33.6s    remaining: 32.3s
510:    learn: 0.0589847    total: 33.7s    remaining: 32.2s
511:    learn: 0.0589442    total: 33.7s    remaining: 32.2s
512:    learn: 0.0589154    total: 33.8s    remaining: 32.1s
513:    learn: 0.0588723    total: 33.9s    remaining: 32.1s
514:    learn: 0.0588471    total: 34s    remaining: 32s
515:    learn: 0.0588147    total: 34.1s    remaining: 31.9s
516:    learn: 0.0587502    total: 34.1s    remaining: 31.9s
517:    learn: 0.0587225    total: 34.2s    remaining: 31.8s
518:    learn: 0.0586514    total: 34.3s    remaining: 31.8s
519:    learn: 0.0585996    total: 34.4s    remaining: 31.7s
520:    learn: 0.0585270    total: 34.5s    remaining: 31.7s
521:    learn: 0.0584892    total: 34.5s    remaining: 31.6s
522:    learn: 0.0584533    total: 34.6s    remaining: 31.6s
523:    learn: 0.0583926    total: 34.7s    remaining: 31.5s
524:    learn: 0.0583423    total: 34.8s    remaining: 31.5s
525:    learn: 0.0582639    total: 34.9s    remaining: 31.4s
526:    learn: 0.0582035    total: 34.9s    remaining: 31.4s
527:    learn: 0.0581350    total: 35s    remaining: 31.3s
528:    learn: 0.0581145    total: 35.1s    remaining: 31.3s
529:    learn: 0.0580438    total: 35.2s    remaining: 31.2s
530:    learn: 0.0580047    total: 35.3s    remaining: 31.1s
531:    learn: 0.0579979    total: 35.3s    remaining: 31s
532:    learn: 0.0579513    total: 35.4s    remaining: 31s
533:    learn: 0.0578789    total: 35.4s    remaining: 30.9s
534:    learn: 0.0578597    total: 35.5s    remaining: 30.9s
535:    learn: 0.0578236    total: 35.6s    remaining: 30.8s
536:    learn: 0.0577576    total: 35.6s    remaining: 30.7s
537:    learn: 0.0577555    total: 35.6s    remaining: 30.6s
538:    learn: 0.0576661    total: 35.7s    remaining: 30.6s
539:    learn: 0.0576423    total: 35.8s    remaining: 30.5s
540:    learn: 0.0576112    total: 35.9s    remaining: 30.5s
541:    learn: 0.0576056    total: 35.9s    remaining: 30.3s
542:    learn: 0.0575204    total: 36s    remaining: 30.3s
543:    learn: 0.0574944    total: 36.1s    remaining: 30.2s
544:    learn: 0.0574624    total: 36.1s    remaining: 30.2s
545:    learn: 0.0574339    total: 36.2s    remaining: 30.1s
546:    learn: 0.0574113    total: 36.2s    remaining: 30s
547:    learn: 0.0573514    total: 36.3s    remaining: 30s
548:    learn: 0.0572865    total: 36.4s    remaining: 29.9s
549:    learn: 0.0572687    total: 36.5s    remaining: 29.8s
550:    learn: 0.0572307    total: 36.6s    remaining: 29.8s
551:    learn: 0.0571639    total: 36.6s    remaining: 29.7s
552:    learn: 0.0571003    total: 36.7s    remaining: 29.7s
553:    learn: 0.0570416    total: 36.8s    remaining: 29.6s
554:    learn: 0.0570358    total: 36.8s    remaining: 29.5s
555:    learn: 0.0570145    total: 36.9s    remaining: 29.5s
556:    learn: 0.0570025    total: 37s    remaining: 29.4s
557:    learn: 0.0569711    total: 37s    remaining: 29.3s
558:    learn: 0.0569280    total: 37.1s    remaining: 29.2s
559:    learn: 0.0568576    total: 37.2s    remaining: 29.2s
560:    learn: 0.0567969    total: 37.2s    remaining: 29.1s
561:    learn: 0.0567652    total: 37.2s    remaining: 29s
562:    learn: 0.0567616    total: 37.3s    remaining: 28.9s
563:    learn: 0.0567243    total: 37.3s    remaining: 28.9s
564:    learn: 0.0566695    total: 37.4s    remaining: 28.8s
565:    learn: 0.0566430    total: 37.5s    remaining: 28.8s
566:    learn: 0.0566045    total: 37.6s    remaining: 28.7s
567:    learn: 0.0566044    total: 37.6s    remaining: 28.6s
568:    learn: 0.0566016    total: 37.6s    remaining: 28.5s
569:    learn: 0.0565509    total: 37.7s    remaining: 28.4s
570:    learn: 0.0565482    total: 37.7s    remaining: 28.3s
571:    learn: 0.0564908    total: 37.8s    remaining: 28.2s
572:    learn: 0.0564157    total: 37.8s    remaining: 28.2s
573:    learn: 0.0563966    total: 37.9s    remaining: 28.1s
574:    learn: 0.0563267    total: 38s    remaining: 28.1s
575:    learn: 0.0563010    total: 38.1s    remaining: 28s
576:    learn: 0.0562972    total: 38.1s    remaining: 27.9s
577:    learn: 0.0562788    total: 38.1s    remaining: 27.8s
578:    learn: 0.0562703    total: 38.1s    remaining: 27.7s
579:    learn: 0.0562224    total: 38.2s    remaining: 27.6s
580:    learn: 0.0562128    total: 38.2s    remaining: 27.5s
581:    learn: 0.0561442    total: 38.3s    remaining: 27.5s
582:    learn: 0.0561258    total: 38.3s    remaining: 27.4s
583:    learn: 0.0560771    total: 38.4s    remaining: 27.4s
584:    learn: 0.0560251    total: 38.5s    remaining: 27.3s
585:    learn: 0.0560085    total: 38.6s    remaining: 27.3s
586:    learn: 0.0559557    total: 38.7s    remaining: 27.2s
587:    learn: 0.0559067    total: 38.7s    remaining: 27.1s
588:    learn: 0.0558655    total: 38.8s    remaining: 27.1s
589:    learn: 0.0558199    total: 38.9s    remaining: 27s
590:    learn: 0.0557767    total: 39s    remaining: 27s
591:    learn: 0.0557538    total: 39.1s    remaining: 26.9s
592:    learn: 0.0557506    total: 39.1s    remaining: 26.8s
593:    learn: 0.0557241    total: 39.2s    remaining: 26.8s
594:    learn: 0.0556955    total: 39.2s    remaining: 26.7s
595:    learn: 0.0556704    total: 39.3s    remaining: 26.6s
596:    learn: 0.0556488    total: 39.4s    remaining: 26.6s
597:    learn: 0.0556369    total: 39.5s    remaining: 26.5s
598:    learn: 0.0556082    total: 39.6s    remaining: 26.5s
599:    learn: 0.0555476    total: 39.6s    remaining: 26.4s
600:    learn: 0.0555254    total: 39.7s    remaining: 26.3s
601:    learn: 0.0555042    total: 39.7s    remaining: 26.3s
602:    learn: 0.0554939    total: 39.8s    remaining: 26.2s
603:    learn: 0.0554583    total: 39.9s    remaining: 26.2s
604:    learn: 0.0554121    total: 40s    remaining: 26.1s
605:    learn: 0.0553839    total: 40.1s    remaining: 26s
606:    learn: 0.0553179    total: 40.1s    remaining: 26s
607:    learn: 0.0552890    total: 40.2s    remaining: 25.9s
608:    learn: 0.0552693    total: 40.3s    remaining: 25.9s
609:    learn: 0.0552215    total: 40.4s    remaining: 25.8s
610:    learn: 0.0551917    total: 40.5s    remaining: 25.8s
611:    learn: 0.0551509    total: 40.6s    remaining: 25.7s
612:    learn: 0.0551163    total: 40.6s    remaining: 25.7s
613:    learn: 0.0550651    total: 40.7s    remaining: 25.6s
614:    learn: 0.0549956    total: 40.8s    remaining: 25.5s
615:    learn: 0.0549358    total: 40.9s    remaining: 25.5s
616:    learn: 0.0548813    total: 40.9s    remaining: 25.4s
617:    learn: 0.0548195    total: 41s    remaining: 25.3s
618:    learn: 0.0547734    total: 41.1s    remaining: 25.3s
619:    learn: 0.0547143    total: 41.1s    remaining: 25.2s
620:    learn: 0.0546948    total: 41.2s    remaining: 25.2s
621:    learn: 0.0546472    total: 41.3s    remaining: 25.1s
622:    learn: 0.0546137    total: 41.4s    remaining: 25s
623:    learn: 0.0545772    total: 41.5s    remaining: 25s
624:    learn: 0.0545610    total: 41.5s    remaining: 24.9s
625:    learn: 0.0545310    total: 41.6s    remaining: 24.9s
626:    learn: 0.0544733    total: 41.7s    remaining: 24.8s
627:    learn: 0.0544322    total: 41.8s    remaining: 24.8s
628:    learn: 0.0543813    total: 41.9s    remaining: 24.7s
629:    learn: 0.0543221    total: 42s    remaining: 24.6s
630:    learn: 0.0542785    total: 42s    remaining: 24.6s
631:    learn: 0.0542340    total: 42.1s    remaining: 24.5s
632:    learn: 0.0542043    total: 42.2s    remaining: 24.5s
633:    learn: 0.0541720    total: 42.3s    remaining: 24.4s
634:    learn: 0.0541348    total: 42.4s    remaining: 24.4s
635:    learn: 0.0541280    total: 42.4s    remaining: 24.2s
636:    learn: 0.0540815    total: 42.4s    remaining: 24.2s
637:    learn: 0.0540410    total: 42.5s    remaining: 24.1s
638:    learn: 0.0540121    total: 42.6s    remaining: 24.1s
639:    learn: 0.0539801    total: 42.7s    remaining: 24s
640:    learn: 0.0539751    total: 42.7s    remaining: 23.9s
641:    learn: 0.0539553    total: 42.8s    remaining: 23.9s
642:    learn: 0.0539153    total: 42.9s    remaining: 23.8s
643:    learn: 0.0538733    total: 42.9s    remaining: 23.7s
644:    learn: 0.0538325    total: 43s    remaining: 23.7s
645:    learn: 0.0538109    total: 43.1s    remaining: 23.6s
646:    learn: 0.0537777    total: 43.2s    remaining: 23.6s
647:    learn: 0.0537541    total: 43.3s    remaining: 23.5s
648:    learn: 0.0537379    total: 43.3s    remaining: 23.4s
649:    learn: 0.0536988    total: 43.4s    remaining: 23.4s
650:    learn: 0.0536781    total: 43.5s    remaining: 23.3s
651:    learn: 0.0536645    total: 43.5s    remaining: 23.2s
652:    learn: 0.0536346    total: 43.6s    remaining: 23.2s
653:    learn: 0.0536225    total: 43.7s    remaining: 23.1s
654:    learn: 0.0535963    total: 43.8s    remaining: 23s
655:    learn: 0.0535883    total: 43.8s    remaining: 23s
656:    learn: 0.0535883    total: 43.8s    remaining: 22.9s
657:    learn: 0.0535780    total: 43.9s    remaining: 22.8s
658:    learn: 0.0535419    total: 44s    remaining: 22.8s
659:    learn: 0.0535116    total: 44.1s    remaining: 22.7s
660:    learn: 0.0534520    total: 44.2s    remaining: 22.7s
661:    learn: 0.0534199    total: 44.3s    remaining: 22.6s
662:    learn: 0.0533850    total: 44.3s    remaining: 22.5s
663:    learn: 0.0533775    total: 44.4s    remaining: 22.5s
664:    learn: 0.0533391    total: 44.5s    remaining: 22.4s
665:    learn: 0.0533314    total: 44.5s    remaining: 22.3s
666:    learn: 0.0532823    total: 44.6s    remaining: 22.3s
667:    learn: 0.0532240    total: 44.7s    remaining: 22.2s
668:    learn: 0.0532200    total: 44.7s    remaining: 22.1s
669:    learn: 0.0531951    total: 44.7s    remaining: 22s
670:    learn: 0.0531700    total: 44.8s    remaining: 21.9s
671:    learn: 0.0531532    total: 44.8s    remaining: 21.9s
672:    learn: 0.0531407    total: 44.9s    remaining: 21.8s
673:    learn: 0.0531192    total: 45s    remaining: 21.8s
674:    learn: 0.0530832    total: 45.1s    remaining: 21.7s
675:    learn: 0.0530787    total: 45.1s    remaining: 21.6s
676:    learn: 0.0530344    total: 45.2s    remaining: 21.5s
677:    learn: 0.0529996    total: 45.2s    remaining: 21.5s
678:    learn: 0.0529649    total: 45.3s    remaining: 21.4s
679:    learn: 0.0529292    total: 45.4s    remaining: 21.4s
680:    learn: 0.0529233    total: 45.4s    remaining: 21.3s
681:    learn: 0.0528912    total: 45.5s    remaining: 21.2s
682:    learn: 0.0528811    total: 45.6s    remaining: 21.2s
683:    learn: 0.0528611    total: 45.7s    remaining: 21.1s
684:    learn: 0.0528421    total: 45.8s    remaining: 21.1s
685:    learn: 0.0528292    total: 45.9s    remaining: 21s
686:    learn: 0.0528068    total: 45.9s    remaining: 20.9s
687:    learn: 0.0527948    total: 46s    remaining: 20.9s
688:    learn: 0.0527652    total: 46.1s    remaining: 20.8s
689:    learn: 0.0527163    total: 46.2s    remaining: 20.7s
690:    learn: 0.0526741    total: 46.3s    remaining: 20.7s
691:    learn: 0.0526589    total: 46.3s    remaining: 20.6s
692:    learn: 0.0526302    total: 46.4s    remaining: 20.6s
693:    learn: 0.0526016    total: 46.5s    remaining: 20.5s
694:    learn: 0.0525761    total: 46.6s    remaining: 20.4s
695:    learn: 0.0525395    total: 46.7s    remaining: 20.4s
696:    learn: 0.0525001    total: 46.7s    remaining: 20.3s
697:    learn: 0.0524684    total: 46.8s    remaining: 20.3s
698:    learn: 0.0524328    total: 46.9s    remaining: 20.2s
699:    learn: 0.0524017    total: 47s    remaining: 20.1s
700:    learn: 0.0523965    total: 47s    remaining: 20s
701:    learn: 0.0523931    total: 47.1s    remaining: 20s
702:    learn: 0.0523444    total: 47.1s    remaining: 19.9s
703:    learn: 0.0523082    total: 47.2s    remaining: 19.9s
704:    learn: 0.0523046    total: 47.2s    remaining: 19.8s
705:    learn: 0.0522693    total: 47.3s    remaining: 19.7s
706:    learn: 0.0522304    total: 47.4s    remaining: 19.6s
707:    learn: 0.0521832    total: 47.5s    remaining: 19.6s
708:    learn: 0.0521677    total: 47.5s    remaining: 19.5s
709:    learn: 0.0521393    total: 47.6s    remaining: 19.5s
710:    learn: 0.0521192    total: 47.7s    remaining: 19.4s
711:    learn: 0.0520910    total: 47.8s    remaining: 19.3s
712:    learn: 0.0520517    total: 47.9s    remaining: 19.3s
713:    learn: 0.0520207    total: 47.9s    remaining: 19.2s
714:    learn: 0.0519936    total: 48s    remaining: 19.1s
715:    learn: 0.0519737    total: 48.1s    remaining: 19.1s
716:    learn: 0.0519543    total: 48.2s    remaining: 19s
717:    learn: 0.0519059    total: 48.3s    remaining: 19s
718:    learn: 0.0519015    total: 48.3s    remaining: 18.9s
719:    learn: 0.0518879    total: 48.4s    remaining: 18.8s
720:    learn: 0.0518622    total: 48.5s    remaining: 18.8s
721:    learn: 0.0518434    total: 48.5s    remaining: 18.7s
722:    learn: 0.0518251    total: 48.6s    remaining: 18.6s
723:    learn: 0.0518006    total: 48.7s    remaining: 18.6s
724:    learn: 0.0517861    total: 48.8s    remaining: 18.5s
725:    learn: 0.0517640    total: 48.9s    remaining: 18.4s
726:    learn: 0.0517639    total: 48.9s    remaining: 18.4s
727:    learn: 0.0517346    total: 49s    remaining: 18.3s
728:    learn: 0.0517253    total: 49s    remaining: 18.2s
729:    learn: 0.0516807    total: 49.1s    remaining: 18.1s
730:    learn: 0.0516806    total: 49.1s    remaining: 18.1s
731:    learn: 0.0516739    total: 49.1s    remaining: 18s
732:    learn: 0.0516491    total: 49.2s    remaining: 17.9s
733:    learn: 0.0516491    total: 49.2s    remaining: 17.8s
734:    learn: 0.0515986    total: 49.3s    remaining: 17.8s
735:    learn: 0.0515787    total: 49.3s    remaining: 17.7s
736:    learn: 0.0515602    total: 49.4s    remaining: 17.6s
737:    learn: 0.0515559    total: 49.5s    remaining: 17.6s
738:    learn: 0.0515476    total: 49.5s    remaining: 17.5s
739:    learn: 0.0515234    total: 49.6s    remaining: 17.4s
740:    learn: 0.0514953    total: 49.7s    remaining: 17.4s
741:    learn: 0.0514785    total: 49.8s    remaining: 17.3s
742:    learn: 0.0514769    total: 49.8s    remaining: 17.2s
743:    learn: 0.0514759    total: 49.8s    remaining: 17.1s
744:    learn: 0.0514683    total: 49.8s    remaining: 17s
745:    learn: 0.0514682    total: 49.8s    remaining: 16.9s
746:    learn: 0.0514407    total: 49.9s    remaining: 16.9s
747:    learn: 0.0514155    total: 49.9s    remaining: 16.8s
748:    learn: 0.0513860    total: 50s    remaining: 16.7s
749:    learn: 0.0513621    total: 50s    remaining: 16.7s
750:    learn: 0.0513435    total: 50.1s    remaining: 16.6s
751:    learn: 0.0513006    total: 50.2s    remaining: 16.6s
752:    learn: 0.0512598    total: 50.3s    remaining: 16.5s
753:    learn: 0.0512391    total: 50.4s    remaining: 16.4s
754:    learn: 0.0512084    total: 50.5s    remaining: 16.4s
755:    learn: 0.0512066    total: 50.5s    remaining: 16.3s
756:    learn: 0.0511812    total: 50.5s    remaining: 16.2s
757:    learn: 0.0511776    total: 50.5s    remaining: 16.1s
758:    learn: 0.0511644    total: 50.6s    remaining: 16.1s
759:    learn: 0.0511427    total: 50.7s    remaining: 16s
760:    learn: 0.0511171    total: 50.8s    remaining: 15.9s
761:    learn: 0.0510823    total: 50.8s    remaining: 15.9s
762:    learn: 0.0510748    total: 50.9s    remaining: 15.8s
763:    learn: 0.0510643    total: 50.9s    remaining: 15.7s
764:    learn: 0.0510521    total: 51s    remaining: 15.7s
765:    learn: 0.0510316    total: 51s    remaining: 15.6s
766:    learn: 0.0510176    total: 51.1s    remaining: 15.5s
767:    learn: 0.0510026    total: 51.2s    remaining: 15.5s
768:    learn: 0.0509902    total: 51.3s    remaining: 15.4s
769:    learn: 0.0509799    total: 51.4s    remaining: 15.3s
770:    learn: 0.0509573    total: 51.4s    remaining: 15.3s
771:    learn: 0.0509404    total: 51.5s    remaining: 15.2s
772:    learn: 0.0509252    total: 51.6s    remaining: 15.2s
773:    learn: 0.0509216    total: 51.7s    remaining: 15.1s
774:    learn: 0.0508946    total: 51.8s    remaining: 15s
775:    learn: 0.0508798    total: 51.8s    remaining: 15s
776:    learn: 0.0508618    total: 51.9s    remaining: 14.9s
777:    learn: 0.0508549    total: 52s    remaining: 14.8s
778:    learn: 0.0508241    total: 52s    remaining: 14.8s
779:    learn: 0.0508036    total: 52.1s    remaining: 14.7s
780:    learn: 0.0507911    total: 52.2s    remaining: 14.6s
781:    learn: 0.0507803    total: 52.3s    remaining: 14.6s
782:    learn: 0.0507605    total: 52.4s    remaining: 14.5s
783:    learn: 0.0507604    total: 52.4s    remaining: 14.4s
784:    learn: 0.0507474    total: 52.5s    remaining: 14.4s
785:    learn: 0.0507139    total: 52.5s    remaining: 14.3s
786:    learn: 0.0507050    total: 52.6s    remaining: 14.2s
787:    learn: 0.0506937    total: 52.7s    remaining: 14.2s
788:    learn: 0.0506593    total: 52.8s    remaining: 14.1s
789:    learn: 0.0506356    total: 52.9s    remaining: 14s
790:    learn: 0.0506036    total: 52.9s    remaining: 14s
791:    learn: 0.0505768    total: 53s    remaining: 13.9s
792:    learn: 0.0505455    total: 53.1s    remaining: 13.9s
793:    learn: 0.0505455    total: 53.1s    remaining: 13.8s
794:    learn: 0.0505376    total: 53.2s    remaining: 13.7s
795:    learn: 0.0505344    total: 53.2s    remaining: 13.6s
796:    learn: 0.0505222    total: 53.3s    remaining: 13.6s
797:    learn: 0.0505047    total: 53.4s    remaining: 13.5s
798:    learn: 0.0504874    total: 53.4s    remaining: 13.4s
799:    learn: 0.0504531    total: 53.5s    remaining: 13.4s
800:    learn: 0.0504323    total: 53.6s    remaining: 13.3s
801:    learn: 0.0504025    total: 53.7s    remaining: 13.3s
802:    learn: 0.0503826    total: 53.8s    remaining: 13.2s
803:    learn: 0.0503826    total: 53.8s    remaining: 13.1s
804:    learn: 0.0503747    total: 53.9s    remaining: 13s
805:    learn: 0.0503445    total: 53.9s    remaining: 13s
806:    learn: 0.0503168    total: 54s    remaining: 12.9s
807:    learn: 0.0502967    total: 54.1s    remaining: 12.9s
808:    learn: 0.0502850    total: 54.2s    remaining: 12.8s
809:    learn: 0.0502785    total: 54.2s    remaining: 12.7s
810:    learn: 0.0502558    total: 54.3s    remaining: 12.7s
811:    learn: 0.0502286    total: 54.4s    remaining: 12.6s
812:    learn: 0.0502024    total: 54.5s    remaining: 12.5s
813:    learn: 0.0501775    total: 54.5s    remaining: 12.5s
814:    learn: 0.0501774    total: 54.5s    remaining: 12.4s
815:    learn: 0.0501544    total: 54.6s    remaining: 12.3s
816:    learn: 0.0501311    total: 54.7s    remaining: 12.2s
817:    learn: 0.0501118    total: 54.8s    remaining: 12.2s
818:    learn: 0.0500872    total: 54.8s    remaining: 12.1s
819:    learn: 0.0500770    total: 54.9s    remaining: 12.1s
820:    learn: 0.0500710    total: 55s    remaining: 12s
821:    learn: 0.0500498    total: 55.1s    remaining: 11.9s
822:    learn: 0.0500335    total: 55.2s    remaining: 11.9s
823:    learn: 0.0500208    total: 55.2s    remaining: 11.8s
824:    learn: 0.0500152    total: 55.3s    remaining: 11.7s
825:    learn: 0.0500075    total: 55.4s    remaining: 11.7s
826:    learn: 0.0500043    total: 55.5s    remaining: 11.6s
827:    learn: 0.0499862    total: 55.6s    remaining: 11.5s
828:    learn: 0.0499706    total: 55.6s    remaining: 11.5s
829:    learn: 0.0499416    total: 55.7s    remaining: 11.4s
830:    learn: 0.0499375    total: 55.8s    remaining: 11.3s
831:    learn: 0.0499345    total: 55.8s    remaining: 11.3s
832:    learn: 0.0499043    total: 55.9s    remaining: 11.2s
833:    learn: 0.0498772    total: 56s    remaining: 11.1s
834:    learn: 0.0498745    total: 56.1s    remaining: 11.1s
835:    learn: 0.0498592    total: 56.1s    remaining: 11s
836:    learn: 0.0498441    total: 56.2s    remaining: 10.9s
837:    learn: 0.0498164    total: 56.3s    remaining: 10.9s
838:    learn: 0.0498028    total: 56.4s    remaining: 10.8s
839:    learn: 0.0497924    total: 56.5s    remaining: 10.8s
840:    learn: 0.0497684    total: 56.6s    remaining: 10.7s
841:    learn: 0.0497584    total: 56.6s    remaining: 10.6s
842:    learn: 0.0497366    total: 56.7s    remaining: 10.6s
843:    learn: 0.0497284    total: 56.8s    remaining: 10.5s
844:    learn: 0.0497170    total: 56.9s    remaining: 10.4s
845:    learn: 0.0497163    total: 56.9s    remaining: 10.4s
846:    learn: 0.0496952    total: 56.9s    remaining: 10.3s
847:    learn: 0.0496952    total: 56.9s    remaining: 10.2s
848:    learn: 0.0496852    total: 57s    remaining: 10.1s
849:    learn: 0.0496637    total: 57.1s    remaining: 10.1s
850:    learn: 0.0496536    total: 57.2s    remaining: 10s
851:    learn: 0.0496440    total: 57.2s    remaining: 9.94s
852:    learn: 0.0496241    total: 57.3s    remaining: 9.88s
853:    learn: 0.0496156    total: 57.4s    remaining: 9.81s
854:    learn: 0.0496015    total: 57.5s    remaining: 9.75s
855:    learn: 0.0495849    total: 57.6s    remaining: 9.68s
856:    learn: 0.0495672    total: 57.6s    remaining: 9.62s
857:    learn: 0.0495489    total: 57.7s    remaining: 9.55s
858:    learn: 0.0495285    total: 57.8s    remaining: 9.49s
859:    learn: 0.0495165    total: 57.9s    remaining: 9.42s
860:    learn: 0.0495049    total: 58s    remaining: 9.36s
861:    learn: 0.0494951    total: 58s    remaining: 9.29s
862:    learn: 0.0494693    total: 58.1s    remaining: 9.23s
863:    learn: 0.0494637    total: 58.2s    remaining: 9.16s
864:    learn: 0.0494637    total: 58.2s    remaining: 9.08s
865:    learn: 0.0494380    total: 58.3s    remaining: 9.02s
866:    learn: 0.0494372    total: 58.4s    remaining: 8.95s
867:    learn: 0.0494135    total: 58.4s    remaining: 8.88s
868:    learn: 0.0493852    total: 58.5s    remaining: 8.81s
869:    learn: 0.0493704    total: 58.5s    remaining: 8.75s
870:    learn: 0.0493704    total: 58.5s    remaining: 8.67s
871:    learn: 0.0493493    total: 58.6s    remaining: 8.6s
872:    learn: 0.0493493    total: 58.6s    remaining: 8.53s
873:    learn: 0.0493342    total: 58.7s    remaining: 8.46s
874:    learn: 0.0493237    total: 58.7s    remaining: 8.39s
875:    learn: 0.0493067    total: 58.8s    remaining: 8.32s
876:    learn: 0.0492978    total: 58.8s    remaining: 8.25s
877:    learn: 0.0492735    total: 58.9s    remaining: 8.18s
878:    learn: 0.0492658    total: 58.9s    remaining: 8.11s
879:    learn: 0.0492395    total: 59s    remaining: 8.04s
880:    learn: 0.0492164    total: 59s    remaining: 7.97s
881:    learn: 0.0492039    total: 59.1s    remaining: 7.91s
882:    learn: 0.0491852    total: 59.2s    remaining: 7.84s
883:    learn: 0.0491852    total: 59.2s    remaining: 7.76s
884:    learn: 0.0491851    total: 59.2s    remaining: 7.69s
885:    learn: 0.0491670    total: 59.2s    remaining: 7.62s
886:    learn: 0.0491507    total: 59.3s    remaining: 7.55s
887:    learn: 0.0491381    total: 59.4s    remaining: 7.49s
888:    learn: 0.0491154    total: 59.4s    remaining: 7.42s
889:    learn: 0.0491012    total: 59.5s    remaining: 7.35s
890:    learn: 0.0490930    total: 59.5s    remaining: 7.28s
891:    learn: 0.0490792    total: 59.6s    remaining: 7.21s
892:    learn: 0.0490792    total: 59.6s    remaining: 7.14s
893:    learn: 0.0490739    total: 59.6s    remaining: 7.07s
894:    learn: 0.0490451    total: 59.7s    remaining: 7s
895:    learn: 0.0490220    total: 59.7s    remaining: 6.93s
896:    learn: 0.0490141    total: 59.7s    remaining: 6.86s
897:    learn: 0.0489974    total: 59.8s    remaining: 6.79s
898:    learn: 0.0489858    total: 59.8s    remaining: 6.72s
899:    learn: 0.0489852    total: 59.9s    remaining: 6.65s
900:    learn: 0.0489770    total: 59.9s    remaining: 6.58s
901:    learn: 0.0489587    total: 60s    remaining: 6.52s
902:    learn: 0.0489458    total: 1m    remaining: 6.45s
903:    learn: 0.0489361    total: 1m    remaining: 6.38s
904:    learn: 0.0489139    total: 1m    remaining: 6.32s
905:    learn: 0.0489041    total: 1m    remaining: 6.25s
906:    learn: 0.0489041    total: 1m    remaining: 6.18s
907:    learn: 0.0488924    total: 1m    remaining: 6.11s
908:    learn: 0.0488831    total: 1m    remaining: 6.04s
909:    learn: 0.0488636    total: 1m    remaining: 5.98s
910:    learn: 0.0488445    total: 1m    remaining: 5.91s
911:    learn: 0.0488279    total: 1m    remaining: 5.84s
912:    learn: 0.0488278    total: 1m    remaining: 5.76s
913:    learn: 0.0488112    total: 1m    remaining: 5.7s
914:    learn: 0.0487917    total: 1m    remaining: 5.63s
915:    learn: 0.0487757    total: 1m    remaining: 5.57s
916:    learn: 0.0487548    total: 1m    remaining: 5.5s
917:    learn: 0.0487503    total: 1m    remaining: 5.43s
918:    learn: 0.0487350    total: 1m    remaining: 5.36s
919:    learn: 0.0487270    total: 1m    remaining: 5.29s
920:    learn: 0.0487001    total: 1m    remaining: 5.23s
921:    learn: 0.0487001    total: 1m    remaining: 5.16s
922:    learn: 0.0486902    total: 1m    remaining: 5.08s
923:    learn: 0.0486902    total: 1m    remaining: 5.01s
924:    learn: 0.0486807    total: 1m 1s    remaining: 4.95s
925:    learn: 0.0486789    total: 1m 1s    remaining: 4.88s
926:    learn: 0.0486717    total: 1m 1s    remaining: 4.81s
927:    learn: 0.0486567    total: 1m 1s    remaining: 4.75s
928:    learn: 0.0486567    total: 1m 1s    remaining: 4.67s
929:    learn: 0.0486428    total: 1m 1s    remaining: 4.61s
930:    learn: 0.0486347    total: 1m 1s    remaining: 4.54s
931:    learn: 0.0486260    total: 1m 1s    remaining: 4.47s
932:    learn: 0.0486114    total: 1m 1s    remaining: 4.41s
933:    learn: 0.0486028    total: 1m 1s    remaining: 4.34s
934:    learn: 0.0485810    total: 1m 1s    remaining: 4.27s
935:    learn: 0.0485567    total: 1m 1s    remaining: 4.21s
936:    learn: 0.0485481    total: 1m 1s    remaining: 4.14s
937:    learn: 0.0485389    total: 1m 1s    remaining: 4.08s
938:    learn: 0.0485326    total: 1m 1s    remaining: 4.01s
939:    learn: 0.0485190    total: 1m 1s    remaining: 3.94s
940:    learn: 0.0485043    total: 1m 1s    remaining: 3.87s
941:    learn: 0.0484906    total: 1m 1s    remaining: 3.81s
942:    learn: 0.0484815    total: 1m 1s    remaining: 3.74s
943:    learn: 0.0484684    total: 1m 1s    remaining: 3.67s
944:    learn: 0.0484594    total: 1m 1s    remaining: 3.61s
945:    learn: 0.0484474    total: 1m 2s    remaining: 3.54s
946:    learn: 0.0484398    total: 1m 2s    remaining: 3.47s
947:    learn: 0.0484324    total: 1m 2s    remaining: 3.41s
948:    learn: 0.0484188    total: 1m 2s    remaining: 3.34s
949:    learn: 0.0484188    total: 1m 2s    remaining: 3.27s
950:    learn: 0.0484104    total: 1m 2s    remaining: 3.21s
951:    learn: 0.0483957    total: 1m 2s    remaining: 3.14s
952:    learn: 0.0483880    total: 1m 2s    remaining: 3.08s
953:    learn: 0.0483805    total: 1m 2s    remaining: 3.01s
954:    learn: 0.0483613    total: 1m 2s    remaining: 2.95s
955:    learn: 0.0483613    total: 1m 2s    remaining: 2.88s
956:    learn: 0.0483527    total: 1m 2s    remaining: 2.81s
957:    learn: 0.0483453    total: 1m 2s    remaining: 2.75s
958:    learn: 0.0483278    total: 1m 2s    remaining: 2.68s
959:    learn: 0.0483195    total: 1m 2s    remaining: 2.62s
960:    learn: 0.0482972    total: 1m 2s    remaining: 2.55s
961:    learn: 0.0482971    total: 1m 2s    remaining: 2.49s
962:    learn: 0.0482848    total: 1m 3s    remaining: 2.42s
963:    learn: 0.0482810    total: 1m 3s    remaining: 2.36s
964:    learn: 0.0482642    total: 1m 3s    remaining: 2.29s
965:    learn: 0.0482608    total: 1m 3s    remaining: 2.22s
966:    learn: 0.0482587    total: 1m 3s    remaining: 2.16s
967:    learn: 0.0482500    total: 1m 3s    remaining: 2.09s
968:    learn: 0.0482373    total: 1m 3s    remaining: 2.03s
969:    learn: 0.0482161    total: 1m 3s    remaining: 1.96s
970:    learn: 0.0482095    total: 1m 3s    remaining: 1.9s
971:    learn: 0.0481939    total: 1m 3s    remaining: 1.83s
972:    learn: 0.0481842    total: 1m 3s    remaining: 1.76s
973:    learn: 0.0481776    total: 1m 3s    remaining: 1.7s
974:    learn: 0.0481723    total: 1m 3s    remaining: 1.64s
975:    learn: 0.0481654    total: 1m 3s    remaining: 1.57s
976:    learn: 0.0481611    total: 1m 3s    remaining: 1.5s
977:    learn: 0.0481550    total: 1m 4s    remaining: 1.44s
978:    learn: 0.0481407    total: 1m 4s    remaining: 1.37s
979:    learn: 0.0481281    total: 1m 4s    remaining: 1.31s
980:    learn: 0.0481096    total: 1m 4s    remaining: 1.24s
981:    learn: 0.0481066    total: 1m 4s    remaining: 1.18s
982:    learn: 0.0480850    total: 1m 4s    remaining: 1.11s
983:    learn: 0.0480813    total: 1m 4s    remaining: 1.05s
984:    learn: 0.0480682    total: 1m 4s    remaining: 982ms
985:    learn: 0.0480566    total: 1m 4s    remaining: 917ms
986:    learn: 0.0480384    total: 1m 4s    remaining: 852ms
987:    learn: 0.0480320    total: 1m 4s    remaining: 786ms
988:    learn: 0.0480199    total: 1m 4s    remaining: 720ms
989:    learn: 0.0480085    total: 1m 4s    remaining: 655ms
990:    learn: 0.0479959    total: 1m 4s    remaining: 590ms
991:    learn: 0.0479911    total: 1m 5s    remaining: 524ms
992:    learn: 0.0479851    total: 1m 5s    remaining: 459ms
993:    learn: 0.0479672    total: 1m 5s    remaining: 393ms
994:    learn: 0.0479602    total: 1m 5s    remaining: 328ms
995:    learn: 0.0479478    total: 1m 5s    remaining: 262ms
996:    learn: 0.0479352    total: 1m 5s    remaining: 197ms
997:    learn: 0.0479208    total: 1m 5s    remaining: 131ms
998:    learn: 0.0478981    total: 1m 5s    remaining: 65.6ms
999:    learn: 0.0478903    total: 1m 5s    remaining: 0us

정확도:  0.9888059701492538
CPU times: user 1min 58s, sys: 1.57 s, total: 2min
Wall time: 1min 7s

그리드 서치를 통해 나온 하이퍼 파라미터 중 ‘depth’: 13, ‘learning_rate’: 0.1 두 개의 파라미터를 조정하였더니 정확도 약 98%가 나왔다.
속도는 다른 부스팅에 비해 비교적 느리지만 정확도는 가장 높은 것을 확인할 수 있다.

결론

RandomForest에서는 하이퍼파라미터 튜닝 후가 65.6ms 로 더 적게 걸렸다. 이를 분석하기위해 기본 파라미터를 살펴보자.

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
criterion=’gini’, max_depth=5, max_features=’auto’,
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)

여기서 우리가 하이퍼 파라미터 튜닝을 위해 넣은 n_estimators= 25는 랜덤포레스트를 구성하는 나무의 갯수를 뜻한다. 기본 파라미터는 100개인데, 튜닝에서 25를 넣어주니 정확도도 더 올라갔고, 속도도 더 줄어듬을 확인 할 수 있었다. 이것은 자료의 양이 많지않아 25개의 트리로 구성하는게 더 빠르고 적합하다 것을 의미한다.

그리고 나는 이 데이터에서 Cat부스트의 성능에 주목했다.
cat부스트에서 주목해야할 점은, 다른 모형들에 비해 기본적으로 뛰어난 정확도이다.
이것은 Cat부스트의 특징때문인데, cat부스트는 수치형데이터보다 범주형 데이터 분석에 더 탁월한 성능을 가지고있다. 그래서 하이퍼 파라미터 튜닝후에는 정확도가 0.988인, 거의 1에 가까운 값이 나왔다고 생각한다

Posted 2021-04-13Updated 2021-04-1314 minutes read (About 2100 words)

Titanic data

결정 트리, 랜덤포레스트, XGBoost, lightBGM, CATBoost 비교

전처리

1	!pip install catboost

Collecting catboost
[?25l  Downloading https://files.pythonhosted.org/packages/47/80/8e9c57ec32dfed6ba2922bc5c96462cbf8596ce1a6f5de532ad1e43e53fe/catboost-0.25.1-cp37-none-manylinux1_x86_64.whl (67.3MB)
[K     |████████████████████████████████| 67.3MB 57kB/s 
[?25hRequirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages (from catboost) (4.4.1)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from catboost) (1.4.1)
Requirement already satisfied: numpy>=1.16.0 in /usr/local/lib/python3.7/dist-packages (from catboost) (1.19.5)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from catboost) (3.2.2)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from catboost) (1.15.0)
Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.7/dist-packages (from catboost) (1.1.5)
Requirement already satisfied: graphviz in /usr/local/lib/python3.7/dist-packages (from catboost) (0.10.1)
Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.7/dist-packages (from plotly->catboost) (1.3.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (1.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->catboost) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->catboost) (2018.9)
Installing collected packages: catboost
Successfully installed catboost-0.25.1

import os
import random

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost as xgb
import lightgbm as lgbm
import catboost as cb
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier

from sklearn import metrics    
from sklearn.model_selection import RandomizedSearchCV

def set_seed(seed_value):
    random.seed(seed_value)
    np.random.seed(seed_value)
    os.environ["PYTHONHASHSEED"] = str(seed_value)
    

SEED = 42
set_seed(SEED)

train_df = pd.read_csv('/content/sample_data/titanic_train.csv')
test_df = pd.read_csv('/content/sample_data/titanic_test.csv')
print(f"Train shape: {train_df.shape}")
train_df.sample(3)

Train shape: (891, 12)

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
709	710	1	3	Moubarek, Master. Halim Gonios ("William George")	male	NaN	1	1	2661	15.2458	NaN	C
439	440	0	2	Kvillner, Mr. Johan Henrik Johannesson	male	31.0	0	0	C.A. 18723	10.5000	NaN	S
840	841	0	3	Alhomaki, Mr. Ilmari Rudolf	male	20.0	0	0	SOTON/O2 3101287	7.9250	NaN	S

1 2	print(f"Test shape: {test_df.shape}") test_df.sample(3)

Test shape: (418, 11)

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
20	912	1	Rothschild, Mr. Martin	male	55.00	1	0	PC 17603	59.40	NaN	C
338	1230	2	Denbury, Mr. Herbert	male	25.00	0	0	C.A. 31029	31.50	NaN	S
250	1142	2	West, Miss. Barbara J	female	0.92	1	2	C.A. 34651	27.75	NaN	S

full_df = pd.concat(
    [
        train_df.drop(["PassengerId", "Survived"], axis=1), 
        test_df.drop(["PassengerId"], axis=1),
    ]
)
y_train = train_df["Survived"].values

1	full_df.isna().sum()

Pclass         0
Name           0
Sex            0
Age          263
SibSp          0
Parch          0
Ticket         0
Fare           1
Cabin       1014
Embarked       2
dtype: int64

1	full_df = full_df.drop(["Age", "Cabin"], axis=1)

plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.hist(full_df["Fare"], bins=20)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.title("Fare distribution", fontsize=16)

plt.subplot(1, 2, 2)
embarked_info = full_df["Embarked"].value_counts()
plt.bar(embarked_info.index, embarked_info.values)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.title("Embarked distribution", fontsize=16);

png

1 2	full_df["Embarked"].fillna("S", inplace=True) full_df["Fare"].fillna(full_df["Fare"].mean(), inplace=True)

full_df["Title"] = full_df["Name"].str.extract(" ([A-Za-z]+)\.")
full_df["Title"] = full_df["Title"].replace(["Ms", "Mlle"], "Miss")
full_df["Title"] = full_df["Title"].replace(["Mme", "Countess", "Lady", "Dona"], "Mrs")
full_df["Title"] = full_df["Title"].replace(["Dr", "Major", "Col", "Sir", "Rev", "Jonkheer", "Capt", "Don"], "Mr")
full_df = full_df.drop(["Name"], axis=1)

1
2
3

full_df["Sex"] = full_df["Sex"].map({"male": 1, "female": 0}).astype(int)    
full_df["Embarked"] = full_df["Embarked"].map({"S": 1, "C": 2, "Q": 3}).astype(int)    
full_df['Title'] = full_df['Title'].map({"Mr": 0, "Miss": 1, "Mrs": 2, "Master": 3}).astype(int)

full_df["TicketNumber"] = full_df["Ticket"].str.split()
full_df["TicketNumber"] = full_df["TicketNumber"].str[-1]
full_df["TicketNumber"] = LabelEncoder().fit_transform(full_df["TicketNumber"])
full_df = full_df.drop(["Ticket"], axis=1)

1 2	full_df["FamilySize"] = full_df["SibSp"] + full_df["Parch"] + 1 full_df["IsAlone"] = full_df["FamilySize"].apply(lambda x: 1 if x == 1 else 0)

1	full_df.head()

	Pclass	Sex	SibSp	Fare	Embarked	Title	TicketNumber	FamilySize	IsAlone
0	3	1	1	7.2500	1	0	209	2	0
1	1	0	1	71.2833	2	2	166	2	0
2	3	0	0	7.9250	1	1	466	1	1
3	1	0	1	53.1000	1	2	67	2	0
4	3	1	0	8.0500	1	0	832	1	1

X_train = full_df[:y_train.shape[0]]
X_test = full_df[y_train.shape[0]:]

print(f"Train X shape: {X_train.shape}")
print(f"Train y shape: {y_train.shape}")
print(f"Test X shape: {X_test.shape}")

Train X shape: (891, 10)
Train y shape: (891,)
Test X shape: (418, 10)

one_hot_cols = ["Embarked", "Title"]
for col in one_hot_cols:
    full_df = pd.concat(
        [full_df, pd.get_dummies(full_df[col], prefix=col)], 
        axis=1, 
        join="inner",
    )
full_df = full_df.drop(one_hot_cols, axis=1)

1 2	scaler = StandardScaler() full_df.loc[:] = scaler.fit_transform(full_df)

1	print(full_df)

	Pclass	Sex	SibSp	Parch	Fare	TicketNumber	FamilySize	IsAlone	Embarked_1	Embarked_2	Embarked_3	Title_0	Title_1	Title_2	Title_3
0	0.841916	0.743497	0.481288	-0.445000	-0.503595	-0.846179	0.073352	-1.233758	0.655011	-0.50977	-0.32204	0.819619	-0.502625	-0.425920	-0.221084
1	-1.546098	-1.344995	0.481288	-0.445000	0.734503	-1.004578	0.073352	-1.233758	-1.526692	1.96167	-0.32204	-1.220079	-0.502625	2.347858	-0.221084
2	0.841916	-1.344995	-0.479087	-0.445000	-0.490544	0.100529	-0.558346	0.810532	0.655011	-0.50977	-0.32204	-1.220079	1.989556	-0.425920	-0.221084
3	-1.546098	-1.344995	0.481288	-0.445000	0.382925	-1.369263	0.073352	-1.233758	0.655011	-0.50977	-0.32204	-1.220079	-0.502625	2.347858	-0.221084
4	0.841916	0.743497	-0.479087	-0.445000	-0.488127	1.448759	-0.558346	0.810532	0.655011	-0.50977	-0.32204	0.819619	-0.502625	-0.425920	-0.221084
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
413	0.841916	0.743497	-0.479087	-0.445000	-0.488127	0.347336	-0.558346	0.810532	0.655011	-0.50977	-0.32204	0.819619	-0.502625	-0.425920	-0.221084
414	-1.546098	-1.344995	-0.479087	-0.445000	1.461829	-0.938271	-0.558346	0.810532	-1.526692	1.96167	-0.32204	-1.220079	-0.502625	2.347858	-0.221084
415	0.841916	0.743497	-0.479087	-0.445000	-0.503595	0.026855	-0.558346	0.810532	0.655011	-0.50977	-0.32204	0.819619	-0.502625	-0.425920	-0.221084
416	0.841916	0.743497	-0.479087	-0.445000	-0.488127	1.183533	-0.558346	0.810532	0.655011	-0.50977	-0.32204	0.819619	-0.502625	-0.425920	-0.221084
417	0.841916	0.743497	0.481288	0.710763	-0.211473	-0.253105	0.705051	-1.233758	-1.526692	1.96167	-0.32204	-1.220079	-0.502625	-0.425920	4.523164

1309 rows × 15 columns

X_train_norm = full_df[:y_train.shape[0]]
X_test_norm = full_df[y_train.shape[0]:]

print(f"Train norm X shape: {X_train_norm.shape}")
print(f"Train y shape: {y_train.shape}")
print(f"Test norm X shape: {X_test_norm.shape}")

Train norm X shape: (891, 15)
Train y shape: (891,)
Test norm X shape: (418, 15)

1	categorical_columns = ['Sex', 'Embarked', 'Title', 'TicketNumber', 'IsAlone']

1	cross_valid_scores = {}

1	X1_train, X1_test, y1_train, y1_test = train_test_split(X_train, y_train, test_size=0.3)

결정트리 생성

%%time
parameters = {
    "max_depth": [3, 5, 7, 9, 11, 13],
}

model_desicion_tree = DecisionTreeClassifier(
    random_state=SEED,
    class_weight='balanced',
)

model_desicion_tree = GridSearchCV(
    model_desicion_tree, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_desicion_tree.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_desicion_tree.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + \
    f'{model_desicion_tree.best_score_:.3f}'
)
cross_valid_scores['desicion_tree'] = model_desicion_tree.best_score_
print('-----')

-----
Best parameters {'max_depth': 11}
Mean cross-validated accuracy score of the best_estimator: 0.817
-----
CPU times: user 180 ms, sys: 2.04 ms, total: 182 ms
Wall time: 202 ms

랜덤 포레스트

그리드서치

%%time
parameters = {
    "n_estimators": [5, 10, 15, 20, 25], 
    "max_depth": [3, 5, 7, 9, 11, 13],
}

model_random_forest = RandomForestClassifier(
    random_state=SEED,
    class_weight='balanced',
)

model_random_forest = GridSearchCV(
    model_random_forest, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_random_forest.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_random_forest.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: '+ \
    f'{model_random_forest.best_score_:.3f}'
)
cross_valid_scores['random_forest'] = model_random_forest.best_score_
print('-----')

-----
Best parameters {'max_depth': 11, 'n_estimators': 25}
Mean cross-validated accuracy score of the best_estimator: 0.844
-----
CPU times: user 4.84 s, sys: 43 ms, total: 4.89 s
Wall time: 4.9 s

랜덤 서치

%%time

parameters = {
    "n_estimators": [5, 10, 15, 20, 25], 
    "max_depth": [3, 5, 7, 9, 11, 13],
}


model2_random_forest_rs = RandomizedSearchCV(model2_random_forest,parameters,cv=5,n_iter=50,random_state=0,scoring="accuracy")
model2_random_forest_rs.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model2_random_forest_rs.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: '+ \
    f'{model2_random_forest_rs.best_score_:.3f}'
)
cross_valid_scores['random_forest'] = model2_random_forest_rs.best_score_
print('-----')

/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_search.py:281: UserWarning: The total space of parameters 30 is smaller than n_iter=50. Running 30 iterations. For exhaustive searches, use GridSearchCV.
  % (grid_size, self.n_iter, grid_size), UserWarning)


-----
Best parameters {'n_estimators': 25, 'max_depth': 11}
Mean cross-validated accuracy score of the best_estimator: 0.844
-----
CPU times: user 4.68 s, sys: 27.1 ms, total: 4.71 s
Wall time: 4.73 s

###파라미터 튜닝을 하지않은 randomForest


model_rf1=RandomForestClassifier(max_depth=5)
model_rf1.fit(X_train,y_train)

y_pred_rf1=model_rf1.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, y_pred_rf1))

정확도 : 1.0

model_rf2=RandomForestClassifier(n_estimators= 25, max_depth= 11)
model_rf2.fit(X_train,y_train)

y_pred_rf2=model_rf2.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, y_pred_rf2))

정확도 : 0.9589552238805971

XGBOOST

gridSearch

%%time
parameters = {
    'max_depth': [3, 5, 7, 9], 
    'n_estimators': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1]
}

model_xgb = xgb.XGBClassifier(
    random_state=SEED,
)

model_xgb = GridSearchCV(
    model_xgb, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_xgb.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_xgb.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_xgb.best_score_:.3f}'
)
cross_valid_scores['xgboost'] = model_xgb.best_score_
print('-----')

-----
Best parameters {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100}
Mean cross-validated accuracy score of the best_estimator: 0.846
-----
CPU times: user 13.7 s, sys: 177 ms, total: 13.9 s
Wall time: 14 s

xgboost에서 하이퍼파라미터튜닝을 위해 GridSearch를 진행.
time : 14.3 s
Best parameters {‘n_estimators’: 100, ‘max_depth’: 7, ‘learning_rate’: 0.1}

랜덤 서치

%%time
params = {
    'max_depth': [3, 5, 7, 9], 
    'n_estimators': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1]
}
model_xgb_random = xgb.XGBClassifier(
    random_state=SEED,
)
model_xgb_random =RandomizedSearchCV(model_xgb_random ,params,cv=5,n_iter=50,random_state=0,scoring="accuracy")

model_xgb_random .fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_xgb_random .best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_xgb_random .best_score_:.3f}'
)
cross_valid_scores['xgboost'] = model_xgb_random .best_score_
print('-----')

-----
Best parameters {'n_estimators': 100, 'max_depth': 7, 'learning_rate': 0.1}
Mean cross-validated accuracy score of the best_estimator: 0.846
-----
CPU times: user 9.22 s, sys: 119 ms, total: 9.34 s
Wall time: 9.34 s

xgboost에서 두 서치의 성능을 보기위해 똑같은 환경에서 RandomSearch를 진행.
time : 9.46 s
Best parameters {‘n_estimators’: 100, ‘max_depth’: 7, ‘learning_rate’: 0.1}

파라미터튜닝을 하지않은 xgboost


model_1=xgb.XGBClassifier()
model_1.fit(X_train,y_train)
pred_y1=model_1.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, pred_y1))

정확도 : 0.8843283582089553

하이퍼 파라미터 적용

model_2=xgb.XGBClassifier(learning_rate= 0.1, max_depth= 7, n_estimators=100)
model_2.fit(X_train,y_train)

pred_y2=model_2.predict(X1_test)

print('\n정확도 :', metrics.accuracy_score(y1_test, pred_y2))

정확도 : 0.9514925373134329

그리드 서치보다 랜덤 서치의 속도가 더 빠른 것을 알 수있다.
또한 하이퍼 파라미터를 튜닝 한 후의 정확도가 훨씬 올라갔음을 알 수 있다.

lightBGM

%%time
parameters = {
    'n_estimators': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1],
    'num_leaves': [7, 15, 31],
}

model_lgbm = lgbm.LGBMClassifier(
    random_state=SEED,
    class_weight='balanced',
)

model_lgbm = GridSearchCV(
    model_lgbm, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_lgbm.fit(
    X_train, 
    y_train, 
    categorical_feature=categorical_columns
)

print('-----')
print(f'Best parameters {model_lgbm.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_lgbm.best_score_:.3f}'
)
cross_valid_scores['lightgbm'] = model_lgbm.best_score_
print('-----')

Catboost

%%time
parameters = {
    'iterations': [5, 10, 15, 20, 25, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1],
    'depth': [3, 5, 7, 9, 11, 13],
}

model_catboost = cb.CatBoostClassifier(
    verbose=False,
)

model_catboost = GridSearchCV(
    model_catboost, 
    parameters, 
    cv=5,
    scoring='accuracy',
)

model_catboost.fit(X_train, y_train)

print('-----')
print(f'Best parameters {model_catboost.best_params_}')
print(
    f'Mean cross-validated accuracy score of the best_estimator: ' + 
    f'{model_catboost.best_score_:.3f}'
)
cross_valid_scores['catboost'] = model_catboost.best_score_
print('-----')

Posted 2021-04-07Updated 2021-04-078 hours read (About 70588 words)

0407파이썬기초

10분 pandas 실습
참고 사이트

10분은 개뿔....

In [ ]:

import numpy as np
import pandas as pd

오브젝트 생성¶

Series 오브젝트 생성

In [ ]:

s= pd.Series([1,3,5,np.nan,6,8])
s

Out[ ]:

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

DataFrame 오브젝트 생성

In [ ]:

dates= pd.date_range("20130101",periods=6) #pandas 내장 함수를 이용해 20130101부터 6일 까지의 리스트를 만듬
dates

Out[ ]:

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [ ]:

df=pd.DataFrame(np.random.randn(6,4),index=dates,columns=list("ABCD")) #가우시안 표준 정규분포에서 난수 matrix array생성f=pd.DataFrame(np.random.randn(6,4),index=dates,columns=list("ABCD")) #가우시안 표준 정규분포에서 난수 matrix array생성
df

Out[ ]:

	A	B	C	D
2013-01-01	0.076484	0.747200	0.378133	-0.459580
2013-01-02	0.331515	-0.308679	2.492327	1.544272
2013-01-03	0.175129	1.136551	1.649191	1.887292
2013-01-04	-0.602191	0.424951	1.219719	0.728339
2013-01-05	-0.289813	0.310278	1.205465	1.814098
2013-01-06	0.123148	2.174075	-0.157369	-0.103182

직렬로 변환할수있는 dict형을 데이터 프레임으로 생성한다.

In [ ]:

df2= pd.DataFrame({#keys는 자동으로 열의 이름이 된다. 
    "A":1.0, #비어있는 값들에 전부 1.0이 들어감
    "B":pd.Timestamp("20130102"),
    "C":pd.Series(1,index=list(range(4)),dtype="float32"),
    "D":np.array([3]*4,dtype="int32"),# 배열 3 , 3, 3, 3 생긴거 차례대로 들어감.
    "E":pd.Categorical(["test","train","test","train"]),
    "F":"foo"})
df2

Out[ ]:

	A	B	C	D	E	F
0	1.0	2013-01-02	1.0	3	test	foo
1	1.0	2013-01-02	1.0	3	train	foo
2	1.0	2013-01-02	1.0	3	test	foo
3	1.0	2013-01-02	1.0	3	train	foo

In [ ]:

df2.dtypes

Out[ ]:

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

Data 보기¶

In [ ]:

df.head()#상위 몇개 보기
df.tail(3)#하위 몇개 보기

Out[ ]:

<bound method NDFrame.tail of                    A         B         C         D
2013-01-01  0.076484  0.747200  0.378133 -0.459580
2013-01-02  0.331515 -0.308679  2.492327  1.544272
2013-01-03  0.175129  1.136551  1.649191  1.887292
2013-01-04 -0.602191  0.424951  1.219719  0.728339
2013-01-05 -0.289813  0.310278  1.205465  1.814098
2013-01-06  0.123148  2.174075 -0.157369 -0.103182>

In [ ]:

df.index #df의 인덱스들(행의 이름)만 보기

Out[ ]:

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [ ]:

df.columns #df의 열의 이름들 보기

Out[ ]:

Index(['A', 'B', 'C', 'D'], dtype='object')

Numpy배열은 전체 배열에 대해 하나의 dtype을 가지고있고, pandas DataFrame은 열당 하나의 dtype을 가진다. Numpy를 사용해서 dataFrame을 불러오려면 모든 dtype을 저장할 수 있는 Numpy 유형을 찾기때문에 모든 값들을 따로 Python 객체에 캐스팅 해야한다는 뜻이다. 그래서 Numpy로 DataFrame을 작업하려면 많은 것을 감당해야한다.

여기서 df는 하나의 dtype이므로 DataFrame.to_numpy()를 사용하면 빠르고 데이터 복사가 필요없다.

In [ ]:

df.to_numpy()

Out[ ]:

array([[ 0.07648391,  0.74719975,  0.37813257, -0.4595797 ],
       [ 0.33151464, -0.30867886,  2.49232684,  1.54427172],
       [ 0.17512946,  1.13655097,  1.64919136,  1.88729241],
       [-0.60219069,  0.42495131,  1.21971929,  0.72833864],
       [-0.28981328,  0.31027806,  1.20546499,  1.81409759],
       [ 0.12314811,  2.17407458, -0.15736865, -0.10318153]])

df2는 다양한 dtype이라 비용이 크다.

In [ ]:

df2.to_numpy()

Out[ ]:

array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)

Note : DataFrame.to_numpy()는 index와 columns 라벨을 출력하지않는다.

describe()는 데이터의 분석결과를 요약해서 보여준다.

In [ ]:

df.describe()

Out[ ]:

	A	B	C	D
count	6.000000	6.000000	6.000000	6.000000
mean	-0.030955	0.747396	1.131244	0.901873
std	0.347121	0.848197	0.934008	1.010909
min	-0.602191	-0.308679	-0.157369	-0.459580
25%	-0.198239	0.338946	0.584966	0.104699
50%	0.099816	0.586076	1.212592	1.136305
75%	0.162134	1.039213	1.541823	1.746641
max	0.331515	2.174075	2.492327	1.887292

데이터 전송??

In [ ]:

df.T

Out[ ]:

	2013-01-01	2013-01-02	2013-01-03	2013-01-04	2013-01-05	2013-01-06
A	0.076484	0.331515	0.175129	-0.602191	-0.289813	0.123148
B	0.747200	-0.308679	1.136551	0.424951	0.310278	2.174075
C	0.378133	2.492327	1.649191	1.219719	1.205465	-0.157369
D	-0.459580	1.544272	1.887292	0.728339	1.814098	-0.103182

정렬하는 함수 sort
sort_index는 인덱스 기준 정렬

In [ ]:

df.sort_index(axis=0,ascending=False)
#axis가 0이면 열 기준 정렬 (ABCD)
#axis가 1이면 행 기준 정렬 (날짜)

Out[ ]:

	A	B	C	D
2013-01-06	0.123148	2.174075	-0.157369	-0.103182
2013-01-05	-0.289813	0.310278	1.205465	1.814098
2013-01-04	-0.602191	0.424951	1.219719	0.728339
2013-01-03	0.175129	1.136551	1.649191	1.887292
2013-01-02	0.331515	-0.308679	2.492327	1.544272
2013-01-01	0.076484	0.747200	0.378133	-0.459580

sort_values는 데이터 값 기준 정렬

In [ ]:

df.sort_values(by="B") #B 열을 기준으로 정렬

Out[ ]:

	A	B	C	D
2013-01-02	0.331515	-0.308679	2.492327	1.544272
2013-01-05	-0.289813	0.310278	1.205465	1.814098
2013-01-04	-0.602191	0.424951	1.219719	0.728339
2013-01-01	0.076484	0.747200	0.378133	-0.459580
2013-01-03	0.175129	1.136551	1.649191	1.887292
2013-01-06	0.123148	2.174075	-0.157369	-0.103182

하나의 열만 보고싶을때 df["열이름"]

In [ ]:

df["A"]

Out[ ]:

2013-01-01    0.076484
2013-01-02    0.331515
2013-01-03    0.175129
2013-01-04   -0.602191
2013-01-05   -0.289813
2013-01-06    0.123148
Freq: D, Name: A, dtype: float64

행 슬라이싱
df[행번호: 행번호] or df["행이름":"행이름"]

In [ ]:

df[0:3]

Out[ ]:

	A	B	C	D
2013-01-01	0.076484	0.747200	0.378133	-0.459580
2013-01-02	0.331515	-0.308679	2.492327	1.544272
2013-01-03	0.175129	1.136551	1.649191	1.887292

In [ ]:

df["20130102":"20130104"]

Out[ ]:

	A	B	C	D
2013-01-02	0.331515	-0.308679	2.492327	1.544272
2013-01-03	0.175129	1.136551	1.649191	1.887292
2013-01-04	-0.602191	0.424951	1.219719	0.728339

label을 통한 선택¶

df.loc[라벨]

In [ ]:

df

Out[ ]:

	A	B	C	D
2013-01-01	0.076484	0.747200	0.378133	-0.459580
2013-01-02	0.331515	-0.308679	2.492327	1.544272
2013-01-03	0.175129	1.136551	1.649191	1.887292
2013-01-04	-0.602191	0.424951	1.219719	0.728339
2013-01-05	-0.289813	0.310278	1.205465	1.814098
2013-01-06	0.123148	2.174075	-0.157369	-0.103182

In [ ]:

df.loc[dates[0]] # 리스트 dates[0] = 20130102 , 라벨이 20130102인 행 출력

Out[ ]:

A    0.076484
B    0.747200
C    0.378133
D   -0.459580
Name: 2013-01-01 00:00:00, dtype: float64

여러 축의 라벨을 사용하여 선택 할 수 있다. df.loc[ 행, 열 ]

In [ ]:

df.loc[:,["A","B"]]

Out[ ]:

	A	B
2013-01-01	0.076484	0.747200
2013-01-02	0.331515	-0.308679
2013-01-03	0.175129	1.136551
2013-01-04	-0.602191	0.424951
2013-01-05	-0.289813	0.310278
2013-01-06	0.123148	2.174075

In [ ]:

matplotlib쓰려면 데이터들을 전부 list로 반환해줘야한다.
pandas를 써도 matplotlib을 쓰려면 데이터프레임들을 list로 반환시켜야함.
matplotlib -->리스트 자료형 받음
seaborn -->판다스 자료형 받음. dataFrame자료형도 받는다. ggplot이랑 유사하다.

In [ ]:

Posted 2021-04-07Updated 2021-04-078 hours read (About 71595 words)

Matplotlib실습

Matlib 실습 210407¶

1. matplot 그리기¶

In [47]:

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator, FuncFormatter)

tips = sns.load_dataset("tips")
tips.head()

Out[47]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

In [6]:

tips_day = tips.groupby("day").mean().reset_index() #group_by로 요일별 평균 데이터를 만든다. #reset_index()로 인덱스(1,2,3) 만들어줌.
 
tips_day

Out[6]:

	day	total_bill	tip	size
0	Thur	17.682742	2.771452	2.451613
1	Fri	17.151579	2.734737	2.105263
2	Sat	20.441379	2.993103	2.517241
3	Sun	21.410000	3.255132	2.842105

In [12]:

fig, ax = plt.subplots()
ax.bar(tips_day["day"], tips_day["tip"], color="lightgray")
ax.set_title("tip (mean)", fontsize=16, pad=12)

Out[12]:

Text(0.5, 1.0, 'tip (mean)')

In [ ]:

fig, ax = plt.subplots() # 틀 그리기.
#fig는 figure로써 전체 subplot을 말한다. 서브플롯 안에 몇개의 그래프가있던지 그걸 담는 틀.
#ax는 axe로써 전체중 낱낱개를 말한다. 그래프의 수가 많아지면 배열 형태로 저장. ex) ax[1], ax[3]
ax.bar(tips_day["day"], tips_day["tip"], color="lightgray") #bar그리기
ax.set_title("tip (mean)", fontsize=16, pad=12) #타이틀 그리기

plt.show()

In [13]:

ax.patches#ax에 있는 patches? 서브플롯 안의 객체를 말하는 듯. 

Out[13]:

[<matplotlib.patches.Rectangle at 0x7f3e844b0450>,
 <matplotlib.patches.Rectangle at 0x7f3e844f47d0>,
 <matplotlib.patches.Rectangle at 0x7f3e844b0890>,
 <matplotlib.patches.Rectangle at 0x7f3e8443d310>]

Rectangle 객체가 4개 있다는 뜻이다. 높이를 확인하면 요일별 평균 데이터가 그대로 있다.

In [14]:

for i in range(len(ax.patches)):
    print(f"height of patch[{i}] = {ax.patches[i].get_height()}")

height of patch[0] = 2.771451612903226
height of patch[1] = 2.734736842105263
height of patch[2] = 2.993103448275862
height of patch[3] = 3.255131578947369

2. 지정 그래프의 색 바꾸기¶

In [15]:

#일요일 데이터만 골라서 짙은 빨강 칠하기
fig, ax = plt.subplots()
ax.bar(tips_day["day"], tips_day["tip"], color="lightgray")
ax.set_title("tip (mean)", fontsize=16, pad=12)

# Sunday
#sunday는 4번째 객체이므로 patches[3]
ax.patches[3].set_facecolor("darkred") #facecolor. 안의 바탕색
ax.patches[3].set_edgecolor("black") #edgecolor 외관선 색

plt.show()

3. 그래프에 텍스트 부여하기¶

In [16]:

fig, ax = plt.subplots()
ax.bar(tips_day["day"], tips_day["tip"], color="lightgray")
ax.set_title("tip (mean)", fontsize=16, pad=12)

# Values
h_pad = 0.1#글자 배치를 위한 패드를 깔아주기
for i in range(4):
    fontweight = "normal"
    color = "k"
    if i == 3: # Sunday
        fontweight = "bold" #sunday일때 굵은 글씨
        color = "darkred" #어두운 붉은 글씨
    
    #그래프 객체에 텍스트 붙이는 함수 ax.text
    ax.text(i, tips_day["tip"].loc[i] + h_pad, f"{tips_day['tip'].loc[i]:0.2f}", #글자 색 지정
            horizontalalignment='center', fontsize=12, fontweight=fontweight, color=color)
#글자 색 지정 tips_day['tip']은 tip열거 뽑아낸거, loc은 그중에 i번째, 0.2f는 소숫점 두자리까지만.



# Sunday
ax.patches[3].set_facecolor("darkred")
ax.patches[3].set_edgecolor("black")

# set_range
ax.set_ylim(0, 4)

plt.show()

ax.spines 활용에 집중하기 위해서 시각화 코드를 함수로 만든다.

In [48]:

def plot_example(ax, zorder=0):
    ax.bar(tips_day["day"], tips_day["tip"], color="lightgray", zorder=zorder)
    ax.set_title("tip (mean)", fontsize=16, pad=12)

    # Values
    h_pad = 0.1
    for i in range(4):
        fontweight = "normal"
        color = "k"
        if i == 3:
            fontweight = "bold"
            color = "darkred"

        ax.text(i, tips_day["tip"].loc[i] + h_pad, f"{tips_day['tip'].loc[i]:0.2f}", 
                horizontalalignment='center', fontsize=12, fontweight=fontweight, color=color)

    # Sunday
    ax.patches[3].set_facecolor("darkred")
    ax.patches[3].set_edgecolor("black")

    # set_range
    ax.set_ylim(0, 4)
    return ax

In [23]:

fig, ax = plt.subplots()
ax = plot_example(ax)

4. SPINES¶

In [24]:

type(ax.spines)

Out[24]:

collections.OrderedDict

OrderedDict의 객체. dictionary의 일종이다.

In [27]:

for k,v in ax.spines.items():
  print(f"spines[{k}] = {v}")

ax.spines.values()

spines[left] = Spine
spines[right] = Spine
spines[bottom] = Spine
spines[top] = Spine

Out[27]:

odict_values([<matplotlib.spines.Spine object at 0x7f3e84001650>, <matplotlib.spines.Spine object at 0x7f3e840b35d0>, <matplotlib.spines.Spine object at 0x7f3e8414af90>, <matplotlib.spines.Spine object at 0x7f3e84096f50>])

key는 네개의 테두리를 가졌고 value는 matplotlib.spines,Spine객체이다.

공식 문서에따르면 Spine은 Patch의 subclass이고 set_patch_circle,set_patch_arc가 호출되면 원이나 호를 그리기도 한다. 선을 그리는 set_patch_line이 기본값이다.

Spine 숨기기? : set_visible(False)

In [28]:

fig, ax = plt.subplots()
ax = plot_example(ax)

ax.spines["top"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["right"].set_visible(False)

#위,옆 축들을 다 지워버렸다.

In [29]:

fig, ax = plt.subplots()
ax = plot_example(ax)

ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

ax.spines["left"].set_bounds(1, 3) #set_bounds를 이용해서 1~3까지만 축을 그렸다. 

In [30]:

ax.spines["left"].get_position() #spines["left"]의 설정. ("outward")0만큼 나가있다? 

Out[30]:

('outward', 0.0)

In [ ]:

#포지션을 통해서 left축(y축)과 그래프 간격을 띄운다

fig, ax = plt.subplots()
ax = plot_example(ax)

ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_position(("outward", 10))##set_position을 이용해 left 축을 10만큼  띄운다.

In [31]:

#만약 축을 안쪽으로  당겨오고 싶으면 outward 음수 값을 넣는다. 

fig, ax = plt.subplots(ncols=3, figsize=(15, 3))

for i in range(3): # 그래프 세개 한번에 그리기
    ax[i] = plot_example(ax[i])
    ax[i].spines["top"].set_visible(False)
    ax[i].spines["right"].set_visible(False)
    
# ax[0] : spine을 data 영역에서 ^지정된 거리^만큼 이동  
ax[0].spines["left"].set_position(("outward", -50)) 

# ax[1] : spine을 ^axes의 지정된 위치^에 설정
ax[1].spines["left"].set_position(("axes", 0.3))

# ax[2] : spine을 ^data의 지정된 위치^에 설정
ax[2].spines["left"].set_position(("data", 2.5))

5. Grid깔기¶

In [32]:

fig, ax = plt.subplots(ncols=3, figsize=(15, 3))

for i in range(3): # 그래프 세개 한번에 그리기
    ax[i] = plot_example(ax[i])
    ax[i].spines["top"].set_visible(False)
    ax[i].spines["right"].set_visible(False)
    ax[0].spines["left"].set_position(("outward", 10))


# axis={“both”, “x”, “y”} 인자로 방향을 지정한다. 
# ax[0] : x, y 둘 다 
ax[0].grid(axis="both")

# ax[1] : x축에서만
ax[1].grid(axis="x")

# ax[2] : y축에서만
ax[2].grid(axis="y")

major grid와 minor grid 구분해서 깔기¶

In [56]:

# !!! 오류나는  코드임 !!!

from matplotlib.ticker import (MultipleLocator, AutoMinorLocator)

fig, ax = plt.subplots()
ax = plot_example(ax)

# top, right, left spines 안보이기
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)

# y축 tick 설정
ax.yaxis.set_major_locator(MultipleLocator(1))    # major tick을 1 단위로 설정
ax.yaxis.set_major_formatter('{x:0.2f}')          # major tick format 지정
ax.yaxis.set_minor_locator(MultipleLocator(0.5))  # minor tick을 0.5 단위로 지정

plt.plot()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-56-33d5c6f4fbaf> in <module>()
     11 # y축 tick 설정
     12 ax.yaxis.set_major_locator(MultipleLocator(1))    # major tick을 1 단위로 설정
---> 13 ax.yaxis.set_major_formatter('{x:0.2f}')          # major tick format 지정
     14 ax.yaxis.set_minor_locator(MultipleLocator(0.5))  # minor tick을 0.5 단위로 지정
     15 

/usr/local/lib/python3.7/dist-packages/matplotlib/axis.py in set_major_formatter(self, formatter)
   1626         formatter : `~matplotlib.ticker.Formatter`
   1627         """
-> 1628         cbook._check_isinstance(mticker.Formatter, formatter=formatter)
   1629         self.isDefault_majfmt = False
   1630         self.major.formatter = formatter

/usr/local/lib/python3.7/dist-packages/matplotlib/cbook/__init__.py in _check_isinstance(_types, **kwargs)
   2126                     ", ".join(names[:-1]) + " or " + names[-1]
   2127                     if len(names) > 1 else names[0],
-> 2128                     type_name(type(v))))
   2129 
   2130 

TypeError: 'formatter' must be an instance of matplotlib.ticker.Formatter, not a str

세팅 환경에 의해서 ax.yaxis.set_major_formatter(formatter) 에서 오류 발생.

TypeError: 'formatter' must be an instance of matplotlib.ticker.Formatter, not a str

set_major_formatter에는 matplotlib.ticker.Formatter의 형식으로 들어가야 한다.

해결방법

from matplotlib.ticker import (MultipleLocator, AutoMinorLocator, FuncFormatter)

을 설치하고,

def major_formatter(x, pos):
    return "{%.2f}" % x
formatter = FuncFormatter(major_formatter)

이라는 함수를 따로 만들어 준다.

In [ ]:

from matplotlib.ticker import (MultipleLocator, AutoMinorLocator, FuncFormatter)

def major_formatter(x, pos):
    return "{%.2f}" % x
formatter = FuncFormatter(major_formatter)

In [49]:

#정상 작동

from matplotlib.ticker import (MultipleLocator, AutoMinorLocator)

fig, ax = plt.subplots()
ax = plot_example(ax)

# top, right, left spines 안보이기
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)

# y축 tick 설정
#여기서 major 라인, minor 라인 먼저 설정한다. 
ax.yaxis.set_major_locator(MultipleLocator(1))    # major tick을 1 단위로 설정
ax.yaxis.set_major_formatter(formatter)      # major tick format 지정 (오류가 나면 matplotlib upgrade)
ax.yaxis.set_minor_locator(MultipleLocator(0.5))  # minor tick을 0.5 단위로 지정

plt.plot()

Out[49]:

[]

In [52]:

fig, ax = plt.subplots(ncols=3, figsize=(15, 3))

for i in range(3): # 그래프 세개 한번에 그리기
    ax[i] = plot_example(ax[i], zorder=2) # zorder: bar를 grid 앞으로.
    ax[i].spines["top"].set_visible(False)
    ax[i].spines["right"].set_visible(False)
    ax[i].spines["left"].set_position(("outward", 10))
    ax[i].yaxis.set_major_locator(MultipleLocator(1))
    ax[i].yaxis.set_major_formatter(formatter)
    ax[i].yaxis.set_minor_locator(MultipleLocator(0.5))
    
# ax[0] : major, minor 둘 다
ax[0].grid(axis="y", which="both")

# ax[1] : major만
ax[1].grid(axis="y", which="major")

# ax[2] : major만 + 여러 옵션
ax[2].grid(axis="y", which="major", color="r", ls=":", lw=0.5, alpha=0.5) #clolor red, 점선, 투명도 낮추기.

plt.show()

In [55]:

fig, ax = plt.subplots()
ax = plot_example(ax, zorder=2)

ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
    
ax.yaxis.set_major_locator(MultipleLocator(1))
ax.yaxis.set_major_formatter(formatter)
ax.yaxis.set_minor_locator(MultipleLocator(0.5))

ax.grid(axis="y", which="major", color="lightgray") # major line에는 강조하기위해 회색줄
ax.grid(axis="y", which="minor", ls=":") # minor라인에는  점선 줄

Posted 2021-04-06Updated 2021-04-06 이현정 3 minutes read (About 520 words)

리스트 vs 튜플

리스트와 튜플의 차이점이 무 엇일까?

1. 리스트와 튜플

my_list=[1,2,3]
print(my_list)

my_tuple=(1,2,3)
print(my_tuple)

둘 다 타입과 상관 없이 일련의 요소(element)를 갖을 수 있다.
두 타입 모두 요소의 순서를 관리한다. (세트(set)나 딕셔너리(dict)와 다르게 말이다.)

2. 리스트와 튜플의 차이

my_list[1]="two"
print(my_list)

my_tuple[1]="two" #error!

리스트와 튜플의 기술적 차이점은 불변성에 있다.
리스트는 가변적(mutable, 변경 가능)이며 튜플은 불변적(immutable, 변경 불가)이다.

4. 튜플의 특징과 장점?

인덱스 요소마다 의미가 있다면, tuple로!

리스트는 리스트 안의 요소가 불확실한 경우에 주로 사용한다.
하지만 튜플은 들어있는 요소의 수와 종류를 사전에 정확히 알고 있을 경우에 사용한다.
why? 각 요소의 위치가 큰 의미를 갖고있기 때문에.

튜플은 구조체(struct)같은 특징을 지닌다.
그래서, 다양한 타입을 저장하는 경우에 자주 사용된다.

튜플은 공간효율성이 좋다.

리스트는 요소를 추가하는 동작을 빠르게 수행할 수 있도록 더 많은 공간을 저장해둔다.
하지만 튜플은 고정적이기때문에 공간을 많이 차지하지않는다.

튜플은 얕은 복사

리스트는 객체를 복사할 경우에 새로운 객체로 복사가 된다.
하지만 튜플은 서로 id가 같은 객체로 복사가 된다.
새로운 메모리에 값을 할당하는게 아닌, 복사된 변수가 원본과 같은 객체를 가리키고 있다는 뜻이다.

3. 그렇다면 언제 tuple을 쓰는가?

빠른 연산속도라는 장점을 지니고 있는 튜플은 읽기 전용데이터에 많이 쓰인다.
또한 원소로 다양한 타입을 저장하는 경우에도 자주쓰인다.