Hackathon & me

Machine Learning Hackathons serve as a recreational activity, for me. And until last night, my last indulgence — sometime around 2021; pre-COVID-19 era, on the Zindi Platform. Half the intent is to acknowledge that I've recently taken up DSN and Microsoft Skills for Job as a pastime with an underlying research angle, and so far I've come to enjoy the entire process all over again. It's thrilling to see how minor adjustments can move one up the leaderboard. Self-gratification and then, possibly, the inevitable shakedown as other similarly skilled data scientists improve on their model.

Basically, one's position on the leaderboard reflects only on some subset of the whole dataset provided by the organizers until end of the competition. Hence, shakedown. There are other technicalities (for instance, "overfitting to the holdout") to it but that would be a bore. Read this paper if you have an interest in the shenanigan that can be the leaderboard.

Every ML competition takes on the same format: Objective, Timeline, Data (partially or fully provided, depending on the learning; typically partitioned), Evaluation method, Prize(ranging from points to $$, even jobs), and RULES. And this one is no different:

Objective
The objective of this hackathon is to create a powerful and accurate predictive model that can estimate the prices of houses in Nigeria. By leveraging the provided dataset, you will analyze various factors that impact house prices, identify meaningful patterns, and build a model that can generate reliable price predictions. The ultimate goal is to provide Wazobia Real Estate Limited with an effective tool to make informed pricing decisions and enhance their competitiveness in the market.

Timeline
This challenge starts on 12 June at 13:00 PM. Competition closes on 29 July at midnight.

...Rules
If your solution places 1st, 2nd, or 3rd on the final leaderboard, you will be required to submit your winning solution code to us for verification, and you thereby agree to assign all worldwide rights of copyright in and to such winning solution to Zindi.

yikes!

Every other detail can be seen here

The publicity of this notebook is to ensure participants can derive an idea or two from my implementation and perhaps get into the top 3%.

I presently sit in 1st 2nd position with a much more similar approach.

Zindi - leaderboard

screenshot of zindi leaderboard

In [ ]:
 

Random seed is used mainly for reproducibility. It is an arbitary number that ensures the results seen in this notebook can be easily reproduced when ran on another machine, as long as correctly included where necessary.

In [1]:
seed = 42 #the meaning of life
In [2]:
#import computational/visualizational packages

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline
In [3]:
#import provided data
submission = pd.read_csv('Sample_submission.csv')
train = pd.read_csv('Housing_dataset_train.csv')
test = pd.read_csv('Housing_dataset_test.csv')
In [4]:
train.shape, test.shape, test.shape[0] == submission.shape[0]
Out[4]:
((14000, 7), (6000, 6), True)
In [5]:
train.head(2)
Out[5]:
ID loc title bedroom bathroom parking_space price
0 3583 Katsina Semi-detached duplex 2.0 2.0 1.0 1149999.565
1 2748 Ondo Apartment NaN 2.0 4.0 1672416.689

Pandas.DataFrame.sample essentially returns a random sample of items from an axis of object. The line of code below is to show you that while 'sample' is a probablistic distribution, setting a random_state makes it a unifirm distribution. That is, the same result as be obtained over and over again.

In [6]:
test.sample(3, random_state=seed) #random_state set
Out[6]:
ID loc title bedroom bathroom parking_space
1782 1696 Kwara Mansion 5 1 3
3917 11276 Adamawa Flat 7 3 2
221 10879 Nasarawa Townhouse 9 4 2
In [7]:
test.sample(3, random_state=seed) #random_state set to same seed for reproducibility
Out[7]:
ID loc title bedroom bathroom parking_space
1782 1696 Kwara Mansion 5 1 3
3917 11276 Adamawa Flat 7 3 2
221 10879 Nasarawa Townhouse 9 4 2
In [ ]:
 
In [8]:
def check_nan_percentage(df):
    """a simple function that returns sum of NaNs in every column"""
    return df.isna().mean()*100
In [9]:
check_nan_percentage(train)
Out[9]:
ID                0.000000
loc              12.950000
title            12.300000
bedroom          12.850000
bathroom         12.892857
parking_space    12.935714
price             0.000000
dtype: float64
In [10]:
check_nan_percentage(test)
Out[10]:
ID               0.0
loc              0.0
title            0.0
bedroom          0.0
bathroom         0.0
parking_space    0.0
dtype: float64
In [ ]:
 

Having a non-null test dataset as seen above leaves possible imputation for the train dataset - saving a bit of time in the process. Let's focus on our train dataset.

In [11]:
sns.pairplot(train)
Out[11]:
<seaborn.axisgrid.PairGrid at 0x24e397b3580>
In [12]:
train['price'].hist(bins=50, figsize=(15,6))
plt.show()
In [13]:
train['price'].describe()
Out[13]:
count    1.400000e+04
mean     2.138082e+06
std      1.083057e+06
min      4.319673e+05
25%      1.393990e+06
50%      1.895223e+06
75%      2.586699e+06
max      1.656849e+07
Name: price, dtype: float64
In [14]:
train.loc[:, 'price']= np.log1p(train['price'])
train['price'].head()
Out[14]:
0    13.955273
1    14.329781
2    15.028879
3    14.695265
4    14.771292
Name: price, dtype: float64
In [15]:
train['price'].hist(bins=50, figsize=(15,8))
plt.show()
In [ ]:
 
In [16]:
#check if ID is unique and follows no pattern.

pd.Series(train['ID']).is_unique
Out[16]:
False
In [17]:
train['ID'].hist(bins=30, figsize=(15,7))
Out[17]:
<AxesSubplot:>
In [18]:
train[train['ID'] == 3583]
Out[18]:
ID loc title bedroom bathroom parking_space price
0 3583 Katsina Semi-detached duplex 2.0 2.0 1.0 13.955273
5682 3583 Edo Penthouse 3.0 5.0 5.0 14.778512
In [19]:
train['ID'].describe()
Out[19]:
count    14000.000000
mean      4862.700357
std       3818.348214
min          0.000000
25%       1672.750000
50%       3527.000000
75%       8011.250000
max      12999.000000
Name: ID, dtype: float64
In [20]:
train.loc[:, 'ID'] = np.log1p(train['ID'])
test.loc[:, 'ID'] = np.log1p(test['ID'])
In [21]:
test['ID'].describe()
Out[21]:
count    6000.000000
mean        8.006512
std         1.178420
min         1.098612
25%         7.407318
50%         8.141481
75%         8.983565
max         9.472397
Name: ID, dtype: float64
In [22]:
sns.histplot(train.loc[:, 'ID'])
Out[22]:
<AxesSubplot:xlabel='ID', ylabel='Count'>
In [23]:
sns.histplot(test.loc[:, 'ID'])
Out[23]:
<AxesSubplot:xlabel='ID', ylabel='Count'>
In [ ]:
 
In [24]:
sns.pairplot(train, hue='bedroom')
Out[24]:
<seaborn.axisgrid.PairGrid at 0x24e3dc45c10>
In [25]:
sns.barplot(data=train, x='title', y='bedroom')
Out[25]:
<AxesSubplot:xlabel='title', ylabel='bedroom'>
In [26]:
sns.barplot(data=train, x='title', y='bathroom')
Out[26]:
<AxesSubplot:xlabel='title', ylabel='bathroom'>
In [27]:
sns.barplot(data=train, x='title', y='parking_space')
Out[27]:
<AxesSubplot:xlabel='title', ylabel='parking_space'>
In [28]:
sns.pairplot(train)
Out[28]:
<seaborn.axisgrid.PairGrid at 0x24e41d06880>
In [29]:
sns.boxplot(data=train, y='price')
Out[29]:
<AxesSubplot:ylabel='price'>
In [ ]:
 
In [30]:
X = train.dropna().drop(['loc', 'title', 'price'], axis=1)
y = train.dropna()['price']
In [ ]:
 
In [31]:
#get outliners/inliers
from sklearn.linear_model import RANSACRegressor, LinearRegression

ransacor = RANSACRegressor(LinearRegression(), min_samples=60, loss='squared_error', residual_threshold=5.0, random_state=seed)

ransacor.fit(X, y)
Out[31]:
RANSACRegressor(estimator=LinearRegression(), loss='squared_error',
                min_samples=60, random_state=42, residual_threshold=5.0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [32]:
inliers = ransacor.inlier_mask_
outliers = np.logical_not(inliers) #inverted-inliers
In [33]:
X[inliers]
Out[33]:
ID bedroom bathroom parking_space
0 8.184235 2.0 2.0 1.0
3 7.707512 5.0 2.0 4.0
7 8.007700 3.0 3.0 5.0
10 9.439386 1.0 2.0 6.0
11 7.872836 3.0 4.0 2.0
... ... ... ... ...
13989 9.221082 4.0 7.0 2.0
13990 8.878358 8.0 7.0 3.0
13992 8.138565 1.0 2.0 2.0
13994 9.257033 8.0 1.0 6.0
13997 9.322865 8.0 6.0 5.0

5689 rows × 4 columns

In [34]:
X[outliers]
Out[34]:
ID bedroom bathroom parking_space
In [35]:
train = train.dropna()
In [ ]:
 
In [36]:
q75 = train['price'].quantile(0.75)
q25 = train['price'].quantile(0.25)
iqr = q75-q25
In [37]:
high, low = q75 + (1.5 * iqr), q25 - (1.5 * iqr) #my sweet chariotttt
In [38]:
high, low
Out[38]:
(15.679543936496026, 13.233412461000762)
In [39]:
train[train['price'] > high].shape
Out[39]:
(32, 7)
In [40]:
train[train['price'] < low].shape
Out[40]:
(3, 7)
In [41]:
sns.histplot(train[train['price'] < high]['price'])
Out[41]:
<AxesSubplot:xlabel='price', ylabel='Count'>
In [42]:
train = train[train['price'] < high]
In [43]:
train = train[train['price'] > low]
In [44]:
sns.pairplot(train)
Out[44]:
<seaborn.axisgrid.PairGrid at 0x24e45079280>
In [45]:
#check correlation

corrmat= train.corr()
corrmat["price"].sort_values()
plt.figure(figsize=(15,7))  
sns.heatmap(corrmat,annot=True,center=0)
Out[45]:
<AxesSubplot:>
In [46]:
sns.boxplot(data=train, y='price')
Out[46]:
<AxesSubplot:ylabel='price'>
In [ ]:
 
In [47]:
check_nan_percentage(train)
Out[47]:
ID               0.0
loc              0.0
title            0.0
bedroom          0.0
bathroom         0.0
parking_space    0.0
price            0.0
dtype: float64
In [48]:
train['title'].value_counts()
Out[48]:
Flat                    656
Detached duplex         626
Townhouse               625
Penthouse               610
Apartment               609
Terrace duplex          593
Bungalow                585
Mansion                 566
Semi-detached duplex    564
Cottage                 220
Name: title, dtype: int64
In [50]:
k = train[['title', 'bedroom', 'bathroom']].groupby('title')
k.mean()
Out[50]:
bedroom bathroom
title
Apartment 4.336617 3.154351
Bungalow 4.449573 3.502564
Cottage 2.845455 1.486364
Detached duplex 4.399361 3.201278
Flat 4.461890 3.219512
Mansion 4.167845 3.166078
Penthouse 4.196721 3.001639
Semi-detached duplex 4.271277 3.253546
Terrace duplex 4.325464 3.217538
Townhouse 4.321600 3.160000
In [51]:
k.agg([np.sum, np.mean, np.std])
Out[51]:
bedroom bathroom
sum mean std sum mean std
title
Apartment 2641.0 4.336617 2.455523 1921.0 3.154351 2.042916
Bungalow 2603.0 4.449573 2.472715 2049.0 3.502564 2.115762
Cottage 626.0 2.845455 1.434642 327.0 1.486364 0.500954
Detached duplex 2754.0 4.399361 2.453458 2004.0 3.201278 2.090221
Flat 2927.0 4.461890 2.491833 2112.0 3.219512 2.027442
Mansion 2359.0 4.167845 2.415677 1792.0 3.166078 2.066329
Penthouse 2560.0 4.196721 2.376124 1831.0 3.001639 1.951379
Semi-detached duplex 2409.0 4.271277 2.383146 1835.0 3.253546 2.078710
Terrace duplex 2565.0 4.325464 2.395867 1908.0 3.217538 2.066846
Townhouse 2701.0 4.321600 2.472398 1975.0 3.160000 1.985525
In [52]:
#group dataset by title to check mean housing price
train.groupby(['title'])['price'].mean().reset_index()
Out[52]:
title price
0 Apartment 14.206367
1 Bungalow 14.309069
2 Cottage 14.007748
3 Detached duplex 14.563482
4 Flat 14.311766
5 Mansion 15.000560
6 Penthouse 14.715764
7 Semi-detached duplex 14.397414
8 Terrace duplex 14.408853
9 Townhouse 14.476695
In [53]:
train.groupby(['bedroom'])['bathroom'].median().reset_index()
Out[53]:
bedroom bathroom
0 1.0 2.0
1 2.0 2.0
2 3.0 2.0
3 4.0 2.0
4 5.0 2.0
5 6.0 4.0
6 7.0 4.0
7 8.0 4.0
8 9.0 4.0
In [54]:
train.groupby(['title', 'bedroom'])['price'].mean().reset_index()
Out[54]:
title bedroom price
0 Apartment 1.0 13.803295
1 Apartment 2.0 13.940348
2 Apartment 3.0 14.029750
3 Apartment 4.0 14.201082
4 Apartment 5.0 14.294487
... ... ... ...
81 Townhouse 5.0 14.543027
82 Townhouse 6.0 14.671119
83 Townhouse 7.0 14.829499
84 Townhouse 8.0 14.858607
85 Townhouse 9.0 14.964500

86 rows × 3 columns

In [55]:
train['title'].value_counts()
Out[55]:
Flat                    656
Detached duplex         626
Townhouse               625
Penthouse               610
Apartment               609
Terrace duplex          593
Bungalow                585
Mansion                 566
Semi-detached duplex    564
Cottage                 220
Name: title, dtype: int64
In [56]:
train[(train['title'] == 'Townhouse') & (train['bedroom'] == 9)].head()
Out[56]:
ID loc title bedroom bathroom parking_space price
1098 8.407378 Kano Townhouse 9.0 5.0 6.0 15.008351
1147 8.806274 Taraba Townhouse 9.0 5.0 2.0 14.888087
1526 6.651572 Adamawa Townhouse 9.0 1.0 5.0 14.860846
1699 9.428994 Ekiti Townhouse 9.0 6.0 1.0 14.976812
1968 9.443672 Adamawa Townhouse 9.0 3.0 1.0 14.851737
In [ ]:
 
In [57]:
train.groupby(['loc'])['price'].mean().reset_index()
Out[57]:
loc price
0 Abia 14.279171
1 Adamawa 14.374544
2 Akwa Ibom 14.705823
3 Anambra 14.579752
4 Bauchi 14.294989
5 Bayelsa 14.875713
6 Benue 14.379780
7 Borno 14.300886
8 Cross River 14.644476
9 Delta 14.702339
10 Ebonyi 14.249751
11 Edo 14.578267
12 Ekiti 14.526841
13 Enugu 14.543229
14 Gombe 14.370078
15 Imo 14.439650
16 Jigawa 14.265463
17 Kaduna 14.333098
18 Kano 14.479540
19 Katsina 14.395788
20 Kebbi 14.226796
21 Kogi 14.306193
22 Kwara 14.349098
23 Lagos 15.074740
24 Nasarawa 14.464896
25 Niger 14.376803
26 Ogun 14.685502
27 Ondo 14.575023
28 Osun 14.514564
29 Oyo 14.592162
30 Plateau 14.392665
31 Rivers 14.756781
32 Sokoto 14.266587
33 Taraba 14.356615
34 Yobe 14.277455
35 Zamfara 14.285468
In [58]:
train.groupby(['loc'])['bedroom'].std().reset_index().head()
Out[58]:
loc bedroom
0 Abia 2.367225
1 Adamawa 2.555558
2 Akwa Ibom 2.318507
3 Anambra 2.370779
4 Bauchi 2.478535
In [59]:
train.groupby(['ID'])['bedroom'].mean().reset_index().head()
Out[59]:
ID bedroom
0 0.000000 6.0
1 0.693147 4.0
2 1.098612 7.0
3 1.386294 1.0
4 1.791759 4.0
In [60]:
train[(train['ID'] == 0)]
Out[60]:
ID loc title bedroom bathroom parking_space price
8591 0.0 Benue Penthouse 5.0 2.0 1.0 14.692917
12637 0.0 Kaduna Townhouse 7.0 4.0 3.0 14.718922
In [ ]:
 
In [61]:
sns.scatterplot(data=train, x='price', y='ID', hue='bedroom')
Out[61]:
<AxesSubplot:xlabel='price', ylabel='ID'>
In [62]:
train['ID'].min(), train['ID'].max(), test['ID'].min(), test['ID'].max()
Out[62]:
(0.0, 9.472550778454295, 1.0986122886681098, 9.472396896788991)
In [63]:
q1 = train['ID'].quantile(0.25)
q3 = train['ID'].quantile(0.75)
qr = q3 - q1
high = q3 + 1.5 * qr
low = q1 - 1.5 * qr
In [64]:
low, high
Out[64]:
(5.112974557913137, 11.308176213004618)
In [65]:
train[~(train['ID'] < low) | (train['ID'] > high)].corr(), train[(train['ID'] < low) | (train['ID'] > high)].shape
Out[65]:
(                     ID   bedroom  bathroom  parking_space     price
 ID             1.000000  0.213961  0.325527       0.168767  0.218169
 bedroom        0.213961  1.000000  0.229302       0.109424  0.627396
 bathroom       0.325527  0.229302  1.000000       0.179171  0.274750
 parking_space  0.168767  0.109424  0.179171       1.000000  0.144898
 price          0.218169  0.627396  0.274750       0.144898  1.000000,
 (142, 7))
In [ ]:
 
In [66]:
sns.scatterplot(data=train[~(train['ID'] < low) | (train['ID'] > high)], x='price', y='ID', hue='bedroom')
Out[66]:
<AxesSubplot:xlabel='price', ylabel='ID'>
In [67]:
train = train[~(train['ID'] < low) | (train['ID'] > high)]
train.head(2)
Out[67]:
ID loc title bedroom bathroom parking_space price
0 8.184235 Katsina Semi-detached duplex 2.0 2.0 1.0 13.955273
3 7.707512 Anambra Detached duplex 5.0 2.0 4.0 14.695265
In [68]:
sns.boxplot(x=train['ID'])
Out[68]:
<AxesSubplot:xlabel='ID'>
In [69]:
train[train['title'] == 'Mansion']
Out[69]:
ID loc title bedroom bathroom parking_space price
43 8.654169 Delta Mansion 3.0 6.0 1.0 15.256161
71 7.010312 Zamfara Mansion 1.0 1.0 3.0 14.873885
77 7.268920 Imo Mansion 2.0 2.0 3.0 14.283410
90 7.388328 Rivers Mansion 3.0 1.0 4.0 15.571653
92 9.231710 Imo Mansion 2.0 6.0 3.0 14.917547
... ... ... ... ... ... ... ...
13896 7.380256 Bayelsa Mansion 2.0 1.0 4.0 14.675163
13927 8.495152 Plateau Mansion 4.0 4.0 3.0 14.964919
13928 9.436120 Anambra Mansion 5.0 6.0 5.0 15.327209
13956 9.111072 Ogun Mansion 8.0 1.0 2.0 15.505564
13972 9.298717 Yobe Mansion 9.0 3.0 5.0 15.269816

548 rows × 7 columns

In [ ]:
 
In [70]:
print(train.loc[(train['title'] == 'Detached duplex')]['bedroom'].mode())
print(train.loc[(train['title'] == 'Bungalow')]['bedroom'].mode())
print(train.loc[(train['title'] == 'Cottage')]['bedroom'].mode())
0    5.0
Name: bedroom, dtype: float64
0    2.0
Name: bedroom, dtype: float64
0    1.0
Name: bedroom, dtype: float64
In [71]:
mansion_bed1 = train.loc[(train['title'] == 'Mansion') & (train['bedroom'] == 1)]
mansion_bed1.head(), mansion_bed1.shape
Out[71]:
(           ID          loc    title  bedroom  bathroom  parking_space  \
 71   7.010312      Zamfara  Mansion      1.0       1.0            3.0   
 250  9.369137        Benue  Mansion      1.0       5.0            2.0   
 263  7.520776      Katsina  Mansion      1.0       1.0            2.0   
 439  9.164925  Cross River  Mansion      1.0       4.0            4.0   
 553  9.469854        Gombe  Mansion      1.0       1.0            4.0   
 
          price  
 71   14.873885  
 250  14.636338  
 263  14.901996  
 439  14.957212  
 553  14.544841  ,
 (79, 7))
In [73]:
#check correlation when mansions with weirdly 1 bedroom are dropped

corrmat= train.drop(mansion_bed1.index.to_list(), axis=0).corr()
corrmat["price"].sort_values()
plt.figure(figsize=(15,7))  
sns.heatmap(corrmat,annot=True,center=0)
Out[73]:
<AxesSubplot:>
In [74]:
train = train.drop(mansion_bed1.index.to_list(), axis=0)
In [75]:
train.head()
Out[75]:
ID loc title bedroom bathroom parking_space price
0 8.184235 Katsina Semi-detached duplex 2.0 2.0 1.0 13.955273
3 7.707512 Anambra Detached duplex 5.0 2.0 4.0 14.695265
7 8.007700 Katsina Penthouse 3.0 3.0 5.0 14.529983
10 9.439386 Ogun Bungalow 1.0 2.0 6.0 14.100850
11 7.872836 Bayelsa Apartment 3.0 4.0 2.0 14.453025
In [76]:
# train[['bedroom', 'bathroom', 'parking_space']] = train[['bedroom', 'bathroom', 'parking_space']].fillna(0)
In [77]:
# c = train.copy()
In [78]:
# c[['bedroom', 'bathroom', 'parking_space']] = c[['bedroom', 'bathroom', 'parking_space']].fillna(0)
In [79]:
# c
In [80]:
train.dropna(inplace=True)
train.reset_index(drop=True, inplace=True)
In [81]:
train
Out[81]:
ID loc title bedroom bathroom parking_space price
0 8.184235 Katsina Semi-detached duplex 2.0 2.0 1.0 13.955273
1 7.707512 Anambra Detached duplex 5.0 2.0 4.0 14.695265
2 8.007700 Katsina Penthouse 3.0 3.0 5.0 14.529983
3 9.439386 Ogun Bungalow 1.0 2.0 6.0 14.100850
4 7.872836 Bayelsa Apartment 3.0 4.0 2.0 14.453025
... ... ... ... ... ... ... ...
5428 9.221082 Kebbi Terrace duplex 4.0 7.0 2.0 14.273607
5429 8.878358 Kebbi Penthouse 8.0 7.0 3.0 14.942516
5430 8.138565 Ogun Cottage 1.0 2.0 2.0 14.226529
5431 9.257033 Taraba Detached duplex 8.0 1.0 6.0 14.858328
5432 9.322865 Plateau Bungalow 8.0 6.0 5.0 14.693814

5433 rows × 7 columns

In [ ]:
 
In [82]:
build_types = train['title'].unique()
build_types
Out[82]:
array(['Semi-detached duplex', 'Detached duplex', 'Penthouse', 'Bungalow',
       'Apartment', 'Terrace duplex', 'Townhouse', 'Flat', 'Mansion',
       'Cottage'], dtype=object)
In [ ]:
 
In [83]:
#check house title stat by bedroom
def check_median_house(build_types: list, df):
    medians = []
    for i, build_type in enumerate(build_types):
        medians.append(df.loc[(train['title'] == f'{build_type}')]['bedroom'].median())
        print(build_type, medians[i])
    return medians

def check_mode_house(build_types: list, df):
    for build_type in build_types:
        print(build_type, df.loc[(train['title'] == f'{build_type}')]['bedroom'].mode())
    return
In [84]:
# data = {
#     'build_types': build_types,
#     'df': train
# }

# medians = check_median_house(**data)
# print('=======')
# check_mode_house(**data)
In [85]:
check_nan_percentage(train)
Out[85]:
ID               0.0
loc              0.0
title            0.0
bedroom          0.0
bathroom         0.0
parking_space    0.0
price            0.0
dtype: float64
In [86]:
train.describe().T
Out[86]:
count mean std min 25% 50% 75% max
ID 5433.0 8.132588 0.993830 5.123964 7.499977 8.218518 9.000360 9.472551
bedroom 5433.0 4.339039 2.411370 1.000000 2.000000 4.000000 6.000000 9.000000
bathroom 5433.0 3.165286 2.044947 1.000000 1.000000 2.000000 5.000000 7.000000
parking_space 5433.0 3.162157 1.620136 1.000000 2.000000 3.000000 4.000000 6.000000
price 5433.0 14.464770 0.428951 13.271034 14.150462 14.451082 14.756224 15.676339
In [87]:
train.describe(include='O').T
Out[87]:
count unique top freq
loc 5433 36 Imo 170
title 5433 10 Flat 641
In [88]:
def feature_eng(df):
    df['bed_per_bath'] = df['bedroom']/(df['bathroom'])
    df['bed_per_park'] = df['bedroom']/df['parking_space']
    df['allrooms'] = df['bedroom'] + df['bathroom'] + 1
    df['IDbath'] = df['ID'] * df['bathroom']
    df['IDbed'] = df['ID'] * df['bedroom']
    return df.sample(5)
In [89]:
train_locprice_std = train.groupby('loc')['price'].std().astype(np.float16)
train_titleprice_std = train.groupby('title')['price'].std().astype(np.float16)
train_bedprice_std = train.groupby('bedroom')['price'].std().astype(np.float16)
In [90]:
#shameful naming

train['x'] = train['loc'].map(train_locprice_std)
train['y'] = train['title'].map(train_titleprice_std)
train['z'] = train['bedroom'].map(train_bedprice_std)

train.head()
Out[90]:
ID loc title bedroom bathroom parking_space price x y z
0 8.184235 Katsina Semi-detached duplex 2.0 2.0 1.0 13.955273 0.392090 0.364746 0.351807
1 7.707512 Anambra Detached duplex 5.0 2.0 4.0 14.695265 0.406982 0.347900 0.322266
2 8.007700 Katsina Penthouse 3.0 3.0 5.0 14.529983 0.392090 0.324951 0.365723
3 9.439386 Ogun Bungalow 1.0 2.0 6.0 14.100850 0.353027 0.358643 0.333008
4 7.872836 Bayelsa Apartment 3.0 4.0 2.0 14.453025 0.345947 0.340576 0.365723
In [91]:
train.corr()['price']
Out[91]:
ID               0.219236
bedroom          0.654641
bathroom         0.278323
parking_space    0.148138
price            1.000000
x               -0.222529
y               -0.124272
z               -0.542893
Name: price, dtype: float64
In [92]:
test['x'] = test['loc'].map(train_locprice_std)
test['y'] = test['title'].map(train_titleprice_std)
test['z'] = test['bedroom'].map(train_bedprice_std)
In [93]:
test.head()
Out[93]:
ID loc title bedroom bathroom parking_space x y z
0 6.740519 Kano Penthouse 4 1 2 0.384033 0.324951 0.348633
1 7.562681 Adamawa Apartment 2 2 4 0.395752 0.340576 0.351807
2 9.279773 Adamawa Bungalow 2 7 2 0.395752 0.358643 0.351807
3 9.399058 Lagos Mansion 9 5 2 0.365967 0.358887 0.248535
4 9.413689 Gombe Semi-detached duplex 5 6 1 0.439209 0.364746 0.322266
In [94]:
#combine train and test dataset
train['marker'] = 'train'
test['marker'] = 'test'

combo = pd.concat([train, test], axis=0)
combo.head()
Out[94]:
ID loc title bedroom bathroom parking_space price x y z marker
0 8.184235 Katsina Semi-detached duplex 2.0 2.0 1.0 13.955273 0.392090 0.364746 0.351807 train
1 7.707512 Anambra Detached duplex 5.0 2.0 4.0 14.695265 0.406982 0.347900 0.322266 train
2 8.007700 Katsina Penthouse 3.0 3.0 5.0 14.529983 0.392090 0.324951 0.365723 train
3 9.439386 Ogun Bungalow 1.0 2.0 6.0 14.100850 0.353027 0.358643 0.333008 train
4 7.872836 Bayelsa Apartment 3.0 4.0 2.0 14.453025 0.345947 0.340576 0.365723 train
In [95]:
feature_eng(combo)
Out[95]:
ID loc title bedroom bathroom parking_space price x y z marker bed_per_bath bed_per_park allrooms IDbath IDbed
4081 8.167636 Kebbi Detached duplex 4.0 2.0 2.0 NaN 0.490723 0.347900 0.348633 test 2.000000 2.00 7.0 16.335271 32.670543
2403 7.383989 Katsina Terrace duplex 4.0 4.0 4.0 14.345581 0.392090 0.366211 0.348633 train 1.000000 1.00 9.0 29.535958 29.535958
1839 9.123693 Gombe Terrace duplex 6.0 3.0 6.0 NaN 0.439209 0.366211 0.288330 test 2.000000 1.00 10.0 27.371078 54.742155
2769 8.385261 Plateau Apartment 5.0 7.0 5.0 14.332247 0.400879 0.340576 0.322266 train 0.714286 1.00 13.0 58.696824 41.926303
3098 8.992930 Sokoto Penthouse 3.0 3.0 4.0 14.397119 0.369629 0.324951 0.365723 train 1.000000 0.75 7.0 26.978791 26.978791
In [96]:
corrmat= combo[combo['marker'] == 'train'].corr()
corrmat["price"].sort_values()
plt.figure(figsize=(15,7))
sns.heatmap(corrmat,annot=True,center=0)
Out[96]:
<AxesSubplot:>
In [97]:
bed_median = np.median(combo['bedroom'])
In [ ]:
 
In [99]:
# #ofbathroom >= medianbed (accounts for #ofbed==#ofbath): privacy ensured - plus rows where bath was entered for bed - rectified
combo.loc[(combo['bathroom'] > (combo['bedroom']-(bed_median-1)))]
Out[99]:
ID loc title bedroom bathroom parking_space price x y z marker bed_per_bath bed_per_park allrooms IDbath IDbed
0 8.184235 Katsina Semi-detached duplex 2.0 2.0 1.0 13.955273 0.392090 0.364746 0.351807 train 1.000000 2.000000 5.0 16.368470 16.368470
2 8.007700 Katsina Penthouse 3.0 3.0 5.0 14.529983 0.392090 0.324951 0.365723 train 1.000000 0.600000 7.0 24.023100 24.023100
3 9.439386 Ogun Bungalow 1.0 2.0 6.0 14.100850 0.353027 0.358643 0.333008 train 0.500000 0.166667 4.0 18.878773 9.439386
4 7.872836 Bayelsa Apartment 3.0 4.0 2.0 14.453025 0.345947 0.340576 0.365723 train 0.750000 1.500000 8.0 31.491345 23.618509
5 8.268219 Abia Terrace duplex 3.0 3.0 3.0 14.073091 0.411133 0.366211 0.365723 train 1.000000 1.000000 7.0 24.804657 24.804657
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5993 7.588324 Anambra Flat 1.0 7.0 1.0 NaN 0.406982 0.369385 0.333008 test 0.142857 1.000000 9.0 53.118266 7.588324
5994 9.333708 Katsina Detached duplex 3.0 5.0 5.0 NaN 0.392090 0.347900 0.365723 test 0.600000 0.600000 9.0 46.668539 28.001123
5995 7.374629 Ekiti Flat 4.0 5.0 2.0 NaN 0.387207 0.369385 0.348633 test 0.800000 2.000000 10.0 36.873145 29.498516
5996 7.790282 Adamawa Terrace duplex 5.0 7.0 1.0 NaN 0.395752 0.366211 0.322266 test 0.714286 5.000000 13.0 54.531977 38.951412
5998 9.154405 Bauchi Flat 3.0 7.0 5.0 NaN 0.391357 0.369385 0.365723 test 0.428571 0.600000 11.0 64.080833 27.463214

7719 rows × 16 columns

In [100]:
combo.loc[(combo['bathroom'] > (combo['bedroom']-(bed_median-1))), 'privacy'] = 1
combo.loc[~(combo['bathroom'] > (combo['bedroom']-(bed_median-1))), 'privacy'] = 0
In [101]:
combo.loc[:, 'privacy'] = combo['privacy'].astype('category')
combo.loc[:, 'privacy'] = combo['privacy'].cat.codes
In [102]:
combo.sample(5, random_state=seed)
Out[102]:
ID loc title bedroom bathroom parking_space price x y z marker bed_per_bath bed_per_park allrooms IDbath IDbed privacy
3082 8.686598 Osun Penthouse 1.0 4.0 1.0 14.328241 0.390137 0.324951 0.333008 train 0.25 1.0 6.0 34.746393 8.686598 1
5791 8.484463 Adamawa Flat 9.0 4.0 1.0 NaN 0.395752 0.369385 0.248535 test 2.25 9.0 14.0 33.937853 76.360170 0
4890 8.893298 Edo Terrace duplex 1.0 4.0 1.0 14.078313 0.364502 0.366211 0.333008 train 0.25 1.0 6.0 35.573193 8.893298 1
3723 7.248504 Delta Semi-detached duplex 3.0 3.0 2.0 NaN 0.359619 0.364746 0.365723 test 1.00 1.5 7.0 21.745512 21.745512 1
5362 7.322510 Ogun Mansion 4.0 1.0 1.0 14.698166 0.353027 0.358887 0.348633 train 4.00 4.0 6.0 7.322510 29.290042 0
In [ ]:
 
In [104]:
combo[combo['marker'] == 'train'].corr()['price']
Out[104]:
ID               0.219236
bedroom          0.654641
bathroom         0.278323
parking_space    0.148138
price            1.000000
x               -0.222529
y               -0.124272
z               -0.542893
bed_per_bath     0.227709
bed_per_park     0.326153
allrooms         0.612600
IDbath           0.285645
IDbed            0.651088
privacy         -0.329917
Name: price, dtype: float64
In [ ]:
 
In [105]:
#same tecchnique as above; if sufficient parking space then big area (other wise okay/low), likely shared
combo.loc[(combo['parking_space'] >= (combo['bedroom']-(bed_median-1)))]
Out[105]:
ID loc title bedroom bathroom parking_space price x y z marker bed_per_bath bed_per_park allrooms IDbath IDbed privacy
0 8.184235 Katsina Semi-detached duplex 2.0 2.0 1.0 13.955273 0.392090 0.364746 0.351807 train 1.000000 2.000000 5.0 16.368470 16.368470 1
1 7.707512 Anambra Detached duplex 5.0 2.0 4.0 14.695265 0.406982 0.347900 0.322266 train 2.500000 1.250000 8.0 15.415024 38.537561 0
2 8.007700 Katsina Penthouse 3.0 3.0 5.0 14.529983 0.392090 0.324951 0.365723 train 1.000000 0.600000 7.0 24.023100 24.023100 1
3 9.439386 Ogun Bungalow 1.0 2.0 6.0 14.100850 0.353027 0.358643 0.333008 train 0.500000 0.166667 4.0 18.878773 9.439386 1
4 7.872836 Bayelsa Apartment 3.0 4.0 2.0 14.453025 0.345947 0.340576 0.365723 train 0.750000 1.500000 8.0 31.491345 23.618509 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5994 9.333708 Katsina Detached duplex 3.0 5.0 5.0 NaN 0.392090 0.347900 0.365723 test 0.600000 0.600000 9.0 46.668539 28.001123 1
5995 7.374629 Ekiti Flat 4.0 5.0 2.0 NaN 0.387207 0.369385 0.348633 test 0.800000 2.000000 10.0 36.873145 29.498516 1
5997 9.229751 Oyo Townhouse 4.0 1.0 4.0 NaN 0.374268 0.340820 0.348633 test 4.000000 1.000000 6.0 9.229751 36.919003 0
5998 9.154405 Bauchi Flat 3.0 7.0 5.0 NaN 0.391357 0.369385 0.365723 test 0.428571 0.600000 11.0 64.080833 27.463214 1
5999 9.370502 Sokoto Mansion 6.0 1.0 6.0 NaN 0.369629 0.358887 0.288330 test 6.000000 1.000000 8.0 9.370502 56.223009 0

9132 rows × 17 columns

In [106]:
combo.loc[(combo['parking_space'] > (combo['bedroom']-(bed_median-1))), 'space_area'] = 1
combo.loc[~(combo['parking_space'] > (combo['bedroom']-(bed_median-1))), 'space_area'] = 0

combo.loc[:, 'space_area'] = combo['space_area'].astype('category')
combo.loc[:, 'space_area'] = combo['space_area'].cat.codes
In [107]:
#using cat.codes here, as high privacy/space_area is more valued
In [108]:
combo[combo['marker'] == 'train'].corr()
Out[108]:
ID bedroom bathroom parking_space price x y z bed_per_bath bed_per_park allrooms IDbath IDbed privacy space_area
ID 1.000000 0.217131 0.323989 0.170371 0.219236 0.003541 -0.010281 -0.237539 -0.056961 0.065502 0.338318 0.461212 0.397860 -0.028593 -0.124479
bedroom 0.217131 1.000000 0.232718 0.110751 0.654641 0.033225 0.025427 -0.859055 0.455004 0.550241 0.823535 0.240099 0.975767 -0.613352 -0.696406
bathroom 0.323989 0.232718 1.000000 0.179109 0.278323 -0.000614 0.037360 -0.250940 -0.570146 0.069874 0.743342 0.983160 0.268046 0.305499 -0.132872
parking_space 0.170371 0.110751 0.179109 1.000000 0.148138 0.014755 -0.021144 -0.128378 -0.040429 -0.594174 0.180645 0.185752 0.132372 0.000351 0.305391
price 0.219236 0.654641 0.278323 0.148138 1.000000 -0.222529 -0.124272 -0.542893 0.227709 0.326153 0.612600 0.285645 0.651088 -0.329917 -0.431993
x 0.003541 0.033225 -0.000614 0.014755 -0.222529 1.000000 0.011372 -0.038963 0.021527 0.011105 0.022494 -0.000614 0.032510 -0.038860 -0.012484
y -0.010281 0.025427 0.037360 -0.021144 -0.124272 0.011372 1.000000 -0.008649 -0.016124 0.007684 0.039280 0.033399 0.023190 0.005129 -0.002831
z -0.237539 -0.859055 -0.250940 -0.128378 -0.542893 -0.038963 -0.008649 1.000000 -0.319930 -0.446707 -0.737222 -0.259105 -0.855180 0.538902 0.666233
bed_per_bath -0.056961 0.455004 -0.570146 -0.040429 0.227709 0.021527 -0.016124 -0.319930 1.000000 0.283375 -0.019605 -0.547151 0.415152 -0.703524 -0.340727
bed_per_park 0.065502 0.550241 0.069874 -0.594174 0.326153 0.011105 0.007684 -0.446707 0.283375 1.000000 0.419209 0.072183 0.526055 -0.384445 -0.705099
allrooms 0.338318 0.823535 0.743342 0.180645 0.612600 0.022494 0.039280 -0.737222 -0.019605 0.419209 1.000000 0.738596 0.827473 -0.243669 -0.556486
IDbath 0.461212 0.240099 0.983160 0.185752 0.285645 -0.000614 0.033399 -0.259105 -0.547151 0.072183 0.738596 1.000000 0.305236 0.288969 -0.136429
IDbed 0.397860 0.975767 0.268046 0.132372 0.651088 0.032510 0.023190 -0.855180 0.415152 0.526055 0.827473 0.305236 1.000000 -0.573772 -0.674687
privacy -0.028593 -0.613352 0.305499 0.000351 -0.329917 -0.038860 0.005129 0.538902 -0.703524 -0.384445 -0.243669 0.288969 -0.573772 1.000000 0.474285
space_area -0.124479 -0.696406 -0.132872 0.305391 -0.431993 -0.012484 -0.002831 0.666233 -0.340727 -0.705099 -0.556486 -0.136429 -0.674687 0.474285 1.000000
In [ ]:
 
In [109]:
combo.drop(['IDbath', 'IDbed'], axis=1, inplace=True)
In [110]:
combo['loc'].unique()
Out[110]:
array(['Katsina', 'Anambra', 'Ogun', 'Bayelsa', 'Abia', 'Rivers',
       'Ebonyi', 'Enugu', 'Edo', 'Kwara', 'Kano', 'Osun', 'Delta',
       'Benue', 'Kogi', 'Cross River', 'Adamawa', 'Taraba', 'Oyo',
       'Kaduna', 'Sokoto', 'Imo', 'Jigawa', 'Ondo', 'Nasarawa', 'Borno',
       'Lagos', 'Gombe', 'Zamfara', 'Yobe', 'Kebbi', 'Akwa Ibom', 'Niger',
       'Ekiti', 'Bauchi', 'Plateau'], dtype=object)
In [111]:
def group_by_zone(state=None):
    #group state by region
    zones = {
        'North Central': ['benue', 'kogi', 'kwara', 'nasarawa', 'niger', 'plateau', 'fct'],
        'North East': ['adamawa', 'bauchi', 'borno', 'gombe', 'taraba', 'yobe'],
        'North West': ['jigawa', 'kaduna', 'kano', 'katsina', 'kebbi', 'sokoto', 'zamfara'],
        'South East': ['abia', 'anambra', 'ebonyi', 'enugu', 'imo'],
        'South South': ['akwa ibom', 'bayelsa', 'cross river', 'delta', 'edo', 'rivers'],
        'South West': ['ekiti', 'lagos', 'ogun', 'ondo', 'osun', 'oyo']
    }
    
    for key, values in zones.items():
        if state in values:
            return key
        continue
    return None
In [112]:
group_by_zone('ekiti'),group_by_zone('lagos'),group_by_zone('adamawa')
Out[112]:
('South West', 'South West', 'North East')
In [ ]:
 
In [113]:
combo['loc'] = combo['loc'].str.lower()
combo.head(2)
Out[113]:
ID loc title bedroom bathroom parking_space price x y z marker bed_per_bath bed_per_park allrooms privacy space_area
0 8.184235 katsina Semi-detached duplex 2.0 2.0 1.0 13.955273 0.392090 0.364746 0.351807 train 1.0 2.00 5.0 1 1
1 7.707512 anambra Detached duplex 5.0 2.0 4.0 14.695265 0.406982 0.347900 0.322266 train 2.5 1.25 8.0 0 1
In [114]:
combo['zone'] = combo['loc'].apply(group_by_zone)
combo[['loc', 'zone']].head()
Out[114]:
loc zone
0 katsina North West
1 anambra South East
2 katsina North West
3 ogun South West
4 bayelsa South South
In [115]:
combo.head(3)
Out[115]:
ID loc title bedroom bathroom parking_space price x y z marker bed_per_bath bed_per_park allrooms privacy space_area zone
0 8.184235 katsina Semi-detached duplex 2.0 2.0 1.0 13.955273 0.392090 0.364746 0.351807 train 1.0 2.00 5.0 1 1 North West
1 7.707512 anambra Detached duplex 5.0 2.0 4.0 14.695265 0.406982 0.347900 0.322266 train 2.5 1.25 8.0 0 1 South East
2 8.007700 katsina Penthouse 3.0 3.0 5.0 14.529983 0.392090 0.324951 0.365723 train 1.0 0.60 7.0 1 1 North West
In [116]:
combo.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 11433 entries, 0 to 5999
Data columns (total 17 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ID             11433 non-null  float64
 1   loc            11433 non-null  object 
 2   title          11433 non-null  object 
 3   bedroom        11433 non-null  float64
 4   bathroom       11433 non-null  float64
 5   parking_space  11433 non-null  float64
 6   price          5433 non-null   float64
 7   x              11433 non-null  float16
 8   y              11433 non-null  float16
 9   z              11433 non-null  float16
 10  marker         11433 non-null  object 
 11  bed_per_bath   11433 non-null  float64
 12  bed_per_park   11433 non-null  float64
 13  allrooms       11433 non-null  float64
 14  privacy        11433 non-null  int8   
 15  space_area     11433 non-null  int8   
 16  zone           11433 non-null  object 
dtypes: float16(3), float64(8), int8(2), object(4)
memory usage: 1.2+ MB
In [ ]:
 
In [117]:
cat_cols = combo.loc[:, combo.dtypes == 'object'].columns
cat_cols = list(cat_cols)
cat_cols
Out[117]:
['loc', 'title', 'marker', 'zone']
In [118]:
cat_cols.remove('marker')
cat_cols
Out[118]:
['loc', 'title', 'zone']
In [119]:
set(combo.columns)
Out[119]:
{'ID',
 'allrooms',
 'bathroom',
 'bed_per_bath',
 'bed_per_park',
 'bedroom',
 'loc',
 'marker',
 'parking_space',
 'price',
 'privacy',
 'space_area',
 'title',
 'x',
 'y',
 'z',
 'zone'}
In [120]:
num_cols_ = set(combo.columns) - set(cat_cols)
In [121]:
num_cols_ = list(num_cols_)
num_cols_
Out[121]:
['bedroom',
 'parking_space',
 'space_area',
 'bathroom',
 'price',
 'z',
 'x',
 'bed_per_bath',
 'ID',
 'y',
 'bed_per_park',
 'marker',
 'allrooms',
 'privacy']
In [122]:
num_cols_.remove('marker')
num_cols_.remove('price')
In [123]:
#num_cols_.remove('ID')
In [124]:
# _num_cols_ = ['parking_space', 'allrooms', 'bed_per_bath','IDbed',
#  'privacy',
#  'space_area',
#  'bed_per_park',
#  'bathroom',
#  'IDbath',
#  'bedroom',
#  ]
# # cat_cols = 
In [125]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
In [126]:
cat_pipeline = Pipeline(
    steps=[(
        'one_hot', OneHotEncoder(handle_unknown='ignore', sparse=False)
    )]
)

num_pipeline = Pipeline(
    steps=[(
        'st_scaler', StandardScaler()
    )]
)


columns_transformed = ColumnTransformer(transformers=[
    ('num_pipe', num_pipeline, num_cols_),
    ('cat_pipe', cat_pipeline, cat_cols)], n_jobs=-1
)


#short transformer
columns_transformed
Out[126]:
ColumnTransformer(n_jobs=-1,
                  transformers=[('num_pipe',
                                 Pipeline(steps=[('st_scaler',
                                                  StandardScaler())]),
                                 ['bedroom', 'parking_space', 'space_area',
                                  'bathroom', 'z', 'x', 'bed_per_bath', 'ID',
                                  'y', 'bed_per_park', 'allrooms', 'privacy']),
                                ('cat_pipe',
                                 Pipeline(steps=[('one_hot',
                                                  OneHotEncoder(handle_unknown='ignore',
                                                                sparse=False))]),
                                 ['loc', 'title', 'zone'])])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [ ]:
 
In [127]:
_train = combo[combo['marker'] == 'train']
_test = combo[combo['marker'] == 'test']
In [128]:
_train.shape, _test.shape
Out[128]:
((5433, 17), (6000, 17))
In [129]:
cols_to_use = cat_cols + num_cols_
# cols_to_use.remove('ID')
In [130]:
cols_to_use
Out[130]:
['loc',
 'title',
 'zone',
 'bedroom',
 'parking_space',
 'space_area',
 'bathroom',
 'z',
 'x',
 'bed_per_bath',
 'ID',
 'y',
 'bed_per_park',
 'allrooms',
 'privacy']
In [131]:
X = _train[cols_to_use]
y = _train['price']
In [132]:
y.head(3)
Out[132]:
0    13.955273
1    14.695265
2    14.529983
Name: price, dtype: float64
In [133]:
# #uncomment to save

# y_ = pd.DataFrame(y, columns=['price'])
# y_.to_csv('processed_trainY.csv')
In [ ]:
 
In [134]:
from sklearn.model_selection import train_test_split as TTS
In [135]:
#seed implemented

Xtrain, Xtest, ytrain, ytest = TTS(X, y, test_size=.2, random_state=seed)
In [136]:
Xtrain.head(2)
Out[136]:
loc title zone bedroom parking_space space_area bathroom z x bed_per_bath ID y bed_per_park allrooms privacy
5017 imo Mansion South East 9.0 6.0 0 1.0 0.248535 0.368408 9.0 8.861634 0.358887 1.500000 11.0 0
1612 nasarawa Flat North Central 2.0 3.0 1 5.0 0.351807 0.377930 0.4 9.092457 0.369385 0.666667 8.0 1
In [137]:
_Xtrain = columns_transformed.fit_transform(Xtrain)
_Xtest = columns_transformed.transform(Xtest)
_test = columns_transformed.transform(_test[cols_to_use])
In [138]:
_Xtrain.shape, _Xtest.shape, _test.shape
Out[138]:
((4346, 64), (1087, 64), (6000, 64))
In [140]:
#store the column names after transformation
cols_transformed_names = list(columns_transformed.get_feature_names_out())
In [141]:
Xtrain = pd.DataFrame(_Xtrain, columns=cols_transformed_names)
Xtest = pd.DataFrame(_Xtest, columns=cols_transformed_names)
test_df = pd.DataFrame(_test, columns=cols_transformed_names)
In [142]:
Xtrain.columns
Out[142]:
Index(['num_pipe__bedroom', 'num_pipe__parking_space', 'num_pipe__space_area',
       'num_pipe__bathroom', 'num_pipe__z', 'num_pipe__x',
       'num_pipe__bed_per_bath', 'num_pipe__ID', 'num_pipe__y',
       'num_pipe__bed_per_park', 'num_pipe__allrooms', 'num_pipe__privacy',
       'cat_pipe__loc_abia', 'cat_pipe__loc_adamawa',
       'cat_pipe__loc_akwa ibom', 'cat_pipe__loc_anambra',
       'cat_pipe__loc_bauchi', 'cat_pipe__loc_bayelsa', 'cat_pipe__loc_benue',
       'cat_pipe__loc_borno', 'cat_pipe__loc_cross river',
       'cat_pipe__loc_delta', 'cat_pipe__loc_ebonyi', 'cat_pipe__loc_edo',
       'cat_pipe__loc_ekiti', 'cat_pipe__loc_enugu', 'cat_pipe__loc_gombe',
       'cat_pipe__loc_imo', 'cat_pipe__loc_jigawa', 'cat_pipe__loc_kaduna',
       'cat_pipe__loc_kano', 'cat_pipe__loc_katsina', 'cat_pipe__loc_kebbi',
       'cat_pipe__loc_kogi', 'cat_pipe__loc_kwara', 'cat_pipe__loc_lagos',
       'cat_pipe__loc_nasarawa', 'cat_pipe__loc_niger', 'cat_pipe__loc_ogun',
       'cat_pipe__loc_ondo', 'cat_pipe__loc_osun', 'cat_pipe__loc_oyo',
       'cat_pipe__loc_plateau', 'cat_pipe__loc_rivers', 'cat_pipe__loc_sokoto',
       'cat_pipe__loc_taraba', 'cat_pipe__loc_yobe', 'cat_pipe__loc_zamfara',
       'cat_pipe__title_Apartment', 'cat_pipe__title_Bungalow',
       'cat_pipe__title_Cottage', 'cat_pipe__title_Detached duplex',
       'cat_pipe__title_Flat', 'cat_pipe__title_Mansion',
       'cat_pipe__title_Penthouse', 'cat_pipe__title_Semi-detached duplex',
       'cat_pipe__title_Terrace duplex', 'cat_pipe__title_Townhouse',
       'cat_pipe__zone_North Central', 'cat_pipe__zone_North East',
       'cat_pipe__zone_North West', 'cat_pipe__zone_South East',
       'cat_pipe__zone_South South', 'cat_pipe__zone_South West'],
      dtype='object')
In [143]:
# from sklearn.cluster import KMeans
In [144]:
# cluster_cols = ['cat_pipe__zone_South South', 'num_pipe__bedroom', 'num_pipe__allrooms']
In [145]:
# kmeans = KMeans(n_clusters=5, random_state=seed, n_init='auto').fit(Xtrain[cluster_cols])
In [146]:
# Xtrain['cluster'] = kmeans.predict(Xtrain[cluster_cols])
# Xtest['cluster'] = kmeans.predict(Xtest[cluster_cols])
# test_df['cluster'] = kmeans.predict(test_df[cluster_cols])
In [147]:
# sns.scatterplot(x=Xtrain['cluster'], y=ytrain)
In [148]:
# Xtrain['cluster'] = Xtrain['cluster'].astype('category')
# Xtrain['cluster'] = Xtrain['cluster'].cat.codes

# Xtest['cluster'] = Xtest['cluster'].astype('category')
# Xtest['cluster'] = Xtest['cluster'].cat.codes

# test_df['cluster'] = test_df['cluster'].astype('category')
# test_df['cluster'] = test_df['cluster'].cat.codes
In [149]:
Xtrain.shape, _test.shape
Out[149]:
((4346, 64), (6000, 64))
In [ ]:
 
In [165]:
#gridsearch - hyperparameter tuning
from sklearn import metrics

from catboost import CatBoostRegressor as CAT
from lightgbm import LGBMRegressor as LGBM
from xgboost import XGBRegressor as XGB

from sklearn.model_selection import GridSearchCV


cat = CAT(loss_function='RMSE', random_state=seed)


#l2_leaf_reg: coeff at the L2 regularization
clf = GridSearchCV(cat, param_grid={
    'max_depth': [4, 5, 7, 9],
    'learning_rate': [0.025, 0.035, 0.05, 0.1],
    'n_estimators': [400, 1500],
    'l2_leaf_reg': [0.05, 0.5, 1, 5]
}, cv=5, n_jobs=-1, scoring='neg_root_mean_squared_error', verbose=0)

clf.fit(Xtrain, ytrain)
0:	learn: 0.4044025	total: 3.9ms	remaining: 1.56s
1:	learn: 0.3810597	total: 7.03ms	remaining: 1.4s
2:	learn: 0.3607123	total: 9.88ms	remaining: 1.31s
3:	learn: 0.3421610	total: 12.4ms	remaining: 1.22s
4:	learn: 0.3256675	total: 14.4ms	remaining: 1.14s
5:	learn: 0.3108299	total: 16.5ms	remaining: 1.08s
6:	learn: 0.2973111	total: 18.3ms	remaining: 1.03s
7:	learn: 0.2851697	total: 20.7ms	remaining: 1.01s
8:	learn: 0.2742222	total: 22.1ms	remaining: 959ms
9:	learn: 0.2639291	total: 23.5ms	remaining: 915ms
10:	learn: 0.2553344	total: 24.9ms	remaining: 880ms
11:	learn: 0.2463560	total: 26.1ms	remaining: 845ms
12:	learn: 0.2379833	total: 27.5ms	remaining: 818ms
13:	learn: 0.2302547	total: 28.7ms	remaining: 792ms
14:	learn: 0.2237435	total: 30.1ms	remaining: 772ms
15:	learn: 0.2171149	total: 31.3ms	remaining: 751ms
16:	learn: 0.2110767	total: 32.5ms	remaining: 731ms
17:	learn: 0.2058804	total: 33.6ms	remaining: 714ms
18:	learn: 0.2002218	total: 34.8ms	remaining: 698ms
19:	learn: 0.1958019	total: 36.1ms	remaining: 687ms
20:	learn: 0.1915907	total: 37.5ms	remaining: 676ms
21:	learn: 0.1871828	total: 38.7ms	remaining: 665ms
22:	learn: 0.1822980	total: 39.9ms	remaining: 654ms
23:	learn: 0.1785488	total: 41.1ms	remaining: 644ms
24:	learn: 0.1744975	total: 42.4ms	remaining: 636ms
25:	learn: 0.1718155	total: 43.5ms	remaining: 626ms
26:	learn: 0.1690855	total: 44.7ms	remaining: 617ms
27:	learn: 0.1662502	total: 45.7ms	remaining: 607ms
28:	learn: 0.1636354	total: 46.9ms	remaining: 600ms
29:	learn: 0.1610351	total: 48.4ms	remaining: 597ms
30:	learn: 0.1584716	total: 49.9ms	remaining: 593ms
31:	learn: 0.1566439	total: 51.1ms	remaining: 587ms
32:	learn: 0.1544600	total: 52.4ms	remaining: 582ms
33:	learn: 0.1526020	total: 53.5ms	remaining: 576ms
34:	learn: 0.1500857	total: 54.5ms	remaining: 569ms
35:	learn: 0.1485449	total: 55.7ms	remaining: 563ms
36:	learn: 0.1465931	total: 56.8ms	remaining: 557ms
37:	learn: 0.1450171	total: 57.9ms	remaining: 552ms
38:	learn: 0.1432563	total: 59.1ms	remaining: 547ms
39:	learn: 0.1414973	total: 60.3ms	remaining: 543ms
40:	learn: 0.1402008	total: 61.6ms	remaining: 539ms
41:	learn: 0.1389896	total: 63ms	remaining: 537ms
42:	learn: 0.1375480	total: 64.3ms	remaining: 534ms
43:	learn: 0.1364302	total: 65.6ms	remaining: 531ms
44:	learn: 0.1349826	total: 66.8ms	remaining: 527ms
45:	learn: 0.1337808	total: 68.1ms	remaining: 524ms
46:	learn: 0.1327627	total: 69.4ms	remaining: 521ms
47:	learn: 0.1317407	total: 70.7ms	remaining: 519ms
48:	learn: 0.1304734	total: 72ms	remaining: 516ms
49:	learn: 0.1292960	total: 73.2ms	remaining: 512ms
50:	learn: 0.1280276	total: 74.6ms	remaining: 510ms
51:	learn: 0.1269162	total: 75.8ms	remaining: 508ms
52:	learn: 0.1259824	total: 77.4ms	remaining: 507ms
53:	learn: 0.1249366	total: 79.4ms	remaining: 509ms
54:	learn: 0.1241866	total: 80.8ms	remaining: 507ms
55:	learn: 0.1234892	total: 82.1ms	remaining: 504ms
56:	learn: 0.1225073	total: 83.6ms	remaining: 503ms
57:	learn: 0.1217068	total: 85.1ms	remaining: 502ms
58:	learn: 0.1207390	total: 86.6ms	remaining: 500ms
59:	learn: 0.1199335	total: 88ms	remaining: 499ms
60:	learn: 0.1193504	total: 89.5ms	remaining: 497ms
61:	learn: 0.1186452	total: 90.9ms	remaining: 495ms
62:	learn: 0.1179110	total: 92.3ms	remaining: 494ms
63:	learn: 0.1173418	total: 94ms	remaining: 493ms
64:	learn: 0.1167244	total: 95.7ms	remaining: 493ms
65:	learn: 0.1161968	total: 97ms	remaining: 491ms
66:	learn: 0.1153656	total: 98.3ms	remaining: 489ms
67:	learn: 0.1147340	total: 99.5ms	remaining: 486ms
68:	learn: 0.1140305	total: 101ms	remaining: 484ms
69:	learn: 0.1135682	total: 102ms	remaining: 481ms
70:	learn: 0.1129735	total: 103ms	remaining: 479ms
71:	learn: 0.1125230	total: 104ms	remaining: 476ms
72:	learn: 0.1120220	total: 106ms	remaining: 474ms
73:	learn: 0.1115414	total: 107ms	remaining: 472ms
74:	learn: 0.1109985	total: 109ms	remaining: 470ms
75:	learn: 0.1104349	total: 110ms	remaining: 470ms
76:	learn: 0.1099910	total: 112ms	remaining: 468ms
77:	learn: 0.1094371	total: 113ms	remaining: 466ms
78:	learn: 0.1090074	total: 114ms	remaining: 464ms
79:	learn: 0.1085954	total: 115ms	remaining: 462ms
80:	learn: 0.1082553	total: 117ms	remaining: 460ms
81:	learn: 0.1078914	total: 118ms	remaining: 458ms
82:	learn: 0.1075624	total: 119ms	remaining: 456ms
83:	learn: 0.1072541	total: 121ms	remaining: 455ms
84:	learn: 0.1067111	total: 122ms	remaining: 453ms
85:	learn: 0.1063251	total: 124ms	remaining: 452ms
86:	learn: 0.1059894	total: 126ms	remaining: 452ms
87:	learn: 0.1055651	total: 127ms	remaining: 452ms
88:	learn: 0.1052175	total: 129ms	remaining: 451ms
89:	learn: 0.1047802	total: 130ms	remaining: 449ms
90:	learn: 0.1044239	total: 132ms	remaining: 447ms
91:	learn: 0.1040599	total: 133ms	remaining: 445ms
92:	learn: 0.1036380	total: 134ms	remaining: 443ms
93:	learn: 0.1033152	total: 135ms	remaining: 441ms
94:	learn: 0.1030129	total: 137ms	remaining: 440ms
95:	learn: 0.1028046	total: 138ms	remaining: 438ms
96:	learn: 0.1025264	total: 140ms	remaining: 438ms
97:	learn: 0.1022570	total: 142ms	remaining: 437ms
98:	learn: 0.1019877	total: 143ms	remaining: 435ms
99:	learn: 0.1017383	total: 145ms	remaining: 434ms
100:	learn: 0.1014880	total: 146ms	remaining: 432ms
101:	learn: 0.1012508	total: 147ms	remaining: 430ms
102:	learn: 0.1009696	total: 149ms	remaining: 428ms
103:	learn: 0.1006620	total: 150ms	remaining: 427ms
104:	learn: 0.1003996	total: 151ms	remaining: 425ms
105:	learn: 0.1000203	total: 153ms	remaining: 424ms
106:	learn: 0.0998013	total: 154ms	remaining: 422ms
107:	learn: 0.0994518	total: 156ms	remaining: 421ms
108:	learn: 0.0992444	total: 158ms	remaining: 421ms
109:	learn: 0.0989352	total: 159ms	remaining: 419ms
110:	learn: 0.0987584	total: 160ms	remaining: 417ms
111:	learn: 0.0985593	total: 162ms	remaining: 416ms
112:	learn: 0.0983487	total: 163ms	remaining: 415ms
113:	learn: 0.0980577	total: 165ms	remaining: 413ms
114:	learn: 0.0978249	total: 166ms	remaining: 411ms
115:	learn: 0.0976624	total: 167ms	remaining: 410ms
116:	learn: 0.0974655	total: 169ms	remaining: 408ms
117:	learn: 0.0972659	total: 170ms	remaining: 407ms
118:	learn: 0.0971250	total: 172ms	remaining: 406ms
119:	learn: 0.0969349	total: 173ms	remaining: 405ms
120:	learn: 0.0967797	total: 175ms	remaining: 403ms
121:	learn: 0.0966417	total: 176ms	remaining: 401ms
122:	learn: 0.0965015	total: 177ms	remaining: 400ms
123:	learn: 0.0963695	total: 179ms	remaining: 398ms
124:	learn: 0.0961634	total: 180ms	remaining: 397ms
125:	learn: 0.0960227	total: 182ms	remaining: 395ms
126:	learn: 0.0958575	total: 183ms	remaining: 393ms
127:	learn: 0.0956448	total: 184ms	remaining: 392ms
128:	learn: 0.0955163	total: 186ms	remaining: 391ms
129:	learn: 0.0953424	total: 188ms	remaining: 389ms
130:	learn: 0.0950896	total: 189ms	remaining: 388ms
131:	learn: 0.0949700	total: 190ms	remaining: 386ms
132:	learn: 0.0948290	total: 192ms	remaining: 385ms
133:	learn: 0.0946559	total: 193ms	remaining: 384ms
134:	learn: 0.0945088	total: 195ms	remaining: 382ms
135:	learn: 0.0944014	total: 196ms	remaining: 381ms
136:	learn: 0.0942436	total: 198ms	remaining: 379ms
137:	learn: 0.0941460	total: 199ms	remaining: 377ms
138:	learn: 0.0939856	total: 200ms	remaining: 376ms
139:	learn: 0.0937999	total: 202ms	remaining: 375ms
140:	learn: 0.0936609	total: 204ms	remaining: 375ms
141:	learn: 0.0935142	total: 206ms	remaining: 374ms
142:	learn: 0.0933610	total: 207ms	remaining: 372ms
143:	learn: 0.0932620	total: 209ms	remaining: 371ms
144:	learn: 0.0931269	total: 210ms	remaining: 370ms
145:	learn: 0.0929398	total: 212ms	remaining: 369ms
146:	learn: 0.0928455	total: 213ms	remaining: 367ms
147:	learn: 0.0926920	total: 215ms	remaining: 366ms
148:	learn: 0.0925709	total: 217ms	remaining: 366ms
149:	learn: 0.0923987	total: 219ms	remaining: 365ms
150:	learn: 0.0923217	total: 221ms	remaining: 364ms
151:	learn: 0.0921089	total: 222ms	remaining: 363ms
152:	learn: 0.0919628	total: 224ms	remaining: 362ms
153:	learn: 0.0918751	total: 226ms	remaining: 361ms
154:	learn: 0.0917112	total: 228ms	remaining: 361ms
155:	learn: 0.0915262	total: 230ms	remaining: 360ms
156:	learn: 0.0914003	total: 232ms	remaining: 359ms
157:	learn: 0.0912638	total: 235ms	remaining: 360ms
158:	learn: 0.0911836	total: 237ms	remaining: 359ms
159:	learn: 0.0911115	total: 239ms	remaining: 358ms
160:	learn: 0.0910332	total: 242ms	remaining: 359ms
161:	learn: 0.0909170	total: 245ms	remaining: 360ms
162:	learn: 0.0908302	total: 247ms	remaining: 359ms
163:	learn: 0.0907533	total: 249ms	remaining: 358ms
164:	learn: 0.0906849	total: 251ms	remaining: 357ms
165:	learn: 0.0905401	total: 252ms	remaining: 356ms
166:	learn: 0.0904052	total: 254ms	remaining: 355ms
167:	learn: 0.0903052	total: 256ms	remaining: 353ms
168:	learn: 0.0902022	total: 258ms	remaining: 352ms
169:	learn: 0.0901364	total: 259ms	remaining: 351ms
170:	learn: 0.0900197	total: 262ms	remaining: 350ms
171:	learn: 0.0899554	total: 266ms	remaining: 352ms
172:	learn: 0.0898967	total: 268ms	remaining: 351ms
173:	learn: 0.0896745	total: 269ms	remaining: 350ms
174:	learn: 0.0895171	total: 271ms	remaining: 349ms
175:	learn: 0.0894343	total: 274ms	remaining: 348ms
176:	learn: 0.0893239	total: 276ms	remaining: 348ms
177:	learn: 0.0892402	total: 278ms	remaining: 347ms
178:	learn: 0.0891466	total: 280ms	remaining: 345ms
179:	learn: 0.0890702	total: 281ms	remaining: 344ms
180:	learn: 0.0889784	total: 283ms	remaining: 342ms
181:	learn: 0.0888014	total: 285ms	remaining: 341ms
182:	learn: 0.0887096	total: 286ms	remaining: 340ms
183:	learn: 0.0886413	total: 289ms	remaining: 339ms
184:	learn: 0.0885679	total: 291ms	remaining: 338ms
185:	learn: 0.0884065	total: 294ms	remaining: 338ms
186:	learn: 0.0882475	total: 296ms	remaining: 337ms
187:	learn: 0.0881627	total: 297ms	remaining: 335ms
188:	learn: 0.0880052	total: 299ms	remaining: 334ms
189:	learn: 0.0879056	total: 300ms	remaining: 332ms
190:	learn: 0.0877317	total: 302ms	remaining: 330ms
191:	learn: 0.0876411	total: 304ms	remaining: 329ms
192:	learn: 0.0874830	total: 306ms	remaining: 328ms
193:	learn: 0.0873453	total: 307ms	remaining: 326ms
194:	learn: 0.0872604	total: 309ms	remaining: 325ms
195:	learn: 0.0871922	total: 311ms	remaining: 324ms
196:	learn: 0.0870271	total: 313ms	remaining: 322ms
197:	learn: 0.0869166	total: 314ms	remaining: 321ms
198:	learn: 0.0868373	total: 316ms	remaining: 319ms
199:	learn: 0.0867535	total: 318ms	remaining: 318ms
200:	learn: 0.0866631	total: 320ms	remaining: 316ms
201:	learn: 0.0865701	total: 321ms	remaining: 315ms
202:	learn: 0.0864893	total: 323ms	remaining: 314ms
203:	learn: 0.0864074	total: 325ms	remaining: 312ms
204:	learn: 0.0863042	total: 327ms	remaining: 311ms
205:	learn: 0.0861675	total: 328ms	remaining: 309ms
206:	learn: 0.0860819	total: 330ms	remaining: 307ms
207:	learn: 0.0859857	total: 332ms	remaining: 306ms
208:	learn: 0.0858520	total: 334ms	remaining: 305ms
209:	learn: 0.0857491	total: 336ms	remaining: 304ms
210:	learn: 0.0855833	total: 337ms	remaining: 302ms
211:	learn: 0.0855114	total: 340ms	remaining: 301ms
212:	learn: 0.0854653	total: 342ms	remaining: 300ms
213:	learn: 0.0854038	total: 343ms	remaining: 298ms
214:	learn: 0.0853505	total: 345ms	remaining: 297ms
215:	learn: 0.0852753	total: 347ms	remaining: 295ms
216:	learn: 0.0851902	total: 349ms	remaining: 294ms
217:	learn: 0.0850941	total: 350ms	remaining: 292ms
218:	learn: 0.0849910	total: 352ms	remaining: 291ms
219:	learn: 0.0849053	total: 354ms	remaining: 290ms
220:	learn: 0.0848231	total: 356ms	remaining: 288ms
221:	learn: 0.0846800	total: 358ms	remaining: 287ms
222:	learn: 0.0846107	total: 359ms	remaining: 285ms
223:	learn: 0.0844557	total: 361ms	remaining: 284ms
224:	learn: 0.0843322	total: 363ms	remaining: 282ms
225:	learn: 0.0842882	total: 365ms	remaining: 281ms
226:	learn: 0.0842311	total: 367ms	remaining: 279ms
227:	learn: 0.0841744	total: 368ms	remaining: 278ms
228:	learn: 0.0840751	total: 371ms	remaining: 277ms
229:	learn: 0.0840151	total: 372ms	remaining: 275ms
230:	learn: 0.0839550	total: 374ms	remaining: 274ms
231:	learn: 0.0838354	total: 378ms	remaining: 274ms
232:	learn: 0.0836320	total: 380ms	remaining: 272ms
233:	learn: 0.0835678	total: 382ms	remaining: 271ms
234:	learn: 0.0834526	total: 383ms	remaining: 269ms
235:	learn: 0.0833909	total: 386ms	remaining: 268ms
236:	learn: 0.0832567	total: 388ms	remaining: 267ms
237:	learn: 0.0831479	total: 390ms	remaining: 265ms
238:	learn: 0.0830141	total: 391ms	remaining: 264ms
239:	learn: 0.0829407	total: 393ms	remaining: 262ms
240:	learn: 0.0828964	total: 395ms	remaining: 261ms
241:	learn: 0.0827493	total: 397ms	remaining: 259ms
242:	learn: 0.0826762	total: 399ms	remaining: 258ms
243:	learn: 0.0826405	total: 401ms	remaining: 257ms
244:	learn: 0.0825264	total: 404ms	remaining: 256ms
245:	learn: 0.0824420	total: 406ms	remaining: 254ms
246:	learn: 0.0823716	total: 408ms	remaining: 253ms
247:	learn: 0.0823237	total: 411ms	remaining: 252ms
248:	learn: 0.0821503	total: 413ms	remaining: 250ms
249:	learn: 0.0820917	total: 414ms	remaining: 249ms
250:	learn: 0.0820245	total: 417ms	remaining: 247ms
251:	learn: 0.0819748	total: 419ms	remaining: 246ms
252:	learn: 0.0818758	total: 421ms	remaining: 244ms
253:	learn: 0.0818267	total: 422ms	remaining: 243ms
254:	learn: 0.0817339	total: 424ms	remaining: 241ms
255:	learn: 0.0816453	total: 426ms	remaining: 240ms
256:	learn: 0.0814939	total: 429ms	remaining: 238ms
257:	learn: 0.0814488	total: 431ms	remaining: 237ms
258:	learn: 0.0813817	total: 433ms	remaining: 236ms
259:	learn: 0.0813121	total: 434ms	remaining: 234ms
260:	learn: 0.0812602	total: 436ms	remaining: 232ms
261:	learn: 0.0811860	total: 437ms	remaining: 230ms
262:	learn: 0.0811458	total: 439ms	remaining: 229ms
263:	learn: 0.0810537	total: 441ms	remaining: 227ms
264:	learn: 0.0809754	total: 442ms	remaining: 225ms
265:	learn: 0.0808994	total: 444ms	remaining: 224ms
266:	learn: 0.0808290	total: 446ms	remaining: 222ms
267:	learn: 0.0807614	total: 448ms	remaining: 221ms
268:	learn: 0.0807001	total: 450ms	remaining: 219ms
269:	learn: 0.0806600	total: 452ms	remaining: 218ms
270:	learn: 0.0805464	total: 454ms	remaining: 216ms
271:	learn: 0.0805118	total: 455ms	remaining: 214ms
272:	learn: 0.0802947	total: 457ms	remaining: 213ms
273:	learn: 0.0802599	total: 459ms	remaining: 211ms
274:	learn: 0.0801950	total: 461ms	remaining: 209ms
275:	learn: 0.0801471	total: 463ms	remaining: 208ms
276:	learn: 0.0801018	total: 465ms	remaining: 206ms
277:	learn: 0.0800462	total: 466ms	remaining: 205ms
278:	learn: 0.0799192	total: 468ms	remaining: 203ms
279:	learn: 0.0798689	total: 470ms	remaining: 201ms
280:	learn: 0.0798375	total: 471ms	remaining: 200ms
281:	learn: 0.0797391	total: 473ms	remaining: 198ms
282:	learn: 0.0796758	total: 474ms	remaining: 196ms
283:	learn: 0.0795318	total: 477ms	remaining: 195ms
284:	learn: 0.0794741	total: 479ms	remaining: 193ms
285:	learn: 0.0794213	total: 481ms	remaining: 192ms
286:	learn: 0.0793669	total: 483ms	remaining: 190ms
287:	learn: 0.0793374	total: 484ms	remaining: 188ms
288:	learn: 0.0792760	total: 486ms	remaining: 187ms
289:	learn: 0.0791824	total: 488ms	remaining: 185ms
290:	learn: 0.0790922	total: 489ms	remaining: 183ms
291:	learn: 0.0790082	total: 491ms	remaining: 182ms
292:	learn: 0.0789366	total: 493ms	remaining: 180ms
293:	learn: 0.0788847	total: 495ms	remaining: 179ms
294:	learn: 0.0788130	total: 497ms	remaining: 177ms
295:	learn: 0.0787634	total: 499ms	remaining: 175ms
296:	learn: 0.0787292	total: 500ms	remaining: 174ms
297:	learn: 0.0786501	total: 502ms	remaining: 172ms
298:	learn: 0.0786155	total: 504ms	remaining: 170ms
299:	learn: 0.0785016	total: 506ms	remaining: 169ms
300:	learn: 0.0784791	total: 508ms	remaining: 167ms
301:	learn: 0.0784304	total: 510ms	remaining: 166ms
302:	learn: 0.0783443	total: 512ms	remaining: 164ms
303:	learn: 0.0782609	total: 514ms	remaining: 162ms
304:	learn: 0.0781939	total: 515ms	remaining: 160ms
305:	learn: 0.0781545	total: 517ms	remaining: 159ms
306:	learn: 0.0781158	total: 519ms	remaining: 157ms
307:	learn: 0.0779665	total: 520ms	remaining: 155ms
308:	learn: 0.0779164	total: 522ms	remaining: 154ms
309:	learn: 0.0778774	total: 524ms	remaining: 152ms
310:	learn: 0.0777848	total: 526ms	remaining: 151ms
311:	learn: 0.0776627	total: 528ms	remaining: 149ms
312:	learn: 0.0776177	total: 530ms	remaining: 147ms
313:	learn: 0.0775776	total: 531ms	remaining: 146ms
314:	learn: 0.0775311	total: 533ms	remaining: 144ms
315:	learn: 0.0774974	total: 535ms	remaining: 142ms
316:	learn: 0.0774154	total: 536ms	remaining: 140ms
317:	learn: 0.0773583	total: 538ms	remaining: 139ms
318:	learn: 0.0772947	total: 540ms	remaining: 137ms
319:	learn: 0.0771725	total: 542ms	remaining: 135ms
320:	learn: 0.0771372	total: 543ms	remaining: 134ms
321:	learn: 0.0771035	total: 545ms	remaining: 132ms
322:	learn: 0.0770584	total: 547ms	remaining: 130ms
323:	learn: 0.0769623	total: 549ms	remaining: 129ms
324:	learn: 0.0769247	total: 551ms	remaining: 127ms
325:	learn: 0.0768789	total: 553ms	remaining: 125ms
326:	learn: 0.0768157	total: 555ms	remaining: 124ms
327:	learn: 0.0767374	total: 558ms	remaining: 122ms
328:	learn: 0.0766485	total: 559ms	remaining: 121ms
329:	learn: 0.0765872	total: 561ms	remaining: 119ms
330:	learn: 0.0765305	total: 562ms	remaining: 117ms
331:	learn: 0.0764942	total: 564ms	remaining: 116ms
332:	learn: 0.0764592	total: 566ms	remaining: 114ms
333:	learn: 0.0763676	total: 568ms	remaining: 112ms
334:	learn: 0.0763262	total: 569ms	remaining: 110ms
335:	learn: 0.0762593	total: 571ms	remaining: 109ms
336:	learn: 0.0762009	total: 573ms	remaining: 107ms
337:	learn: 0.0761671	total: 575ms	remaining: 106ms
338:	learn: 0.0761067	total: 578ms	remaining: 104ms
339:	learn: 0.0760679	total: 579ms	remaining: 102ms
340:	learn: 0.0760184	total: 581ms	remaining: 101ms
341:	learn: 0.0759153	total: 583ms	remaining: 98.8ms
342:	learn: 0.0758948	total: 586ms	remaining: 97.3ms
343:	learn: 0.0758640	total: 588ms	remaining: 95.7ms
344:	learn: 0.0758231	total: 589ms	remaining: 94ms
345:	learn: 0.0757133	total: 591ms	remaining: 92.3ms
346:	learn: 0.0756374	total: 593ms	remaining: 90.6ms
347:	learn: 0.0755804	total: 595ms	remaining: 88.9ms
348:	learn: 0.0755429	total: 597ms	remaining: 87.2ms
349:	learn: 0.0754409	total: 599ms	remaining: 85.5ms
350:	learn: 0.0753832	total: 601ms	remaining: 83.9ms
351:	learn: 0.0753214	total: 603ms	remaining: 82.3ms
352:	learn: 0.0752622	total: 605ms	remaining: 80.6ms
353:	learn: 0.0752253	total: 608ms	remaining: 78.9ms
354:	learn: 0.0751499	total: 609ms	remaining: 77.2ms
355:	learn: 0.0751144	total: 611ms	remaining: 75.5ms
356:	learn: 0.0750762	total: 613ms	remaining: 73.8ms
357:	learn: 0.0749891	total: 614ms	remaining: 72.1ms
358:	learn: 0.0749127	total: 618ms	remaining: 70.5ms
359:	learn: 0.0747878	total: 619ms	remaining: 68.8ms
360:	learn: 0.0747550	total: 621ms	remaining: 67.1ms
361:	learn: 0.0746749	total: 623ms	remaining: 65.3ms
362:	learn: 0.0745986	total: 624ms	remaining: 63.6ms
363:	learn: 0.0745657	total: 626ms	remaining: 61.9ms
364:	learn: 0.0745228	total: 628ms	remaining: 60.2ms
365:	learn: 0.0744589	total: 629ms	remaining: 58.5ms
366:	learn: 0.0744344	total: 632ms	remaining: 56.8ms
367:	learn: 0.0742862	total: 634ms	remaining: 55.1ms
368:	learn: 0.0742520	total: 636ms	remaining: 53.4ms
369:	learn: 0.0741232	total: 637ms	remaining: 51.7ms
370:	learn: 0.0740657	total: 639ms	remaining: 49.9ms
371:	learn: 0.0739843	total: 641ms	remaining: 48.2ms
372:	learn: 0.0739294	total: 643ms	remaining: 46.5ms
373:	learn: 0.0738556	total: 644ms	remaining: 44.8ms
374:	learn: 0.0738051	total: 647ms	remaining: 43.1ms
375:	learn: 0.0737311	total: 649ms	remaining: 41.4ms
376:	learn: 0.0736373	total: 650ms	remaining: 39.7ms
377:	learn: 0.0736122	total: 652ms	remaining: 37.9ms
378:	learn: 0.0735974	total: 654ms	remaining: 36.2ms
379:	learn: 0.0735484	total: 656ms	remaining: 34.5ms
380:	learn: 0.0734916	total: 657ms	remaining: 32.8ms
381:	learn: 0.0734041	total: 659ms	remaining: 31.1ms
382:	learn: 0.0732986	total: 661ms	remaining: 29.3ms
383:	learn: 0.0732368	total: 663ms	remaining: 27.6ms
384:	learn: 0.0732003	total: 664ms	remaining: 25.9ms
385:	learn: 0.0731665	total: 666ms	remaining: 24.2ms
386:	learn: 0.0731354	total: 668ms	remaining: 22.4ms
387:	learn: 0.0730973	total: 670ms	remaining: 20.7ms
388:	learn: 0.0730759	total: 671ms	remaining: 19ms
389:	learn: 0.0730137	total: 673ms	remaining: 17.3ms
390:	learn: 0.0729787	total: 675ms	remaining: 15.5ms
391:	learn: 0.0729242	total: 677ms	remaining: 13.8ms
392:	learn: 0.0728727	total: 679ms	remaining: 12.1ms
393:	learn: 0.0728426	total: 680ms	remaining: 10.4ms
394:	learn: 0.0728068	total: 682ms	remaining: 8.63ms
395:	learn: 0.0727387	total: 684ms	remaining: 6.9ms
396:	learn: 0.0727189	total: 685ms	remaining: 5.18ms
397:	learn: 0.0726781	total: 687ms	remaining: 3.45ms
398:	learn: 0.0725742	total: 688ms	remaining: 1.73ms
399:	learn: 0.0725002	total: 690ms	remaining: 0us
Out[165]:
GridSearchCV(cv=5,
             estimator=<catboost.core.CatBoostRegressor object at 0x0000024E4A0D2490>,
             n_jobs=-1,
             param_grid={'l2_leaf_reg': [0.05, 0.5, 1, 5],
                         'learning_rate': [0.025, 0.035, 0.05, 0.1],
                         'max_depth': [4, 5, 7, 9],
                         'n_estimators': [400, 1500]},
             scoring='neg_root_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [ ]:
 
In [166]:
clf.best_estimator_, clf.best_score_, clf.best_params_  #0.09355667{'l2_leaf_reg': 1, 'learning_rate': 0.1, 'max_depth': 4, 'n_estimators': 400})
Out[166]:
(<catboost.core.CatBoostRegressor at 0x24e4921ba30>,
 -0.09372559894894998,
 {'l2_leaf_reg': 5, 'learning_rate': 0.1, 'max_depth': 4, 'n_estimators': 400})
In [158]:
list(sorted(zip(clf.best_estimator_.feature_importances_, cols_transformed_names), reverse=True))
Out[158]:
[(19.826856114277668, 'cat_pipe__title_Mansion'),
 (13.734025548060096, 'num_pipe__bedroom'),
 (12.25075699595523, 'num_pipe__z'),
 (10.022527239847616, 'num_pipe__allrooms'),
 (7.175310461594522, 'num_pipe__x'),
 (6.687459790271547, 'num_pipe__y'),
 (5.6095790781730805, 'cat_pipe__loc_lagos'),
 (4.896085257249919, 'cat_pipe__zone_South South'),
 (4.542830450649301, 'cat_pipe__title_Apartment'),
 (2.821965344271707, 'cat_pipe__title_Penthouse'),
 (2.595669489484209, 'cat_pipe__zone_South West'),
 (1.3541876852135442, 'num_pipe__ID'),
 (0.9230398767911064, 'cat_pipe__zone_North West'),
 (0.8571870262379999, 'cat_pipe__title_Cottage'),
 (0.7714379251327118, 'cat_pipe__loc_anambra'),
 (0.6865370379163105, 'cat_pipe__zone_North East'),
 (0.6599510374672051, 'cat_pipe__title_Detached duplex'),
 (0.46595333737531025, 'num_pipe__parking_space'),
 (0.45378070325310105, 'cat_pipe__title_Bungalow'),
 (0.43850185825661697, 'cat_pipe__loc_ebonyi'),
 (0.34985884415344726, 'cat_pipe__loc_zamfara'),
 (0.28073849546640606, 'num_pipe__bathroom'),
 (0.21027296381980262, 'cat_pipe__title_Flat'),
 (0.20327097743479927, 'cat_pipe__loc_kwara'),
 (0.20172216070725024, 'cat_pipe__loc_kano'),
 (0.19753747457339826, 'num_pipe__bed_per_bath'),
 (0.19674889052119818, 'num_pipe__bed_per_park'),
 (0.18642529649221806, 'cat_pipe__loc_sokoto'),
 (0.128355446186675, 'cat_pipe__loc_ondo'),
 (0.11187516088592121, 'cat_pipe__loc_kebbi'),
 (0.10688893767332795, 'cat_pipe__loc_enugu'),
 (0.09563460468386166, 'cat_pipe__loc_edo'),
 (0.08320966306462066, 'cat_pipe__loc_delta'),
 (0.08101904244614899, 'cat_pipe__loc_nasarawa'),
 (0.08096805410678497, 'cat_pipe__loc_adamawa'),
 (0.07042127048869207, 'cat_pipe__loc_bauchi'),
 (0.06682059731428176, 'cat_pipe__loc_abia'),
 (0.06193792834319466, 'cat_pipe__loc_kogi'),
 (0.052913963548734115, 'cat_pipe__loc_kaduna'),
 (0.05150559837092767, 'cat_pipe__loc_akwa ibom'),
 (0.04750967598470922, 'cat_pipe__loc_bayelsa'),
 (0.03758475043445373, 'cat_pipe__loc_gombe'),
 (0.03652771707206825, 'cat_pipe__loc_oyo'),
 (0.034774730751367784, 'cat_pipe__loc_ekiti'),
 (0.033318967036690315, 'cat_pipe__loc_jigawa'),
 (0.03064559347253121, 'num_pipe__privacy'),
 (0.029627859570519985, 'cat_pipe__loc_rivers'),
 (0.027274789393340593, 'cat_pipe__loc_yobe'),
 (0.024310758683233374, 'cat_pipe__loc_katsina'),
 (0.021408419234861996, 'cat_pipe__loc_osun'),
 (0.01833738141147518, 'cat_pipe__loc_niger'),
 (0.018244910307782736, 'cat_pipe__loc_imo'),
 (0.014167365590012299, 'cat_pipe__zone_South East'),
 (0.008599150123753486, 'num_pipe__space_area'),
 (0.00830401968081456, 'cat_pipe__loc_taraba'),
 (0.004556704814126429, 'cat_pipe__loc_plateau'),
 (0.004395734740144501, 'cat_pipe__loc_cross river'),
 (0.0034902674491263884, 'cat_pipe__title_Semi-detached duplex'),
 (0.0030361795927086603, 'cat_pipe__loc_borno'),
 (0.002117396895735483, 'cat_pipe__title_Townhouse'),
 (0.0, 'cat_pipe__zone_North Central'),
 (0.0, 'cat_pipe__title_Terrace duplex'),
 (0.0, 'cat_pipe__loc_ogun'),
 (0.0, 'cat_pipe__loc_benue')]
In [159]:
clf.best_params_.update({'random_state':seed})  #used: {'l2_leaf_reg': 0.5,'learning_rate': 0.05,'max_depth': 5, 'n_estimators': 400, 'random_state': 42}
clf.best_params_
Out[159]:
{'l2_leaf_reg': 1,
 'learning_rate': 0.1,
 'max_depth': 4,
 'n_estimators': 400,
 'random_state': 42}
In [ ]:
 
In [173]:
from sklearn.ensemble import GradientBoostingRegressor
    
boost = GradientBoostingRegressor(loss='squared_error', random_state=seed)

clf_boost = GridSearchCV(boost, param_grid={
    'max_depth': [3, 5, 7],
    'learning_rate': [0.05, 0.1],
    'alpha': [0.05, 0.5, 0.7],
    'max_features': [8,30,55, 60]
}, cv=5, n_jobs=-1, scoring='neg_root_mean_squared_error', verbose=0)

clf_boost.fit(Xtrain, ytrain)
clf_boost.best_estimator_, clf_boost.best_score_, clf_boost.best_params_ #0.094
Out[173]:
(GradientBoostingRegressor(alpha=0.05, max_depth=5, max_features=60,
                           random_state=42),
 -0.10019006020313508,
 {'alpha': 0.05, 'learning_rate': 0.1, 'max_depth': 5, 'max_features': 60})
In [174]:
clf_boost.best_params_.update({'random_state': seed})
clf_boost.best_params_
Out[174]:
{'alpha': 0.05,
 'learning_rate': 0.1,
 'max_depth': 5,
 'max_features': 60,
 'random_state': 42}
In [176]:
list(sorted(zip(clf_boost.best_estimator_.feature_importances_, Xtrain.columns), reverse=True))
Out[176]:
[(0.1903137290138727, 'num_pipe__bedroom'),
 (0.17064072491210072, 'cat_pipe__title_Mansion'),
 (0.16764065041035492, 'num_pipe__z'),
 (0.09828283903843699, 'num_pipe__allrooms'),
 (0.08142861304043052, 'num_pipe__x'),
 (0.05444199503052348, 'num_pipe__y'),
 (0.04130681521418009, 'cat_pipe__loc_lagos'),
 (0.033946359014390685, 'cat_pipe__zone_South South'),
 (0.026953904391943805, 'cat_pipe__zone_South West'),
 (0.026189822206071786, 'cat_pipe__title_Apartment'),
 (0.025146044959371765, 'cat_pipe__title_Penthouse'),
 (0.011987819777304752, 'num_pipe__ID'),
 (0.00816116349684588, 'cat_pipe__title_Detached duplex'),
 (0.007995960323085034, 'cat_pipe__title_Cottage'),
 (0.0075810899235957616, 'cat_pipe__loc_anambra'),
 (0.005880762976640448, 'cat_pipe__zone_North West'),
 (0.00572518061854685, 'cat_pipe__zone_North East'),
 (0.004114503187822084, 'num_pipe__parking_space'),
 (0.004046783664344774, 'cat_pipe__title_Bungalow'),
 (0.0033296180693536057, 'cat_pipe__loc_zamfara'),
 (0.0028894534245060993, 'cat_pipe__loc_ebonyi'),
 (0.0023070780148777194, 'num_pipe__bed_per_bath'),
 (0.002234404994655097, 'cat_pipe__title_Flat'),
 (0.002052300763574446, 'cat_pipe__loc_kano'),
 (0.001575456975430424, 'num_pipe__bed_per_park'),
 (0.0014349728581337714, 'cat_pipe__loc_kwara'),
 (0.0012615444780660452, 'cat_pipe__loc_edo'),
 (0.001167906786387391, 'num_pipe__bathroom'),
 (0.0009158864687112136, 'cat_pipe__loc_nasarawa'),
 (0.0008787563686522747, 'cat_pipe__title_Townhouse'),
 (0.000840193500967368, 'cat_pipe__loc_enugu'),
 (0.000800351991636211, 'cat_pipe__loc_kebbi'),
 (0.0007881792504909776, 'cat_pipe__loc_sokoto'),
 (0.0005902244767755395, 'cat_pipe__loc_ondo'),
 (0.0004997465505169001, 'cat_pipe__loc_adamawa'),
 (0.0004758187525404365, 'cat_pipe__loc_abia'),
 (0.00046046881789762735, 'cat_pipe__loc_bayelsa'),
 (0.00041158993874230613, 'cat_pipe__loc_katsina'),
 (0.0003512978202185523, 'cat_pipe__zone_North Central'),
 (0.00029053020377139196, 'cat_pipe__loc_taraba'),
 (0.00028385190457362467, 'cat_pipe__loc_kogi'),
 (0.00028000767546136374, 'cat_pipe__loc_gombe'),
 (0.00024399279511865053, 'cat_pipe__loc_kaduna'),
 (0.0002348568933572725, 'cat_pipe__loc_delta'),
 (0.00017384147175090724, 'cat_pipe__loc_akwa ibom'),
 (0.00016790066651363526, 'cat_pipe__zone_South East'),
 (0.00015842121210184648, 'cat_pipe__loc_ekiti'),
 (0.00014107875195050243, 'num_pipe__privacy'),
 (0.00012145700355408032, 'cat_pipe__loc_imo'),
 (0.00010941838500873316, 'cat_pipe__loc_rivers'),
 (0.00010835681236424844, 'cat_pipe__loc_niger'),
 (0.00010786348630002455, 'cat_pipe__loc_cross river'),
 (9.278834172394983e-05, 'cat_pipe__loc_borno'),
 (8.620060357316536e-05, 'cat_pipe__loc_yobe'),
 (8.084641712175252e-05, 'cat_pipe__loc_bauchi'),
 (8.055094256423472e-05, 'cat_pipe__loc_jigawa'),
 (7.389136841278812e-05, 'cat_pipe__loc_benue'),
 (3.368640610125256e-05, 'cat_pipe__title_Semi-detached duplex'),
 (1.9580219120665134e-05, 'cat_pipe__loc_ogun'),
 (1.864807843526846e-05, 'num_pipe__space_area'),
 (1.8290458788556973e-05, 'cat_pipe__loc_plateau'),
 (1.5478846920669147e-05, 'cat_pipe__loc_oyo'),
 (6.19277639614632e-06, 'cat_pipe__title_Terrace duplex'),
 (2.256777018236654e-06, 'cat_pipe__loc_osun')]
In [177]:
xgb = XGB(n_jobs=-1, random_state=seed)
cat = CAT(**clf.best_params_)
# rfc = RFC(n_jobs=-1, random_state=seed)
lgbm = LGBM(**clf.best_params_)
gbr = GradientBoostingRegressor(**clf_boost.best_params_)
In [ ]:
 
In [178]:
print(Xtrain.shape, ytrain.shape)
print(Xtest.shape, ytest.shape)
print(_test.shape)
(4346, 64) (4346,)
(1087, 64) (1087,)
(6000, 64)
In [180]:
scores = []
test_preds = []
models_ = []

train_X = Xtrain
test_X = Xtest
test_ = test_df

for model in [cat, lgbm, xgb, gbr]:
    if model == xgb:
        model.fit(train_X, ytrain, eval_set=[(test_X, ytest)])
    if model in [cat, lgbm]:
        model.fit(train_X, ytrain, eval_set=(test_X, ytest))
    else:
        model.fit(train_X, ytrain)
    train_pred = model.predict(test_X)
    scores.append([model.__class__.__name__, metrics.mean_squared_error(ytest, train_pred, squared=False)])
    test_preds.append(model.predict(test_))
    models_.append(model)
0:	learn: 0.4041663	test: 0.4021356	best: 0.4021356 (0)	total: 2.12ms	remaining: 845ms
1:	learn: 0.3807060	test: 0.3789638	best: 0.3789638 (1)	total: 3.81ms	remaining: 758ms
2:	learn: 0.3608298	test: 0.3597945	best: 0.3597945 (2)	total: 5.36ms	remaining: 710ms
3:	learn: 0.3440518	test: 0.3435088	best: 0.3435088 (3)	total: 6.63ms	remaining: 657ms
4:	learn: 0.3270193	test: 0.3274672	best: 0.3274672 (4)	total: 7.93ms	remaining: 626ms
5:	learn: 0.3121134	test: 0.3129723	best: 0.3129723 (5)	total: 9.09ms	remaining: 597ms
6:	learn: 0.2997015	test: 0.3012895	best: 0.3012895 (6)	total: 10.4ms	remaining: 582ms
7:	learn: 0.2870552	test: 0.2885084	best: 0.2885084 (7)	total: 11.5ms	remaining: 566ms
8:	learn: 0.2761166	test: 0.2773423	best: 0.2773423 (8)	total: 12.8ms	remaining: 557ms
9:	learn: 0.2656066	test: 0.2665746	best: 0.2665746 (9)	total: 14ms	remaining: 547ms
10:	learn: 0.2559564	test: 0.2575488	best: 0.2575488 (10)	total: 15.3ms	remaining: 541ms
11:	learn: 0.2469694	test: 0.2490352	best: 0.2490352 (11)	total: 16.7ms	remaining: 540ms
12:	learn: 0.2394168	test: 0.2422569	best: 0.2422569 (12)	total: 18.2ms	remaining: 543ms
13:	learn: 0.2328092	test: 0.2363482	best: 0.2363482 (13)	total: 20.1ms	remaining: 555ms
14:	learn: 0.2237529	test: 0.2279912	best: 0.2279912 (14)	total: 21.4ms	remaining: 549ms
15:	learn: 0.2179442	test: 0.2218948	best: 0.2218948 (15)	total: 22.6ms	remaining: 543ms
16:	learn: 0.2104686	test: 0.2149536	best: 0.2149536 (16)	total: 23.9ms	remaining: 540ms
17:	learn: 0.2054753	test: 0.2104343	best: 0.2104343 (17)	total: 25.1ms	remaining: 534ms
18:	learn: 0.2005583	test: 0.2056988	best: 0.2056988 (18)	total: 26.3ms	remaining: 528ms
19:	learn: 0.1957730	test: 0.2013463	best: 0.2013463 (19)	total: 27.6ms	remaining: 524ms
20:	learn: 0.1899777	test: 0.1958524	best: 0.1958524 (20)	total: 28.7ms	remaining: 518ms
21:	learn: 0.1855233	test: 0.1920520	best: 0.1920520 (21)	total: 29.9ms	remaining: 514ms
22:	learn: 0.1816228	test: 0.1885247	best: 0.1885247 (22)	total: 31.6ms	remaining: 518ms
23:	learn: 0.1779617	test: 0.1852049	best: 0.1852049 (23)	total: 33.4ms	remaining: 524ms
24:	learn: 0.1748514	test: 0.1822035	best: 0.1822035 (24)	total: 34.9ms	remaining: 523ms
25:	learn: 0.1719577	test: 0.1796853	best: 0.1796853 (25)	total: 36.4ms	remaining: 524ms
26:	learn: 0.1687796	test: 0.1774164	best: 0.1774164 (26)	total: 37.9ms	remaining: 524ms
27:	learn: 0.1652344	test: 0.1742200	best: 0.1742200 (27)	total: 40.5ms	remaining: 538ms
28:	learn: 0.1621129	test: 0.1713217	best: 0.1713217 (28)	total: 41.7ms	remaining: 534ms
29:	learn: 0.1595305	test: 0.1690169	best: 0.1690169 (29)	total: 43ms	remaining: 530ms
30:	learn: 0.1571325	test: 0.1668013	best: 0.1668013 (30)	total: 44.1ms	remaining: 525ms
31:	learn: 0.1545776	test: 0.1647908	best: 0.1647908 (31)	total: 45.3ms	remaining: 521ms
32:	learn: 0.1526219	test: 0.1627799	best: 0.1627799 (32)	total: 46.4ms	remaining: 516ms
33:	learn: 0.1506970	test: 0.1613201	best: 0.1613201 (33)	total: 47.9ms	remaining: 516ms
34:	learn: 0.1489449	test: 0.1597403	best: 0.1597403 (34)	total: 49.7ms	remaining: 518ms
35:	learn: 0.1470125	test: 0.1580447	best: 0.1580447 (35)	total: 51.1ms	remaining: 517ms
36:	learn: 0.1452092	test: 0.1563882	best: 0.1563882 (36)	total: 52.3ms	remaining: 513ms
37:	learn: 0.1432184	test: 0.1547590	best: 0.1547590 (37)	total: 53.4ms	remaining: 508ms
38:	learn: 0.1418970	test: 0.1535470	best: 0.1535470 (38)	total: 54.4ms	remaining: 504ms
39:	learn: 0.1404123	test: 0.1521226	best: 0.1521226 (39)	total: 55.5ms	remaining: 499ms
40:	learn: 0.1389082	test: 0.1505296	best: 0.1505296 (40)	total: 56.6ms	remaining: 496ms
41:	learn: 0.1375746	test: 0.1492905	best: 0.1492905 (41)	total: 57.7ms	remaining: 492ms
42:	learn: 0.1362918	test: 0.1480075	best: 0.1480075 (42)	total: 58.9ms	remaining: 489ms
43:	learn: 0.1345316	test: 0.1464236	best: 0.1464236 (43)	total: 60ms	remaining: 485ms
44:	learn: 0.1331882	test: 0.1452792	best: 0.1452792 (44)	total: 61.4ms	remaining: 484ms
45:	learn: 0.1322199	test: 0.1444948	best: 0.1444948 (45)	total: 62.9ms	remaining: 484ms
46:	learn: 0.1312411	test: 0.1435562	best: 0.1435562 (46)	total: 64.5ms	remaining: 485ms
47:	learn: 0.1301627	test: 0.1427952	best: 0.1427952 (47)	total: 65.6ms	remaining: 481ms
48:	learn: 0.1292259	test: 0.1417443	best: 0.1417443 (48)	total: 66.6ms	remaining: 477ms
49:	learn: 0.1281666	test: 0.1406091	best: 0.1406091 (49)	total: 68.2ms	remaining: 477ms
50:	learn: 0.1272211	test: 0.1397470	best: 0.1397470 (50)	total: 69.3ms	remaining: 474ms
51:	learn: 0.1262781	test: 0.1388827	best: 0.1388827 (51)	total: 70.3ms	remaining: 471ms
52:	learn: 0.1250264	test: 0.1377977	best: 0.1377977 (52)	total: 71.4ms	remaining: 468ms
53:	learn: 0.1242667	test: 0.1371389	best: 0.1371389 (53)	total: 72.4ms	remaining: 464ms
54:	learn: 0.1234744	test: 0.1364345	best: 0.1364345 (54)	total: 73.5ms	remaining: 461ms
55:	learn: 0.1226306	test: 0.1357130	best: 0.1357130 (55)	total: 74.6ms	remaining: 459ms
56:	learn: 0.1215845	test: 0.1348418	best: 0.1348418 (56)	total: 75.7ms	remaining: 456ms
57:	learn: 0.1209120	test: 0.1342921	best: 0.1342921 (57)	total: 77.6ms	remaining: 457ms
58:	learn: 0.1201658	test: 0.1335834	best: 0.1335834 (58)	total: 78.8ms	remaining: 455ms
59:	learn: 0.1194638	test: 0.1330035	best: 0.1330035 (59)	total: 80ms	remaining: 454ms
60:	learn: 0.1189482	test: 0.1325589	best: 0.1325589 (60)	total: 81.3ms	remaining: 452ms
61:	learn: 0.1183815	test: 0.1320091	best: 0.1320091 (61)	total: 82.4ms	remaining: 449ms
62:	learn: 0.1176252	test: 0.1312677	best: 0.1312677 (62)	total: 83.5ms	remaining: 446ms
63:	learn: 0.1167984	test: 0.1306380	best: 0.1306380 (63)	total: 84.6ms	remaining: 444ms
64:	learn: 0.1163364	test: 0.1302817	best: 0.1302817 (64)	total: 85.7ms	remaining: 442ms
65:	learn: 0.1158383	test: 0.1297839	best: 0.1297839 (65)	total: 87ms	remaining: 440ms
66:	learn: 0.1152767	test: 0.1292900	best: 0.1292900 (66)	total: 88.2ms	remaining: 439ms
67:	learn: 0.1147398	test: 0.1288912	best: 0.1288912 (67)	total: 89.4ms	remaining: 436ms
68:	learn: 0.1142781	test: 0.1284804	best: 0.1284804 (68)	total: 90.4ms	remaining: 434ms
69:	learn: 0.1135061	test: 0.1277862	best: 0.1277862 (69)	total: 91.7ms	remaining: 432ms
70:	learn: 0.1129410	test: 0.1273896	best: 0.1273896 (70)	total: 93ms	remaining: 431ms
71:	learn: 0.1124371	test: 0.1272663	best: 0.1272663 (71)	total: 94.6ms	remaining: 431ms
72:	learn: 0.1119110	test: 0.1267750	best: 0.1267750 (72)	total: 95.7ms	remaining: 428ms
73:	learn: 0.1113907	test: 0.1264158	best: 0.1264158 (73)	total: 96.8ms	remaining: 427ms
74:	learn: 0.1108428	test: 0.1259214	best: 0.1259214 (74)	total: 97.9ms	remaining: 424ms
75:	learn: 0.1104957	test: 0.1256115	best: 0.1256115 (75)	total: 99ms	remaining: 422ms
76:	learn: 0.1101013	test: 0.1252169	best: 0.1252169 (76)	total: 100ms	remaining: 419ms
77:	learn: 0.1097418	test: 0.1249782	best: 0.1249782 (77)	total: 101ms	remaining: 418ms
78:	learn: 0.1092486	test: 0.1246297	best: 0.1246297 (78)	total: 102ms	remaining: 415ms
79:	learn: 0.1088305	test: 0.1242680	best: 0.1242680 (79)	total: 103ms	remaining: 413ms
80:	learn: 0.1084103	test: 0.1239293	best: 0.1239293 (80)	total: 104ms	remaining: 411ms
81:	learn: 0.1078827	test: 0.1233807	best: 0.1233807 (81)	total: 105ms	remaining: 408ms
82:	learn: 0.1074860	test: 0.1231057	best: 0.1231057 (82)	total: 107ms	remaining: 408ms
83:	learn: 0.1071538	test: 0.1229015	best: 0.1229015 (83)	total: 108ms	remaining: 407ms
84:	learn: 0.1066433	test: 0.1224504	best: 0.1224504 (84)	total: 109ms	remaining: 405ms
85:	learn: 0.1062409	test: 0.1220951	best: 0.1220951 (85)	total: 111ms	remaining: 404ms
86:	learn: 0.1058020	test: 0.1218103	best: 0.1218103 (86)	total: 112ms	remaining: 401ms
87:	learn: 0.1054595	test: 0.1215308	best: 0.1215308 (87)	total: 113ms	remaining: 399ms
88:	learn: 0.1051872	test: 0.1213583	best: 0.1213583 (88)	total: 114ms	remaining: 397ms
89:	learn: 0.1047928	test: 0.1211174	best: 0.1211174 (89)	total: 115ms	remaining: 395ms
90:	learn: 0.1044720	test: 0.1208045	best: 0.1208045 (90)	total: 116ms	remaining: 393ms
91:	learn: 0.1041712	test: 0.1206354	best: 0.1206354 (91)	total: 117ms	remaining: 391ms
92:	learn: 0.1038628	test: 0.1203880	best: 0.1203880 (92)	total: 118ms	remaining: 389ms
93:	learn: 0.1035155	test: 0.1201651	best: 0.1201651 (93)	total: 119ms	remaining: 387ms
94:	learn: 0.1031844	test: 0.1198431	best: 0.1198431 (94)	total: 120ms	remaining: 385ms
95:	learn: 0.1028124	test: 0.1194152	best: 0.1194152 (95)	total: 121ms	remaining: 383ms
96:	learn: 0.1025031	test: 0.1191477	best: 0.1191477 (96)	total: 122ms	remaining: 381ms
97:	learn: 0.1021619	test: 0.1191471	best: 0.1191471 (97)	total: 123ms	remaining: 379ms
98:	learn: 0.1019031	test: 0.1189772	best: 0.1189772 (98)	total: 124ms	remaining: 377ms
99:	learn: 0.1016051	test: 0.1186616	best: 0.1186616 (99)	total: 125ms	remaining: 376ms
100:	learn: 0.1012951	test: 0.1188523	best: 0.1186616 (99)	total: 127ms	remaining: 374ms
101:	learn: 0.1010700	test: 0.1185334	best: 0.1185334 (101)	total: 128ms	remaining: 373ms
102:	learn: 0.1007401	test: 0.1182716	best: 0.1182716 (102)	total: 129ms	remaining: 371ms
103:	learn: 0.1004774	test: 0.1180762	best: 0.1180762 (103)	total: 130ms	remaining: 369ms
104:	learn: 0.1002421	test: 0.1178813	best: 0.1178813 (104)	total: 131ms	remaining: 367ms
105:	learn: 0.1000235	test: 0.1176769	best: 0.1176769 (105)	total: 132ms	remaining: 365ms
106:	learn: 0.0997620	test: 0.1174867	best: 0.1174867 (106)	total: 133ms	remaining: 363ms
107:	learn: 0.0995362	test: 0.1173333	best: 0.1173333 (107)	total: 134ms	remaining: 361ms
108:	learn: 0.0993298	test: 0.1171568	best: 0.1171568 (108)	total: 135ms	remaining: 360ms
109:	learn: 0.0991335	test: 0.1170810	best: 0.1170810 (109)	total: 136ms	remaining: 358ms
110:	learn: 0.0987937	test: 0.1167951	best: 0.1167951 (110)	total: 137ms	remaining: 357ms
111:	learn: 0.0986033	test: 0.1166705	best: 0.1166705 (111)	total: 138ms	remaining: 356ms
112:	learn: 0.0983912	test: 0.1167004	best: 0.1166705 (111)	total: 140ms	remaining: 355ms
113:	learn: 0.0982165	test: 0.1165341	best: 0.1165341 (113)	total: 141ms	remaining: 353ms
114:	learn: 0.0980548	test: 0.1164257	best: 0.1164257 (114)	total: 142ms	remaining: 351ms
115:	learn: 0.0978983	test: 0.1163220	best: 0.1163220 (115)	total: 143ms	remaining: 349ms
116:	learn: 0.0976277	test: 0.1161643	best: 0.1161643 (116)	total: 144ms	remaining: 348ms
117:	learn: 0.0974393	test: 0.1161758	best: 0.1161643 (116)	total: 145ms	remaining: 346ms
118:	learn: 0.0972081	test: 0.1160256	best: 0.1160256 (118)	total: 146ms	remaining: 344ms
119:	learn: 0.0970166	test: 0.1159449	best: 0.1159449 (119)	total: 147ms	remaining: 343ms
120:	learn: 0.0968704	test: 0.1158738	best: 0.1158738 (120)	total: 148ms	remaining: 341ms
121:	learn: 0.0966883	test: 0.1157119	best: 0.1157119 (121)	total: 149ms	remaining: 339ms
122:	learn: 0.0964322	test: 0.1155691	best: 0.1155691 (122)	total: 150ms	remaining: 337ms
123:	learn: 0.0962778	test: 0.1154456	best: 0.1154456 (123)	total: 151ms	remaining: 336ms
124:	learn: 0.0961219	test: 0.1153447	best: 0.1153447 (124)	total: 153ms	remaining: 336ms
125:	learn: 0.0959675	test: 0.1152478	best: 0.1152478 (125)	total: 154ms	remaining: 334ms
126:	learn: 0.0957083	test: 0.1150646	best: 0.1150646 (126)	total: 155ms	remaining: 333ms
127:	learn: 0.0955634	test: 0.1149260	best: 0.1149260 (127)	total: 156ms	remaining: 332ms
128:	learn: 0.0953955	test: 0.1147807	best: 0.1147807 (128)	total: 157ms	remaining: 331ms
129:	learn: 0.0952689	test: 0.1146972	best: 0.1146972 (129)	total: 158ms	remaining: 329ms
130:	learn: 0.0951378	test: 0.1146094	best: 0.1146094 (130)	total: 159ms	remaining: 327ms
131:	learn: 0.0949266	test: 0.1143923	best: 0.1143923 (131)	total: 160ms	remaining: 326ms
132:	learn: 0.0947134	test: 0.1142552	best: 0.1142552 (132)	total: 161ms	remaining: 324ms
133:	learn: 0.0945768	test: 0.1141585	best: 0.1141585 (133)	total: 162ms	remaining: 322ms
134:	learn: 0.0944288	test: 0.1140506	best: 0.1140506 (134)	total: 163ms	remaining: 321ms
135:	learn: 0.0941473	test: 0.1139058	best: 0.1139058 (135)	total: 164ms	remaining: 319ms
136:	learn: 0.0939690	test: 0.1137075	best: 0.1137075 (136)	total: 166ms	remaining: 318ms
137:	learn: 0.0937986	test: 0.1136578	best: 0.1136578 (137)	total: 167ms	remaining: 316ms
138:	learn: 0.0936248	test: 0.1135866	best: 0.1135866 (138)	total: 168ms	remaining: 315ms
139:	learn: 0.0935007	test: 0.1136051	best: 0.1135866 (138)	total: 169ms	remaining: 314ms
140:	learn: 0.0934044	test: 0.1135977	best: 0.1135866 (138)	total: 171ms	remaining: 313ms
141:	learn: 0.0932347	test: 0.1135981	best: 0.1135866 (138)	total: 172ms	remaining: 312ms
142:	learn: 0.0931094	test: 0.1136615	best: 0.1135866 (138)	total: 173ms	remaining: 310ms
143:	learn: 0.0929923	test: 0.1135926	best: 0.1135866 (138)	total: 174ms	remaining: 309ms
144:	learn: 0.0928097	test: 0.1134721	best: 0.1134721 (144)	total: 175ms	remaining: 307ms
145:	learn: 0.0927238	test: 0.1134216	best: 0.1134216 (145)	total: 176ms	remaining: 306ms
146:	learn: 0.0925395	test: 0.1134124	best: 0.1134124 (146)	total: 177ms	remaining: 304ms
147:	learn: 0.0922747	test: 0.1132337	best: 0.1132337 (147)	total: 178ms	remaining: 303ms
148:	learn: 0.0921048	test: 0.1131871	best: 0.1131871 (148)	total: 179ms	remaining: 301ms
149:	learn: 0.0920273	test: 0.1131409	best: 0.1131409 (149)	total: 180ms	remaining: 300ms
150:	learn: 0.0919277	test: 0.1130592	best: 0.1130592 (150)	total: 181ms	remaining: 298ms
151:	learn: 0.0918158	test: 0.1129935	best: 0.1129935 (151)	total: 182ms	remaining: 297ms
152:	learn: 0.0916715	test: 0.1127948	best: 0.1127948 (152)	total: 184ms	remaining: 297ms
153:	learn: 0.0915537	test: 0.1126940	best: 0.1126940 (153)	total: 185ms	remaining: 296ms
154:	learn: 0.0914072	test: 0.1126106	best: 0.1126106 (154)	total: 186ms	remaining: 295ms
155:	learn: 0.0913256	test: 0.1125837	best: 0.1125837 (155)	total: 187ms	remaining: 293ms
156:	learn: 0.0912285	test: 0.1125446	best: 0.1125446 (156)	total: 189ms	remaining: 292ms
157:	learn: 0.0910858	test: 0.1124993	best: 0.1124993 (157)	total: 190ms	remaining: 290ms
158:	learn: 0.0909803	test: 0.1124056	best: 0.1124056 (158)	total: 191ms	remaining: 289ms
159:	learn: 0.0908318	test: 0.1123754	best: 0.1123754 (159)	total: 192ms	remaining: 287ms
160:	learn: 0.0907227	test: 0.1122985	best: 0.1122985 (160)	total: 193ms	remaining: 286ms
161:	learn: 0.0906148	test: 0.1123733	best: 0.1122985 (160)	total: 194ms	remaining: 285ms
162:	learn: 0.0905266	test: 0.1123665	best: 0.1122985 (160)	total: 195ms	remaining: 284ms
163:	learn: 0.0904305	test: 0.1123483	best: 0.1122985 (160)	total: 196ms	remaining: 283ms
164:	learn: 0.0903298	test: 0.1124488	best: 0.1122985 (160)	total: 198ms	remaining: 282ms
165:	learn: 0.0901252	test: 0.1123207	best: 0.1122985 (160)	total: 199ms	remaining: 280ms
166:	learn: 0.0899196	test: 0.1125013	best: 0.1122985 (160)	total: 200ms	remaining: 279ms
167:	learn: 0.0898374	test: 0.1124756	best: 0.1122985 (160)	total: 202ms	remaining: 279ms
168:	learn: 0.0897231	test: 0.1124067	best: 0.1122985 (160)	total: 203ms	remaining: 277ms
169:	learn: 0.0896347	test: 0.1123331	best: 0.1122985 (160)	total: 204ms	remaining: 276ms
170:	learn: 0.0895692	test: 0.1122509	best: 0.1122509 (170)	total: 205ms	remaining: 275ms
171:	learn: 0.0893535	test: 0.1120373	best: 0.1120373 (171)	total: 206ms	remaining: 274ms
172:	learn: 0.0892430	test: 0.1119319	best: 0.1119319 (172)	total: 208ms	remaining: 272ms
173:	learn: 0.0891091	test: 0.1118755	best: 0.1118755 (173)	total: 209ms	remaining: 271ms
174:	learn: 0.0890043	test: 0.1118101	best: 0.1118101 (174)	total: 210ms	remaining: 270ms
175:	learn: 0.0889312	test: 0.1117798	best: 0.1117798 (175)	total: 211ms	remaining: 269ms
176:	learn: 0.0887593	test: 0.1117034	best: 0.1117034 (176)	total: 212ms	remaining: 267ms
177:	learn: 0.0886939	test: 0.1116827	best: 0.1116827 (177)	total: 214ms	remaining: 266ms
178:	learn: 0.0886022	test: 0.1115877	best: 0.1115877 (178)	total: 215ms	remaining: 265ms
179:	learn: 0.0884886	test: 0.1115061	best: 0.1115061 (179)	total: 216ms	remaining: 264ms
180:	learn: 0.0884351	test: 0.1114868	best: 0.1114868 (180)	total: 217ms	remaining: 263ms
181:	learn: 0.0883543	test: 0.1114642	best: 0.1114642 (181)	total: 218ms	remaining: 261ms
182:	learn: 0.0882437	test: 0.1114052	best: 0.1114052 (182)	total: 219ms	remaining: 260ms
183:	learn: 0.0881110	test: 0.1114938	best: 0.1114052 (182)	total: 221ms	remaining: 259ms
184:	learn: 0.0880310	test: 0.1114055	best: 0.1114052 (182)	total: 222ms	remaining: 258ms
185:	learn: 0.0878591	test: 0.1115553	best: 0.1114052 (182)	total: 223ms	remaining: 257ms
186:	learn: 0.0877809	test: 0.1115405	best: 0.1114052 (182)	total: 226ms	remaining: 258ms
187:	learn: 0.0876687	test: 0.1115028	best: 0.1114052 (182)	total: 228ms	remaining: 257ms
188:	learn: 0.0876110	test: 0.1114478	best: 0.1114052 (182)	total: 229ms	remaining: 256ms
189:	learn: 0.0875145	test: 0.1113960	best: 0.1113960 (189)	total: 231ms	remaining: 255ms
190:	learn: 0.0874567	test: 0.1113519	best: 0.1113519 (190)	total: 233ms	remaining: 255ms
191:	learn: 0.0873714	test: 0.1113768	best: 0.1113519 (190)	total: 234ms	remaining: 254ms
192:	learn: 0.0873215	test: 0.1113901	best: 0.1113519 (190)	total: 235ms	remaining: 252ms
193:	learn: 0.0872394	test: 0.1113389	best: 0.1113389 (193)	total: 236ms	remaining: 251ms
194:	learn: 0.0871487	test: 0.1113540	best: 0.1113389 (193)	total: 238ms	remaining: 250ms
195:	learn: 0.0870852	test: 0.1113266	best: 0.1113266 (195)	total: 239ms	remaining: 248ms
196:	learn: 0.0869283	test: 0.1111243	best: 0.1111243 (196)	total: 240ms	remaining: 247ms
197:	learn: 0.0868019	test: 0.1109718	best: 0.1109718 (197)	total: 241ms	remaining: 246ms
198:	learn: 0.0866932	test: 0.1108496	best: 0.1108496 (198)	total: 242ms	remaining: 244ms
199:	learn: 0.0866094	test: 0.1107295	best: 0.1107295 (199)	total: 243ms	remaining: 243ms
200:	learn: 0.0864542	test: 0.1107204	best: 0.1107204 (200)	total: 245ms	remaining: 242ms
201:	learn: 0.0863112	test: 0.1107440	best: 0.1107204 (200)	total: 246ms	remaining: 241ms
202:	learn: 0.0861576	test: 0.1108970	best: 0.1107204 (200)	total: 247ms	remaining: 240ms
203:	learn: 0.0861036	test: 0.1108698	best: 0.1107204 (200)	total: 248ms	remaining: 239ms
204:	learn: 0.0860140	test: 0.1107712	best: 0.1107204 (200)	total: 249ms	remaining: 237ms
205:	learn: 0.0859152	test: 0.1107851	best: 0.1107204 (200)	total: 251ms	remaining: 236ms
206:	learn: 0.0858231	test: 0.1107777	best: 0.1107204 (200)	total: 252ms	remaining: 235ms
207:	learn: 0.0857352	test: 0.1106032	best: 0.1106032 (207)	total: 253ms	remaining: 234ms
208:	learn: 0.0856539	test: 0.1105571	best: 0.1105571 (208)	total: 254ms	remaining: 232ms
209:	learn: 0.0855579	test: 0.1106201	best: 0.1105571 (208)	total: 256ms	remaining: 231ms
210:	learn: 0.0854441	test: 0.1106209	best: 0.1105571 (208)	total: 258ms	remaining: 231ms
211:	learn: 0.0853700	test: 0.1105722	best: 0.1105571 (208)	total: 260ms	remaining: 230ms
212:	learn: 0.0853185	test: 0.1105645	best: 0.1105571 (208)	total: 262ms	remaining: 230ms
213:	learn: 0.0852212	test: 0.1105777	best: 0.1105571 (208)	total: 263ms	remaining: 228ms
214:	learn: 0.0851316	test: 0.1105863	best: 0.1105571 (208)	total: 264ms	remaining: 227ms
215:	learn: 0.0850663	test: 0.1105589	best: 0.1105571 (208)	total: 265ms	remaining: 226ms
216:	learn: 0.0849981	test: 0.1106163	best: 0.1105571 (208)	total: 266ms	remaining: 225ms
217:	learn: 0.0849084	test: 0.1105713	best: 0.1105571 (208)	total: 267ms	remaining: 223ms
218:	learn: 0.0848311	test: 0.1104909	best: 0.1104909 (218)	total: 268ms	remaining: 222ms
219:	learn: 0.0846755	test: 0.1104825	best: 0.1104825 (219)	total: 269ms	remaining: 220ms
220:	learn: 0.0845753	test: 0.1103844	best: 0.1103844 (220)	total: 270ms	remaining: 219ms
221:	learn: 0.0844887	test: 0.1103252	best: 0.1103252 (221)	total: 271ms	remaining: 218ms
222:	learn: 0.0844260	test: 0.1103138	best: 0.1103138 (222)	total: 272ms	remaining: 216ms
223:	learn: 0.0843778	test: 0.1102985	best: 0.1102985 (223)	total: 273ms	remaining: 215ms
224:	learn: 0.0843173	test: 0.1103132	best: 0.1102985 (223)	total: 275ms	remaining: 214ms
225:	learn: 0.0842647	test: 0.1102958	best: 0.1102958 (225)	total: 276ms	remaining: 213ms
226:	learn: 0.0842055	test: 0.1102494	best: 0.1102494 (226)	total: 277ms	remaining: 211ms
227:	learn: 0.0841352	test: 0.1101958	best: 0.1101958 (227)	total: 278ms	remaining: 210ms
228:	learn: 0.0840481	test: 0.1102352	best: 0.1101958 (227)	total: 279ms	remaining: 209ms
229:	learn: 0.0839448	test: 0.1101726	best: 0.1101726 (229)	total: 280ms	remaining: 207ms
230:	learn: 0.0838557	test: 0.1101150	best: 0.1101150 (230)	total: 281ms	remaining: 206ms
231:	learn: 0.0837400	test: 0.1101449	best: 0.1101150 (230)	total: 282ms	remaining: 205ms
232:	learn: 0.0836750	test: 0.1100895	best: 0.1100895 (232)	total: 283ms	remaining: 203ms
233:	learn: 0.0835944	test: 0.1099742	best: 0.1099742 (233)	total: 284ms	remaining: 202ms
234:	learn: 0.0835397	test: 0.1099391	best: 0.1099391 (234)	total: 285ms	remaining: 200ms
235:	learn: 0.0834504	test: 0.1099651	best: 0.1099391 (234)	total: 286ms	remaining: 199ms
236:	learn: 0.0833181	test: 0.1099626	best: 0.1099391 (234)	total: 288ms	remaining: 198ms
237:	learn: 0.0832351	test: 0.1099309	best: 0.1099309 (237)	total: 289ms	remaining: 196ms
238:	learn: 0.0831575	test: 0.1099300	best: 0.1099300 (238)	total: 290ms	remaining: 195ms
239:	learn: 0.0830476	test: 0.1100148	best: 0.1099300 (238)	total: 291ms	remaining: 194ms
240:	learn: 0.0830096	test: 0.1099954	best: 0.1099300 (238)	total: 292ms	remaining: 193ms
241:	learn: 0.0829563	test: 0.1099949	best: 0.1099300 (238)	total: 293ms	remaining: 192ms
242:	learn: 0.0828360	test: 0.1099383	best: 0.1099300 (238)	total: 295ms	remaining: 190ms
243:	learn: 0.0827789	test: 0.1099240	best: 0.1099240 (243)	total: 296ms	remaining: 189ms
244:	learn: 0.0827022	test: 0.1098465	best: 0.1098465 (244)	total: 297ms	remaining: 188ms
245:	learn: 0.0826128	test: 0.1098521	best: 0.1098465 (244)	total: 298ms	remaining: 186ms
246:	learn: 0.0825182	test: 0.1098579	best: 0.1098465 (244)	total: 299ms	remaining: 185ms
247:	learn: 0.0824523	test: 0.1098107	best: 0.1098107 (247)	total: 300ms	remaining: 184ms
248:	learn: 0.0823777	test: 0.1098015	best: 0.1098015 (248)	total: 301ms	remaining: 182ms
249:	learn: 0.0823040	test: 0.1097661	best: 0.1097661 (249)	total: 302ms	remaining: 181ms
250:	learn: 0.0822278	test: 0.1097338	best: 0.1097338 (250)	total: 303ms	remaining: 180ms
251:	learn: 0.0821855	test: 0.1097283	best: 0.1097283 (251)	total: 304ms	remaining: 178ms
252:	learn: 0.0819879	test: 0.1095249	best: 0.1095249 (252)	total: 305ms	remaining: 178ms
253:	learn: 0.0818656	test: 0.1096413	best: 0.1095249 (252)	total: 307ms	remaining: 176ms
254:	learn: 0.0817898	test: 0.1096533	best: 0.1095249 (252)	total: 308ms	remaining: 175ms
255:	learn: 0.0817288	test: 0.1096254	best: 0.1095249 (252)	total: 309ms	remaining: 174ms
256:	learn: 0.0816192	test: 0.1096664	best: 0.1095249 (252)	total: 310ms	remaining: 173ms
257:	learn: 0.0815670	test: 0.1096670	best: 0.1095249 (252)	total: 311ms	remaining: 171ms
258:	learn: 0.0815106	test: 0.1096398	best: 0.1095249 (252)	total: 312ms	remaining: 170ms
259:	learn: 0.0814246	test: 0.1096953	best: 0.1095249 (252)	total: 313ms	remaining: 169ms
260:	learn: 0.0813610	test: 0.1097606	best: 0.1095249 (252)	total: 314ms	remaining: 167ms
261:	learn: 0.0812894	test: 0.1097898	best: 0.1095249 (252)	total: 315ms	remaining: 166ms
262:	learn: 0.0812347	test: 0.1097166	best: 0.1095249 (252)	total: 316ms	remaining: 165ms
263:	learn: 0.0811918	test: 0.1096928	best: 0.1095249 (252)	total: 317ms	remaining: 163ms
264:	learn: 0.0811263	test: 0.1096955	best: 0.1095249 (252)	total: 318ms	remaining: 162ms
265:	learn: 0.0810741	test: 0.1096770	best: 0.1095249 (252)	total: 320ms	remaining: 161ms
266:	learn: 0.0810174	test: 0.1096632	best: 0.1095249 (252)	total: 321ms	remaining: 160ms
267:	learn: 0.0809405	test: 0.1096433	best: 0.1095249 (252)	total: 322ms	remaining: 159ms
268:	learn: 0.0809044	test: 0.1096821	best: 0.1095249 (252)	total: 323ms	remaining: 157ms
269:	learn: 0.0808608	test: 0.1096473	best: 0.1095249 (252)	total: 324ms	remaining: 156ms
270:	learn: 0.0808082	test: 0.1096809	best: 0.1095249 (252)	total: 325ms	remaining: 155ms
271:	learn: 0.0807270	test: 0.1096464	best: 0.1095249 (252)	total: 326ms	remaining: 154ms
272:	learn: 0.0806942	test: 0.1096466	best: 0.1095249 (252)	total: 328ms	remaining: 152ms
273:	learn: 0.0805725	test: 0.1095109	best: 0.1095109 (273)	total: 329ms	remaining: 151ms
274:	learn: 0.0804954	test: 0.1094933	best: 0.1094933 (274)	total: 330ms	remaining: 150ms
275:	learn: 0.0803473	test: 0.1093743	best: 0.1093743 (275)	total: 331ms	remaining: 149ms
276:	learn: 0.0802195	test: 0.1093133	best: 0.1093133 (276)	total: 332ms	remaining: 147ms
277:	learn: 0.0800580	test: 0.1092416	best: 0.1092416 (277)	total: 333ms	remaining: 146ms
278:	learn: 0.0800139	test: 0.1092060	best: 0.1092060 (278)	total: 334ms	remaining: 145ms
279:	learn: 0.0799581	test: 0.1091992	best: 0.1091992 (279)	total: 336ms	remaining: 144ms
280:	learn: 0.0798338	test: 0.1092328	best: 0.1091992 (279)	total: 337ms	remaining: 143ms
281:	learn: 0.0797745	test: 0.1092402	best: 0.1091992 (279)	total: 339ms	remaining: 142ms
282:	learn: 0.0797432	test: 0.1091953	best: 0.1091953 (282)	total: 340ms	remaining: 141ms
283:	learn: 0.0796916	test: 0.1091297	best: 0.1091297 (283)	total: 341ms	remaining: 139ms
284:	learn: 0.0796256	test: 0.1091398	best: 0.1091297 (283)	total: 342ms	remaining: 138ms
285:	learn: 0.0795587	test: 0.1090387	best: 0.1090387 (285)	total: 343ms	remaining: 137ms
286:	learn: 0.0794361	test: 0.1090678	best: 0.1090387 (285)	total: 344ms	remaining: 135ms
287:	learn: 0.0793803	test: 0.1090518	best: 0.1090387 (285)	total: 345ms	remaining: 134ms
288:	learn: 0.0793080	test: 0.1090551	best: 0.1090387 (285)	total: 346ms	remaining: 133ms
289:	learn: 0.0791528	test: 0.1093794	best: 0.1090387 (285)	total: 347ms	remaining: 132ms
290:	learn: 0.0790887	test: 0.1093520	best: 0.1090387 (285)	total: 348ms	remaining: 130ms
291:	learn: 0.0789821	test: 0.1093748	best: 0.1090387 (285)	total: 349ms	remaining: 129ms
292:	learn: 0.0789558	test: 0.1093766	best: 0.1090387 (285)	total: 350ms	remaining: 128ms
293:	learn: 0.0788757	test: 0.1094978	best: 0.1090387 (285)	total: 351ms	remaining: 127ms
294:	learn: 0.0788250	test: 0.1095292	best: 0.1090387 (285)	total: 353ms	remaining: 126ms
295:	learn: 0.0787167	test: 0.1095005	best: 0.1090387 (285)	total: 354ms	remaining: 124ms
296:	learn: 0.0786404	test: 0.1095661	best: 0.1090387 (285)	total: 355ms	remaining: 123ms
297:	learn: 0.0786078	test: 0.1095683	best: 0.1090387 (285)	total: 356ms	remaining: 122ms
298:	learn: 0.0785690	test: 0.1095983	best: 0.1090387 (285)	total: 357ms	remaining: 121ms
299:	learn: 0.0784890	test: 0.1095712	best: 0.1090387 (285)	total: 358ms	remaining: 119ms
300:	learn: 0.0784298	test: 0.1095531	best: 0.1090387 (285)	total: 359ms	remaining: 118ms
301:	learn: 0.0783555	test: 0.1095242	best: 0.1090387 (285)	total: 360ms	remaining: 117ms
302:	learn: 0.0782543	test: 0.1095253	best: 0.1090387 (285)	total: 361ms	remaining: 116ms
303:	learn: 0.0782189	test: 0.1095064	best: 0.1090387 (285)	total: 362ms	remaining: 114ms
304:	learn: 0.0780890	test: 0.1097640	best: 0.1090387 (285)	total: 363ms	remaining: 113ms
305:	learn: 0.0780314	test: 0.1098000	best: 0.1090387 (285)	total: 364ms	remaining: 112ms
306:	learn: 0.0779651	test: 0.1098240	best: 0.1090387 (285)	total: 366ms	remaining: 111ms
307:	learn: 0.0779274	test: 0.1097944	best: 0.1090387 (285)	total: 367ms	remaining: 110ms
308:	learn: 0.0778709	test: 0.1097594	best: 0.1090387 (285)	total: 368ms	remaining: 108ms
309:	learn: 0.0778112	test: 0.1097265	best: 0.1090387 (285)	total: 369ms	remaining: 107ms
310:	learn: 0.0777392	test: 0.1097005	best: 0.1090387 (285)	total: 370ms	remaining: 106ms
311:	learn: 0.0776931	test: 0.1096900	best: 0.1090387 (285)	total: 371ms	remaining: 105ms
312:	learn: 0.0776543	test: 0.1096935	best: 0.1090387 (285)	total: 372ms	remaining: 103ms
313:	learn: 0.0776145	test: 0.1096572	best: 0.1090387 (285)	total: 373ms	remaining: 102ms
314:	learn: 0.0775602	test: 0.1096280	best: 0.1090387 (285)	total: 374ms	remaining: 101ms
315:	learn: 0.0775176	test: 0.1096576	best: 0.1090387 (285)	total: 375ms	remaining: 99.8ms
316:	learn: 0.0774579	test: 0.1096574	best: 0.1090387 (285)	total: 376ms	remaining: 98.5ms
317:	learn: 0.0773092	test: 0.1096170	best: 0.1090387 (285)	total: 377ms	remaining: 97.3ms
318:	learn: 0.0772504	test: 0.1095731	best: 0.1090387 (285)	total: 378ms	remaining: 96.1ms
319:	learn: 0.0771873	test: 0.1095794	best: 0.1090387 (285)	total: 380ms	remaining: 95ms
320:	learn: 0.0771387	test: 0.1095748	best: 0.1090387 (285)	total: 381ms	remaining: 93.8ms
321:	learn: 0.0770773	test: 0.1096046	best: 0.1090387 (285)	total: 383ms	remaining: 92.8ms
322:	learn: 0.0769867	test: 0.1095040	best: 0.1090387 (285)	total: 385ms	remaining: 91.7ms
323:	learn: 0.0769273	test: 0.1095005	best: 0.1090387 (285)	total: 386ms	remaining: 90.5ms
324:	learn: 0.0768674	test: 0.1094842	best: 0.1090387 (285)	total: 387ms	remaining: 89.3ms
325:	learn: 0.0768219	test: 0.1095822	best: 0.1090387 (285)	total: 388ms	remaining: 88.1ms
326:	learn: 0.0767553	test: 0.1095586	best: 0.1090387 (285)	total: 389ms	remaining: 86.9ms
327:	learn: 0.0766546	test: 0.1097053	best: 0.1090387 (285)	total: 390ms	remaining: 85.7ms
328:	learn: 0.0766077	test: 0.1097065	best: 0.1090387 (285)	total: 392ms	remaining: 84.5ms
329:	learn: 0.0765629	test: 0.1096836	best: 0.1090387 (285)	total: 393ms	remaining: 83.3ms
330:	learn: 0.0765068	test: 0.1096924	best: 0.1090387 (285)	total: 394ms	remaining: 82.1ms
331:	learn: 0.0764696	test: 0.1096944	best: 0.1090387 (285)	total: 395ms	remaining: 81ms
332:	learn: 0.0764181	test: 0.1096828	best: 0.1090387 (285)	total: 396ms	remaining: 79.8ms
333:	learn: 0.0763792	test: 0.1096845	best: 0.1090387 (285)	total: 398ms	remaining: 78.7ms
334:	learn: 0.0763387	test: 0.1096890	best: 0.1090387 (285)	total: 399ms	remaining: 77.5ms
335:	learn: 0.0762353	test: 0.1099751	best: 0.1090387 (285)	total: 401ms	remaining: 76.3ms
336:	learn: 0.0761228	test: 0.1099220	best: 0.1090387 (285)	total: 402ms	remaining: 75.1ms
337:	learn: 0.0760731	test: 0.1099102	best: 0.1090387 (285)	total: 403ms	remaining: 73.9ms
338:	learn: 0.0760260	test: 0.1098701	best: 0.1090387 (285)	total: 404ms	remaining: 72.7ms
339:	learn: 0.0759487	test: 0.1097652	best: 0.1090387 (285)	total: 405ms	remaining: 71.5ms
340:	learn: 0.0759018	test: 0.1097560	best: 0.1090387 (285)	total: 406ms	remaining: 70.3ms
341:	learn: 0.0758178	test: 0.1097233	best: 0.1090387 (285)	total: 407ms	remaining: 69.1ms
342:	learn: 0.0757667	test: 0.1096959	best: 0.1090387 (285)	total: 408ms	remaining: 67.9ms
343:	learn: 0.0757136	test: 0.1096878	best: 0.1090387 (285)	total: 410ms	remaining: 66.7ms
344:	learn: 0.0756636	test: 0.1096745	best: 0.1090387 (285)	total: 412ms	remaining: 65.7ms
345:	learn: 0.0755979	test: 0.1097821	best: 0.1090387 (285)	total: 414ms	remaining: 64.6ms
346:	learn: 0.0754717	test: 0.1098580	best: 0.1090387 (285)	total: 415ms	remaining: 63.4ms
347:	learn: 0.0753690	test: 0.1099916	best: 0.1090387 (285)	total: 416ms	remaining: 62.2ms
348:	learn: 0.0753065	test: 0.1098898	best: 0.1090387 (285)	total: 417ms	remaining: 61ms
349:	learn: 0.0752608	test: 0.1098556	best: 0.1090387 (285)	total: 418ms	remaining: 59.8ms
350:	learn: 0.0752130	test: 0.1098460	best: 0.1090387 (285)	total: 420ms	remaining: 58.6ms
351:	learn: 0.0751146	test: 0.1098014	best: 0.1090387 (285)	total: 421ms	remaining: 57.3ms
352:	learn: 0.0750433	test: 0.1097119	best: 0.1090387 (285)	total: 422ms	remaining: 56.1ms
353:	learn: 0.0749517	test: 0.1096523	best: 0.1090387 (285)	total: 423ms	remaining: 54.9ms
354:	learn: 0.0749208	test: 0.1096464	best: 0.1090387 (285)	total: 424ms	remaining: 53.7ms
355:	learn: 0.0748039	test: 0.1097219	best: 0.1090387 (285)	total: 425ms	remaining: 52.5ms
356:	learn: 0.0747607	test: 0.1096895	best: 0.1090387 (285)	total: 426ms	remaining: 51.3ms
357:	learn: 0.0746868	test: 0.1097147	best: 0.1090387 (285)	total: 427ms	remaining: 50.1ms
358:	learn: 0.0746453	test: 0.1096680	best: 0.1090387 (285)	total: 428ms	remaining: 48.9ms
359:	learn: 0.0745351	test: 0.1096516	best: 0.1090387 (285)	total: 430ms	remaining: 47.7ms
360:	learn: 0.0744792	test: 0.1096190	best: 0.1090387 (285)	total: 431ms	remaining: 46.5ms
361:	learn: 0.0744303	test: 0.1095743	best: 0.1090387 (285)	total: 432ms	remaining: 45.3ms
362:	learn: 0.0743779	test: 0.1095727	best: 0.1090387 (285)	total: 433ms	remaining: 44.1ms
363:	learn: 0.0742372	test: 0.1094344	best: 0.1090387 (285)	total: 434ms	remaining: 42.9ms
364:	learn: 0.0741868	test: 0.1093899	best: 0.1090387 (285)	total: 435ms	remaining: 41.7ms
365:	learn: 0.0741318	test: 0.1094603	best: 0.1090387 (285)	total: 436ms	remaining: 40.5ms
366:	learn: 0.0740437	test: 0.1095081	best: 0.1090387 (285)	total: 437ms	remaining: 39.3ms
367:	learn: 0.0740015	test: 0.1094998	best: 0.1090387 (285)	total: 438ms	remaining: 38.1ms
368:	learn: 0.0739510	test: 0.1094563	best: 0.1090387 (285)	total: 439ms	remaining: 36.9ms
369:	learn: 0.0738709	test: 0.1094638	best: 0.1090387 (285)	total: 440ms	remaining: 35.7ms
370:	learn: 0.0738019	test: 0.1095729	best: 0.1090387 (285)	total: 441ms	remaining: 34.5ms
371:	learn: 0.0737557	test: 0.1095458	best: 0.1090387 (285)	total: 443ms	remaining: 33.3ms
372:	learn: 0.0736786	test: 0.1096602	best: 0.1090387 (285)	total: 444ms	remaining: 32.1ms
373:	learn: 0.0736420	test: 0.1096754	best: 0.1090387 (285)	total: 445ms	remaining: 30.9ms
374:	learn: 0.0735951	test: 0.1096732	best: 0.1090387 (285)	total: 446ms	remaining: 29.7ms
375:	learn: 0.0735158	test: 0.1096646	best: 0.1090387 (285)	total: 447ms	remaining: 28.5ms
376:	learn: 0.0733290	test: 0.1095347	best: 0.1090387 (285)	total: 448ms	remaining: 27.3ms
377:	learn: 0.0732774	test: 0.1095442	best: 0.1090387 (285)	total: 449ms	remaining: 26.1ms
378:	learn: 0.0732155	test: 0.1095253	best: 0.1090387 (285)	total: 450ms	remaining: 25ms
379:	learn: 0.0731786	test: 0.1095054	best: 0.1090387 (285)	total: 451ms	remaining: 23.8ms
380:	learn: 0.0731490	test: 0.1094936	best: 0.1090387 (285)	total: 452ms	remaining: 22.6ms
381:	learn: 0.0731123	test: 0.1094864	best: 0.1090387 (285)	total: 453ms	remaining: 21.4ms
382:	learn: 0.0730133	test: 0.1094942	best: 0.1090387 (285)	total: 454ms	remaining: 20.2ms
383:	learn: 0.0729677	test: 0.1094586	best: 0.1090387 (285)	total: 456ms	remaining: 19ms
384:	learn: 0.0729205	test: 0.1094087	best: 0.1090387 (285)	total: 457ms	remaining: 17.8ms
385:	learn: 0.0728466	test: 0.1094079	best: 0.1090387 (285)	total: 458ms	remaining: 16.6ms
386:	learn: 0.0728255	test: 0.1093853	best: 0.1090387 (285)	total: 460ms	remaining: 15.4ms
387:	learn: 0.0727509	test: 0.1093346	best: 0.1090387 (285)	total: 461ms	remaining: 14.2ms
388:	learn: 0.0727097	test: 0.1093376	best: 0.1090387 (285)	total: 462ms	remaining: 13.1ms
389:	learn: 0.0726827	test: 0.1093439	best: 0.1090387 (285)	total: 463ms	remaining: 11.9ms
390:	learn: 0.0726486	test: 0.1093401	best: 0.1090387 (285)	total: 464ms	remaining: 10.7ms
391:	learn: 0.0725629	test: 0.1092887	best: 0.1090387 (285)	total: 465ms	remaining: 9.48ms
392:	learn: 0.0725171	test: 0.1092742	best: 0.1090387 (285)	total: 466ms	remaining: 8.3ms
393:	learn: 0.0724457	test: 0.1095158	best: 0.1090387 (285)	total: 467ms	remaining: 7.11ms
394:	learn: 0.0723768	test: 0.1094662	best: 0.1090387 (285)	total: 468ms	remaining: 5.92ms
395:	learn: 0.0723423	test: 0.1094853	best: 0.1090387 (285)	total: 469ms	remaining: 4.74ms
396:	learn: 0.0723156	test: 0.1094920	best: 0.1090387 (285)	total: 470ms	remaining: 3.55ms
397:	learn: 0.0722912	test: 0.1094988	best: 0.1090387 (285)	total: 471ms	remaining: 2.37ms
398:	learn: 0.0722248	test: 0.1095238	best: 0.1090387 (285)	total: 472ms	remaining: 1.18ms
399:	learn: 0.0721866	test: 0.1095631	best: 0.1090387 (285)	total: 474ms	remaining: 0us

bestTest = 0.109038714
bestIteration = 285

Shrink model to first 286 iterations.
[LightGBM] [Warning] Unknown parameter: l2_leaf_reg
[1]	valid_0's l2: 0.15983
[2]	valid_0's l2: 0.14114
[3]	valid_0's l2: 0.125777
[4]	valid_0's l2: 0.112774
[5]	valid_0's l2: 0.102087
[6]	valid_0's l2: 0.092657
[7]	valid_0's l2: 0.0849863
[8]	valid_0's l2: 0.0777165
[9]	valid_0's l2: 0.0719836
[10]	valid_0's l2: 0.0662702
[11]	valid_0's l2: 0.0614553
[12]	valid_0's l2: 0.0571976
[13]	valid_0's l2: 0.0537112
[14]	valid_0's l2: 0.0501064
[15]	valid_0's l2: 0.0469454
[16]	valid_0's l2: 0.0445546
[17]	valid_0's l2: 0.0418454
[18]	valid_0's l2: 0.0398523
[19]	valid_0's l2: 0.0377619
[20]	valid_0's l2: 0.0360565
[21]	valid_0's l2: 0.0345654
[22]	valid_0's l2: 0.0331071
[23]	valid_0's l2: 0.0318273
[24]	valid_0's l2: 0.0306925
[25]	valid_0's l2: 0.0295018
[26]	valid_0's l2: 0.0283773
[27]	valid_0's l2: 0.0274389
[28]	valid_0's l2: 0.026613
[29]	valid_0's l2: 0.0259555
[30]	valid_0's l2: 0.0252
[31]	valid_0's l2: 0.024593
[32]	valid_0's l2: 0.0239306
[33]	valid_0's l2: 0.0233019
[34]	valid_0's l2: 0.0227093
[35]	valid_0's l2: 0.0222372
[36]	valid_0's l2: 0.0217566
[37]	valid_0's l2: 0.0213404
[38]	valid_0's l2: 0.0209373
[39]	valid_0's l2: 0.0203797
[40]	valid_0's l2: 0.0200027
[41]	valid_0's l2: 0.0196511
[42]	valid_0's l2: 0.0192751
[43]	valid_0's l2: 0.0190039
[44]	valid_0's l2: 0.018744
[45]	valid_0's l2: 0.018332
[46]	valid_0's l2: 0.0180999
[47]	valid_0's l2: 0.0178666
[48]	valid_0's l2: 0.0175394
[49]	valid_0's l2: 0.0174035
[50]	valid_0's l2: 0.0172147
[51]	valid_0's l2: 0.0170752
[52]	valid_0's l2: 0.0168951
[53]	valid_0's l2: 0.0166217
[54]	valid_0's l2: 0.0164738
[55]	valid_0's l2: 0.016346
[56]	valid_0's l2: 0.0161593
[57]	valid_0's l2: 0.0160082
[58]	valid_0's l2: 0.0157713
[59]	valid_0's l2: 0.0156734
[60]	valid_0's l2: 0.0155806
[61]	valid_0's l2: 0.0154955
[62]	valid_0's l2: 0.01539
[63]	valid_0's l2: 0.0153041
[64]	valid_0's l2: 0.0151818
[65]	valid_0's l2: 0.0150957
[66]	valid_0's l2: 0.0150032
[67]	valid_0's l2: 0.014956
[68]	valid_0's l2: 0.0148364
[69]	valid_0's l2: 0.0147345
[70]	valid_0's l2: 0.0146815
[71]	valid_0's l2: 0.0146034
[72]	valid_0's l2: 0.0145546
[73]	valid_0's l2: 0.0145224
[74]	valid_0's l2: 0.0144812
[75]	valid_0's l2: 0.0144255
[76]	valid_0's l2: 0.0143621
[77]	valid_0's l2: 0.0142755
[78]	valid_0's l2: 0.0142461
[79]	valid_0's l2: 0.0141805
[80]	valid_0's l2: 0.0141289
[81]	valid_0's l2: 0.0140862
[82]	valid_0's l2: 0.0140542
[83]	valid_0's l2: 0.014018
[84]	valid_0's l2: 0.0139836
[85]	valid_0's l2: 0.0139322
[86]	valid_0's l2: 0.0138977
[87]	valid_0's l2: 0.013849
[88]	valid_0's l2: 0.0138161
[89]	valid_0's l2: 0.0137912
[90]	valid_0's l2: 0.0138193
[91]	valid_0's l2: 0.0137841
[92]	valid_0's l2: 0.0137548
[93]	valid_0's l2: 0.0137277
[94]	valid_0's l2: 0.0137059
[95]	valid_0's l2: 0.0136839
[96]	valid_0's l2: 0.0136372
[97]	valid_0's l2: 0.0136093
[98]	valid_0's l2: 0.0135439
[99]	valid_0's l2: 0.0135337
[100]	valid_0's l2: 0.0135078
[101]	valid_0's l2: 0.013482
[102]	valid_0's l2: 0.0134402
[103]	valid_0's l2: 0.0134209
[104]	valid_0's l2: 0.0134065
[105]	valid_0's l2: 0.0133779
[106]	valid_0's l2: 0.01335
[107]	valid_0's l2: 0.0133151
[108]	valid_0's l2: 0.0133118
[109]	valid_0's l2: 0.0133065
[110]	valid_0's l2: 0.0132944
[111]	valid_0's l2: 0.0133084
[112]	valid_0's l2: 0.0132946
[113]	valid_0's l2: 0.0132688
[114]	valid_0's l2: 0.013263
[115]	valid_0's l2: 0.0132471
[116]	valid_0's l2: 0.0132125
[117]	valid_0's l2: 0.0131987
[118]	valid_0's l2: 0.0131674
[119]	valid_0's l2: 0.0131385
[120]	valid_0's l2: 0.0131525
[121]	valid_0's l2: 0.0131484
[122]	valid_0's l2: 0.0131303
[123]	valid_0's l2: 0.0131215
[124]	valid_0's l2: 0.0131094
[125]	valid_0's l2: 0.0130954
[126]	valid_0's l2: 0.0130813
[127]	valid_0's l2: 0.013058
[128]	valid_0's l2: 0.0130627
[129]	valid_0's l2: 0.013051
[130]	valid_0's l2: 0.0130704
[131]	valid_0's l2: 0.0130351
[132]	valid_0's l2: 0.0130191
[133]	valid_0's l2: 0.0130164
[134]	valid_0's l2: 0.0130046
[135]	valid_0's l2: 0.0130002
[136]	valid_0's l2: 0.0129912
[137]	valid_0's l2: 0.0129772
[138]	valid_0's l2: 0.0129708
[139]	valid_0's l2: 0.0129596
[140]	valid_0's l2: 0.0129303
[141]	valid_0's l2: 0.0129099
[142]	valid_0's l2: 0.0128893
[143]	valid_0's l2: 0.0128927
[144]	valid_0's l2: 0.0128827
[145]	valid_0's l2: 0.0129087
[146]	valid_0's l2: 0.0129037
[147]	valid_0's l2: 0.0128926
[148]	valid_0's l2: 0.0128886
[149]	valid_0's l2: 0.0128832
[150]	valid_0's l2: 0.0128629
[151]	valid_0's l2: 0.0128563
[152]	valid_0's l2: 0.0128497
[153]	valid_0's l2: 0.0128369
[154]	valid_0's l2: 0.0128284
[155]	valid_0's l2: 0.0128159
[156]	valid_0's l2: 0.0128164
[157]	valid_0's l2: 0.0128116
[158]	valid_0's l2: 0.0127923
[159]	valid_0's l2: 0.0128002
[160]	valid_0's l2: 0.0128045
[161]	valid_0's l2: 0.0128045
[162]	valid_0's l2: 0.0128323
[163]	valid_0's l2: 0.0128273
[164]	valid_0's l2: 0.0128044
[165]	valid_0's l2: 0.0128043
[166]	valid_0's l2: 0.0127883
[167]	valid_0's l2: 0.0127742
[168]	valid_0's l2: 0.0127738
[169]	valid_0's l2: 0.0127697
[170]	valid_0's l2: 0.0127625
[171]	valid_0's l2: 0.0127569
[172]	valid_0's l2: 0.0127583
[173]	valid_0's l2: 0.0127519
[174]	valid_0's l2: 0.0127407
[175]	valid_0's l2: 0.0127254
[176]	valid_0's l2: 0.0127128
[177]	valid_0's l2: 0.0127149
[178]	valid_0's l2: 0.0127157
[179]	valid_0's l2: 0.012709
[180]	valid_0's l2: 0.0126795
[181]	valid_0's l2: 0.012676
[182]	valid_0's l2: 0.0126917
[183]	valid_0's l2: 0.0126934
[184]	valid_0's l2: 0.0126876
[185]	valid_0's l2: 0.0126743
[186]	valid_0's l2: 0.0126705
[187]	valid_0's l2: 0.0126758
[188]	valid_0's l2: 0.0126666
[189]	valid_0's l2: 0.0126569
[190]	valid_0's l2: 0.0126496
[191]	valid_0's l2: 0.0126472
[192]	valid_0's l2: 0.012639
[193]	valid_0's l2: 0.0126321
[194]	valid_0's l2: 0.0126311
[195]	valid_0's l2: 0.0126238
[196]	valid_0's l2: 0.0126176
[197]	valid_0's l2: 0.0126021
[198]	valid_0's l2: 0.0125974
[199]	valid_0's l2: 0.0126007
[200]	valid_0's l2: 0.0125972
[201]	valid_0's l2: 0.0125939
[202]	valid_0's l2: 0.0126184
[203]	valid_0's l2: 0.0126185
[204]	valid_0's l2: 0.0126241
[205]	valid_0's l2: 0.0126229
[206]	valid_0's l2: 0.0126256
[207]	valid_0's l2: 0.0126279
[208]	valid_0's l2: 0.0126227
[209]	valid_0's l2: 0.0126184
[210]	valid_0's l2: 0.0126156
[211]	valid_0's l2: 0.0126188
[212]	valid_0's l2: 0.0126137
[213]	valid_0's l2: 0.0126175
[214]	valid_0's l2: 0.0126136
[215]	valid_0's l2: 0.0126084
[216]	valid_0's l2: 0.0126072
[217]	valid_0's l2: 0.0126036
[218]	valid_0's l2: 0.0125882
[219]	valid_0's l2: 0.0125864
[220]	valid_0's l2: 0.0125878
[221]	valid_0's l2: 0.012589
[222]	valid_0's l2: 0.0125837
[223]	valid_0's l2: 0.0125829
[224]	valid_0's l2: 0.0125801
[225]	valid_0's l2: 0.012573
[226]	valid_0's l2: 0.012576
[227]	valid_0's l2: 0.0125697
[228]	valid_0's l2: 0.0125688
[229]	valid_0's l2: 0.012567
[230]	valid_0's l2: 0.0125354
[231]	valid_0's l2: 0.0125135
[232]	valid_0's l2: 0.0125098
[233]	valid_0's l2: 0.0125127
[234]	valid_0's l2: 0.0125157
[235]	valid_0's l2: 0.0125135
[236]	valid_0's l2: 0.0125142
[237]	valid_0's l2: 0.0125105
[238]	valid_0's l2: 0.0125116
[239]	valid_0's l2: 0.0125066
[240]	valid_0's l2: 0.0125299
[241]	valid_0's l2: 0.0125276
[242]	valid_0's l2: 0.0125167
[243]	valid_0's l2: 0.0125348
[244]	valid_0's l2: 0.0125235
[245]	valid_0's l2: 0.0125461
[246]	valid_0's l2: 0.012544
[247]	valid_0's l2: 0.0125631
[248]	valid_0's l2: 0.0125613
[249]	valid_0's l2: 0.0125762
[250]	valid_0's l2: 0.0125479
[251]	valid_0's l2: 0.0125663
[252]	valid_0's l2: 0.01257
[253]	valid_0's l2: 0.0125612
[254]	valid_0's l2: 0.0125573
[255]	valid_0's l2: 0.012556
[256]	valid_0's l2: 0.0125529
[257]	valid_0's l2: 0.0125546
[258]	valid_0's l2: 0.01255
[259]	valid_0's l2: 0.0125404
[260]	valid_0's l2: 0.012538
[261]	valid_0's l2: 0.0125382
[262]	valid_0's l2: 0.0125382
[263]	valid_0's l2: 0.0125487
[264]	valid_0's l2: 0.0125481
[265]	valid_0's l2: 0.0125406
[266]	valid_0's l2: 0.0125598
[267]	valid_0's l2: 0.0125577
[268]	valid_0's l2: 0.0125645
[269]	valid_0's l2: 0.0125503
[270]	valid_0's l2: 0.0125518
[271]	valid_0's l2: 0.0125525
[272]	valid_0's l2: 0.012557
[273]	valid_0's l2: 0.012559
[274]	valid_0's l2: 0.012565
[275]	valid_0's l2: 0.0125695
[276]	valid_0's l2: 0.0125666
[277]	valid_0's l2: 0.0125648
[278]	valid_0's l2: 0.0125672
[279]	valid_0's l2: 0.0125665
[280]	valid_0's l2: 0.0125712
[281]	valid_0's l2: 0.0125688
[282]	valid_0's l2: 0.01257
[283]	valid_0's l2: 0.0125765
[284]	valid_0's l2: 0.0125743
[285]	valid_0's l2: 0.0125738
[286]	valid_0's l2: 0.0125725
[287]	valid_0's l2: 0.0125652
[288]	valid_0's l2: 0.0125657
[289]	valid_0's l2: 0.0125665
[290]	valid_0's l2: 0.0125498
[291]	valid_0's l2: 0.0124884
[292]	valid_0's l2: 0.0124762
[293]	valid_0's l2: 0.0124748
[294]	valid_0's l2: 0.0124518
[295]	valid_0's l2: 0.0124529
[296]	valid_0's l2: 0.0124509
[297]	valid_0's l2: 0.0124506
[298]	valid_0's l2: 0.0124634
[299]	valid_0's l2: 0.0124605
[300]	valid_0's l2: 0.0124651
[301]	valid_0's l2: 0.0124713
[302]	valid_0's l2: 0.0124768
[303]	valid_0's l2: 0.0124762
[304]	valid_0's l2: 0.0124769
[305]	valid_0's l2: 0.0124711
[306]	valid_0's l2: 0.0124699
[307]	valid_0's l2: 0.012468
[308]	valid_0's l2: 0.0124629
[309]	valid_0's l2: 0.0124521
[310]	valid_0's l2: 0.0124492
[311]	valid_0's l2: 0.012434
[312]	valid_0's l2: 0.0124331
[313]	valid_0's l2: 0.0124424
[314]	valid_0's l2: 0.0124358
[315]	valid_0's l2: 0.0123922
[316]	valid_0's l2: 0.0123922
[317]	valid_0's l2: 0.0123908
[318]	valid_0's l2: 0.0123734
[319]	valid_0's l2: 0.012366
[320]	valid_0's l2: 0.0123668
[321]	valid_0's l2: 0.0123693
[322]	valid_0's l2: 0.012365
[323]	valid_0's l2: 0.012368
[324]	valid_0's l2: 0.0123852
[325]	valid_0's l2: 0.0123803
[326]	valid_0's l2: 0.0124041
[327]	valid_0's l2: 0.012406
[328]	valid_0's l2: 0.0124019
[329]	valid_0's l2: 0.0124007
[330]	valid_0's l2: 0.0123994
[331]	valid_0's l2: 0.0123938
[332]	valid_0's l2: 0.0123807
[333]	valid_0's l2: 0.0123788
[334]	valid_0's l2: 0.012382
[335]	valid_0's l2: 0.0123824
[336]	valid_0's l2: 0.0123959
[337]	valid_0's l2: 0.0123969
[338]	valid_0's l2: 0.0123994
[339]	valid_0's l2: 0.0124066
[340]	valid_0's l2: 0.012399
[341]	valid_0's l2: 0.0123972
[342]	valid_0's l2: 0.0123595
[343]	valid_0's l2: 0.0123799
[344]	valid_0's l2: 0.0123753
[345]	valid_0's l2: 0.0124026
[346]	valid_0's l2: 0.0124074
[347]	valid_0's l2: 0.012402
[348]	valid_0's l2: 0.0123805
[349]	valid_0's l2: 0.0123794
[350]	valid_0's l2: 0.0123785
[351]	valid_0's l2: 0.0123693
[352]	valid_0's l2: 0.0123669
[353]	valid_0's l2: 0.01237
[354]	valid_0's l2: 0.0123694
[355]	valid_0's l2: 0.0123681
[356]	valid_0's l2: 0.0123628
[357]	valid_0's l2: 0.0123659
[358]	valid_0's l2: 0.012356
[359]	valid_0's l2: 0.0123619
[360]	valid_0's l2: 0.0123545
[361]	valid_0's l2: 0.0123481
[362]	valid_0's l2: 0.0123467
[363]	valid_0's l2: 0.0123465
[364]	valid_0's l2: 0.0123629
[365]	valid_0's l2: 0.0123639
[366]	valid_0's l2: 0.0123613
[367]	valid_0's l2: 0.0123565
[368]	valid_0's l2: 0.01237
[369]	valid_0's l2: 0.0123494
[370]	valid_0's l2: 0.0123407
[371]	valid_0's l2: 0.0123663
[372]	valid_0's l2: 0.0123626
[373]	valid_0's l2: 0.012353
[374]	valid_0's l2: 0.0123439
[375]	valid_0's l2: 0.0123455
[376]	valid_0's l2: 0.0123436
[377]	valid_0's l2: 0.0123439
[378]	valid_0's l2: 0.012349
[379]	valid_0's l2: 0.0123454
[380]	valid_0's l2: 0.0123532
[381]	valid_0's l2: 0.0123591
[382]	valid_0's l2: 0.0123592
[383]	valid_0's l2: 0.01236
[384]	valid_0's l2: 0.012349
[385]	valid_0's l2: 0.0123615
[386]	valid_0's l2: 0.0123597
[387]	valid_0's l2: 0.0123706
[388]	valid_0's l2: 0.0123662
[389]	valid_0's l2: 0.0123694
[390]	valid_0's l2: 0.0123711
[391]	valid_0's l2: 0.0123731
[392]	valid_0's l2: 0.0123627
[393]	valid_0's l2: 0.012359
[394]	valid_0's l2: 0.0123872
[395]	valid_0's l2: 0.0123943
[396]	valid_0's l2: 0.0123942
[397]	valid_0's l2: 0.0123989
[398]	valid_0's l2: 0.0123833
[399]	valid_0's l2: 0.0123891
[400]	valid_0's l2: 0.0123847
[0]	validation_0-rmse:9.78854
[1]	validation_0-rmse:6.85916
[2]	validation_0-rmse:4.80457
[3]	validation_0-rmse:3.36999
[4]	validation_0-rmse:2.36325
[5]	validation_0-rmse:1.65736
[6]	validation_0-rmse:1.16802
[7]	validation_0-rmse:0.82540
[8]	validation_0-rmse:0.58796
[9]	validation_0-rmse:0.42355
[10]	validation_0-rmse:0.31311
[11]	validation_0-rmse:0.23925
[12]	validation_0-rmse:0.19368
[13]	validation_0-rmse:0.16539
[14]	validation_0-rmse:0.14816
[15]	validation_0-rmse:0.13752
[16]	validation_0-rmse:0.13193
[17]	validation_0-rmse:0.12850
[18]	validation_0-rmse:0.12621
[19]	validation_0-rmse:0.12472
[20]	validation_0-rmse:0.12456
[21]	validation_0-rmse:0.12308
[22]	validation_0-rmse:0.12194
[23]	validation_0-rmse:0.12088
[24]	validation_0-rmse:0.12032
[25]	validation_0-rmse:0.11966
[26]	validation_0-rmse:0.11848
[27]	validation_0-rmse:0.11830
[28]	validation_0-rmse:0.11799
[29]	validation_0-rmse:0.11748
[30]	validation_0-rmse:0.11720
[31]	validation_0-rmse:0.11711
[32]	validation_0-rmse:0.11658
[33]	validation_0-rmse:0.11675
[34]	validation_0-rmse:0.11660
[35]	validation_0-rmse:0.11657
[36]	validation_0-rmse:0.11684
[37]	validation_0-rmse:0.11625
[38]	validation_0-rmse:0.11625
[39]	validation_0-rmse:0.11618
[40]	validation_0-rmse:0.11666
[41]	validation_0-rmse:0.11633
[42]	validation_0-rmse:0.11652
[43]	validation_0-rmse:0.11651
[44]	validation_0-rmse:0.11643
[45]	validation_0-rmse:0.11627
[46]	validation_0-rmse:0.11620
[47]	validation_0-rmse:0.11640
[48]	validation_0-rmse:0.11662
[49]	validation_0-rmse:0.11629
[50]	validation_0-rmse:0.11654
[51]	validation_0-rmse:0.11680
[52]	validation_0-rmse:0.11678
[53]	validation_0-rmse:0.11682
[54]	validation_0-rmse:0.11703
[55]	validation_0-rmse:0.11728
[56]	validation_0-rmse:0.11716
[57]	validation_0-rmse:0.11714
[58]	validation_0-rmse:0.11733
[59]	validation_0-rmse:0.11756
[60]	validation_0-rmse:0.11788
[61]	validation_0-rmse:0.11776
[62]	validation_0-rmse:0.11742
[63]	validation_0-rmse:0.11746
[64]	validation_0-rmse:0.11754
[65]	validation_0-rmse:0.11761
[66]	validation_0-rmse:0.11719
[67]	validation_0-rmse:0.11721
[68]	validation_0-rmse:0.11727
[69]	validation_0-rmse:0.11728
[70]	validation_0-rmse:0.11793
[71]	validation_0-rmse:0.11797
[72]	validation_0-rmse:0.11821
[73]	validation_0-rmse:0.11821
[74]	validation_0-rmse:0.11835
[75]	validation_0-rmse:0.11838
[76]	validation_0-rmse:0.11838
[77]	validation_0-rmse:0.11902
[78]	validation_0-rmse:0.11921
[79]	validation_0-rmse:0.11939
[80]	validation_0-rmse:0.11976
[81]	validation_0-rmse:0.11993
[82]	validation_0-rmse:0.11995
[83]	validation_0-rmse:0.11989
[84]	validation_0-rmse:0.11984
[85]	validation_0-rmse:0.12000
[86]	validation_0-rmse:0.12000
[87]	validation_0-rmse:0.12000
[88]	validation_0-rmse:0.11994
[89]	validation_0-rmse:0.11994
[90]	validation_0-rmse:0.11994
[91]	validation_0-rmse:0.12003
[92]	validation_0-rmse:0.12012
[93]	validation_0-rmse:0.12014
[94]	validation_0-rmse:0.12051
[95]	validation_0-rmse:0.12079
[96]	validation_0-rmse:0.12087
[97]	validation_0-rmse:0.12087
[98]	validation_0-rmse:0.12086
[99]	validation_0-rmse:0.12078
In [181]:
scores, test_preds #0.09152509120742014
Out[181]:
([['CatBoostRegressor', 0.10903871215663595],
  ['LGBMRegressor', 0.1112864047191497],
  ['XGBRegressor', 0.12077656391223711],
  ['GradientBoostingRegressor', 0.1157004087536634]],
 [array([14.62862902, 13.81366005, 14.03759899, ..., 14.49412432,
         14.11853428, 15.02925027]),
  array([14.64822627, 13.83499445, 14.02397003, ..., 14.47579454,
         14.07962661, 15.04850494]),
  array([14.648105 , 13.8144245, 14.021985 , ..., 14.498511 , 14.087087 ,
         14.911426 ], dtype=float32),
  array([14.60639374, 13.81127295, 14.00952014, ..., 14.45821006,
         14.08153861, 14.98560428])])
In [188]:
list(zip(train_X.columns,models_[0].feature_importances_)), list(zip(train_X.columns, models_[2].feature_importances_))
Out[188]:
([('num_pipe__bedroom', 14.440752380821504),
  ('num_pipe__parking_space', 0.42193031392624447),
  ('num_pipe__space_area', 0.0),
  ('num_pipe__bathroom', 0.1685810124930052),
  ('num_pipe__z', 14.258398761160132),
  ('num_pipe__x', 7.029880413386731),
  ('num_pipe__bed_per_bath', 0.10594722421654382),
  ('num_pipe__ID', 0.7186007876871836),
  ('num_pipe__y', 6.453046313850768),
  ('num_pipe__bed_per_park', 0.10204489773661836),
  ('num_pipe__allrooms', 9.695146903211823),
  ('num_pipe__privacy', 0.002175669662605856),
  ('cat_pipe__loc_abia', 0.023669016163109426),
  ('cat_pipe__loc_adamawa', 0.08513412751441925),
  ('cat_pipe__loc_akwa ibom', 0.02263391315098677),
  ('cat_pipe__loc_anambra', 0.7812246452115704),
  ('cat_pipe__loc_bauchi', 0.024531931302872338),
  ('cat_pipe__loc_bayelsa', 0.026687751181418716),
  ('cat_pipe__loc_benue', 0.0),
  ('cat_pipe__loc_borno', 0.0),
  ('cat_pipe__loc_cross river', 0.0),
  ('cat_pipe__loc_delta', 0.08642133513787315),
  ('cat_pipe__loc_ebonyi', 0.4054426596739886),
  ('cat_pipe__loc_edo', 0.10193154666996522),
  ('cat_pipe__loc_ekiti', 0.027058541215460155),
  ('cat_pipe__loc_enugu', 0.1248647906765626),
  ('cat_pipe__loc_gombe', 0.0026008395341342813),
  ('cat_pipe__loc_imo', 0.010745198793584767),
  ('cat_pipe__loc_jigawa', 0.020195197165683302),
  ('cat_pipe__loc_kaduna', 0.02121994338051149),
  ('cat_pipe__loc_kano', 0.16076781421972364),
  ('cat_pipe__loc_katsina', 0.010239705340432182),
  ('cat_pipe__loc_kebbi', 0.017852118962980203),
  ('cat_pipe__loc_kogi', 0.04861386487038757),
  ('cat_pipe__loc_kwara', 0.17573765376327447),
  ('cat_pipe__loc_lagos', 5.515003720309134),
  ('cat_pipe__loc_nasarawa', 0.0586856189593374),
  ('cat_pipe__loc_niger', 0.005019238344099219),
  ('cat_pipe__loc_ogun', 0.0),
  ('cat_pipe__loc_ondo', 0.09520836427831393),
  ('cat_pipe__loc_osun', 0.01199644537041218),
  ('cat_pipe__loc_oyo', 0.00966565246085248),
  ('cat_pipe__loc_plateau', 0.0),
  ('cat_pipe__loc_rivers', 0.022238956783261526),
  ('cat_pipe__loc_sokoto', 0.11564131131470311),
  ('cat_pipe__loc_taraba', 0.012945381214568145),
  ('cat_pipe__loc_yobe', 0.009043503347280551),
  ('cat_pipe__loc_zamfara', 0.2719377879515702),
  ('cat_pipe__title_Apartment', 4.268372540856749),
  ('cat_pipe__title_Bungalow', 0.5689998619089702),
  ('cat_pipe__title_Cottage', 0.778094239747613),
  ('cat_pipe__title_Detached duplex', 0.7131329954135393),
  ('cat_pipe__title_Flat', 0.3572321709246325),
  ('cat_pipe__title_Mansion', 19.463944383554104),
  ('cat_pipe__title_Penthouse', 2.862438497541186),
  ('cat_pipe__title_Semi-detached duplex', 0.0),
  ('cat_pipe__title_Terrace duplex', 0.0),
  ('cat_pipe__title_Townhouse', 0.0),
  ('cat_pipe__zone_North Central', 0.0),
  ('cat_pipe__zone_North East', 0.7526379964514219),
  ('cat_pipe__zone_North West', 1.1223772677518224),
  ('cat_pipe__zone_South East', 0.009804131509289214),
  ('cat_pipe__zone_South South', 4.926577053134576),
  ('cat_pipe__zone_South West', 2.474925608790404)],
 [('num_pipe__bedroom', 0.0480243),
  ('num_pipe__parking_space', 0.0024457106),
  ('num_pipe__space_area', 0.00084930006),
  ('num_pipe__bathroom', 0.0019813702),
  ('num_pipe__z', 0.0021756971),
  ('num_pipe__x', 0.030812604),
  ('num_pipe__bed_per_bath', 0.0016389332),
  ('num_pipe__ID', 0.0020954357),
  ('num_pipe__y', 0.028387096),
  ('num_pipe__bed_per_park', 0.0018977473),
  ('num_pipe__allrooms', 0.028883876),
  ('num_pipe__privacy', 0.004111518),
  ('cat_pipe__loc_abia', 0.0020608378),
  ('cat_pipe__loc_adamawa', 0.005641747),
  ('cat_pipe__loc_akwa ibom', 0.0018180764),
  ('cat_pipe__loc_anambra', 0.024468262),
  ('cat_pipe__loc_bauchi', 0.001720732),
  ('cat_pipe__loc_bayelsa', 0.0),
  ('cat_pipe__loc_benue', 0.0013331126),
  ('cat_pipe__loc_borno', 0.0014226404),
  ('cat_pipe__loc_cross river', 0.0008946016),
  ('cat_pipe__loc_delta', 0.0048682974),
  ('cat_pipe__loc_ebonyi', 0.022416944),
  ('cat_pipe__loc_edo', 0.0058245515),
  ('cat_pipe__loc_ekiti', 0.0017138303),
  ('cat_pipe__loc_enugu', 0.007102661),
  ('cat_pipe__loc_gombe', 0.0012816362),
  ('cat_pipe__loc_imo', 0.0025925613),
  ('cat_pipe__loc_jigawa', 0.0019231027),
  ('cat_pipe__loc_kaduna', 0.0060560587),
  ('cat_pipe__loc_kano', 0.025255717),
  ('cat_pipe__loc_katsina', 0.009032063),
  ('cat_pipe__loc_kebbi', 0.0),
  ('cat_pipe__loc_kogi', 0.003573745),
  ('cat_pipe__loc_kwara', 0.014363877),
  ('cat_pipe__loc_lagos', 0.047000673),
  ('cat_pipe__loc_nasarawa', 0.011622808),
  ('cat_pipe__loc_niger', 0.00066343957),
  ('cat_pipe__loc_ogun', 0.0011680889),
  ('cat_pipe__loc_ondo', 0.0093935905),
  ('cat_pipe__loc_osun', 0.0014578883),
  ('cat_pipe__loc_oyo', 0.0022439603),
  ('cat_pipe__loc_plateau', 0.0011962275),
  ('cat_pipe__loc_rivers', 0.0024049634),
  ('cat_pipe__loc_sokoto', 0.012324219),
  ('cat_pipe__loc_taraba', 0.0025147994),
  ('cat_pipe__loc_yobe', 0.0009801777),
  ('cat_pipe__loc_zamfara', 0.014584765),
  ('cat_pipe__title_Apartment', 0.03922433),
  ('cat_pipe__title_Bungalow', 0.009075927),
  ('cat_pipe__title_Cottage', 0.005521773),
  ('cat_pipe__title_Detached duplex', 0.035667427),
  ('cat_pipe__title_Flat', 0.0),
  ('cat_pipe__title_Mansion', 0.057276484),
  ('cat_pipe__title_Penthouse', 0.0),
  ('cat_pipe__title_Semi-detached duplex', 0.0008675065),
  ('cat_pipe__title_Terrace duplex', 0.00023227978),
  ('cat_pipe__title_Townhouse', 0.0033762741),
  ('cat_pipe__zone_North Central', 0.005761145),
  ('cat_pipe__zone_North East', 0.02963099),
  ('cat_pipe__zone_North West', 0.03442689),
  ('cat_pipe__zone_South East', 0.025362281),
  ('cat_pipe__zone_South South', 0.21231754),
  ('cat_pipe__zone_South West', 0.13503486)])
In [ ]:
 
In [ ]:
 
In [189]:
from sklearn.linear_model import Ridge

ridge = Ridge(random_state=seed, max_iter=700)
ridge.fit(train_X, ytrain)
Out[189]:
Ridge(max_iter=700, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [190]:
c = ridge.predict(test_)
In [197]:
v = np.expm1(c)*0.5 + np.expm1(test_preds[0])*0.3 + np.expm1(test_preds[1]) * 0.2
v
Out[197]:
array([2267315.65458889, 1019008.82447468, 1246684.67757294, ...,
       1999802.57934304, 1332212.24953827, 3347915.34696873])
In [191]:
# array([2250188.09538975, 1021480.99448109, 1250830.58885351, ...,
#        2015230.89372306, 1346352.30086387, 3334390.27243935])
In [198]:
from sklearn.linear_model import HuberRegressor
In [199]:
g = GridSearchCV(HuberRegressor(),param_grid={
    'alpha': [0.003, 0.03, 0.3, 1, 10, 100],
    'max_iter': [300, 500, 700, 1000],
    'tol': [0.0001, 0.001, 1, 0.1, 0.0005]
}, cv=5, n_jobs=-1, scoring='neg_root_mean_squared_error', verbose=1)
g.fit(Xtrain, ytrain)
g.best_estimator_, g.best_score_, g.best_params_ #0.094
Fitting 5 folds for each of 120 candidates, totalling 600 fits
Out[199]:
(HuberRegressor(alpha=0.03, max_iter=300, tol=1),
 -0.10565428895042357,
 {'alpha': 0.03, 'max_iter': 300, 'tol': 1})
In [201]:
rg = HuberRegressor(**{'alpha': 0.3, 'max_iter': 300, 'tol': 1})
rg.fit(Xtrain, ytrain)

f = rg.predict(test_df)
f = np.expm1(f)
f
Out[201]:
array([2269590.07945439, 1036740.22958681, 1254395.06742561, ...,
       2048880.17029156, 1343889.57619463, 3349416.28753735])
In [203]:
k =f*0.6 + np.expm1(test_preds[0])*0.4
k
Out[203]:
array([2263725.22953444, 1021304.22061335, 1252109.36150216, ...,
       2017785.04627761, 1347911.97939606, 3356069.05570083])
In [206]:
f*0.4 + np.expm1(c)*0.25 + np.expm1(test_preds[0])*0.35
Out[206]:
array([2262524.03081264, 1021863.1512526 , 1251658.7260722 , ...,
       2020147.74380393, 1344223.24036186, 3343784.99466788])
In [ ]:
 
In [207]:
ID = pd.read_csv('Sample_submission.csv')['ID']

submission_cat = pd.DataFrame({
    'ID': ID,
    'price': np.expm1(test_preds[0])
})

submission_rcl = pd.DataFrame({
    'ID': ID,
    'price':v
})


submission_hc = pd.DataFrame({
    'ID': ID,
    'price': k
})

submission_hrc =  pd.DataFrame({
    'ID': ID,
    'price': f*0.4 + np.expm1(c)*0.25 + np.expm1(test_preds[0])*0.35
})
In [212]:
submission_cat.head(3)
Out[212]:
ID price
0 845 2.254928e+06
1 1924 9.981502e+05
2 10718 1.248681e+06
In [215]:
submission_rcl.head(3)
Out[215]:
ID price
0 845 2.267316e+06
1 1924 1.019009e+06
2 10718 1.246685e+06
In [216]:
submission_hc.head(3)
Out[216]:
ID price
0 845 2.263725e+06
1 1924 1.021304e+06
2 10718 1.252109e+06
In [217]:
submission_hrc.head(3)
Out[217]:
ID price
0 845 2.262524e+06
1 1924 1.021863e+06
2 10718 1.251659e+06
In [218]:
# submission.to_csv('meh.csv', index=False)
# submission_c.to_csv('combo.csv', index=False)
In [219]:
submission_cat.to_csv('CAT_submission.csv', index=False)
submission_rcl.to_csv('RCL_submission.csv', index=False)
submission_hc.to_csv('HC_submission.csv', index=False)
submission_hrc.to_csv('HRC_submission.csv', index=False)
In [ ]:
 
In [ ]:
 
In [220]:
#save model
In [223]:
import joblib
In [224]:
joblib.dump(cat, 'catt.joblib')
joblib.dump(lgbm, 'lgbmm.joblib')
joblib.dump(ridge, 'ridge.joblib')
joblib.dump(rg, 'huber.joblib')
Out[224]:
['huber.joblib']
In [ ]:
 
In [597]:
 
In [225]:
#save processed csv
Xtrain.to_csv('processed_xtrainx.csv', index=False)
Xtest.to_csv('processed_xtestx.csv', index=False)
test_df.to_csv('processed_testt.csv', index=False)
In [ ]:
 

In [ ]:
 

The gratification of being top 3% on the leadership can only last for so long - the big(ger) deal is staying in the top 3% after the shakedown, or maybe top place?