The aim of this competition is to predict the sale price of each property. The target variable is called price_doc in train.csv.

The training data is from August 2011 to June 2015, and the test set is from July 2015 to May 2016. The dataset also includes information about overall conditions in Russia's economy and finance sector, so you can focus on generating accurate price forecasts for individual properties, without needing to second-guess what the business cycle will do.

Data Files

train.csv, test.csv: information about individual transactions. The rows are indexed by the "id" field, which refers to individual transactions (particular properties might appear more than once, in separate transactions). These files also include supplementary information about the local area of each property. macro.csv: data on Russia's macroeconomy and financial sector (could be joined to the train and test sets on the "timestamp" column) sample_submission.csv: an example submission file in the correct format data_dictionary.txt: explanations of the fields available in the other data files


In [158]:
library(mlbench)
library(caret)
library(corrplot)

In [159]:
library(data.table)
  
train <- fread(
    "https://raw.githubusercontent.com/jsphyg/ml_practice_notebooks/master/SRHM/train.csv", 
    stringsAsFactors=F, 
    na.strings = c("NA","")
    )

test <- fread(
    "https://raw.githubusercontent.com/jsphyg/ml_practice_notebooks/master/SRHM/test.csv", 
    stringsAsFactors=F, 
    na.strings = c("NA","")
    )

macro <- fread(
    "https://raw.githubusercontent.com/jsphyg/ml_practice_notebooks/master/SRHM/macro.csv", 
    stringsAsFactors=F, 
    na.strings = c("NA","")
    )

In [160]:
# rename the dataset
dataset <- train

In [161]:
# create a list of 80% of the rows in the original dataset we can use for training
validation_index <- createDataPartition(dataset$price_doc, p=0.80, list=FALSE)
# select 20% of the data for validation
validation <- dataset[-validation_index,]
# use the remaining 80% of data to training and testing the models
dataset <- dataset[validation_index,]

In [162]:
dim(dataset)


  1. 24378
  2. 292

In [163]:
sapply(dataset, class)


id
'integer'
timestamp
'character'
full_sq
'integer'
life_sq
'integer'
floor
'integer'
max_floor
'integer'
material
'integer'
build_year
'integer'
num_room
'integer'
kitch_sq
'integer'
state
'integer'
product_type
'character'
sub_area
'character'
area_m
'numeric'
raion_popul
'integer'
green_zone_part
'numeric'
indust_part
'numeric'
children_preschool
'integer'
preschool_quota
'integer'
preschool_education_centers_raion
'integer'
children_school
'integer'
school_quota
'integer'
school_education_centers_raion
'integer'
school_education_centers_top_20_raion
'integer'
hospital_beds_raion
'integer'
healthcare_centers_raion
'integer'
university_top_20_raion
'integer'
sport_objects_raion
'integer'
additional_education_raion
'integer'
culture_objects_top_25
'character'
culture_objects_top_25_raion
'integer'
shopping_centers_raion
'integer'
office_raion
'integer'
thermal_power_plant_raion
'character'
incineration_raion
'character'
oil_chemistry_raion
'character'
radiation_raion
'character'
railroad_terminal_raion
'character'
big_market_raion
'character'
nuclear_reactor_raion
'character'
detention_facility_raion
'character'
full_all
'integer'
male_f
'integer'
female_f
'integer'
young_all
'integer'
young_male
'integer'
young_female
'integer'
work_all
'integer'
work_male
'integer'
work_female
'integer'
ekder_all
'integer'
ekder_male
'integer'
ekder_female
'integer'
0_6_all
'integer'
0_6_male
'integer'
0_6_female
'integer'
7_14_all
'integer'
7_14_male
'integer'
7_14_female
'integer'
0_17_all
'integer'
0_17_male
'integer'
0_17_female
'integer'
16_29_all
'integer'
16_29_male
'integer'
16_29_female
'integer'
0_13_all
'integer'
0_13_male
'integer'
0_13_female
'integer'
raion_build_count_with_material_info
'integer'
build_count_block
'integer'
build_count_wood
'integer'
build_count_frame
'integer'
build_count_brick
'integer'
build_count_monolith
'integer'
build_count_panel
'integer'
build_count_foam
'integer'
build_count_slag
'integer'
build_count_mix
'integer'
raion_build_count_with_builddate_info
'integer'
build_count_before_1920
'integer'
build_count_1921-1945
'integer'
build_count_1946-1970
'integer'
build_count_1971-1995
'integer'
build_count_after_1995
'integer'
ID_metro
'integer'
metro_min_avto
'numeric'
metro_km_avto
'numeric'
metro_min_walk
'numeric'
metro_km_walk
'numeric'
kindergarten_km
'numeric'
school_km
'numeric'
park_km
'numeric'
green_zone_km
'numeric'
industrial_km
'numeric'
water_treatment_km
'numeric'
cemetery_km
'numeric'
incineration_km
'numeric'
railroad_station_walk_km
'numeric'
railroad_station_walk_min
'numeric'
ID_railroad_station_walk
'integer'
railroad_station_avto_km
'numeric'
railroad_station_avto_min
'numeric'
ID_railroad_station_avto
'integer'
public_transport_station_km
'numeric'
public_transport_station_min_walk
'numeric'
water_km
'numeric'
water_1line
'character'
mkad_km
'numeric'
ttk_km
'numeric'
sadovoe_km
'numeric'
bulvar_ring_km
'numeric'
kremlin_km
'numeric'
big_road1_km
'numeric'
ID_big_road1
'integer'
big_road1_1line
'character'
big_road2_km
'numeric'
ID_big_road2
'integer'
railroad_km
'numeric'
railroad_1line
'character'
zd_vokzaly_avto_km
'numeric'
ID_railroad_terminal
'integer'
bus_terminal_avto_km
'numeric'
ID_bus_terminal
'integer'
oil_chemistry_km
'numeric'
nuclear_reactor_km
'numeric'
radiation_km
'numeric'
power_transmission_line_km
'numeric'
thermal_power_plant_km
'numeric'
ts_km
'numeric'
big_market_km
'numeric'
market_shop_km
'numeric'
fitness_km
'numeric'
swim_pool_km
'numeric'
ice_rink_km
'numeric'
stadium_km
'numeric'
basketball_km
'numeric'
hospice_morgue_km
'numeric'
detention_facility_km
'numeric'
public_healthcare_km
'numeric'
university_km
'numeric'
workplaces_km
'numeric'
shopping_centers_km
'numeric'
office_km
'numeric'
additional_education_km
'numeric'
preschool_km
'numeric'
big_church_km
'numeric'
church_synagogue_km
'numeric'
mosque_km
'numeric'
theater_km
'numeric'
museum_km
'numeric'
exhibition_km
'numeric'
catering_km
'numeric'
ecology
'character'
green_part_500
'numeric'
prom_part_500
'numeric'
office_count_500
'integer'
office_sqm_500
'integer'
trc_count_500
'integer'
trc_sqm_500
'integer'
cafe_count_500
'integer'
cafe_sum_500_min_price_avg
'numeric'
cafe_sum_500_max_price_avg
'numeric'
cafe_avg_price_500
'numeric'
cafe_count_500_na_price
'integer'
cafe_count_500_price_500
'integer'
cafe_count_500_price_1000
'integer'
cafe_count_500_price_1500
'integer'
cafe_count_500_price_2500
'integer'
cafe_count_500_price_4000
'integer'
cafe_count_500_price_high
'integer'
big_church_count_500
'integer'
church_count_500
'integer'
mosque_count_500
'integer'
leisure_count_500
'integer'
sport_count_500
'integer'
market_count_500
'integer'
green_part_1000
'numeric'
prom_part_1000
'numeric'
office_count_1000
'integer'
office_sqm_1000
'integer'
trc_count_1000
'integer'
trc_sqm_1000
'integer'
cafe_count_1000
'integer'
cafe_sum_1000_min_price_avg
'numeric'
cafe_sum_1000_max_price_avg
'numeric'
cafe_avg_price_1000
'numeric'
cafe_count_1000_na_price
'integer'
cafe_count_1000_price_500
'integer'
cafe_count_1000_price_1000
'integer'
cafe_count_1000_price_1500
'integer'
cafe_count_1000_price_2500
'integer'
cafe_count_1000_price_4000
'integer'
cafe_count_1000_price_high
'integer'
big_church_count_1000
'integer'
church_count_1000
'integer'
mosque_count_1000
'integer'
leisure_count_1000
'integer'
sport_count_1000
'integer'
market_count_1000
'integer'
green_part_1500
'numeric'
prom_part_1500
'numeric'
office_count_1500
'integer'
office_sqm_1500
'integer'
trc_count_1500
'integer'
trc_sqm_1500
'integer'
cafe_count_1500
'integer'
cafe_sum_1500_min_price_avg
'numeric'
cafe_sum_1500_max_price_avg
'numeric'
cafe_avg_price_1500
'numeric'
cafe_count_1500_na_price
'integer'
cafe_count_1500_price_500
'integer'
cafe_count_1500_price_1000
'integer'
cafe_count_1500_price_1500
'integer'
cafe_count_1500_price_2500
'integer'
cafe_count_1500_price_4000
'integer'
cafe_count_1500_price_high
'integer'
big_church_count_1500
'integer'
church_count_1500
'integer'
mosque_count_1500
'integer'
leisure_count_1500
'integer'
sport_count_1500
'integer'
market_count_1500
'integer'
green_part_2000
'numeric'
prom_part_2000
'numeric'
office_count_2000
'integer'
office_sqm_2000
'integer'
trc_count_2000
'integer'
trc_sqm_2000
'integer'
cafe_count_2000
'integer'
cafe_sum_2000_min_price_avg
'numeric'
cafe_sum_2000_max_price_avg
'numeric'
cafe_avg_price_2000
'numeric'
cafe_count_2000_na_price
'integer'
cafe_count_2000_price_500
'integer'
cafe_count_2000_price_1000
'integer'
cafe_count_2000_price_1500
'integer'
cafe_count_2000_price_2500
'integer'
cafe_count_2000_price_4000
'integer'
cafe_count_2000_price_high
'integer'
big_church_count_2000
'integer'
church_count_2000
'integer'
mosque_count_2000
'integer'
leisure_count_2000
'integer'
sport_count_2000
'integer'
market_count_2000
'integer'
green_part_3000
'numeric'
prom_part_3000
'numeric'
office_count_3000
'integer'
office_sqm_3000
'integer'
trc_count_3000
'integer'
trc_sqm_3000
'integer'
cafe_count_3000
'integer'
cafe_sum_3000_min_price_avg
'numeric'
cafe_sum_3000_max_price_avg
'numeric'
cafe_avg_price_3000
'numeric'
cafe_count_3000_na_price
'integer'
cafe_count_3000_price_500
'integer'
cafe_count_3000_price_1000
'integer'
cafe_count_3000_price_1500
'integer'
cafe_count_3000_price_2500
'integer'
cafe_count_3000_price_4000
'integer'
cafe_count_3000_price_high
'integer'
big_church_count_3000
'integer'
church_count_3000
'integer'
mosque_count_3000
'integer'
leisure_count_3000
'integer'
sport_count_3000
'integer'
market_count_3000
'integer'
green_part_5000
'numeric'
prom_part_5000
'numeric'
office_count_5000
'integer'
office_sqm_5000
'integer'
trc_count_5000
'integer'
trc_sqm_5000
'integer'
cafe_count_5000
'integer'
cafe_sum_5000_min_price_avg
'numeric'
cafe_sum_5000_max_price_avg
'numeric'
cafe_avg_price_5000
'numeric'
cafe_count_5000_na_price
'integer'
cafe_count_5000_price_500
'integer'
cafe_count_5000_price_1000
'integer'
cafe_count_5000_price_1500
'integer'
cafe_count_5000_price_2500
'integer'
cafe_count_5000_price_4000
'integer'
cafe_count_5000_price_high
'integer'
big_church_count_5000
'integer'
church_count_5000
'integer'
mosque_count_5000
'integer'
leisure_count_5000
'integer'
sport_count_5000
'integer'
market_count_5000
'integer'
price_doc
'integer'

In [164]:
# take a peek at the first 20 rows of the data
head(dataset, n=20)


idtimestampfull_sqlife_sqfloormax_floormaterialbuild_yearnum_roomkitch_sq...cafe_count_5000_price_2500cafe_count_5000_price_4000cafe_count_5000_price_highbig_church_count_5000church_count_5000mosque_count_5000leisure_count_5000sport_count_5000market_count_5000price_doc
1 2011-08-2043 27 4 NA NA NA NA NA ... 9 4 0 13 22 1 0 52 4 5850000
2 2011-08-2334 19 3 NA NA NA NA NA ... 15 3 0 15 29 1 10 66 14 6000000
3 2011-08-2743 29 2 NA NA NA NA NA ... 10 3 0 11 27 0 4 67 10 5700000
5 2011-09-0577 77 4 NA NA NA NA NA ... 319 108 17 135 236 2 91 195 14 16331452
7 2011-09-0825 14 10 NA NA NA NA NA ... 81 16 3 38 80 1 27 127 8 5500000
8 2011-09-0944 44 5 NA NA NA NA NA ... 9 4 0 11 18 1 0 47 4 2000000
9 2011-09-1042 27 5 NA NA NA NA NA ... 19 8 1 18 34 1 3 85 11 5300000
10 2011-09-1336 21 9 NA NA NA NA NA ... 19 13 0 10 20 1 3 67 1 2000000
11 2011-09-1636 19 12 NA NA NA NA NA ... 1 1 0 5 9 0 2 17 6 4650000
13 2011-09-1743 28 4 NA NA NA NA NA ... 13 9 1 7 15 0 2 47 0 5100000
14 2011-09-1931 31 4 NA NA NA NA NA ... 254 108 22 57 102 1 72 166 7 5200000
15 2011-09-1931 21 3 NA NA NA NA NA ... 88 19 2 63 100 0 28 132 14 5000000
16 2011-09-2051 31 15 NA NA NA NA NA ... 6 1 0 9 21 0 1 53 9 1850000
17 2011-09-2047 31 4 NA NA NA NA NA ... 10 2 0 7 23 0 4 62 13 6300000
18 2011-09-2042 28 2 NA NA NA NA NA ... 32 6 0 13 33 1 10 72 12 5900000
19 2011-09-2259 33 10 NA NA NA NA NA ... 1 1 0 6 9 0 2 17 6 7900000
20 2011-09-2244 29 4 NA NA NA NA NA ... 9 2 0 10 14 0 2 51 5 5200000
22 2011-09-2239 39 7 NA NA NA NA NA ... 18 3 0 12 14 0 1 64 9 5200000
23 2011-09-2348 34 6 NA NA NA NA NA ... 16 4 1 11 10 0 1 55 8 6250000
24 2011-09-2332 18 3 NA NA NA NA NA ... 10 1 0 7 21 1 1 42 13 5750000

In [165]:
# summarize attribute distributions
summary(dataset)


       id         timestamp            full_sq           life_sq      
 Min.   :    1   Length:24378       Min.   :   0.00   Min.   :  0.00  
 1st Qu.: 7688   Class :character   1st Qu.:  38.00   1st Qu.: 20.00  
 Median :15282   Mode  :character   Median :  49.00   Median : 30.00  
 Mean   :15260                      Mean   :  54.16   Mean   : 34.04  
 3rd Qu.:22858                      3rd Qu.:  63.00   3rd Qu.: 43.00  
 Max.   :30472                      Max.   :5326.00   Max.   :802.00  
                                                      NA's   :5091    
     floor          max_floor         material       build_year      
 Min.   : 0.000   Min.   :  0.00   Min.   :1.000   Min.   :       0  
 1st Qu.: 3.000   1st Qu.:  9.00   1st Qu.:1.000   1st Qu.:    1966  
 Median : 7.000   Median : 12.00   Median :1.000   Median :    1979  
 Mean   : 7.661   Mean   : 12.56   Mean   :1.826   Mean   :    3355  
 3rd Qu.:11.000   3rd Qu.: 17.00   3rd Qu.:2.000   3rd Qu.:    2005  
 Max.   :77.000   Max.   :117.00   Max.   :6.000   Max.   :20052009  
 NA's   :129      NA's   :7628     NA's   :7628    NA's   :10826     
    num_room         kitch_sq            state        product_type      
 Min.   : 0.000   Min.   :   0.000   Min.   : 1.000   Length:24378      
 1st Qu.: 1.000   1st Qu.:   1.000   1st Qu.: 1.000   Class :character  
 Median : 2.000   Median :   6.000   Median : 2.000   Mode  :character  
 Mean   : 1.904   Mean   :   6.338   Mean   : 2.109                     
 3rd Qu.: 2.000   3rd Qu.:   9.000   3rd Qu.: 3.000                     
 Max.   :17.000   Max.   :2014.000   Max.   :33.000                     
 NA's   :7628     NA's   :7628       NA's   :10839                      
   sub_area             area_m           raion_popul     green_zone_part   
 Length:24378       Min.   :  2081628   Min.   :  2546   Min.   :0.001879  
 Class :character   1st Qu.:  7307411   1st Qu.: 21819   1st Qu.:0.063755  
 Mode  :character   Median : 10416575   Median : 83844   Median :0.167526  
                    Mean   : 17608827   Mean   : 84216   Mean   :0.219060  
                    3rd Qu.: 18036437   3rd Qu.:122862   3rd Qu.:0.336177  
                    Max.   :206071809   Max.   :247469   Max.   :0.852923  
                                                                           
  indust_part      children_preschool preschool_quota
 Min.   :0.00000   Min.   :  175      Min.   :    0  
 1st Qu.:0.01951   1st Qu.: 1706      1st Qu.: 1874  
 Median :0.07216   Median : 4857      Median : 2868  
 Mean   :0.11905   Mean   : 5149      Mean   : 3273  
 3rd Qu.:0.19578   3rd Qu.: 7103      3rd Qu.: 4050  
 Max.   :0.52187   Max.   :19223      Max.   :11926  
                                      NA's   :5338   
 preschool_education_centers_raion children_school  school_quota  
 Min.   : 0.000                    Min.   :  168   Min.   : 1012  
 1st Qu.: 2.000                    1st Qu.: 1564   1st Qu.: 5782  
 Median : 4.000                    Median : 5261   Median : 7377  
 Mean   : 4.068                    Mean   : 5360   Mean   : 8328  
 3rd Qu.: 6.000                    3rd Qu.: 7227   3rd Qu.: 9891  
 Max.   :13.000                    Max.   :19083   Max.   :24750  
                                                   NA's   :5337   
 school_education_centers_raion school_education_centers_top_20_raion
 Min.   : 0.000                 Min.   :0.0000                       
 1st Qu.: 2.000                 1st Qu.:0.0000                       
 Median : 5.000                 Median :0.0000                       
 Mean   : 4.704                 Mean   :0.1083                       
 3rd Qu.: 7.000                 3rd Qu.:0.0000                       
 Max.   :14.000                 Max.   :2.0000                       
                                                                     
 hospital_beds_raion healthcare_centers_raion university_top_20_raion
 Min.   :  30        Min.   :0.000            Min.   :0.0000         
 1st Qu.: 520        1st Qu.:0.000            1st Qu.:0.0000         
 Median : 990        Median :1.000            Median :0.0000         
 Mean   :1193        Mean   :1.326            Mean   :0.1358         
 3rd Qu.:1786        3rd Qu.:2.000            3rd Qu.:0.0000         
 Max.   :4849        Max.   :6.000            Max.   :3.0000         
 NA's   :11524                                                       
 sport_objects_raion additional_education_raion culture_objects_top_25
 Min.   : 0.000      Min.   : 0.0               Length:24378          
 1st Qu.: 1.000      1st Qu.: 1.0               Class :character      
 Median : 5.000      Median : 2.0               Mode  :character      
 Mean   : 6.614      Mean   : 2.9                                     
 3rd Qu.:10.000      3rd Qu.: 4.0                                     
 Max.   :29.000      Max.   :16.0                                     
                                                                      
 culture_objects_top_25_raion shopping_centers_raion  office_raion    
 Min.   : 0.0000              Min.   : 0.000         Min.   :  0.000  
 1st Qu.: 0.0000              1st Qu.: 1.000         1st Qu.:  0.000  
 Median : 0.0000              Median : 3.000         Median :  2.000  
 Mean   : 0.2836              Mean   : 4.186         Mean   :  8.118  
 3rd Qu.: 0.0000              3rd Qu.: 6.000         3rd Qu.:  5.000  
 Max.   :10.0000              Max.   :23.000         Max.   :141.000  
                                                                      
 thermal_power_plant_raion incineration_raion oil_chemistry_raion
 Length:24378              Length:24378       Length:24378       
 Class :character          Class :character   Class :character   
 Mode  :character          Mode  :character   Mode  :character   
                                                                 
                                                                 
                                                                 
                                                                 
 radiation_raion    railroad_terminal_raion big_market_raion  
 Length:24378       Length:24378            Length:24378      
 Class :character   Class :character        Class :character  
 Mode  :character   Mode  :character        Mode  :character  
                                                              
                                                              
                                                              
                                                              
 nuclear_reactor_raion detention_facility_raion    full_all      
 Length:24378          Length:24378             Min.   :   2693  
 Class :character      Class :character         1st Qu.:  31167  
 Mode  :character      Mode  :character         Median :  85083  
                                                Mean   : 146807  
                                                3rd Qu.: 125111  
                                                Max.   :1716730  
                                                                 
     male_f          female_f        young_all       young_male   
 Min.   :  1264   Min.   :  1430   Min.   :  365   Min.   :  189  
 1st Qu.: 14906   1st Qu.: 15167   1st Qu.: 3459   1st Qu.: 1782  
 Median : 39227   Median : 45410   Median :10988   Median : 5470  
 Mean   : 67440   Mean   : 79367   Mean   :11194   Mean   : 5732  
 3rd Qu.: 58226   3rd Qu.: 67872   3rd Qu.:14906   3rd Qu.: 7597  
 Max.   :774585   Max.   :942145   Max.   :40692   Max.   :20977  
                                                                  
  young_female      work_all        work_male      work_female   
 Min.   :  177   Min.   :  1633   Min.   :  863   Min.   :  771  
 1st Qu.: 1677   1st Qu.: 13996   1st Qu.: 7394   1st Qu.: 6661  
 Median : 5333   Median : 52450   Median :26382   Median :26096  
 Mean   : 5462   Mean   : 53766   Mean   :27305   Mean   :26461  
 3rd Qu.: 7617   3rd Qu.: 77612   3rd Qu.:38841   3rd Qu.:37942  
 Max.   :19715   Max.   :161290   Max.   :79622   Max.   :81668  
                                                                 
   ekder_all       ekder_male     ekder_female      0_6_all         0_6_male   
 Min.   :  548   Min.   :  156   Min.   :  393   Min.   :  175   Min.   :  91  
 1st Qu.: 4695   1st Qu.: 1331   1st Qu.: 3365   1st Qu.: 1706   1st Qu.: 862  
 Median :20184   Median : 6180   Median :13540   Median : 4857   Median :2435  
 Mean   :19256   Mean   : 5826   Mean   :13430   Mean   : 5149   Mean   :2636  
 3rd Qu.:29172   3rd Qu.: 8775   3rd Qu.:20165   3rd Qu.: 7103   3rd Qu.:3589  
 Max.   :57086   Max.   :19275   Max.   :37811   Max.   :19223   Max.   :9987  
                                                                               
   0_6_female      7_14_all       7_14_male     7_14_female      0_17_all    
 Min.   :  85   Min.   :  168   Min.   :  87   Min.   :  82   Min.   :  411  
 1st Qu.: 844   1st Qu.: 1564   1st Qu.: 821   1st Qu.: 743   1st Qu.: 3831  
 Median :2390   Median : 5261   Median :2693   Median :2535   Median :12508  
 Mean   :2513   Mean   : 5360   Mean   :2747   Mean   :2613   Mean   :12558  
 3rd Qu.:3455   3rd Qu.: 7227   3rd Qu.:3585   3rd Qu.:3534   3rd Qu.:16727  
 Max.   :9236   Max.   :19083   Max.   :9761   Max.   :9322   Max.   :45170  
                                                                             
   0_17_male      0_17_female      16_29_all        16_29_male    
 Min.   :  214   Min.   :  198   Min.   :   575   Min.   :   308  
 1st Qu.: 1973   1st Qu.: 1858   1st Qu.:  5829   1st Qu.:  2955  
 Median : 6085   Median : 6185   Median : 17864   Median :  8896  
 Mean   : 6433   Mean   : 6125   Mean   : 31423   Mean   : 15422  
 3rd Qu.: 8599   3rd Qu.: 8549   3rd Qu.: 27107   3rd Qu.: 13683  
 Max.   :23233   Max.   :21937   Max.   :367659   Max.   :172958  
                                                                  
  16_29_female       0_13_all       0_13_male      0_13_female   
 Min.   :   267   Min.   :  322   Min.   :  166   Min.   :  156  
 1st Qu.:  2874   1st Qu.: 3112   1st Qu.: 1600   1st Qu.: 1512  
 Median :  9353   Median : 9633   Median : 4835   Median : 4667  
 Mean   : 16001   Mean   : 9855   Mean   : 5045   Mean   : 4810  
 3rd Qu.: 14145   3rd Qu.:13121   3rd Qu.: 6684   3rd Qu.: 6699  
 Max.   :194701   Max.   :36035   Max.   :18574   Max.   :17461  
                                                                 
 raion_build_count_with_material_info build_count_block build_count_wood
 Min.   :   1.0                       Min.   :  0.00    Min.   :  0.00  
 1st Qu.: 180.0                       1st Qu.: 13.00    1st Qu.:  0.00  
 Median : 273.0                       Median : 42.00    Median :  0.00  
 Mean   : 328.3                       Mean   : 50.35    Mean   : 41.04  
 3rd Qu.: 400.0                       3rd Qu.: 72.00    3rd Qu.:  7.00  
 Max.   :1681.0                       Max.   :223.00    Max.   :793.00  
 NA's   :3968                         NA's   :3968      NA's   :3968    
 build_count_frame build_count_brick build_count_monolith build_count_panel
 Min.   : 0.000    Min.   :  0.0     Min.   :  0          Min.   :  0.0    
 1st Qu.: 0.000    1st Qu.: 10.0     1st Qu.:  2          1st Qu.: 35.0    
 Median : 0.000    Median : 67.0     Median :  6          Median : 92.0    
 Mean   : 4.993    Mean   :107.2     Mean   : 12          Mean   :107.5    
 3rd Qu.: 1.000    3rd Qu.:156.0     3rd Qu.: 13          3rd Qu.:157.0    
 Max.   :97.000    Max.   :664.0     Max.   :127          Max.   :431.0    
 NA's   :3968      NA's   :3968      NA's   :3968         NA's   :3968     
 build_count_foam build_count_slag build_count_mix
 Min.   : 0.000   Min.   : 0.000   Min.   :0.000  
 1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:0.000  
 Median : 0.000   Median : 0.000   Median :0.000  
 Mean   : 0.164   Mean   : 4.486   Mean   :0.573  
 3rd Qu.: 0.000   3rd Qu.: 2.000   3rd Qu.:0.000  
 Max.   :11.000   Max.   :84.000   Max.   :9.000  
 NA's   :3968     NA's   :3968     NA's   :3968   
 raion_build_count_with_builddate_info build_count_before_1920
 Min.   :   1.0                        Min.   :  0.00         
 1st Qu.: 178.0                        1st Qu.:  0.00         
 Median : 271.0                        Median :  0.00         
 Mean   : 327.9                        Mean   : 18.41         
 3rd Qu.: 400.0                        3rd Qu.:  3.00         
 Max.   :1680.0                        Max.   :371.00         
 NA's   :3968                          NA's   :3968           
 build_count_1921-1945 build_count_1946-1970 build_count_1971-1995
 Min.   :  0.00        Min.   :  0.0         Min.   :  0.00       
 1st Qu.:  0.00        1st Qu.: 14.0         1st Qu.: 38.00       
 Median :  2.00        Median :135.0         Median : 71.00       
 Mean   : 26.64        Mean   :141.5         Mean   : 80.15       
 3rd Qu.: 20.00        3rd Qu.:216.0         3rd Qu.:125.00       
 Max.   :382.00        Max.   :845.0         Max.   :246.00       
 NA's   :3968          NA's   :3968          NA's   :3968         
 build_count_after_1995    ID_metro      metro_min_avto   metro_km_avto   
 Min.   :  0.00         Min.   :  1.00   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 14.00         1st Qu.: 27.00   1st Qu.: 1.725   1st Qu.: 1.037  
 Median : 24.00         Median : 53.00   Median : 2.805   Median : 1.790  
 Mean   : 61.18         Mean   : 72.32   Mean   : 4.921   Mean   : 3.662  
 3rd Qu.: 57.00         3rd Qu.:108.00   3rd Qu.: 4.789   3rd Qu.: 3.777  
 Max.   :799.00         Max.   :223.00   Max.   :61.438   Max.   :74.906  
 NA's   :3968                                                             
 metro_min_walk   metro_km_walk     kindergarten_km      school_km      
 Min.   :  0.00   Min.   : 0.0000   Min.   : 0.00047   Min.   : 0.0000  
 1st Qu.: 11.54   1st Qu.: 0.9619   1st Qu.: 0.20008   1st Qu.: 0.2697  
 Median : 20.53   Median : 1.7110   Median : 0.35294   Median : 0.4769  
 Mean   : 42.30   Mean   : 3.5251   Mean   : 0.96988   Mean   : 1.3005  
 3rd Qu.: 45.32   3rd Qu.: 3.7768   3rd Qu.: 0.96683   3rd Qu.: 0.8899  
 Max.   :687.32   Max.   :57.2764   Max.   :25.50644   Max.   :47.3947  
 NA's   :21       NA's   :21                                            
    park_km         green_zone_km    industrial_km     water_treatment_km
 Min.   : 0.00374   Min.   :0.0000   Min.   : 0.0000   Min.   : 0.2741   
 1st Qu.: 0.97378   1st Qu.:0.1014   1st Qu.: 0.2883   1st Qu.: 5.2994   
 Median : 1.79991   Median :0.2143   Median : 0.5766   Median :10.3780   
 Mean   : 3.07460   Mean   :0.2997   Mean   : 0.7668   Mean   :11.1668   
 3rd Qu.: 3.39175   3rd Qu.:0.4135   3rd Qu.: 1.0402   3rd Qu.:16.8053   
 Max.   :47.35154   Max.   :1.9824   Max.   :14.0482   Max.   :47.5912   
                                                                         
  cemetery_km     incineration_km   railroad_station_walk_km
 Min.   : 0.000   Min.   : 0.1981   Min.   : 0.02815        
 1st Qu.: 1.336   1st Qu.: 6.2061   1st Qu.: 1.93177        
 Median : 1.968   Median :10.3175   Median : 3.23554        
 Mean   : 2.314   Mean   :10.8460   Mean   : 4.37332        
 3rd Qu.: 3.090   3rd Qu.:13.3851   3rd Qu.: 5.14764        
 Max.   :13.846   Max.   :58.6320   Max.   :24.65304        
                                    NA's   :21              
 railroad_station_walk_min ID_railroad_station_walk railroad_station_avto_km
 Min.   :  0.3378          Min.   :  1.00           Min.   : 0.02815        
 1st Qu.: 23.1812          1st Qu.: 18.00           1st Qu.: 2.11338        
 Median : 38.8265          Median : 33.00           Median : 3.43212        
 Mean   : 52.4799          Mean   : 38.71           Mean   : 4.57249        
 3rd Qu.: 61.7717          3rd Qu.: 52.00           3rd Qu.: 5.38987        
 Max.   :295.8365          Max.   :133.00           Max.   :24.65398        
 NA's   :21                NA's   :21                                       
 railroad_station_avto_min ID_railroad_station_avto public_transport_station_km
 Min.   : 0.03519          Min.   :  1.00           Min.   : 0.003733          
 1st Qu.: 3.23769          1st Qu.: 19.00           1st Qu.: 0.101156          
 Median : 4.94456          Median : 34.00           Median : 0.160421          
 Mean   : 6.06944          Mean   : 45.53           Mean   : 0.407560          
 3rd Qu.: 7.29978          3rd Qu.: 73.00           3rd Qu.: 0.277879          
 Max.   :38.69192          Max.   :138.00           Max.   :17.413002          
                                                                               
 public_transport_station_min_walk    water_km        water_1line       
 Min.   :  0.0448                  Min.   :0.006707   Length:24378      
 1st Qu.:  1.2139                  1st Qu.:0.339637   Class :character  
 Median :  1.9250                  Median :0.619856   Mode  :character  
 Mean   :  4.8907                  Mean   :0.690766                     
 3rd Qu.:  3.3346                  3rd Qu.:0.967451                     
 Max.   :208.9560                  Max.   :2.743788                     
                                                                        
    mkad_km             ttk_km           sadovoe_km       bulvar_ring_km    
 Min.   : 0.01363   Min.   : 0.00193   Min.   : 0.00036   Min.   : 0.00195  
 1st Qu.: 2.63051   1st Qu.: 5.35504   1st Qu.: 8.37280   1st Qu.: 9.28184  
 Median : 5.44795   Median : 9.83387   Median :12.74954   Median :13.61322  
 Mean   : 6.23268   Mean   :11.28974   Mean   :14.03432   Mean   :15.00061  
 3rd Qu.: 8.18475   3rd Qu.:15.67545   3rd Qu.:18.62930   3rd Qu.:19.87318  
 Max.   :53.27783   Max.   :66.03320   Max.   :68.85305   Max.   :69.98487  
                                                                            
   kremlin_km       big_road1_km       ID_big_road1   big_road1_1line   
 Min.   : 0.0729   Min.   :0.000364   Min.   : 1.00   Length:24378      
 1st Qu.:10.4753   1st Qu.:0.785019   1st Qu.: 2.00   Class :character  
 Median :14.8772   Median :1.728433   Median :10.00   Mode  :character  
 Mean   :16.0232   Mean   :1.884760   Mean   :11.45                     
 3rd Qu.:20.6482   3rd Qu.:2.806477   3rd Qu.:14.00                     
 Max.   :70.7388   Max.   :6.995416   Max.   :48.00                     
                                                                        
  big_road2_km        ID_big_road2    railroad_km        railroad_1line    
 Min.   : 0.001935   Min.   : 1.00   Min.   : 0.002299   Length:24378      
 1st Qu.: 2.107386   1st Qu.: 4.00   1st Qu.: 0.652358   Class :character  
 Median : 3.210544   Median :21.00   Median : 1.238357   Mode  :character  
 Mean   : 3.389901   Mean   :22.35   Mean   : 1.879070                     
 3rd Qu.: 4.306233   3rd Qu.:38.00   3rd Qu.: 2.519546                     
 Max.   :13.798346   Max.   :58.00   Max.   :16.656237                     
                                                                           
 zd_vokzaly_avto_km ID_railroad_terminal bus_terminal_avto_km ID_bus_terminal 
 Min.   : 0.1367    Min.   :  5.00       Min.   : 0.06203     Min.   : 1.000  
 1st Qu.:10.0111    1st Qu.: 32.00       1st Qu.: 5.21388     1st Qu.: 3.000  
 Median :14.7628    Median : 50.00       Median : 7.45701     Median : 8.000  
 Mean   :17.1938    Mean   : 51.67       Mean   : 9.96392     Mean   : 6.703  
 3rd Qu.:24.0612    3rd Qu.: 83.00       3rd Qu.:13.21233     3rd Qu.: 9.000  
 Max.   :91.2151    Max.   :121.00       Max.   :74.79611     Max.   :14.000  
                                                                              
 oil_chemistry_km  nuclear_reactor_km  radiation_km     
 Min.   : 0.5107   Min.   : 0.3098    Min.   : 0.00546  
 1st Qu.: 8.7126   1st Qu.: 5.2528    1st Qu.: 1.22756  
 Median :16.6881   Median : 8.9960    Median : 2.43392  
 Mean   :17.3670   Mean   :10.9190    Mean   : 4.37800  
 3rd Qu.:23.4245   3rd Qu.:16.3725    3rd Qu.: 4.68705  
 Max.   :70.4134   Max.   :64.2570    Max.   :53.89016  
                                                        
 power_transmission_line_km thermal_power_plant_km     ts_km       
 Min.   : 0.03027           Min.   : 0.4006        Min.   : 0.000  
 1st Qu.: 0.97315           1st Qu.: 3.7771        1st Qu.: 2.046  
 Median : 1.88587           Median : 5.8999        Median : 3.954  
 Mean   : 3.46226           Mean   : 7.3138        Mean   : 4.896  
 3rd Qu.: 4.92655           3rd Qu.: 9.7932        3rd Qu.: 5.515  
 Max.   :43.32437           Max.   :56.8561        Max.   :54.081  
                                                                   
 big_market_km     market_shop_km       fitness_km       swim_pool_km   
 Min.   : 0.7056   Min.   : 0.02157   Min.   : 0.0000   Min.   : 0.000  
 1st Qu.: 7.5296   1st Qu.: 1.53980   1st Qu.: 0.3640   1st Qu.: 1.721  
 Median :11.9104   Median : 2.93128   Median : 0.6595   Median : 2.877  
 Mean   :13.2595   Mean   : 3.94502   Mean   : 1.1491   Mean   : 4.198  
 3rd Qu.:16.5513   3rd Qu.: 5.46021   3rd Qu.: 1.3428   3rd Qu.: 5.370  
 Max.   :59.5016   Max.   :41.10365   Max.   :26.6525   Max.   :53.359  
                                                                        
  ice_rink_km       stadium_km      basketball_km      hospice_morgue_km 
 Min.   : 0.000   Min.   : 0.1148   Min.   : 0.00546   Min.   : 0.00252  
 1st Qu.: 3.044   1st Qu.: 4.0182   1st Qu.: 1.30833   1st Qu.: 1.12070  
 Median : 5.547   Median : 6.9541   Median : 2.89067   Median : 1.89079  
 Mean   : 6.101   Mean   : 9.3920   Mean   : 4.76008   Mean   : 2.63201  
 3rd Qu.: 7.943   3rd Qu.:13.5516   3rd Qu.: 6.36452   3rd Qu.: 3.29394  
 Max.   :38.765   Max.   :83.3985   Max.   :56.70379   Max.   :43.69464  
                                                                         
 detention_facility_km public_healthcare_km university_km      workplaces_km   
 Min.   : 0.07958      Min.   : 0.00266     Min.   : 0.00093   Min.   : 0.000  
 1st Qu.: 5.66383      1st Qu.: 1.28021     1st Qu.: 2.20422   1st Qu.: 1.017  
 Median :11.30765      Median : 2.34085     Median : 4.32835   Median : 2.044  
 Mean   :14.51069      Mean   : 3.32347     Mean   : 6.82586   Mean   : 3.905  
 3rd Qu.:24.73506      3rd Qu.: 3.98390     3rd Qu.: 9.38027   3rd Qu.: 5.420  
 Max.   :89.37137      Max.   :76.05514     Max.   :84.86215   Max.   :55.278  
                                                                               
 shopping_centers_km   office_km       additional_education_km
 Min.   : 0.0000     Min.   : 0.0000   Min.   : 0.0000        
 1st Qu.: 0.4893     1st Qu.: 0.5615   1st Qu.: 0.4742        
 Median : 0.8428     Median : 1.0530   Median : 0.9022        
 Mean   : 1.4945     Mean   : 2.0014   Mean   : 1.3234        
 3rd Qu.: 1.5618     3rd Qu.: 3.0165   3rd Qu.: 1.5716        
 Max.   :26.2595     Max.   :18.9589   Max.   :24.2682        
                                                              
  preschool_km     big_church_km      church_synagogue_km   mosque_km       
 Min.   : 0.0000   Min.   : 0.00407   Min.   : 0.0000     Min.   : 0.00554  
 1st Qu.: 0.2854   1st Qu.: 0.86276   1st Qu.: 0.5327     1st Qu.: 3.76607  
 Median : 0.4944   Median : 1.49079   Median : 0.8618     Median : 6.52078  
 Mean   : 1.3212   Mean   : 2.30683   Mean   : 0.9731     Mean   : 7.72111  
 3rd Qu.: 0.9363   3rd Qu.: 2.90870   3rd Qu.: 1.2500     3rd Qu.:10.04295  
 Max.   :47.3947   Max.   :45.66906   Max.   :15.6157     Max.   :44.84983  
                                                                            
   theater_km         museum_km       exhibition_km       catering_km       
 Min.   : 0.02679   Min.   : 0.0079   Min.   : 0.00895   Min.   : 0.000357  
 1st Qu.: 4.22525   1st Qu.: 2.8828   1st Qu.: 2.24065   1st Qu.: 0.209446  
 Median : 8.61201   Median : 5.6433   Median : 4.10261   Median : 0.413909  
 Mean   : 9.60979   Mean   : 7.0418   Mean   : 5.51957   Mean   : 0.686772  
 3rd Qu.:13.45959   3rd Qu.:10.3286   3rd Qu.: 6.95087   3rd Qu.: 0.836744  
 Max.   :87.60069   Max.   :59.2032   Max.   :54.43124   Max.   :10.671808  
                                                                            
   ecology          green_part_500   prom_part_500    office_count_500 
 Length:24378       Min.   :  0.00   Min.   : 0.000   Min.   : 0.0000  
 Class :character   1st Qu.:  1.48   1st Qu.: 0.000   1st Qu.: 0.0000  
 Mode  :character   Median :  8.45   Median : 0.000   Median : 0.0000  
                    Mean   : 13.42   Mean   : 5.742   Mean   : 0.7246  
                    3rd Qu.: 19.92   3rd Qu.: 5.760   3rd Qu.: 0.0000  
                    Max.   :100.00   Max.   :98.770   Max.   :34.0000  
                                                                       
 office_sqm_500   trc_count_500    trc_sqm_500      cafe_count_500   
 Min.   :     0   Min.   :0.000   Min.   :      0   Min.   :  0.000  
 1st Qu.:     0   1st Qu.:0.000   1st Qu.:      0   1st Qu.:  0.000  
 Median :     0   Median :0.000   Median :      0   Median :  1.000  
 Mean   : 13732   Mean   :0.553   Mean   :  21692   Mean   :  3.826  
 3rd Qu.:     0   3rd Qu.:1.000   3rd Qu.:      0   3rd Qu.:  3.000  
 Max.   :611015   Max.   :8.000   Max.   :1500000   Max.   :120.000  
                                                                     
 cafe_sum_500_min_price_avg cafe_sum_500_max_price_avg cafe_avg_price_500
 Min.   : 300.0             Min.   : 500               Min.   : 400.0    
 1st Qu.: 500.0             1st Qu.:1000               1st Qu.: 750.0    
 Median : 666.7             Median :1154               Median : 916.7    
 Mean   : 741.2             Mean   :1248               Mean   : 994.5    
 3rd Qu.: 954.8             3rd Qu.:1500               3rd Qu.:1250.0    
 Max.   :4000.0             Max.   :6000               Max.   :5000.0    
 NA's   :10643              NA's   :10643              NA's   :10643     
 cafe_count_500_na_price cafe_count_500_price_500 cafe_count_500_price_1000
 Min.   : 0.0000         Min.   : 0.0000          Min.   : 0.0000          
 1st Qu.: 0.0000         1st Qu.: 0.0000          1st Qu.: 0.0000          
 Median : 0.0000         Median : 0.0000          Median : 0.0000          
 Mean   : 0.3404         Mean   : 0.9804          Mean   : 0.9732          
 3rd Qu.: 0.0000         3rd Qu.: 1.0000          3rd Qu.: 1.0000          
 Max.   :13.0000         Max.   :33.0000          Max.   :37.0000          
                                                                           
 cafe_count_500_price_1500 cafe_count_500_price_2500 cafe_count_500_price_4000
 Min.   : 0.0000           Min.   : 0.0000           Min.   : 0.0000          
 1st Qu.: 0.0000           1st Qu.: 0.0000           1st Qu.: 0.0000          
 Median : 0.0000           Median : 0.0000           Median : 0.0000          
 Mean   : 0.8291           Mean   : 0.5372           Mean   : 0.1364          
 3rd Qu.: 1.0000           3rd Qu.: 0.0000           3rd Qu.: 0.0000          
 Max.   :29.0000           Max.   :22.0000           Max.   :14.0000          
                                                                              
 cafe_count_500_price_high big_church_count_500 church_count_500 
 Min.   :0.00000           Min.   : 0.0000      Min.   : 0.0000  
 1st Qu.:0.00000           1st Qu.: 0.0000      1st Qu.: 0.0000  
 Median :0.00000           Median : 0.0000      Median : 0.0000  
 Mean   :0.02917           Mean   : 0.2829      Mean   : 0.5788  
 3rd Qu.:0.00000           3rd Qu.: 0.0000      3rd Qu.: 0.0000  
 Max.   :3.00000           Max.   :11.0000      Max.   :17.0000  
                                                                 
 mosque_count_500   leisure_count_500 sport_count_500   market_count_500
 Min.   :0.000000   Min.   :0.00000   Min.   : 0.0000   Min.   :0.0000  
 1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.: 0.0000   1st Qu.:0.0000  
 Median :0.000000   Median :0.00000   Median : 0.0000   Median :0.0000  
 Mean   :0.004882   Mean   :0.06957   Mean   : 0.9065   Mean   :0.1231  
 3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.: 1.0000   3rd Qu.:0.0000  
 Max.   :1.000000   Max.   :9.00000   Max.   :11.0000   Max.   :4.0000  
                                                                        
 green_part_1000  prom_part_1000   office_count_1000 office_sqm_1000  
 Min.   :  0.00   Min.   : 0.000   Min.   : 0.000    Min.   :      0  
 1st Qu.:  6.31   1st Qu.: 0.000   1st Qu.: 0.000    1st Qu.:      0  
 Median : 13.18   Median : 4.010   Median : 0.000    Median :      0  
 Mean   : 17.00   Mean   : 8.802   Mean   : 3.032    Mean   :  61575  
 3rd Qu.: 24.36   3rd Qu.:12.620   3rd Qu.: 2.000    3rd Qu.:  54500  
 Max.   :100.00   Max.   :72.200   Max.   :91.000    Max.   :2244723  
                                                                      
 trc_count_1000    trc_sqm_1000     cafe_count_1000 
 Min.   : 0.000   Min.   :      0   Min.   :  0.00  
 1st Qu.: 0.000   1st Qu.:      0   1st Qu.:  1.00  
 Median : 1.000   Median :   7670   Median :  4.00  
 Mean   : 1.963   Mean   :  65545   Mean   : 15.21  
 3rd Qu.: 3.000   3rd Qu.:  65978   3rd Qu.: 11.00  
 Max.   :20.000   Max.   :1500000   Max.   :449.00  
                                                    
 cafe_sum_1000_min_price_avg cafe_sum_1000_max_price_avg cafe_avg_price_1000
 Min.   : 300.0              Min.   : 500                Min.   : 400.0     
 1st Qu.: 542.9              1st Qu.:1000                1st Qu.: 750.0     
 Median : 666.7              Median :1143                Median : 912.5     
 Mean   : 709.8              Mean   :1205                Mean   : 957.6     
 3rd Qu.: 833.8              3rd Qu.:1391                3rd Qu.:1115.0     
 Max.   :2500.0              Max.   :4000                Max.   :3250.0     
 NA's   :5207                NA's   :5207                NA's   :5207       
 cafe_count_1000_na_price cafe_count_1000_price_500 cafe_count_1000_price_1000
 Min.   : 0.000           Min.   :  0.000           Min.   :  0.0             
 1st Qu.: 0.000           1st Qu.:  0.000           1st Qu.:  0.0             
 Median : 0.000           Median :  1.000           Median :  1.0             
 Mean   : 1.009           Mean   :  4.089           Mean   :  3.9             
 3rd Qu.: 1.000           3rd Qu.:  3.000           3rd Qu.:  4.0             
 Max.   :28.000           Max.   :112.000           Max.   :107.0             
                                                                              
 cafe_count_1000_price_1500 cafe_count_1000_price_2500
 Min.   :  0.000            Min.   : 0.000            
 1st Qu.:  0.000            1st Qu.: 0.000            
 Median :  1.000            Median : 0.000            
 Mean   :  3.477            Mean   : 1.922            
 3rd Qu.:  3.000            3rd Qu.: 1.000            
 Max.   :104.000            Max.   :79.000            
                                                      
 cafe_count_1000_price_4000 cafe_count_1000_price_high big_church_count_1000
 Min.   : 0.0000            Min.   :0.00000            Min.   : 0.0000      
 1st Qu.: 0.0000            1st Qu.:0.00000            1st Qu.: 0.0000      
 Median : 0.0000            Median :0.00000            Median : 0.0000      
 Mean   : 0.7574            Mean   :0.05854            Mean   : 0.8034      
 3rd Qu.: 0.0000            3rd Qu.:0.00000            3rd Qu.: 1.0000      
 Max.   :40.0000            Max.   :7.00000            Max.   :27.0000      
                                                                            
 church_count_1000 mosque_count_1000 leisure_count_1000 sport_count_1000
 Min.   : 0.000    Min.   :0.00000   Min.   : 0.0000    Min.   : 0.000  
 1st Qu.: 0.000    1st Qu.:0.00000   1st Qu.: 0.0000    1st Qu.: 0.000  
 Median : 1.000    Median :0.00000   Median : 0.0000    Median : 2.000  
 Mean   : 1.806    Mean   :0.01887   Mean   : 0.4614    Mean   : 2.897  
 3rd Qu.: 1.000    3rd Qu.:0.00000   3rd Qu.: 0.0000    3rd Qu.: 4.000  
 Max.   :38.000    Max.   :1.00000   Max.   :30.0000    Max.   :25.000  
                                                                        
 market_count_1000 green_part_1500 prom_part_1500  office_count_1500
 Min.   :0.000     Min.   : 0.00   Min.   : 0.00   Min.   :  0.000  
 1st Qu.:0.000     1st Qu.: 8.53   1st Qu.: 1.52   1st Qu.:  0.000  
 Median :0.000     Median :15.03   Median : 7.81   Median :  1.000  
 Mean   :0.382     Mean   :19.23   Mean   :10.61   Mean   :  7.192  
 3rd Qu.:1.000     3rd Qu.:26.80   3rd Qu.:15.34   3rd Qu.:  4.000  
 Max.   :6.000     Max.   :90.41   Max.   :63.00   Max.   :173.000  
                                                                    
 office_sqm_1500   trc_count_1500  trc_sqm_1500     cafe_count_1500 
 Min.   :      0   Min.   : 0.0   Min.   :      0   Min.   :  0.00  
 1st Qu.:      0   1st Qu.: 0.0   1st Qu.:      0   1st Qu.:  2.00  
 Median :  16765   Median : 3.0   Median :  49410   Median : 10.00  
 Mean   : 139395   Mean   : 3.7   Mean   : 127239   Mean   : 32.01  
 3rd Qu.: 117300   3rd Qu.: 5.0   3rd Qu.: 153965   3rd Qu.: 23.00  
 Max.   :2908344   Max.   :27.0   Max.   :1533000   Max.   :784.00  
                                                                    
 cafe_sum_1500_min_price_avg cafe_sum_1500_max_price_avg cafe_avg_price_1500
 Min.   : 300.0              Min.   : 500                Min.   : 400.0     
 1st Qu.: 585.7              1st Qu.:1000                1st Qu.: 794.4     
 Median : 690.9              Median :1167                Median : 925.0     
 Mean   : 713.2              Mean   :1205                Mean   : 959.1     
 3rd Qu.: 820.0              3rd Qu.:1366                3rd Qu.:1092.5     
 Max.   :2500.0              Max.   :4000                Max.   :3250.0     
 NA's   :3365                NA's   :3365                NA's   :3365       
 cafe_count_1500_na_price cafe_count_1500_price_500 cafe_count_1500_price_1000
 Min.   : 0.000           Min.   :  0.00            Min.   :  0.000           
 1st Qu.: 0.000           1st Qu.:  0.00            1st Qu.:  1.000           
 Median : 1.000           Median :  2.00            Median :  3.000           
 Mean   : 2.082           Mean   :  8.11            Mean   :  8.662           
 3rd Qu.: 2.000           3rd Qu.:  6.00            3rd Qu.:  8.000           
 Max.   :54.000           Max.   :195.00            Max.   :177.000           
                                                                              
 cafe_count_1500_price_1500 cafe_count_1500_price_2500
 Min.   :  0.000            Min.   :  0.000           
 1st Qu.:  0.000            1st Qu.:  0.000           
 Median :  2.000            Median :  1.000           
 Mean   :  7.772            Mean   :  3.768           
 3rd Qu.:  6.000            3rd Qu.:  2.000           
 Max.   :183.000            Max.   :127.000           
                                                      
 cafe_count_1500_price_4000 cafe_count_1500_price_high big_church_count_1500
 Min.   : 0.000             Min.   : 0.0000            Min.   : 0.00        
 1st Qu.: 0.000             1st Qu.: 0.0000            1st Qu.: 0.00        
 Median : 0.000             Median : 0.0000            Median : 1.00        
 Mean   : 1.432             Mean   : 0.1882            Mean   : 1.95        
 3rd Qu.: 0.000             3rd Qu.: 0.0000            3rd Qu.: 1.00        
 Max.   :55.000             Max.   :12.0000            Max.   :44.00        
                                                                            
 church_count_1500 mosque_count_1500 leisure_count_1500 sport_count_1500
 Min.   : 0.000    Min.   :0.00000   Min.   : 0.0000    Min.   : 0.000  
 1st Qu.: 1.000    1st Qu.:0.00000   1st Qu.: 0.0000    1st Qu.: 1.000  
 Median : 1.000    Median :0.00000   Median : 0.0000    Median : 5.000  
 Mean   : 3.631    Mean   :0.03741   Mean   : 0.9285    Mean   : 5.846  
 3rd Qu.: 3.000    3rd Qu.:0.00000   3rd Qu.: 1.0000    3rd Qu.: 9.000  
 Max.   :75.000    Max.   :1.00000   Max.   :39.0000    Max.   :37.000  
                                                                        
 market_count_1500 green_part_2000 prom_part_2000  office_count_2000
 Min.   :0.0000    Min.   : 0.01   Min.   : 0.00   Min.   :  0.00   
 1st Qu.:0.0000    1st Qu.:10.21   1st Qu.: 3.12   1st Qu.:  0.00   
 Median :0.0000    Median :17.71   Median : 8.80   Median :  2.00   
 Mean   :0.7661    Mean   :20.87   Mean   :11.24   Mean   : 13.13   
 3rd Qu.:1.0000    3rd Qu.:28.42   3rd Qu.:16.21   3rd Qu.:  7.00   
 Max.   :7.0000    Max.   :75.30   Max.   :56.10   Max.   :250.00   
                                                                    
 office_sqm_2000   trc_count_2000    trc_sqm_2000     cafe_count_2000  
 Min.   :      0   Min.   : 0.000   Min.   :      0   Min.   :   0.00  
 1st Qu.:      0   1st Qu.: 1.000   1st Qu.:  12065   1st Qu.:   3.00  
 Median :  58411   Median : 5.000   Median : 115856   Median :  18.00  
 Mean   : 244211   Mean   : 5.932   Mean   : 212261   Mean   :  54.32  
 3rd Qu.: 207193   3rd Qu.: 9.000   3rd Qu.: 284727   3rd Qu.:  37.00  
 Max.   :3602982   Max.   :37.000   Max.   :2442600   Max.   :1115.00  
                                                                       
 cafe_sum_2000_min_price_avg cafe_sum_2000_max_price_avg cafe_avg_price_2000
 Min.   : 300.0              Min.   : 500                Min.   : 400.0     
 1st Qu.: 607.0              1st Qu.:1000                1st Qu.: 823.2     
 Median : 682.5              Median :1155                Median : 918.0     
 Mean   : 719.4              Mean   :1210                Mean   : 964.8     
 3rd Qu.: 791.3              3rd Qu.:1321                3rd Qu.:1056.8     
 Max.   :2166.7              Max.   :3500                Max.   :2833.3     
 NA's   :1368                NA's   :1368                NA's   :1368       
 cafe_count_2000_na_price cafe_count_2000_price_500 cafe_count_2000_price_1000
 Min.   : 0.000           Min.   :  0.00            Min.   :  0.00            
 1st Qu.: 0.000           1st Qu.:  1.00            1st Qu.:  1.00            
 Median : 1.000           Median :  4.00            Median :  6.00            
 Mean   : 3.548           Mean   : 13.42            Mean   : 15.07            
 3rd Qu.: 3.000           3rd Qu.: 10.00            3rd Qu.: 13.00            
 Max.   :70.000           Max.   :278.00            Max.   :261.00            
                                                                              
 cafe_count_2000_price_1500 cafe_count_2000_price_2500
 Min.   :  0.00             Min.   :  0.000           
 1st Qu.:  1.00             1st Qu.:  0.000           
 Median :  4.00             Median :  1.000           
 Mean   : 13.09             Mean   :  6.546           
 3rd Qu.:  9.00             3rd Qu.:  3.000           
 Max.   :261.00             Max.   :167.000           
                                                      
 cafe_count_2000_price_4000 cafe_count_2000_price_high big_church_count_2000
 Min.   : 0.000             Min.   : 0.0000            Min.   : 0.000       
 1st Qu.: 0.000             1st Qu.: 0.0000            1st Qu.: 0.000       
 Median : 0.000             Median : 0.0000            Median : 1.000       
 Mean   : 2.281             Mean   : 0.3707            Mean   : 3.225       
 3rd Qu.: 1.000             3rd Qu.: 0.0000            3rd Qu.: 2.000       
 Max.   :74.000             Max.   :16.0000            Max.   :70.000       
                                                                            
 church_count_2000 mosque_count_2000 leisure_count_2000 sport_count_2000
 Min.   :  0.000   Min.   :0.00000   Min.   : 0.000     Min.   : 0.000  
 1st Qu.:  2.000   1st Qu.:0.00000   1st Qu.: 0.000     1st Qu.: 2.000  
 Median :  3.000   Median :0.00000   Median : 0.000     Median : 9.000  
 Mean   :  6.154   Mean   :0.08819   Mean   : 1.888     Mean   : 9.816  
 3rd Qu.:  5.000   3rd Qu.:0.00000   3rd Qu.: 1.000     3rd Qu.:14.000  
 Max.   :108.000   Max.   :1.00000   Max.   :55.000     Max.   :54.000  
                                                                        
 market_count_2000 green_part_3000 prom_part_3000  office_count_3000
 Min.   :0.00      Min.   : 0.31   Min.   : 0.00   Min.   :  0.00   
 1st Qu.:0.00      1st Qu.:12.15   1st Qu.: 4.28   1st Qu.:  0.00   
 Median :1.00      Median :20.30   Median : 9.69   Median :  5.00   
 Mean   :1.17      Mean   :22.74   Mean   :10.99   Mean   : 28.96   
 3rd Qu.:2.00      3rd Qu.:30.20   3rd Qu.:15.73   3rd Qu.: 17.00   
 Max.   :8.00      Max.   :74.02   Max.   :45.10   Max.   :493.00   
                                                                    
 office_sqm_3000   trc_count_3000   trc_sqm_3000     cafe_count_3000 
 Min.   :      0   Min.   : 0.00   Min.   :      0   Min.   :   0.0  
 1st Qu.:      0   1st Qu.: 2.00   1st Qu.:  41100   1st Qu.:   7.0  
 Median : 130303   Median :11.00   Median : 294350   Median :  41.0  
 Mean   : 538727   Mean   :11.79   Mean   : 437423   Mean   : 109.5  
 3rd Qu.: 491883   3rd Qu.:17.00   3rd Qu.: 651639   3rd Qu.:  78.0  
 Max.   :6106112   Max.   :66.00   Max.   :2654102   Max.   :1815.0  
                                                                     
 cafe_sum_3000_min_price_avg cafe_sum_3000_max_price_avg cafe_avg_price_3000
 Min.   : 300.0              Min.   : 500                Min.   : 400.0     
 1st Qu.: 650.0              1st Qu.:1101                1st Qu.: 875.3     
 Median : 711.0              Median :1211                Median : 961.1     
 Mean   : 765.3              Mean   :1283                Mean   :1023.9     
 3rd Qu.: 815.2              3rd Qu.:1333                3rd Qu.:1083.3     
 Max.   :1833.3              Max.   :3000                Max.   :2416.7     
 NA's   :773                 NA's   :773                 NA's   :773        
 cafe_count_3000_na_price cafe_count_3000_price_500 cafe_count_3000_price_1000
 Min.   :  0.000          Min.   :  0.00            Min.   :  0.00            
 1st Qu.:  0.000          1st Qu.:  1.00            1st Qu.:  2.00            
 Median :  3.000          Median :  9.00            Median : 14.00            
 Mean   :  7.196          Mean   : 27.45            Mean   : 30.11            
 3rd Qu.:  6.000          3rd Qu.: 22.00            3rd Qu.: 26.00            
 Max.   :114.000          Max.   :449.00            Max.   :441.00            
                                                                              
 cafe_count_3000_price_1500 cafe_count_3000_price_2500
 Min.   :  0.00             Min.   :  0.00            
 1st Qu.:  2.00             1st Qu.:  1.00            
 Median : 10.00             Median :  3.00            
 Mean   : 26.34             Mean   : 13.11            
 3rd Qu.: 17.00             3rd Qu.:  6.00            
 Max.   :446.00             Max.   :263.00            
                                                      
 cafe_count_3000_price_4000 cafe_count_3000_price_high big_church_count_3000
 Min.   :  0.000            Min.   : 0.0000            Min.   :  0.000      
 1st Qu.:  0.000            1st Qu.: 0.0000            1st Qu.:  1.000      
 Median :  1.000            Median : 0.0000            Median :  2.000      
 Mean   :  4.572            Mean   : 0.6915            Mean   :  6.057      
 3rd Qu.:  2.000            3rd Qu.: 0.0000            3rd Qu.:  5.000      
 Max.   :112.000            Max.   :22.0000            Max.   :102.000      
                                                                            
 church_count_3000 mosque_count_3000 leisure_count_3000 sport_count_3000
 Min.   :  0.00    Min.   :0.000     Min.   : 0.000     Min.   :  0.00  
 1st Qu.:  3.00    1st Qu.:0.000     1st Qu.: 0.000     1st Qu.:  5.00  
 Median :  6.00    Median :0.000     Median : 0.000     Median : 18.00  
 Mean   : 12.17    Mean   :0.199     Mean   : 3.817     Mean   : 20.18  
 3rd Qu.: 10.00    3rd Qu.:0.000     3rd Qu.: 2.000     3rd Qu.: 29.00  
 Max.   :164.00    Max.   :2.000     Max.   :85.000     Max.   :100.00  
                                                                        
 market_count_3000 green_part_5000 prom_part_5000  office_count_5000
 Min.   : 0.000    Min.   : 3.53   Min.   : 0.21   Min.   :  0.00   
 1st Qu.: 0.000    1st Qu.:14.78   1st Qu.: 6.05   1st Qu.:  2.00   
 Median : 2.000    Median :19.77   Median : 8.96   Median : 15.00   
 Mean   : 2.317    Mean   :22.76   Mean   :10.35   Mean   : 70.57   
 3rd Qu.: 4.000    3rd Qu.:31.39   3rd Qu.:13.97   3rd Qu.: 52.00   
 Max.   :10.000    Max.   :68.35   Max.   :28.56   Max.   :789.00   
                                   NA's   :136                      
 office_sqm_5000    trc_count_5000    trc_sqm_5000     cafe_count_5000 
 Min.   :       0   Min.   :  0.00   Min.   :      0   Min.   :   0.0  
 1st Qu.:   85159   1st Qu.:  6.00   1st Qu.: 262000   1st Qu.:  20.0  
 Median :  429442   Median : 31.00   Median :1076162   Median : 108.0  
 Mean   : 1391525   Mean   : 30.06   Mean   :1172851   Mean   : 262.9  
 3rd Qu.: 1430674   3rd Qu.: 43.00   3rd Qu.:1683553   3rd Qu.: 221.0  
 Max.   :12372993   Max.   :119.00   Max.   :4585477   Max.   :2645.0  
                                                                       
 cafe_sum_5000_min_price_avg cafe_sum_5000_max_price_avg cafe_avg_price_5000
 Min.   : 300.0              Min.   : 500                Min.   : 400.0     
 1st Qu.: 670.5              1st Qu.:1144                1st Qu.: 909.4     
 Median : 721.1              Median :1212                Median : 966.0     
 Mean   : 764.0              Mean   :1277                Mean   :1020.4     
 3rd Qu.: 815.4              3rd Qu.:1341                3rd Qu.:1088.1     
 Max.   :1875.0              Max.   :3000                Max.   :2437.5     
 NA's   :235                 NA's   :235                 NA's   :235        
 cafe_count_5000_na_price cafe_count_5000_price_500 cafe_count_5000_price_1000
 Min.   :  0.00           Min.   :  0.00            Min.   :  0.00            
 1st Qu.:  1.00           1st Qu.:  4.00            1st Qu.:  8.00            
 Median :  8.00           Median : 28.00            Median : 36.00            
 Mean   : 17.64           Mean   : 65.54            Mean   : 72.83            
 3rd Qu.: 15.00           3rd Qu.: 59.00            3rd Qu.: 69.00            
 Max.   :174.00           Max.   :650.00            Max.   :648.00            
                                                                              
 cafe_count_5000_price_1500 cafe_count_5000_price_2500
 Min.   :  0.00             Min.   :  0.00            
 1st Qu.:  6.00             1st Qu.:  2.00            
 Median : 24.00             Median :  8.00            
 Mean   : 62.84             Mean   : 31.67            
 3rd Qu.: 50.00             3rd Qu.: 21.00            
 Max.   :641.00             Max.   :377.00            
                                                      
 cafe_count_5000_price_4000 cafe_count_5000_price_high big_church_count_5000
 Min.   :  0.00             Min.   : 0.000             Min.   :  0.0        
 1st Qu.:  1.00             1st Qu.: 0.000             1st Qu.:  2.0        
 Median :  2.00             Median : 0.000             Median :  7.0        
 Mean   : 10.63             Mean   : 1.746             Mean   : 14.9        
 3rd Qu.:  5.00             3rd Qu.: 0.000             3rd Qu.: 12.0        
 Max.   :147.00             Max.   :30.000             Max.   :151.0        
                                                                            
 church_count_5000 mosque_count_5000 leisure_count_5000 sport_count_5000
 Min.   :  0       Min.   :0.0000    Min.   :  0.000    Min.   :  0.0   
 1st Qu.:  9       1st Qu.:0.0000    1st Qu.:  0.000    1st Qu.: 11.0   
 Median : 16       Median :0.0000    Median :  2.000    Median : 48.0   
 Mean   : 30       Mean   :0.4422    Mean   :  8.553    Mean   : 52.7   
 3rd Qu.: 28       3rd Qu.:1.0000    3rd Qu.:  7.000    3rd Qu.: 75.0   
 Max.   :250       Max.   :2.0000    Max.   :106.000    Max.   :218.0   
                                                                        
 market_count_5000   price_doc       
 Min.   : 0.000    Min.   :  100000  
 1st Qu.: 1.000    1st Qu.: 4740002  
 Median : 5.000    Median : 6274186  
 Mean   : 5.979    Mean   : 7103225  
 3rd Qu.:10.000    3rd Qu.: 8300000  
 Max.   :21.000    Max.   :95122496  
                                     

In [166]:
#I'd like to see that applied to each feature. Here I make it an actual percentage and round to 2 decimal places
sapply(dataset, function(df) {
  round((      sum(is.na(df) == TRUE) / length(df) * 100    ),2)
})


id
0
timestamp
0
full_sq
0
life_sq
20.88
floor
0.53
max_floor
31.29
material
31.29
build_year
44.41
num_room
31.29
kitch_sq
31.29
state
44.46
product_type
0
sub_area
0
area_m
0
raion_popul
0
green_zone_part
0
indust_part
0
children_preschool
0
preschool_quota
21.9
preschool_education_centers_raion
0
children_school
0
school_quota
21.89
school_education_centers_raion
0
school_education_centers_top_20_raion
0
hospital_beds_raion
47.27
healthcare_centers_raion
0
university_top_20_raion
0
sport_objects_raion
0
additional_education_raion
0
culture_objects_top_25
0
culture_objects_top_25_raion
0
shopping_centers_raion
0
office_raion
0
thermal_power_plant_raion
0
incineration_raion
0
oil_chemistry_raion
0
radiation_raion
0
railroad_terminal_raion
0
big_market_raion
0
nuclear_reactor_raion
0
detention_facility_raion
0
full_all
0
male_f
0
female_f
0
young_all
0
young_male
0
young_female
0
work_all
0
work_male
0
work_female
0
ekder_all
0
ekder_male
0
ekder_female
0
0_6_all
0
0_6_male
0
0_6_female
0
7_14_all
0
7_14_male
0
7_14_female
0
0_17_all
0
0_17_male
0
0_17_female
0
16_29_all
0
16_29_male
0
16_29_female
0
0_13_all
0
0_13_male
0
0_13_female
0
raion_build_count_with_material_info
16.28
build_count_block
16.28
build_count_wood
16.28
build_count_frame
16.28
build_count_brick
16.28
build_count_monolith
16.28
build_count_panel
16.28
build_count_foam
16.28
build_count_slag
16.28
build_count_mix
16.28
raion_build_count_with_builddate_info
16.28
build_count_before_1920
16.28
build_count_1921-1945
16.28
build_count_1946-1970
16.28
build_count_1971-1995
16.28
build_count_after_1995
16.28
ID_metro
0
metro_min_avto
0
metro_km_avto
0
metro_min_walk
0.09
metro_km_walk
0.09
kindergarten_km
0
school_km
0
park_km
0
green_zone_km
0
industrial_km
0
water_treatment_km
0
cemetery_km
0
incineration_km
0
railroad_station_walk_km
0.09
railroad_station_walk_min
0.09
ID_railroad_station_walk
0.09
railroad_station_avto_km
0
railroad_station_avto_min
0
ID_railroad_station_avto
0
public_transport_station_km
0
public_transport_station_min_walk
0
water_km
0
water_1line
0
mkad_km
0
ttk_km
0
sadovoe_km
0
bulvar_ring_km
0
kremlin_km
0
big_road1_km
0
ID_big_road1
0
big_road1_1line
0
big_road2_km
0
ID_big_road2
0
railroad_km
0
railroad_1line
0
zd_vokzaly_avto_km
0
ID_railroad_terminal
0
bus_terminal_avto_km
0
ID_bus_terminal
0
oil_chemistry_km
0
nuclear_reactor_km
0
radiation_km
0
power_transmission_line_km
0
thermal_power_plant_km
0
ts_km
0
big_market_km
0
market_shop_km
0
fitness_km
0
swim_pool_km
0
ice_rink_km
0
stadium_km
0
basketball_km
0
hospice_morgue_km
0
detention_facility_km
0
public_healthcare_km
0
university_km
0
workplaces_km
0
shopping_centers_km
0
office_km
0
additional_education_km
0
preschool_km
0
big_church_km
0
church_synagogue_km
0
mosque_km
0
theater_km
0
museum_km
0
exhibition_km
0
catering_km
0
ecology
0
green_part_500
0
prom_part_500
0
office_count_500
0
office_sqm_500
0
trc_count_500
0
trc_sqm_500
0
cafe_count_500
0
cafe_sum_500_min_price_avg
43.66
cafe_sum_500_max_price_avg
43.66
cafe_avg_price_500
43.66
cafe_count_500_na_price
0
cafe_count_500_price_500
0
cafe_count_500_price_1000
0
cafe_count_500_price_1500
0
cafe_count_500_price_2500
0
cafe_count_500_price_4000
0
cafe_count_500_price_high
0
big_church_count_500
0
church_count_500
0
mosque_count_500
0
leisure_count_500
0
sport_count_500
0
market_count_500
0
green_part_1000
0
prom_part_1000
0
office_count_1000
0
office_sqm_1000
0
trc_count_1000
0
trc_sqm_1000
0
cafe_count_1000
0
cafe_sum_1000_min_price_avg
21.36
cafe_sum_1000_max_price_avg
21.36
cafe_avg_price_1000
21.36
cafe_count_1000_na_price
0
cafe_count_1000_price_500
0
cafe_count_1000_price_1000
0
cafe_count_1000_price_1500
0
cafe_count_1000_price_2500
0
cafe_count_1000_price_4000
0
cafe_count_1000_price_high
0
big_church_count_1000
0
church_count_1000
0
mosque_count_1000
0
leisure_count_1000
0
sport_count_1000
0
market_count_1000
0
green_part_1500
0
prom_part_1500
0
office_count_1500
0
office_sqm_1500
0
trc_count_1500
0
trc_sqm_1500
0
cafe_count_1500
0
cafe_sum_1500_min_price_avg
13.8
cafe_sum_1500_max_price_avg
13.8
cafe_avg_price_1500
13.8
cafe_count_1500_na_price
0
cafe_count_1500_price_500
0
cafe_count_1500_price_1000
0
cafe_count_1500_price_1500
0
cafe_count_1500_price_2500
0
cafe_count_1500_price_4000
0
cafe_count_1500_price_high
0
big_church_count_1500
0
church_count_1500
0
mosque_count_1500
0
leisure_count_1500
0
sport_count_1500
0
market_count_1500
0
green_part_2000
0
prom_part_2000
0
office_count_2000
0
office_sqm_2000
0
trc_count_2000
0
trc_sqm_2000
0
cafe_count_2000
0
cafe_sum_2000_min_price_avg
5.61
cafe_sum_2000_max_price_avg
5.61
cafe_avg_price_2000
5.61
cafe_count_2000_na_price
0
cafe_count_2000_price_500
0
cafe_count_2000_price_1000
0
cafe_count_2000_price_1500
0
cafe_count_2000_price_2500
0
cafe_count_2000_price_4000
0
cafe_count_2000_price_high
0
big_church_count_2000
0
church_count_2000
0
mosque_count_2000
0
leisure_count_2000
0
sport_count_2000
0
market_count_2000
0
green_part_3000
0
prom_part_3000
0
office_count_3000
0
office_sqm_3000
0
trc_count_3000
0
trc_sqm_3000
0
cafe_count_3000
0
cafe_sum_3000_min_price_avg
3.17
cafe_sum_3000_max_price_avg
3.17
cafe_avg_price_3000
3.17
cafe_count_3000_na_price
0
cafe_count_3000_price_500
0
cafe_count_3000_price_1000
0
cafe_count_3000_price_1500
0
cafe_count_3000_price_2500
0
cafe_count_3000_price_4000
0
cafe_count_3000_price_high
0
big_church_count_3000
0
church_count_3000
0
mosque_count_3000
0
leisure_count_3000
0
sport_count_3000
0
market_count_3000
0
green_part_5000
0
prom_part_5000
0.56
office_count_5000
0
office_sqm_5000
0
trc_count_5000
0
trc_sqm_5000
0
cafe_count_5000
0
cafe_sum_5000_min_price_avg
0.96
cafe_sum_5000_max_price_avg
0.96
cafe_avg_price_5000
0.96
cafe_count_5000_na_price
0
cafe_count_5000_price_500
0
cafe_count_5000_price_1000
0
cafe_count_5000_price_1500
0
cafe_count_5000_price_2500
0
cafe_count_5000_price_4000
0
cafe_count_5000_price_high
0
big_church_count_5000
0
church_count_5000
0
mosque_count_5000
0
leisure_count_5000
0
sport_count_5000
0
market_count_5000
0
price_doc
0

In [ ]:


In [167]:
dataset$timestamp <- as.Date(dataset$timestamp)

In [ ]:


In [168]:
sapply(dataset, class)


id
'integer'
timestamp
'Date'
full_sq
'integer'
life_sq
'integer'
floor
'integer'
max_floor
'integer'
material
'integer'
build_year
'integer'
num_room
'integer'
kitch_sq
'integer'
state
'integer'
product_type
'character'
sub_area
'character'
area_m
'numeric'
raion_popul
'integer'
green_zone_part
'numeric'
indust_part
'numeric'
children_preschool
'integer'
preschool_quota
'integer'
preschool_education_centers_raion
'integer'
children_school
'integer'
school_quota
'integer'
school_education_centers_raion
'integer'
school_education_centers_top_20_raion
'integer'
hospital_beds_raion
'integer'
healthcare_centers_raion
'integer'
university_top_20_raion
'integer'
sport_objects_raion
'integer'
additional_education_raion
'integer'
culture_objects_top_25
'character'
culture_objects_top_25_raion
'integer'
shopping_centers_raion
'integer'
office_raion
'integer'
thermal_power_plant_raion
'character'
incineration_raion
'character'
oil_chemistry_raion
'character'
radiation_raion
'character'
railroad_terminal_raion
'character'
big_market_raion
'character'
nuclear_reactor_raion
'character'
detention_facility_raion
'character'
full_all
'integer'
male_f
'integer'
female_f
'integer'
young_all
'integer'
young_male
'integer'
young_female
'integer'
work_all
'integer'
work_male
'integer'
work_female
'integer'
ekder_all
'integer'
ekder_male
'integer'
ekder_female
'integer'
0_6_all
'integer'
0_6_male
'integer'
0_6_female
'integer'
7_14_all
'integer'
7_14_male
'integer'
7_14_female
'integer'
0_17_all
'integer'
0_17_male
'integer'
0_17_female
'integer'
16_29_all
'integer'
16_29_male
'integer'
16_29_female
'integer'
0_13_all
'integer'
0_13_male
'integer'
0_13_female
'integer'
raion_build_count_with_material_info
'integer'
build_count_block
'integer'
build_count_wood
'integer'
build_count_frame
'integer'
build_count_brick
'integer'
build_count_monolith
'integer'
build_count_panel
'integer'
build_count_foam
'integer'
build_count_slag
'integer'
build_count_mix
'integer'
raion_build_count_with_builddate_info
'integer'
build_count_before_1920
'integer'
build_count_1921-1945
'integer'
build_count_1946-1970
'integer'
build_count_1971-1995
'integer'
build_count_after_1995
'integer'
ID_metro
'integer'
metro_min_avto
'numeric'
metro_km_avto
'numeric'
metro_min_walk
'numeric'
metro_km_walk
'numeric'
kindergarten_km
'numeric'
school_km
'numeric'
park_km
'numeric'
green_zone_km
'numeric'
industrial_km
'numeric'
water_treatment_km
'numeric'
cemetery_km
'numeric'
incineration_km
'numeric'
railroad_station_walk_km
'numeric'
railroad_station_walk_min
'numeric'
ID_railroad_station_walk
'integer'
railroad_station_avto_km
'numeric'
railroad_station_avto_min
'numeric'
ID_railroad_station_avto
'integer'
public_transport_station_km
'numeric'
public_transport_station_min_walk
'numeric'
water_km
'numeric'
water_1line
'character'
mkad_km
'numeric'
ttk_km
'numeric'
sadovoe_km
'numeric'
bulvar_ring_km
'numeric'
kremlin_km
'numeric'
big_road1_km
'numeric'
ID_big_road1
'integer'
big_road1_1line
'character'
big_road2_km
'numeric'
ID_big_road2
'integer'
railroad_km
'numeric'
railroad_1line
'character'
zd_vokzaly_avto_km
'numeric'
ID_railroad_terminal
'integer'
bus_terminal_avto_km
'numeric'
ID_bus_terminal
'integer'
oil_chemistry_km
'numeric'
nuclear_reactor_km
'numeric'
radiation_km
'numeric'
power_transmission_line_km
'numeric'
thermal_power_plant_km
'numeric'
ts_km
'numeric'
big_market_km
'numeric'
market_shop_km
'numeric'
fitness_km
'numeric'
swim_pool_km
'numeric'
ice_rink_km
'numeric'
stadium_km
'numeric'
basketball_km
'numeric'
hospice_morgue_km
'numeric'
detention_facility_km
'numeric'
public_healthcare_km
'numeric'
university_km
'numeric'
workplaces_km
'numeric'
shopping_centers_km
'numeric'
office_km
'numeric'
additional_education_km
'numeric'
preschool_km
'numeric'
big_church_km
'numeric'
church_synagogue_km
'numeric'
mosque_km
'numeric'
theater_km
'numeric'
museum_km
'numeric'
exhibition_km
'numeric'
catering_km
'numeric'
ecology
'character'
green_part_500
'numeric'
prom_part_500
'numeric'
office_count_500
'integer'
office_sqm_500
'integer'
trc_count_500
'integer'
trc_sqm_500
'integer'
cafe_count_500
'integer'
cafe_sum_500_min_price_avg
'numeric'
cafe_sum_500_max_price_avg
'numeric'
cafe_avg_price_500
'numeric'
cafe_count_500_na_price
'integer'
cafe_count_500_price_500
'integer'
cafe_count_500_price_1000
'integer'
cafe_count_500_price_1500
'integer'
cafe_count_500_price_2500
'integer'
cafe_count_500_price_4000
'integer'
cafe_count_500_price_high
'integer'
big_church_count_500
'integer'
church_count_500
'integer'
mosque_count_500
'integer'
leisure_count_500
'integer'
sport_count_500
'integer'
market_count_500
'integer'
green_part_1000
'numeric'
prom_part_1000
'numeric'
office_count_1000
'integer'
office_sqm_1000
'integer'
trc_count_1000
'integer'
trc_sqm_1000
'integer'
cafe_count_1000
'integer'
cafe_sum_1000_min_price_avg
'numeric'
cafe_sum_1000_max_price_avg
'numeric'
cafe_avg_price_1000
'numeric'
cafe_count_1000_na_price
'integer'
cafe_count_1000_price_500
'integer'
cafe_count_1000_price_1000
'integer'
cafe_count_1000_price_1500
'integer'
cafe_count_1000_price_2500
'integer'
cafe_count_1000_price_4000
'integer'
cafe_count_1000_price_high
'integer'
big_church_count_1000
'integer'
church_count_1000
'integer'
mosque_count_1000
'integer'
leisure_count_1000
'integer'
sport_count_1000
'integer'
market_count_1000
'integer'
green_part_1500
'numeric'
prom_part_1500
'numeric'
office_count_1500
'integer'
office_sqm_1500
'integer'
trc_count_1500
'integer'
trc_sqm_1500
'integer'
cafe_count_1500
'integer'
cafe_sum_1500_min_price_avg
'numeric'
cafe_sum_1500_max_price_avg
'numeric'
cafe_avg_price_1500
'numeric'
cafe_count_1500_na_price
'integer'
cafe_count_1500_price_500
'integer'
cafe_count_1500_price_1000
'integer'
cafe_count_1500_price_1500
'integer'
cafe_count_1500_price_2500
'integer'
cafe_count_1500_price_4000
'integer'
cafe_count_1500_price_high
'integer'
big_church_count_1500
'integer'
church_count_1500
'integer'
mosque_count_1500
'integer'
leisure_count_1500
'integer'
sport_count_1500
'integer'
market_count_1500
'integer'
green_part_2000
'numeric'
prom_part_2000
'numeric'
office_count_2000
'integer'
office_sqm_2000
'integer'
trc_count_2000
'integer'
trc_sqm_2000
'integer'
cafe_count_2000
'integer'
cafe_sum_2000_min_price_avg
'numeric'
cafe_sum_2000_max_price_avg
'numeric'
cafe_avg_price_2000
'numeric'
cafe_count_2000_na_price
'integer'
cafe_count_2000_price_500
'integer'
cafe_count_2000_price_1000
'integer'
cafe_count_2000_price_1500
'integer'
cafe_count_2000_price_2500
'integer'
cafe_count_2000_price_4000
'integer'
cafe_count_2000_price_high
'integer'
big_church_count_2000
'integer'
church_count_2000
'integer'
mosque_count_2000
'integer'
leisure_count_2000
'integer'
sport_count_2000
'integer'
market_count_2000
'integer'
green_part_3000
'numeric'
prom_part_3000
'numeric'
office_count_3000
'integer'
office_sqm_3000
'integer'
trc_count_3000
'integer'
trc_sqm_3000
'integer'
cafe_count_3000
'integer'
cafe_sum_3000_min_price_avg
'numeric'
cafe_sum_3000_max_price_avg
'numeric'
cafe_avg_price_3000
'numeric'
cafe_count_3000_na_price
'integer'
cafe_count_3000_price_500
'integer'
cafe_count_3000_price_1000
'integer'
cafe_count_3000_price_1500
'integer'
cafe_count_3000_price_2500
'integer'
cafe_count_3000_price_4000
'integer'
cafe_count_3000_price_high
'integer'
big_church_count_3000
'integer'
church_count_3000
'integer'
mosque_count_3000
'integer'
leisure_count_3000
'integer'
sport_count_3000
'integer'
market_count_3000
'integer'
green_part_5000
'numeric'
prom_part_5000
'numeric'
office_count_5000
'integer'
office_sqm_5000
'integer'
trc_count_5000
'integer'
trc_sqm_5000
'integer'
cafe_count_5000
'integer'
cafe_sum_5000_min_price_avg
'numeric'
cafe_sum_5000_max_price_avg
'numeric'
cafe_avg_price_5000
'numeric'
cafe_count_5000_na_price
'integer'
cafe_count_5000_price_500
'integer'
cafe_count_5000_price_1000
'integer'
cafe_count_5000_price_1500
'integer'
cafe_count_5000_price_2500
'integer'
cafe_count_5000_price_4000
'integer'
cafe_count_5000_price_high
'integer'
big_church_count_5000
'integer'
church_count_5000
'integer'
mosque_count_5000
'integer'
leisure_count_5000
'integer'
sport_count_5000
'integer'
market_count_5000
'integer'
price_doc
'integer'

In [169]:
# summarize correlations between input variables
#cor(dataset[,3:11])

In [170]:
# split input and output
        #name the features then split into x and y by feature name and label name -- or just by column nubers
x <- dataset[,2:4]
y <- dataset[,292]


#how to refer to specific columns by number
#df[,c(1:3,6,9:12)]

In [171]:
#dataset[3:9] <- lapply(dataset[3:9], as.numeric)

In [172]:
# scatterplot matrix
pairs(dataset[,2:4])



In [173]:
# correlation plot
#correlations <- cor(dataset[,3:7])
#corrplot(correlations, method="circle")

In [174]:
dataset <- dataset[,c(3,103:105,292)]  # dataset <- dataset[,c(292,3,6,7:8)]

In [175]:
sapply(dataset, function(df) {
  round((      sum(is.na(df) == TRUE) / length(df) * 100    ),2)
})


full_sq
0
ID_railroad_station_avto
0
public_transport_station_km
0
public_transport_station_min_walk
0
price_doc
0

In [176]:
head(dataset)


full_sqID_railroad_station_avtopublic_transport_station_kmpublic_transport_station_min_walkprice_doc
43 1 0.274985143.2998217 5850000
34 2 0.065263340.7831601 6000000
43 3 0.328756043.9450725 5700000
77 113 0.071480320.8577639 16331452
25 7 0.050211050.6025326 5500000
44 1 0.254813893.0577667 2000000

In [177]:
# Evaluate Algorithms: Baseline

# Run algorithms using 10-fold cross validation
control <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "RMSE"

In [178]:
# lm
set.seed(7)
fit.lm <- train(price_doc~., data=dataset, method="lm", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# GLM
set.seed(7)
fit.glm <- train(price_doc~., data=dataset, method="glm", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# GLMNET
set.seed(7)
fit.glmnet <- train(price_doc~., data=dataset, method="glmnet", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# SVM
set.seed(7)
#fit.svm <- train(price_doc~., data=dataset, method="svmRadial", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# CART
set.seed(7)
grid <- expand.grid(.cp=c(0, 0.05, 0.1))
fit.cart <- train(price_doc~., data=dataset, method="rpart", metric=metric, tuneGrid=grid, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# kNN
set.seed(7)
fit.knn <- train(price_doc~., data=dataset, method="knn", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)


Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"

In [179]:
# Compare algorithms
results <- resamples(list(LM=fit.lm
                          , GLM=fit.glm
                          , GLMNET=fit.glmnet
                          #, SVM=fit.svm
                          , CART=fit.cart
                          , KNN=fit.knn))
summary(results)
dotplot(results)


Call:
summary.resamples(object = results)

Models: LM, GLM, GLMNET, CART, KNN 
Number of resamples: 30 

RMSE 
          Min. 1st Qu.  Median    Mean 3rd Qu.     Max. NA's
LM     3717000 4150000 4254000 5204000 4623000 13700000    0
GLM    3713000 4148000 4251000 5203000 4622000 13700000    0
GLMNET 3797000 4246000 4351000 5143000 4717000 12230000    0
CART   3031000 3201000 3279000 3309000 3385000  3692000    0
KNN    3016000 3113000 3217000 3254000 3363000  3579000    0

Rsquared 
          Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA's
LM     0.01489  0.2616 0.2817 0.2621  0.3061 0.3586    0
GLM    0.01504  0.2590 0.2796 0.2618  0.3076 0.3540    0
GLMNET 0.01413  0.2938 0.3254 0.3024  0.3604 0.4209    0
CART   0.42380  0.4797 0.5328 0.5260  0.5710 0.6443    0
KNN    0.45800  0.5160 0.5371 0.5331  0.5572 0.5938    0

In [ ]:


In [ ]: