The aim of this competition is to predict the sale price of each property. The target variable is called price_doc in train.csv.
The training data is from August 2011 to June 2015, and the test set is from July 2015 to May 2016. The dataset also includes information about overall conditions in Russia's economy and finance sector, so you can focus on generating accurate price forecasts for individual properties, without needing to second-guess what the business cycle will do.
Data Files
train.csv, test.csv: information about individual transactions. The rows are indexed by the "id" field, which refers to individual transactions (particular properties might appear more than once, in separate transactions). These files also include supplementary information about the local area of each property. macro.csv: data on Russia's macroeconomy and financial sector (could be joined to the train and test sets on the "timestamp" column) sample_submission.csv: an example submission file in the correct format data_dictionary.txt: explanations of the fields available in the other data files
In [158]:
library(mlbench)
library(caret)
library(corrplot)
In [159]:
library(data.table)
train <- fread(
"https://raw.githubusercontent.com/jsphyg/ml_practice_notebooks/master/SRHM/train.csv",
stringsAsFactors=F,
na.strings = c("NA","")
)
test <- fread(
"https://raw.githubusercontent.com/jsphyg/ml_practice_notebooks/master/SRHM/test.csv",
stringsAsFactors=F,
na.strings = c("NA","")
)
macro <- fread(
"https://raw.githubusercontent.com/jsphyg/ml_practice_notebooks/master/SRHM/macro.csv",
stringsAsFactors=F,
na.strings = c("NA","")
)
In [160]:
# rename the dataset
dataset <- train
In [161]:
# create a list of 80% of the rows in the original dataset we can use for training
validation_index <- createDataPartition(dataset$price_doc, p=0.80, list=FALSE)
# select 20% of the data for validation
validation <- dataset[-validation_index,]
# use the remaining 80% of data to training and testing the models
dataset <- dataset[validation_index,]
In [162]:
dim(dataset)
- 24378
- 292
In [163]:
sapply(dataset, class)
- id
- 'integer'
- timestamp
- 'character'
- full_sq
- 'integer'
- life_sq
- 'integer'
- floor
- 'integer'
- max_floor
- 'integer'
- material
- 'integer'
- build_year
- 'integer'
- num_room
- 'integer'
- kitch_sq
- 'integer'
- state
- 'integer'
- product_type
- 'character'
- sub_area
- 'character'
- area_m
- 'numeric'
- raion_popul
- 'integer'
- green_zone_part
- 'numeric'
- indust_part
- 'numeric'
- children_preschool
- 'integer'
- preschool_quota
- 'integer'
- preschool_education_centers_raion
- 'integer'
- children_school
- 'integer'
- school_quota
- 'integer'
- school_education_centers_raion
- 'integer'
- school_education_centers_top_20_raion
- 'integer'
- hospital_beds_raion
- 'integer'
- healthcare_centers_raion
- 'integer'
- university_top_20_raion
- 'integer'
- sport_objects_raion
- 'integer'
- additional_education_raion
- 'integer'
- culture_objects_top_25
- 'character'
- culture_objects_top_25_raion
- 'integer'
- shopping_centers_raion
- 'integer'
- office_raion
- 'integer'
- thermal_power_plant_raion
- 'character'
- incineration_raion
- 'character'
- oil_chemistry_raion
- 'character'
- radiation_raion
- 'character'
- railroad_terminal_raion
- 'character'
- big_market_raion
- 'character'
- nuclear_reactor_raion
- 'character'
- detention_facility_raion
- 'character'
- full_all
- 'integer'
- male_f
- 'integer'
- female_f
- 'integer'
- young_all
- 'integer'
- young_male
- 'integer'
- young_female
- 'integer'
- work_all
- 'integer'
- work_male
- 'integer'
- work_female
- 'integer'
- ekder_all
- 'integer'
- ekder_male
- 'integer'
- ekder_female
- 'integer'
- 0_6_all
- 'integer'
- 0_6_male
- 'integer'
- 0_6_female
- 'integer'
- 7_14_all
- 'integer'
- 7_14_male
- 'integer'
- 7_14_female
- 'integer'
- 0_17_all
- 'integer'
- 0_17_male
- 'integer'
- 0_17_female
- 'integer'
- 16_29_all
- 'integer'
- 16_29_male
- 'integer'
- 16_29_female
- 'integer'
- 0_13_all
- 'integer'
- 0_13_male
- 'integer'
- 0_13_female
- 'integer'
- raion_build_count_with_material_info
- 'integer'
- build_count_block
- 'integer'
- build_count_wood
- 'integer'
- build_count_frame
- 'integer'
- build_count_brick
- 'integer'
- build_count_monolith
- 'integer'
- build_count_panel
- 'integer'
- build_count_foam
- 'integer'
- build_count_slag
- 'integer'
- build_count_mix
- 'integer'
- raion_build_count_with_builddate_info
- 'integer'
- build_count_before_1920
- 'integer'
- build_count_1921-1945
- 'integer'
- build_count_1946-1970
- 'integer'
- build_count_1971-1995
- 'integer'
- build_count_after_1995
- 'integer'
- ID_metro
- 'integer'
- metro_min_avto
- 'numeric'
- metro_km_avto
- 'numeric'
- metro_min_walk
- 'numeric'
- metro_km_walk
- 'numeric'
- kindergarten_km
- 'numeric'
- school_km
- 'numeric'
- park_km
- 'numeric'
- green_zone_km
- 'numeric'
- industrial_km
- 'numeric'
- water_treatment_km
- 'numeric'
- cemetery_km
- 'numeric'
- incineration_km
- 'numeric'
- railroad_station_walk_km
- 'numeric'
- railroad_station_walk_min
- 'numeric'
- ID_railroad_station_walk
- 'integer'
- railroad_station_avto_km
- 'numeric'
- railroad_station_avto_min
- 'numeric'
- ID_railroad_station_avto
- 'integer'
- public_transport_station_km
- 'numeric'
- public_transport_station_min_walk
- 'numeric'
- water_km
- 'numeric'
- water_1line
- 'character'
- mkad_km
- 'numeric'
- ttk_km
- 'numeric'
- sadovoe_km
- 'numeric'
- bulvar_ring_km
- 'numeric'
- kremlin_km
- 'numeric'
- big_road1_km
- 'numeric'
- ID_big_road1
- 'integer'
- big_road1_1line
- 'character'
- big_road2_km
- 'numeric'
- ID_big_road2
- 'integer'
- railroad_km
- 'numeric'
- railroad_1line
- 'character'
- zd_vokzaly_avto_km
- 'numeric'
- ID_railroad_terminal
- 'integer'
- bus_terminal_avto_km
- 'numeric'
- ID_bus_terminal
- 'integer'
- oil_chemistry_km
- 'numeric'
- nuclear_reactor_km
- 'numeric'
- radiation_km
- 'numeric'
- power_transmission_line_km
- 'numeric'
- thermal_power_plant_km
- 'numeric'
- ts_km
- 'numeric'
- big_market_km
- 'numeric'
- market_shop_km
- 'numeric'
- fitness_km
- 'numeric'
- swim_pool_km
- 'numeric'
- ice_rink_km
- 'numeric'
- stadium_km
- 'numeric'
- basketball_km
- 'numeric'
- hospice_morgue_km
- 'numeric'
- detention_facility_km
- 'numeric'
- public_healthcare_km
- 'numeric'
- university_km
- 'numeric'
- workplaces_km
- 'numeric'
- shopping_centers_km
- 'numeric'
- office_km
- 'numeric'
- additional_education_km
- 'numeric'
- preschool_km
- 'numeric'
- big_church_km
- 'numeric'
- church_synagogue_km
- 'numeric'
- mosque_km
- 'numeric'
- theater_km
- 'numeric'
- museum_km
- 'numeric'
- exhibition_km
- 'numeric'
- catering_km
- 'numeric'
- ecology
- 'character'
- green_part_500
- 'numeric'
- prom_part_500
- 'numeric'
- office_count_500
- 'integer'
- office_sqm_500
- 'integer'
- trc_count_500
- 'integer'
- trc_sqm_500
- 'integer'
- cafe_count_500
- 'integer'
- cafe_sum_500_min_price_avg
- 'numeric'
- cafe_sum_500_max_price_avg
- 'numeric'
- cafe_avg_price_500
- 'numeric'
- cafe_count_500_na_price
- 'integer'
- cafe_count_500_price_500
- 'integer'
- cafe_count_500_price_1000
- 'integer'
- cafe_count_500_price_1500
- 'integer'
- cafe_count_500_price_2500
- 'integer'
- cafe_count_500_price_4000
- 'integer'
- cafe_count_500_price_high
- 'integer'
- big_church_count_500
- 'integer'
- church_count_500
- 'integer'
- mosque_count_500
- 'integer'
- leisure_count_500
- 'integer'
- sport_count_500
- 'integer'
- market_count_500
- 'integer'
- green_part_1000
- 'numeric'
- prom_part_1000
- 'numeric'
- office_count_1000
- 'integer'
- office_sqm_1000
- 'integer'
- trc_count_1000
- 'integer'
- trc_sqm_1000
- 'integer'
- cafe_count_1000
- 'integer'
- cafe_sum_1000_min_price_avg
- 'numeric'
- cafe_sum_1000_max_price_avg
- 'numeric'
- cafe_avg_price_1000
- 'numeric'
- cafe_count_1000_na_price
- 'integer'
- cafe_count_1000_price_500
- 'integer'
- cafe_count_1000_price_1000
- 'integer'
- cafe_count_1000_price_1500
- 'integer'
- cafe_count_1000_price_2500
- 'integer'
- cafe_count_1000_price_4000
- 'integer'
- cafe_count_1000_price_high
- 'integer'
- big_church_count_1000
- 'integer'
- church_count_1000
- 'integer'
- mosque_count_1000
- 'integer'
- leisure_count_1000
- 'integer'
- sport_count_1000
- 'integer'
- market_count_1000
- 'integer'
- green_part_1500
- 'numeric'
- prom_part_1500
- 'numeric'
- office_count_1500
- 'integer'
- office_sqm_1500
- 'integer'
- trc_count_1500
- 'integer'
- trc_sqm_1500
- 'integer'
- cafe_count_1500
- 'integer'
- cafe_sum_1500_min_price_avg
- 'numeric'
- cafe_sum_1500_max_price_avg
- 'numeric'
- cafe_avg_price_1500
- 'numeric'
- cafe_count_1500_na_price
- 'integer'
- cafe_count_1500_price_500
- 'integer'
- cafe_count_1500_price_1000
- 'integer'
- cafe_count_1500_price_1500
- 'integer'
- cafe_count_1500_price_2500
- 'integer'
- cafe_count_1500_price_4000
- 'integer'
- cafe_count_1500_price_high
- 'integer'
- big_church_count_1500
- 'integer'
- church_count_1500
- 'integer'
- mosque_count_1500
- 'integer'
- leisure_count_1500
- 'integer'
- sport_count_1500
- 'integer'
- market_count_1500
- 'integer'
- green_part_2000
- 'numeric'
- prom_part_2000
- 'numeric'
- office_count_2000
- 'integer'
- office_sqm_2000
- 'integer'
- trc_count_2000
- 'integer'
- trc_sqm_2000
- 'integer'
- cafe_count_2000
- 'integer'
- cafe_sum_2000_min_price_avg
- 'numeric'
- cafe_sum_2000_max_price_avg
- 'numeric'
- cafe_avg_price_2000
- 'numeric'
- cafe_count_2000_na_price
- 'integer'
- cafe_count_2000_price_500
- 'integer'
- cafe_count_2000_price_1000
- 'integer'
- cafe_count_2000_price_1500
- 'integer'
- cafe_count_2000_price_2500
- 'integer'
- cafe_count_2000_price_4000
- 'integer'
- cafe_count_2000_price_high
- 'integer'
- big_church_count_2000
- 'integer'
- church_count_2000
- 'integer'
- mosque_count_2000
- 'integer'
- leisure_count_2000
- 'integer'
- sport_count_2000
- 'integer'
- market_count_2000
- 'integer'
- green_part_3000
- 'numeric'
- prom_part_3000
- 'numeric'
- office_count_3000
- 'integer'
- office_sqm_3000
- 'integer'
- trc_count_3000
- 'integer'
- trc_sqm_3000
- 'integer'
- cafe_count_3000
- 'integer'
- cafe_sum_3000_min_price_avg
- 'numeric'
- cafe_sum_3000_max_price_avg
- 'numeric'
- cafe_avg_price_3000
- 'numeric'
- cafe_count_3000_na_price
- 'integer'
- cafe_count_3000_price_500
- 'integer'
- cafe_count_3000_price_1000
- 'integer'
- cafe_count_3000_price_1500
- 'integer'
- cafe_count_3000_price_2500
- 'integer'
- cafe_count_3000_price_4000
- 'integer'
- cafe_count_3000_price_high
- 'integer'
- big_church_count_3000
- 'integer'
- church_count_3000
- 'integer'
- mosque_count_3000
- 'integer'
- leisure_count_3000
- 'integer'
- sport_count_3000
- 'integer'
- market_count_3000
- 'integer'
- green_part_5000
- 'numeric'
- prom_part_5000
- 'numeric'
- office_count_5000
- 'integer'
- office_sqm_5000
- 'integer'
- trc_count_5000
- 'integer'
- trc_sqm_5000
- 'integer'
- cafe_count_5000
- 'integer'
- cafe_sum_5000_min_price_avg
- 'numeric'
- cafe_sum_5000_max_price_avg
- 'numeric'
- cafe_avg_price_5000
- 'numeric'
- cafe_count_5000_na_price
- 'integer'
- cafe_count_5000_price_500
- 'integer'
- cafe_count_5000_price_1000
- 'integer'
- cafe_count_5000_price_1500
- 'integer'
- cafe_count_5000_price_2500
- 'integer'
- cafe_count_5000_price_4000
- 'integer'
- cafe_count_5000_price_high
- 'integer'
- big_church_count_5000
- 'integer'
- church_count_5000
- 'integer'
- mosque_count_5000
- 'integer'
- leisure_count_5000
- 'integer'
- sport_count_5000
- 'integer'
- market_count_5000
- 'integer'
- price_doc
- 'integer'
In [164]:
# take a peek at the first 20 rows of the data
head(dataset, n=20)
id timestamp full_sq life_sq floor max_floor material build_year num_room kitch_sq ... cafe_count_5000_price_2500 cafe_count_5000_price_4000 cafe_count_5000_price_high big_church_count_5000 church_count_5000 mosque_count_5000 leisure_count_5000 sport_count_5000 market_count_5000 price_doc
1 2011-08-20 43 27 4 NA NA NA NA NA ... 9 4 0 13 22 1 0 52 4 5850000
2 2011-08-23 34 19 3 NA NA NA NA NA ... 15 3 0 15 29 1 10 66 14 6000000
3 2011-08-27 43 29 2 NA NA NA NA NA ... 10 3 0 11 27 0 4 67 10 5700000
5 2011-09-05 77 77 4 NA NA NA NA NA ... 319 108 17 135 236 2 91 195 14 16331452
7 2011-09-08 25 14 10 NA NA NA NA NA ... 81 16 3 38 80 1 27 127 8 5500000
8 2011-09-09 44 44 5 NA NA NA NA NA ... 9 4 0 11 18 1 0 47 4 2000000
9 2011-09-10 42 27 5 NA NA NA NA NA ... 19 8 1 18 34 1 3 85 11 5300000
10 2011-09-13 36 21 9 NA NA NA NA NA ... 19 13 0 10 20 1 3 67 1 2000000
11 2011-09-16 36 19 12 NA NA NA NA NA ... 1 1 0 5 9 0 2 17 6 4650000
13 2011-09-17 43 28 4 NA NA NA NA NA ... 13 9 1 7 15 0 2 47 0 5100000
14 2011-09-19 31 31 4 NA NA NA NA NA ... 254 108 22 57 102 1 72 166 7 5200000
15 2011-09-19 31 21 3 NA NA NA NA NA ... 88 19 2 63 100 0 28 132 14 5000000
16 2011-09-20 51 31 15 NA NA NA NA NA ... 6 1 0 9 21 0 1 53 9 1850000
17 2011-09-20 47 31 4 NA NA NA NA NA ... 10 2 0 7 23 0 4 62 13 6300000
18 2011-09-20 42 28 2 NA NA NA NA NA ... 32 6 0 13 33 1 10 72 12 5900000
19 2011-09-22 59 33 10 NA NA NA NA NA ... 1 1 0 6 9 0 2 17 6 7900000
20 2011-09-22 44 29 4 NA NA NA NA NA ... 9 2 0 10 14 0 2 51 5 5200000
22 2011-09-22 39 39 7 NA NA NA NA NA ... 18 3 0 12 14 0 1 64 9 5200000
23 2011-09-23 48 34 6 NA NA NA NA NA ... 16 4 1 11 10 0 1 55 8 6250000
24 2011-09-23 32 18 3 NA NA NA NA NA ... 10 1 0 7 21 1 1 42 13 5750000
In [165]:
# summarize attribute distributions
summary(dataset)
id timestamp full_sq life_sq
Min. : 1 Length:24378 Min. : 0.00 Min. : 0.00
1st Qu.: 7688 Class :character 1st Qu.: 38.00 1st Qu.: 20.00
Median :15282 Mode :character Median : 49.00 Median : 30.00
Mean :15260 Mean : 54.16 Mean : 34.04
3rd Qu.:22858 3rd Qu.: 63.00 3rd Qu.: 43.00
Max. :30472 Max. :5326.00 Max. :802.00
NA's :5091
floor max_floor material build_year
Min. : 0.000 Min. : 0.00 Min. :1.000 Min. : 0
1st Qu.: 3.000 1st Qu.: 9.00 1st Qu.:1.000 1st Qu.: 1966
Median : 7.000 Median : 12.00 Median :1.000 Median : 1979
Mean : 7.661 Mean : 12.56 Mean :1.826 Mean : 3355
3rd Qu.:11.000 3rd Qu.: 17.00 3rd Qu.:2.000 3rd Qu.: 2005
Max. :77.000 Max. :117.00 Max. :6.000 Max. :20052009
NA's :129 NA's :7628 NA's :7628 NA's :10826
num_room kitch_sq state product_type
Min. : 0.000 Min. : 0.000 Min. : 1.000 Length:24378
1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000 Class :character
Median : 2.000 Median : 6.000 Median : 2.000 Mode :character
Mean : 1.904 Mean : 6.338 Mean : 2.109
3rd Qu.: 2.000 3rd Qu.: 9.000 3rd Qu.: 3.000
Max. :17.000 Max. :2014.000 Max. :33.000
NA's :7628 NA's :7628 NA's :10839
sub_area area_m raion_popul green_zone_part
Length:24378 Min. : 2081628 Min. : 2546 Min. :0.001879
Class :character 1st Qu.: 7307411 1st Qu.: 21819 1st Qu.:0.063755
Mode :character Median : 10416575 Median : 83844 Median :0.167526
Mean : 17608827 Mean : 84216 Mean :0.219060
3rd Qu.: 18036437 3rd Qu.:122862 3rd Qu.:0.336177
Max. :206071809 Max. :247469 Max. :0.852923
indust_part children_preschool preschool_quota
Min. :0.00000 Min. : 175 Min. : 0
1st Qu.:0.01951 1st Qu.: 1706 1st Qu.: 1874
Median :0.07216 Median : 4857 Median : 2868
Mean :0.11905 Mean : 5149 Mean : 3273
3rd Qu.:0.19578 3rd Qu.: 7103 3rd Qu.: 4050
Max. :0.52187 Max. :19223 Max. :11926
NA's :5338
preschool_education_centers_raion children_school school_quota
Min. : 0.000 Min. : 168 Min. : 1012
1st Qu.: 2.000 1st Qu.: 1564 1st Qu.: 5782
Median : 4.000 Median : 5261 Median : 7377
Mean : 4.068 Mean : 5360 Mean : 8328
3rd Qu.: 6.000 3rd Qu.: 7227 3rd Qu.: 9891
Max. :13.000 Max. :19083 Max. :24750
NA's :5337
school_education_centers_raion school_education_centers_top_20_raion
Min. : 0.000 Min. :0.0000
1st Qu.: 2.000 1st Qu.:0.0000
Median : 5.000 Median :0.0000
Mean : 4.704 Mean :0.1083
3rd Qu.: 7.000 3rd Qu.:0.0000
Max. :14.000 Max. :2.0000
hospital_beds_raion healthcare_centers_raion university_top_20_raion
Min. : 30 Min. :0.000 Min. :0.0000
1st Qu.: 520 1st Qu.:0.000 1st Qu.:0.0000
Median : 990 Median :1.000 Median :0.0000
Mean :1193 Mean :1.326 Mean :0.1358
3rd Qu.:1786 3rd Qu.:2.000 3rd Qu.:0.0000
Max. :4849 Max. :6.000 Max. :3.0000
NA's :11524
sport_objects_raion additional_education_raion culture_objects_top_25
Min. : 0.000 Min. : 0.0 Length:24378
1st Qu.: 1.000 1st Qu.: 1.0 Class :character
Median : 5.000 Median : 2.0 Mode :character
Mean : 6.614 Mean : 2.9
3rd Qu.:10.000 3rd Qu.: 4.0
Max. :29.000 Max. :16.0
culture_objects_top_25_raion shopping_centers_raion office_raion
Min. : 0.0000 Min. : 0.000 Min. : 0.000
1st Qu.: 0.0000 1st Qu.: 1.000 1st Qu.: 0.000
Median : 0.0000 Median : 3.000 Median : 2.000
Mean : 0.2836 Mean : 4.186 Mean : 8.118
3rd Qu.: 0.0000 3rd Qu.: 6.000 3rd Qu.: 5.000
Max. :10.0000 Max. :23.000 Max. :141.000
thermal_power_plant_raion incineration_raion oil_chemistry_raion
Length:24378 Length:24378 Length:24378
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
radiation_raion railroad_terminal_raion big_market_raion
Length:24378 Length:24378 Length:24378
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
nuclear_reactor_raion detention_facility_raion full_all
Length:24378 Length:24378 Min. : 2693
Class :character Class :character 1st Qu.: 31167
Mode :character Mode :character Median : 85083
Mean : 146807
3rd Qu.: 125111
Max. :1716730
male_f female_f young_all young_male
Min. : 1264 Min. : 1430 Min. : 365 Min. : 189
1st Qu.: 14906 1st Qu.: 15167 1st Qu.: 3459 1st Qu.: 1782
Median : 39227 Median : 45410 Median :10988 Median : 5470
Mean : 67440 Mean : 79367 Mean :11194 Mean : 5732
3rd Qu.: 58226 3rd Qu.: 67872 3rd Qu.:14906 3rd Qu.: 7597
Max. :774585 Max. :942145 Max. :40692 Max. :20977
young_female work_all work_male work_female
Min. : 177 Min. : 1633 Min. : 863 Min. : 771
1st Qu.: 1677 1st Qu.: 13996 1st Qu.: 7394 1st Qu.: 6661
Median : 5333 Median : 52450 Median :26382 Median :26096
Mean : 5462 Mean : 53766 Mean :27305 Mean :26461
3rd Qu.: 7617 3rd Qu.: 77612 3rd Qu.:38841 3rd Qu.:37942
Max. :19715 Max. :161290 Max. :79622 Max. :81668
ekder_all ekder_male ekder_female 0_6_all 0_6_male
Min. : 548 Min. : 156 Min. : 393 Min. : 175 Min. : 91
1st Qu.: 4695 1st Qu.: 1331 1st Qu.: 3365 1st Qu.: 1706 1st Qu.: 862
Median :20184 Median : 6180 Median :13540 Median : 4857 Median :2435
Mean :19256 Mean : 5826 Mean :13430 Mean : 5149 Mean :2636
3rd Qu.:29172 3rd Qu.: 8775 3rd Qu.:20165 3rd Qu.: 7103 3rd Qu.:3589
Max. :57086 Max. :19275 Max. :37811 Max. :19223 Max. :9987
0_6_female 7_14_all 7_14_male 7_14_female 0_17_all
Min. : 85 Min. : 168 Min. : 87 Min. : 82 Min. : 411
1st Qu.: 844 1st Qu.: 1564 1st Qu.: 821 1st Qu.: 743 1st Qu.: 3831
Median :2390 Median : 5261 Median :2693 Median :2535 Median :12508
Mean :2513 Mean : 5360 Mean :2747 Mean :2613 Mean :12558
3rd Qu.:3455 3rd Qu.: 7227 3rd Qu.:3585 3rd Qu.:3534 3rd Qu.:16727
Max. :9236 Max. :19083 Max. :9761 Max. :9322 Max. :45170
0_17_male 0_17_female 16_29_all 16_29_male
Min. : 214 Min. : 198 Min. : 575 Min. : 308
1st Qu.: 1973 1st Qu.: 1858 1st Qu.: 5829 1st Qu.: 2955
Median : 6085 Median : 6185 Median : 17864 Median : 8896
Mean : 6433 Mean : 6125 Mean : 31423 Mean : 15422
3rd Qu.: 8599 3rd Qu.: 8549 3rd Qu.: 27107 3rd Qu.: 13683
Max. :23233 Max. :21937 Max. :367659 Max. :172958
16_29_female 0_13_all 0_13_male 0_13_female
Min. : 267 Min. : 322 Min. : 166 Min. : 156
1st Qu.: 2874 1st Qu.: 3112 1st Qu.: 1600 1st Qu.: 1512
Median : 9353 Median : 9633 Median : 4835 Median : 4667
Mean : 16001 Mean : 9855 Mean : 5045 Mean : 4810
3rd Qu.: 14145 3rd Qu.:13121 3rd Qu.: 6684 3rd Qu.: 6699
Max. :194701 Max. :36035 Max. :18574 Max. :17461
raion_build_count_with_material_info build_count_block build_count_wood
Min. : 1.0 Min. : 0.00 Min. : 0.00
1st Qu.: 180.0 1st Qu.: 13.00 1st Qu.: 0.00
Median : 273.0 Median : 42.00 Median : 0.00
Mean : 328.3 Mean : 50.35 Mean : 41.04
3rd Qu.: 400.0 3rd Qu.: 72.00 3rd Qu.: 7.00
Max. :1681.0 Max. :223.00 Max. :793.00
NA's :3968 NA's :3968 NA's :3968
build_count_frame build_count_brick build_count_monolith build_count_panel
Min. : 0.000 Min. : 0.0 Min. : 0 Min. : 0.0
1st Qu.: 0.000 1st Qu.: 10.0 1st Qu.: 2 1st Qu.: 35.0
Median : 0.000 Median : 67.0 Median : 6 Median : 92.0
Mean : 4.993 Mean :107.2 Mean : 12 Mean :107.5
3rd Qu.: 1.000 3rd Qu.:156.0 3rd Qu.: 13 3rd Qu.:157.0
Max. :97.000 Max. :664.0 Max. :127 Max. :431.0
NA's :3968 NA's :3968 NA's :3968 NA's :3968
build_count_foam build_count_slag build_count_mix
Min. : 0.000 Min. : 0.000 Min. :0.000
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000
Median : 0.000 Median : 0.000 Median :0.000
Mean : 0.164 Mean : 4.486 Mean :0.573
3rd Qu.: 0.000 3rd Qu.: 2.000 3rd Qu.:0.000
Max. :11.000 Max. :84.000 Max. :9.000
NA's :3968 NA's :3968 NA's :3968
raion_build_count_with_builddate_info build_count_before_1920
Min. : 1.0 Min. : 0.00
1st Qu.: 178.0 1st Qu.: 0.00
Median : 271.0 Median : 0.00
Mean : 327.9 Mean : 18.41
3rd Qu.: 400.0 3rd Qu.: 3.00
Max. :1680.0 Max. :371.00
NA's :3968 NA's :3968
build_count_1921-1945 build_count_1946-1970 build_count_1971-1995
Min. : 0.00 Min. : 0.0 Min. : 0.00
1st Qu.: 0.00 1st Qu.: 14.0 1st Qu.: 38.00
Median : 2.00 Median :135.0 Median : 71.00
Mean : 26.64 Mean :141.5 Mean : 80.15
3rd Qu.: 20.00 3rd Qu.:216.0 3rd Qu.:125.00
Max. :382.00 Max. :845.0 Max. :246.00
NA's :3968 NA's :3968 NA's :3968
build_count_after_1995 ID_metro metro_min_avto metro_km_avto
Min. : 0.00 Min. : 1.00 Min. : 0.000 Min. : 0.000
1st Qu.: 14.00 1st Qu.: 27.00 1st Qu.: 1.725 1st Qu.: 1.037
Median : 24.00 Median : 53.00 Median : 2.805 Median : 1.790
Mean : 61.18 Mean : 72.32 Mean : 4.921 Mean : 3.662
3rd Qu.: 57.00 3rd Qu.:108.00 3rd Qu.: 4.789 3rd Qu.: 3.777
Max. :799.00 Max. :223.00 Max. :61.438 Max. :74.906
NA's :3968
metro_min_walk metro_km_walk kindergarten_km school_km
Min. : 0.00 Min. : 0.0000 Min. : 0.00047 Min. : 0.0000
1st Qu.: 11.54 1st Qu.: 0.9619 1st Qu.: 0.20008 1st Qu.: 0.2697
Median : 20.53 Median : 1.7110 Median : 0.35294 Median : 0.4769
Mean : 42.30 Mean : 3.5251 Mean : 0.96988 Mean : 1.3005
3rd Qu.: 45.32 3rd Qu.: 3.7768 3rd Qu.: 0.96683 3rd Qu.: 0.8899
Max. :687.32 Max. :57.2764 Max. :25.50644 Max. :47.3947
NA's :21 NA's :21
park_km green_zone_km industrial_km water_treatment_km
Min. : 0.00374 Min. :0.0000 Min. : 0.0000 Min. : 0.2741
1st Qu.: 0.97378 1st Qu.:0.1014 1st Qu.: 0.2883 1st Qu.: 5.2994
Median : 1.79991 Median :0.2143 Median : 0.5766 Median :10.3780
Mean : 3.07460 Mean :0.2997 Mean : 0.7668 Mean :11.1668
3rd Qu.: 3.39175 3rd Qu.:0.4135 3rd Qu.: 1.0402 3rd Qu.:16.8053
Max. :47.35154 Max. :1.9824 Max. :14.0482 Max. :47.5912
cemetery_km incineration_km railroad_station_walk_km
Min. : 0.000 Min. : 0.1981 Min. : 0.02815
1st Qu.: 1.336 1st Qu.: 6.2061 1st Qu.: 1.93177
Median : 1.968 Median :10.3175 Median : 3.23554
Mean : 2.314 Mean :10.8460 Mean : 4.37332
3rd Qu.: 3.090 3rd Qu.:13.3851 3rd Qu.: 5.14764
Max. :13.846 Max. :58.6320 Max. :24.65304
NA's :21
railroad_station_walk_min ID_railroad_station_walk railroad_station_avto_km
Min. : 0.3378 Min. : 1.00 Min. : 0.02815
1st Qu.: 23.1812 1st Qu.: 18.00 1st Qu.: 2.11338
Median : 38.8265 Median : 33.00 Median : 3.43212
Mean : 52.4799 Mean : 38.71 Mean : 4.57249
3rd Qu.: 61.7717 3rd Qu.: 52.00 3rd Qu.: 5.38987
Max. :295.8365 Max. :133.00 Max. :24.65398
NA's :21 NA's :21
railroad_station_avto_min ID_railroad_station_avto public_transport_station_km
Min. : 0.03519 Min. : 1.00 Min. : 0.003733
1st Qu.: 3.23769 1st Qu.: 19.00 1st Qu.: 0.101156
Median : 4.94456 Median : 34.00 Median : 0.160421
Mean : 6.06944 Mean : 45.53 Mean : 0.407560
3rd Qu.: 7.29978 3rd Qu.: 73.00 3rd Qu.: 0.277879
Max. :38.69192 Max. :138.00 Max. :17.413002
public_transport_station_min_walk water_km water_1line
Min. : 0.0448 Min. :0.006707 Length:24378
1st Qu.: 1.2139 1st Qu.:0.339637 Class :character
Median : 1.9250 Median :0.619856 Mode :character
Mean : 4.8907 Mean :0.690766
3rd Qu.: 3.3346 3rd Qu.:0.967451
Max. :208.9560 Max. :2.743788
mkad_km ttk_km sadovoe_km bulvar_ring_km
Min. : 0.01363 Min. : 0.00193 Min. : 0.00036 Min. : 0.00195
1st Qu.: 2.63051 1st Qu.: 5.35504 1st Qu.: 8.37280 1st Qu.: 9.28184
Median : 5.44795 Median : 9.83387 Median :12.74954 Median :13.61322
Mean : 6.23268 Mean :11.28974 Mean :14.03432 Mean :15.00061
3rd Qu.: 8.18475 3rd Qu.:15.67545 3rd Qu.:18.62930 3rd Qu.:19.87318
Max. :53.27783 Max. :66.03320 Max. :68.85305 Max. :69.98487
kremlin_km big_road1_km ID_big_road1 big_road1_1line
Min. : 0.0729 Min. :0.000364 Min. : 1.00 Length:24378
1st Qu.:10.4753 1st Qu.:0.785019 1st Qu.: 2.00 Class :character
Median :14.8772 Median :1.728433 Median :10.00 Mode :character
Mean :16.0232 Mean :1.884760 Mean :11.45
3rd Qu.:20.6482 3rd Qu.:2.806477 3rd Qu.:14.00
Max. :70.7388 Max. :6.995416 Max. :48.00
big_road2_km ID_big_road2 railroad_km railroad_1line
Min. : 0.001935 Min. : 1.00 Min. : 0.002299 Length:24378
1st Qu.: 2.107386 1st Qu.: 4.00 1st Qu.: 0.652358 Class :character
Median : 3.210544 Median :21.00 Median : 1.238357 Mode :character
Mean : 3.389901 Mean :22.35 Mean : 1.879070
3rd Qu.: 4.306233 3rd Qu.:38.00 3rd Qu.: 2.519546
Max. :13.798346 Max. :58.00 Max. :16.656237
zd_vokzaly_avto_km ID_railroad_terminal bus_terminal_avto_km ID_bus_terminal
Min. : 0.1367 Min. : 5.00 Min. : 0.06203 Min. : 1.000
1st Qu.:10.0111 1st Qu.: 32.00 1st Qu.: 5.21388 1st Qu.: 3.000
Median :14.7628 Median : 50.00 Median : 7.45701 Median : 8.000
Mean :17.1938 Mean : 51.67 Mean : 9.96392 Mean : 6.703
3rd Qu.:24.0612 3rd Qu.: 83.00 3rd Qu.:13.21233 3rd Qu.: 9.000
Max. :91.2151 Max. :121.00 Max. :74.79611 Max. :14.000
oil_chemistry_km nuclear_reactor_km radiation_km
Min. : 0.5107 Min. : 0.3098 Min. : 0.00546
1st Qu.: 8.7126 1st Qu.: 5.2528 1st Qu.: 1.22756
Median :16.6881 Median : 8.9960 Median : 2.43392
Mean :17.3670 Mean :10.9190 Mean : 4.37800
3rd Qu.:23.4245 3rd Qu.:16.3725 3rd Qu.: 4.68705
Max. :70.4134 Max. :64.2570 Max. :53.89016
power_transmission_line_km thermal_power_plant_km ts_km
Min. : 0.03027 Min. : 0.4006 Min. : 0.000
1st Qu.: 0.97315 1st Qu.: 3.7771 1st Qu.: 2.046
Median : 1.88587 Median : 5.8999 Median : 3.954
Mean : 3.46226 Mean : 7.3138 Mean : 4.896
3rd Qu.: 4.92655 3rd Qu.: 9.7932 3rd Qu.: 5.515
Max. :43.32437 Max. :56.8561 Max. :54.081
big_market_km market_shop_km fitness_km swim_pool_km
Min. : 0.7056 Min. : 0.02157 Min. : 0.0000 Min. : 0.000
1st Qu.: 7.5296 1st Qu.: 1.53980 1st Qu.: 0.3640 1st Qu.: 1.721
Median :11.9104 Median : 2.93128 Median : 0.6595 Median : 2.877
Mean :13.2595 Mean : 3.94502 Mean : 1.1491 Mean : 4.198
3rd Qu.:16.5513 3rd Qu.: 5.46021 3rd Qu.: 1.3428 3rd Qu.: 5.370
Max. :59.5016 Max. :41.10365 Max. :26.6525 Max. :53.359
ice_rink_km stadium_km basketball_km hospice_morgue_km
Min. : 0.000 Min. : 0.1148 Min. : 0.00546 Min. : 0.00252
1st Qu.: 3.044 1st Qu.: 4.0182 1st Qu.: 1.30833 1st Qu.: 1.12070
Median : 5.547 Median : 6.9541 Median : 2.89067 Median : 1.89079
Mean : 6.101 Mean : 9.3920 Mean : 4.76008 Mean : 2.63201
3rd Qu.: 7.943 3rd Qu.:13.5516 3rd Qu.: 6.36452 3rd Qu.: 3.29394
Max. :38.765 Max. :83.3985 Max. :56.70379 Max. :43.69464
detention_facility_km public_healthcare_km university_km workplaces_km
Min. : 0.07958 Min. : 0.00266 Min. : 0.00093 Min. : 0.000
1st Qu.: 5.66383 1st Qu.: 1.28021 1st Qu.: 2.20422 1st Qu.: 1.017
Median :11.30765 Median : 2.34085 Median : 4.32835 Median : 2.044
Mean :14.51069 Mean : 3.32347 Mean : 6.82586 Mean : 3.905
3rd Qu.:24.73506 3rd Qu.: 3.98390 3rd Qu.: 9.38027 3rd Qu.: 5.420
Max. :89.37137 Max. :76.05514 Max. :84.86215 Max. :55.278
shopping_centers_km office_km additional_education_km
Min. : 0.0000 Min. : 0.0000 Min. : 0.0000
1st Qu.: 0.4893 1st Qu.: 0.5615 1st Qu.: 0.4742
Median : 0.8428 Median : 1.0530 Median : 0.9022
Mean : 1.4945 Mean : 2.0014 Mean : 1.3234
3rd Qu.: 1.5618 3rd Qu.: 3.0165 3rd Qu.: 1.5716
Max. :26.2595 Max. :18.9589 Max. :24.2682
preschool_km big_church_km church_synagogue_km mosque_km
Min. : 0.0000 Min. : 0.00407 Min. : 0.0000 Min. : 0.00554
1st Qu.: 0.2854 1st Qu.: 0.86276 1st Qu.: 0.5327 1st Qu.: 3.76607
Median : 0.4944 Median : 1.49079 Median : 0.8618 Median : 6.52078
Mean : 1.3212 Mean : 2.30683 Mean : 0.9731 Mean : 7.72111
3rd Qu.: 0.9363 3rd Qu.: 2.90870 3rd Qu.: 1.2500 3rd Qu.:10.04295
Max. :47.3947 Max. :45.66906 Max. :15.6157 Max. :44.84983
theater_km museum_km exhibition_km catering_km
Min. : 0.02679 Min. : 0.0079 Min. : 0.00895 Min. : 0.000357
1st Qu.: 4.22525 1st Qu.: 2.8828 1st Qu.: 2.24065 1st Qu.: 0.209446
Median : 8.61201 Median : 5.6433 Median : 4.10261 Median : 0.413909
Mean : 9.60979 Mean : 7.0418 Mean : 5.51957 Mean : 0.686772
3rd Qu.:13.45959 3rd Qu.:10.3286 3rd Qu.: 6.95087 3rd Qu.: 0.836744
Max. :87.60069 Max. :59.2032 Max. :54.43124 Max. :10.671808
ecology green_part_500 prom_part_500 office_count_500
Length:24378 Min. : 0.00 Min. : 0.000 Min. : 0.0000
Class :character 1st Qu.: 1.48 1st Qu.: 0.000 1st Qu.: 0.0000
Mode :character Median : 8.45 Median : 0.000 Median : 0.0000
Mean : 13.42 Mean : 5.742 Mean : 0.7246
3rd Qu.: 19.92 3rd Qu.: 5.760 3rd Qu.: 0.0000
Max. :100.00 Max. :98.770 Max. :34.0000
office_sqm_500 trc_count_500 trc_sqm_500 cafe_count_500
Min. : 0 Min. :0.000 Min. : 0 Min. : 0.000
1st Qu.: 0 1st Qu.:0.000 1st Qu.: 0 1st Qu.: 0.000
Median : 0 Median :0.000 Median : 0 Median : 1.000
Mean : 13732 Mean :0.553 Mean : 21692 Mean : 3.826
3rd Qu.: 0 3rd Qu.:1.000 3rd Qu.: 0 3rd Qu.: 3.000
Max. :611015 Max. :8.000 Max. :1500000 Max. :120.000
cafe_sum_500_min_price_avg cafe_sum_500_max_price_avg cafe_avg_price_500
Min. : 300.0 Min. : 500 Min. : 400.0
1st Qu.: 500.0 1st Qu.:1000 1st Qu.: 750.0
Median : 666.7 Median :1154 Median : 916.7
Mean : 741.2 Mean :1248 Mean : 994.5
3rd Qu.: 954.8 3rd Qu.:1500 3rd Qu.:1250.0
Max. :4000.0 Max. :6000 Max. :5000.0
NA's :10643 NA's :10643 NA's :10643
cafe_count_500_na_price cafe_count_500_price_500 cafe_count_500_price_1000
Min. : 0.0000 Min. : 0.0000 Min. : 0.0000
1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000
Median : 0.0000 Median : 0.0000 Median : 0.0000
Mean : 0.3404 Mean : 0.9804 Mean : 0.9732
3rd Qu.: 0.0000 3rd Qu.: 1.0000 3rd Qu.: 1.0000
Max. :13.0000 Max. :33.0000 Max. :37.0000
cafe_count_500_price_1500 cafe_count_500_price_2500 cafe_count_500_price_4000
Min. : 0.0000 Min. : 0.0000 Min. : 0.0000
1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000
Median : 0.0000 Median : 0.0000 Median : 0.0000
Mean : 0.8291 Mean : 0.5372 Mean : 0.1364
3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
Max. :29.0000 Max. :22.0000 Max. :14.0000
cafe_count_500_price_high big_church_count_500 church_count_500
Min. :0.00000 Min. : 0.0000 Min. : 0.0000
1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 0.0000
Median :0.00000 Median : 0.0000 Median : 0.0000
Mean :0.02917 Mean : 0.2829 Mean : 0.5788
3rd Qu.:0.00000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
Max. :3.00000 Max. :11.0000 Max. :17.0000
mosque_count_500 leisure_count_500 sport_count_500 market_count_500
Min. :0.000000 Min. :0.00000 Min. : 0.0000 Min. :0.0000
1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.:0.0000
Median :0.000000 Median :0.00000 Median : 0.0000 Median :0.0000
Mean :0.004882 Mean :0.06957 Mean : 0.9065 Mean :0.1231
3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.: 1.0000 3rd Qu.:0.0000
Max. :1.000000 Max. :9.00000 Max. :11.0000 Max. :4.0000
green_part_1000 prom_part_1000 office_count_1000 office_sqm_1000
Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0
1st Qu.: 6.31 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0
Median : 13.18 Median : 4.010 Median : 0.000 Median : 0
Mean : 17.00 Mean : 8.802 Mean : 3.032 Mean : 61575
3rd Qu.: 24.36 3rd Qu.:12.620 3rd Qu.: 2.000 3rd Qu.: 54500
Max. :100.00 Max. :72.200 Max. :91.000 Max. :2244723
trc_count_1000 trc_sqm_1000 cafe_count_1000
Min. : 0.000 Min. : 0 Min. : 0.00
1st Qu.: 0.000 1st Qu.: 0 1st Qu.: 1.00
Median : 1.000 Median : 7670 Median : 4.00
Mean : 1.963 Mean : 65545 Mean : 15.21
3rd Qu.: 3.000 3rd Qu.: 65978 3rd Qu.: 11.00
Max. :20.000 Max. :1500000 Max. :449.00
cafe_sum_1000_min_price_avg cafe_sum_1000_max_price_avg cafe_avg_price_1000
Min. : 300.0 Min. : 500 Min. : 400.0
1st Qu.: 542.9 1st Qu.:1000 1st Qu.: 750.0
Median : 666.7 Median :1143 Median : 912.5
Mean : 709.8 Mean :1205 Mean : 957.6
3rd Qu.: 833.8 3rd Qu.:1391 3rd Qu.:1115.0
Max. :2500.0 Max. :4000 Max. :3250.0
NA's :5207 NA's :5207 NA's :5207
cafe_count_1000_na_price cafe_count_1000_price_500 cafe_count_1000_price_1000
Min. : 0.000 Min. : 0.000 Min. : 0.0
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0
Median : 0.000 Median : 1.000 Median : 1.0
Mean : 1.009 Mean : 4.089 Mean : 3.9
3rd Qu.: 1.000 3rd Qu.: 3.000 3rd Qu.: 4.0
Max. :28.000 Max. :112.000 Max. :107.0
cafe_count_1000_price_1500 cafe_count_1000_price_2500
Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.000
Median : 1.000 Median : 0.000
Mean : 3.477 Mean : 1.922
3rd Qu.: 3.000 3rd Qu.: 1.000
Max. :104.000 Max. :79.000
cafe_count_1000_price_4000 cafe_count_1000_price_high big_church_count_1000
Min. : 0.0000 Min. :0.00000 Min. : 0.0000
1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.: 0.0000
Median : 0.0000 Median :0.00000 Median : 0.0000
Mean : 0.7574 Mean :0.05854 Mean : 0.8034
3rd Qu.: 0.0000 3rd Qu.:0.00000 3rd Qu.: 1.0000
Max. :40.0000 Max. :7.00000 Max. :27.0000
church_count_1000 mosque_count_1000 leisure_count_1000 sport_count_1000
Min. : 0.000 Min. :0.00000 Min. : 0.0000 Min. : 0.000
1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 0.000
Median : 1.000 Median :0.00000 Median : 0.0000 Median : 2.000
Mean : 1.806 Mean :0.01887 Mean : 0.4614 Mean : 2.897
3rd Qu.: 1.000 3rd Qu.:0.00000 3rd Qu.: 0.0000 3rd Qu.: 4.000
Max. :38.000 Max. :1.00000 Max. :30.0000 Max. :25.000
market_count_1000 green_part_1500 prom_part_1500 office_count_1500
Min. :0.000 Min. : 0.00 Min. : 0.00 Min. : 0.000
1st Qu.:0.000 1st Qu.: 8.53 1st Qu.: 1.52 1st Qu.: 0.000
Median :0.000 Median :15.03 Median : 7.81 Median : 1.000
Mean :0.382 Mean :19.23 Mean :10.61 Mean : 7.192
3rd Qu.:1.000 3rd Qu.:26.80 3rd Qu.:15.34 3rd Qu.: 4.000
Max. :6.000 Max. :90.41 Max. :63.00 Max. :173.000
office_sqm_1500 trc_count_1500 trc_sqm_1500 cafe_count_1500
Min. : 0 Min. : 0.0 Min. : 0 Min. : 0.00
1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0 1st Qu.: 2.00
Median : 16765 Median : 3.0 Median : 49410 Median : 10.00
Mean : 139395 Mean : 3.7 Mean : 127239 Mean : 32.01
3rd Qu.: 117300 3rd Qu.: 5.0 3rd Qu.: 153965 3rd Qu.: 23.00
Max. :2908344 Max. :27.0 Max. :1533000 Max. :784.00
cafe_sum_1500_min_price_avg cafe_sum_1500_max_price_avg cafe_avg_price_1500
Min. : 300.0 Min. : 500 Min. : 400.0
1st Qu.: 585.7 1st Qu.:1000 1st Qu.: 794.4
Median : 690.9 Median :1167 Median : 925.0
Mean : 713.2 Mean :1205 Mean : 959.1
3rd Qu.: 820.0 3rd Qu.:1366 3rd Qu.:1092.5
Max. :2500.0 Max. :4000 Max. :3250.0
NA's :3365 NA's :3365 NA's :3365
cafe_count_1500_na_price cafe_count_1500_price_500 cafe_count_1500_price_1000
Min. : 0.000 Min. : 0.00 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 1.000
Median : 1.000 Median : 2.00 Median : 3.000
Mean : 2.082 Mean : 8.11 Mean : 8.662
3rd Qu.: 2.000 3rd Qu.: 6.00 3rd Qu.: 8.000
Max. :54.000 Max. :195.00 Max. :177.000
cafe_count_1500_price_1500 cafe_count_1500_price_2500
Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.000
Median : 2.000 Median : 1.000
Mean : 7.772 Mean : 3.768
3rd Qu.: 6.000 3rd Qu.: 2.000
Max. :183.000 Max. :127.000
cafe_count_1500_price_4000 cafe_count_1500_price_high big_church_count_1500
Min. : 0.000 Min. : 0.0000 Min. : 0.00
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.00
Median : 0.000 Median : 0.0000 Median : 1.00
Mean : 1.432 Mean : 0.1882 Mean : 1.95
3rd Qu.: 0.000 3rd Qu.: 0.0000 3rd Qu.: 1.00
Max. :55.000 Max. :12.0000 Max. :44.00
church_count_1500 mosque_count_1500 leisure_count_1500 sport_count_1500
Min. : 0.000 Min. :0.00000 Min. : 0.0000 Min. : 0.000
1st Qu.: 1.000 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 1.000
Median : 1.000 Median :0.00000 Median : 0.0000 Median : 5.000
Mean : 3.631 Mean :0.03741 Mean : 0.9285 Mean : 5.846
3rd Qu.: 3.000 3rd Qu.:0.00000 3rd Qu.: 1.0000 3rd Qu.: 9.000
Max. :75.000 Max. :1.00000 Max. :39.0000 Max. :37.000
market_count_1500 green_part_2000 prom_part_2000 office_count_2000
Min. :0.0000 Min. : 0.01 Min. : 0.00 Min. : 0.00
1st Qu.:0.0000 1st Qu.:10.21 1st Qu.: 3.12 1st Qu.: 0.00
Median :0.0000 Median :17.71 Median : 8.80 Median : 2.00
Mean :0.7661 Mean :20.87 Mean :11.24 Mean : 13.13
3rd Qu.:1.0000 3rd Qu.:28.42 3rd Qu.:16.21 3rd Qu.: 7.00
Max. :7.0000 Max. :75.30 Max. :56.10 Max. :250.00
office_sqm_2000 trc_count_2000 trc_sqm_2000 cafe_count_2000
Min. : 0 Min. : 0.000 Min. : 0 Min. : 0.00
1st Qu.: 0 1st Qu.: 1.000 1st Qu.: 12065 1st Qu.: 3.00
Median : 58411 Median : 5.000 Median : 115856 Median : 18.00
Mean : 244211 Mean : 5.932 Mean : 212261 Mean : 54.32
3rd Qu.: 207193 3rd Qu.: 9.000 3rd Qu.: 284727 3rd Qu.: 37.00
Max. :3602982 Max. :37.000 Max. :2442600 Max. :1115.00
cafe_sum_2000_min_price_avg cafe_sum_2000_max_price_avg cafe_avg_price_2000
Min. : 300.0 Min. : 500 Min. : 400.0
1st Qu.: 607.0 1st Qu.:1000 1st Qu.: 823.2
Median : 682.5 Median :1155 Median : 918.0
Mean : 719.4 Mean :1210 Mean : 964.8
3rd Qu.: 791.3 3rd Qu.:1321 3rd Qu.:1056.8
Max. :2166.7 Max. :3500 Max. :2833.3
NA's :1368 NA's :1368 NA's :1368
cafe_count_2000_na_price cafe_count_2000_price_500 cafe_count_2000_price_1000
Min. : 0.000 Min. : 0.00 Min. : 0.00
1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 1.00
Median : 1.000 Median : 4.00 Median : 6.00
Mean : 3.548 Mean : 13.42 Mean : 15.07
3rd Qu.: 3.000 3rd Qu.: 10.00 3rd Qu.: 13.00
Max. :70.000 Max. :278.00 Max. :261.00
cafe_count_2000_price_1500 cafe_count_2000_price_2500
Min. : 0.00 Min. : 0.000
1st Qu.: 1.00 1st Qu.: 0.000
Median : 4.00 Median : 1.000
Mean : 13.09 Mean : 6.546
3rd Qu.: 9.00 3rd Qu.: 3.000
Max. :261.00 Max. :167.000
cafe_count_2000_price_4000 cafe_count_2000_price_high big_church_count_2000
Min. : 0.000 Min. : 0.0000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000
Median : 0.000 Median : 0.0000 Median : 1.000
Mean : 2.281 Mean : 0.3707 Mean : 3.225
3rd Qu.: 1.000 3rd Qu.: 0.0000 3rd Qu.: 2.000
Max. :74.000 Max. :16.0000 Max. :70.000
church_count_2000 mosque_count_2000 leisure_count_2000 sport_count_2000
Min. : 0.000 Min. :0.00000 Min. : 0.000 Min. : 0.000
1st Qu.: 2.000 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 2.000
Median : 3.000 Median :0.00000 Median : 0.000 Median : 9.000
Mean : 6.154 Mean :0.08819 Mean : 1.888 Mean : 9.816
3rd Qu.: 5.000 3rd Qu.:0.00000 3rd Qu.: 1.000 3rd Qu.:14.000
Max. :108.000 Max. :1.00000 Max. :55.000 Max. :54.000
market_count_2000 green_part_3000 prom_part_3000 office_count_3000
Min. :0.00 Min. : 0.31 Min. : 0.00 Min. : 0.00
1st Qu.:0.00 1st Qu.:12.15 1st Qu.: 4.28 1st Qu.: 0.00
Median :1.00 Median :20.30 Median : 9.69 Median : 5.00
Mean :1.17 Mean :22.74 Mean :10.99 Mean : 28.96
3rd Qu.:2.00 3rd Qu.:30.20 3rd Qu.:15.73 3rd Qu.: 17.00
Max. :8.00 Max. :74.02 Max. :45.10 Max. :493.00
office_sqm_3000 trc_count_3000 trc_sqm_3000 cafe_count_3000
Min. : 0 Min. : 0.00 Min. : 0 Min. : 0.0
1st Qu.: 0 1st Qu.: 2.00 1st Qu.: 41100 1st Qu.: 7.0
Median : 130303 Median :11.00 Median : 294350 Median : 41.0
Mean : 538727 Mean :11.79 Mean : 437423 Mean : 109.5
3rd Qu.: 491883 3rd Qu.:17.00 3rd Qu.: 651639 3rd Qu.: 78.0
Max. :6106112 Max. :66.00 Max. :2654102 Max. :1815.0
cafe_sum_3000_min_price_avg cafe_sum_3000_max_price_avg cafe_avg_price_3000
Min. : 300.0 Min. : 500 Min. : 400.0
1st Qu.: 650.0 1st Qu.:1101 1st Qu.: 875.3
Median : 711.0 Median :1211 Median : 961.1
Mean : 765.3 Mean :1283 Mean :1023.9
3rd Qu.: 815.2 3rd Qu.:1333 3rd Qu.:1083.3
Max. :1833.3 Max. :3000 Max. :2416.7
NA's :773 NA's :773 NA's :773
cafe_count_3000_na_price cafe_count_3000_price_500 cafe_count_3000_price_1000
Min. : 0.000 Min. : 0.00 Min. : 0.00
1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 2.00
Median : 3.000 Median : 9.00 Median : 14.00
Mean : 7.196 Mean : 27.45 Mean : 30.11
3rd Qu.: 6.000 3rd Qu.: 22.00 3rd Qu.: 26.00
Max. :114.000 Max. :449.00 Max. :441.00
cafe_count_3000_price_1500 cafe_count_3000_price_2500
Min. : 0.00 Min. : 0.00
1st Qu.: 2.00 1st Qu.: 1.00
Median : 10.00 Median : 3.00
Mean : 26.34 Mean : 13.11
3rd Qu.: 17.00 3rd Qu.: 6.00
Max. :446.00 Max. :263.00
cafe_count_3000_price_4000 cafe_count_3000_price_high big_church_count_3000
Min. : 0.000 Min. : 0.0000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 1.000
Median : 1.000 Median : 0.0000 Median : 2.000
Mean : 4.572 Mean : 0.6915 Mean : 6.057
3rd Qu.: 2.000 3rd Qu.: 0.0000 3rd Qu.: 5.000
Max. :112.000 Max. :22.0000 Max. :102.000
church_count_3000 mosque_count_3000 leisure_count_3000 sport_count_3000
Min. : 0.00 Min. :0.000 Min. : 0.000 Min. : 0.00
1st Qu.: 3.00 1st Qu.:0.000 1st Qu.: 0.000 1st Qu.: 5.00
Median : 6.00 Median :0.000 Median : 0.000 Median : 18.00
Mean : 12.17 Mean :0.199 Mean : 3.817 Mean : 20.18
3rd Qu.: 10.00 3rd Qu.:0.000 3rd Qu.: 2.000 3rd Qu.: 29.00
Max. :164.00 Max. :2.000 Max. :85.000 Max. :100.00
market_count_3000 green_part_5000 prom_part_5000 office_count_5000
Min. : 0.000 Min. : 3.53 Min. : 0.21 Min. : 0.00
1st Qu.: 0.000 1st Qu.:14.78 1st Qu.: 6.05 1st Qu.: 2.00
Median : 2.000 Median :19.77 Median : 8.96 Median : 15.00
Mean : 2.317 Mean :22.76 Mean :10.35 Mean : 70.57
3rd Qu.: 4.000 3rd Qu.:31.39 3rd Qu.:13.97 3rd Qu.: 52.00
Max. :10.000 Max. :68.35 Max. :28.56 Max. :789.00
NA's :136
office_sqm_5000 trc_count_5000 trc_sqm_5000 cafe_count_5000
Min. : 0 Min. : 0.00 Min. : 0 Min. : 0.0
1st Qu.: 85159 1st Qu.: 6.00 1st Qu.: 262000 1st Qu.: 20.0
Median : 429442 Median : 31.00 Median :1076162 Median : 108.0
Mean : 1391525 Mean : 30.06 Mean :1172851 Mean : 262.9
3rd Qu.: 1430674 3rd Qu.: 43.00 3rd Qu.:1683553 3rd Qu.: 221.0
Max. :12372993 Max. :119.00 Max. :4585477 Max. :2645.0
cafe_sum_5000_min_price_avg cafe_sum_5000_max_price_avg cafe_avg_price_5000
Min. : 300.0 Min. : 500 Min. : 400.0
1st Qu.: 670.5 1st Qu.:1144 1st Qu.: 909.4
Median : 721.1 Median :1212 Median : 966.0
Mean : 764.0 Mean :1277 Mean :1020.4
3rd Qu.: 815.4 3rd Qu.:1341 3rd Qu.:1088.1
Max. :1875.0 Max. :3000 Max. :2437.5
NA's :235 NA's :235 NA's :235
cafe_count_5000_na_price cafe_count_5000_price_500 cafe_count_5000_price_1000
Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 1.00 1st Qu.: 4.00 1st Qu.: 8.00
Median : 8.00 Median : 28.00 Median : 36.00
Mean : 17.64 Mean : 65.54 Mean : 72.83
3rd Qu.: 15.00 3rd Qu.: 59.00 3rd Qu.: 69.00
Max. :174.00 Max. :650.00 Max. :648.00
cafe_count_5000_price_1500 cafe_count_5000_price_2500
Min. : 0.00 Min. : 0.00
1st Qu.: 6.00 1st Qu.: 2.00
Median : 24.00 Median : 8.00
Mean : 62.84 Mean : 31.67
3rd Qu.: 50.00 3rd Qu.: 21.00
Max. :641.00 Max. :377.00
cafe_count_5000_price_4000 cafe_count_5000_price_high big_church_count_5000
Min. : 0.00 Min. : 0.000 Min. : 0.0
1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 2.0
Median : 2.00 Median : 0.000 Median : 7.0
Mean : 10.63 Mean : 1.746 Mean : 14.9
3rd Qu.: 5.00 3rd Qu.: 0.000 3rd Qu.: 12.0
Max. :147.00 Max. :30.000 Max. :151.0
church_count_5000 mosque_count_5000 leisure_count_5000 sport_count_5000
Min. : 0 Min. :0.0000 Min. : 0.000 Min. : 0.0
1st Qu.: 9 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 11.0
Median : 16 Median :0.0000 Median : 2.000 Median : 48.0
Mean : 30 Mean :0.4422 Mean : 8.553 Mean : 52.7
3rd Qu.: 28 3rd Qu.:1.0000 3rd Qu.: 7.000 3rd Qu.: 75.0
Max. :250 Max. :2.0000 Max. :106.000 Max. :218.0
market_count_5000 price_doc
Min. : 0.000 Min. : 100000
1st Qu.: 1.000 1st Qu.: 4740002
Median : 5.000 Median : 6274186
Mean : 5.979 Mean : 7103225
3rd Qu.:10.000 3rd Qu.: 8300000
Max. :21.000 Max. :95122496
In [166]:
#I'd like to see that applied to each feature. Here I make it an actual percentage and round to 2 decimal places
sapply(dataset, function(df) {
round(( sum(is.na(df) == TRUE) / length(df) * 100 ),2)
})
- id
- 0
- timestamp
- 0
- full_sq
- 0
- life_sq
- 20.88
- floor
- 0.53
- max_floor
- 31.29
- material
- 31.29
- build_year
- 44.41
- num_room
- 31.29
- kitch_sq
- 31.29
- state
- 44.46
- product_type
- 0
- sub_area
- 0
- area_m
- 0
- raion_popul
- 0
- green_zone_part
- 0
- indust_part
- 0
- children_preschool
- 0
- preschool_quota
- 21.9
- preschool_education_centers_raion
- 0
- children_school
- 0
- school_quota
- 21.89
- school_education_centers_raion
- 0
- school_education_centers_top_20_raion
- 0
- hospital_beds_raion
- 47.27
- healthcare_centers_raion
- 0
- university_top_20_raion
- 0
- sport_objects_raion
- 0
- additional_education_raion
- 0
- culture_objects_top_25
- 0
- culture_objects_top_25_raion
- 0
- shopping_centers_raion
- 0
- office_raion
- 0
- thermal_power_plant_raion
- 0
- incineration_raion
- 0
- oil_chemistry_raion
- 0
- radiation_raion
- 0
- railroad_terminal_raion
- 0
- big_market_raion
- 0
- nuclear_reactor_raion
- 0
- detention_facility_raion
- 0
- full_all
- 0
- male_f
- 0
- female_f
- 0
- young_all
- 0
- young_male
- 0
- young_female
- 0
- work_all
- 0
- work_male
- 0
- work_female
- 0
- ekder_all
- 0
- ekder_male
- 0
- ekder_female
- 0
- 0_6_all
- 0
- 0_6_male
- 0
- 0_6_female
- 0
- 7_14_all
- 0
- 7_14_male
- 0
- 7_14_female
- 0
- 0_17_all
- 0
- 0_17_male
- 0
- 0_17_female
- 0
- 16_29_all
- 0
- 16_29_male
- 0
- 16_29_female
- 0
- 0_13_all
- 0
- 0_13_male
- 0
- 0_13_female
- 0
- raion_build_count_with_material_info
- 16.28
- build_count_block
- 16.28
- build_count_wood
- 16.28
- build_count_frame
- 16.28
- build_count_brick
- 16.28
- build_count_monolith
- 16.28
- build_count_panel
- 16.28
- build_count_foam
- 16.28
- build_count_slag
- 16.28
- build_count_mix
- 16.28
- raion_build_count_with_builddate_info
- 16.28
- build_count_before_1920
- 16.28
- build_count_1921-1945
- 16.28
- build_count_1946-1970
- 16.28
- build_count_1971-1995
- 16.28
- build_count_after_1995
- 16.28
- ID_metro
- 0
- metro_min_avto
- 0
- metro_km_avto
- 0
- metro_min_walk
- 0.09
- metro_km_walk
- 0.09
- kindergarten_km
- 0
- school_km
- 0
- park_km
- 0
- green_zone_km
- 0
- industrial_km
- 0
- water_treatment_km
- 0
- cemetery_km
- 0
- incineration_km
- 0
- railroad_station_walk_km
- 0.09
- railroad_station_walk_min
- 0.09
- ID_railroad_station_walk
- 0.09
- railroad_station_avto_km
- 0
- railroad_station_avto_min
- 0
- ID_railroad_station_avto
- 0
- public_transport_station_km
- 0
- public_transport_station_min_walk
- 0
- water_km
- 0
- water_1line
- 0
- mkad_km
- 0
- ttk_km
- 0
- sadovoe_km
- 0
- bulvar_ring_km
- 0
- kremlin_km
- 0
- big_road1_km
- 0
- ID_big_road1
- 0
- big_road1_1line
- 0
- big_road2_km
- 0
- ID_big_road2
- 0
- railroad_km
- 0
- railroad_1line
- 0
- zd_vokzaly_avto_km
- 0
- ID_railroad_terminal
- 0
- bus_terminal_avto_km
- 0
- ID_bus_terminal
- 0
- oil_chemistry_km
- 0
- nuclear_reactor_km
- 0
- radiation_km
- 0
- power_transmission_line_km
- 0
- thermal_power_plant_km
- 0
- ts_km
- 0
- big_market_km
- 0
- market_shop_km
- 0
- fitness_km
- 0
- swim_pool_km
- 0
- ice_rink_km
- 0
- stadium_km
- 0
- basketball_km
- 0
- hospice_morgue_km
- 0
- detention_facility_km
- 0
- public_healthcare_km
- 0
- university_km
- 0
- workplaces_km
- 0
- shopping_centers_km
- 0
- office_km
- 0
- additional_education_km
- 0
- preschool_km
- 0
- big_church_km
- 0
- church_synagogue_km
- 0
- mosque_km
- 0
- theater_km
- 0
- museum_km
- 0
- exhibition_km
- 0
- catering_km
- 0
- ecology
- 0
- green_part_500
- 0
- prom_part_500
- 0
- office_count_500
- 0
- office_sqm_500
- 0
- trc_count_500
- 0
- trc_sqm_500
- 0
- cafe_count_500
- 0
- cafe_sum_500_min_price_avg
- 43.66
- cafe_sum_500_max_price_avg
- 43.66
- cafe_avg_price_500
- 43.66
- cafe_count_500_na_price
- 0
- cafe_count_500_price_500
- 0
- cafe_count_500_price_1000
- 0
- cafe_count_500_price_1500
- 0
- cafe_count_500_price_2500
- 0
- cafe_count_500_price_4000
- 0
- cafe_count_500_price_high
- 0
- big_church_count_500
- 0
- church_count_500
- 0
- mosque_count_500
- 0
- leisure_count_500
- 0
- sport_count_500
- 0
- market_count_500
- 0
- green_part_1000
- 0
- prom_part_1000
- 0
- office_count_1000
- 0
- office_sqm_1000
- 0
- trc_count_1000
- 0
- trc_sqm_1000
- 0
- cafe_count_1000
- 0
- cafe_sum_1000_min_price_avg
- 21.36
- cafe_sum_1000_max_price_avg
- 21.36
- cafe_avg_price_1000
- 21.36
- cafe_count_1000_na_price
- 0
- cafe_count_1000_price_500
- 0
- cafe_count_1000_price_1000
- 0
- cafe_count_1000_price_1500
- 0
- cafe_count_1000_price_2500
- 0
- cafe_count_1000_price_4000
- 0
- cafe_count_1000_price_high
- 0
- big_church_count_1000
- 0
- church_count_1000
- 0
- mosque_count_1000
- 0
- leisure_count_1000
- 0
- sport_count_1000
- 0
- market_count_1000
- 0
- green_part_1500
- 0
- prom_part_1500
- 0
- office_count_1500
- 0
- office_sqm_1500
- 0
- trc_count_1500
- 0
- trc_sqm_1500
- 0
- cafe_count_1500
- 0
- cafe_sum_1500_min_price_avg
- 13.8
- cafe_sum_1500_max_price_avg
- 13.8
- cafe_avg_price_1500
- 13.8
- cafe_count_1500_na_price
- 0
- cafe_count_1500_price_500
- 0
- cafe_count_1500_price_1000
- 0
- cafe_count_1500_price_1500
- 0
- cafe_count_1500_price_2500
- 0
- cafe_count_1500_price_4000
- 0
- cafe_count_1500_price_high
- 0
- big_church_count_1500
- 0
- church_count_1500
- 0
- mosque_count_1500
- 0
- leisure_count_1500
- 0
- sport_count_1500
- 0
- market_count_1500
- 0
- green_part_2000
- 0
- prom_part_2000
- 0
- office_count_2000
- 0
- office_sqm_2000
- 0
- trc_count_2000
- 0
- trc_sqm_2000
- 0
- cafe_count_2000
- 0
- cafe_sum_2000_min_price_avg
- 5.61
- cafe_sum_2000_max_price_avg
- 5.61
- cafe_avg_price_2000
- 5.61
- cafe_count_2000_na_price
- 0
- cafe_count_2000_price_500
- 0
- cafe_count_2000_price_1000
- 0
- cafe_count_2000_price_1500
- 0
- cafe_count_2000_price_2500
- 0
- cafe_count_2000_price_4000
- 0
- cafe_count_2000_price_high
- 0
- big_church_count_2000
- 0
- church_count_2000
- 0
- mosque_count_2000
- 0
- leisure_count_2000
- 0
- sport_count_2000
- 0
- market_count_2000
- 0
- green_part_3000
- 0
- prom_part_3000
- 0
- office_count_3000
- 0
- office_sqm_3000
- 0
- trc_count_3000
- 0
- trc_sqm_3000
- 0
- cafe_count_3000
- 0
- cafe_sum_3000_min_price_avg
- 3.17
- cafe_sum_3000_max_price_avg
- 3.17
- cafe_avg_price_3000
- 3.17
- cafe_count_3000_na_price
- 0
- cafe_count_3000_price_500
- 0
- cafe_count_3000_price_1000
- 0
- cafe_count_3000_price_1500
- 0
- cafe_count_3000_price_2500
- 0
- cafe_count_3000_price_4000
- 0
- cafe_count_3000_price_high
- 0
- big_church_count_3000
- 0
- church_count_3000
- 0
- mosque_count_3000
- 0
- leisure_count_3000
- 0
- sport_count_3000
- 0
- market_count_3000
- 0
- green_part_5000
- 0
- prom_part_5000
- 0.56
- office_count_5000
- 0
- office_sqm_5000
- 0
- trc_count_5000
- 0
- trc_sqm_5000
- 0
- cafe_count_5000
- 0
- cafe_sum_5000_min_price_avg
- 0.96
- cafe_sum_5000_max_price_avg
- 0.96
- cafe_avg_price_5000
- 0.96
- cafe_count_5000_na_price
- 0
- cafe_count_5000_price_500
- 0
- cafe_count_5000_price_1000
- 0
- cafe_count_5000_price_1500
- 0
- cafe_count_5000_price_2500
- 0
- cafe_count_5000_price_4000
- 0
- cafe_count_5000_price_high
- 0
- big_church_count_5000
- 0
- church_count_5000
- 0
- mosque_count_5000
- 0
- leisure_count_5000
- 0
- sport_count_5000
- 0
- market_count_5000
- 0
- price_doc
- 0
In [ ]:
In [167]:
dataset$timestamp <- as.Date(dataset$timestamp)
In [ ]:
In [168]:
sapply(dataset, class)
- id
- 'integer'
- timestamp
- 'Date'
- full_sq
- 'integer'
- life_sq
- 'integer'
- floor
- 'integer'
- max_floor
- 'integer'
- material
- 'integer'
- build_year
- 'integer'
- num_room
- 'integer'
- kitch_sq
- 'integer'
- state
- 'integer'
- product_type
- 'character'
- sub_area
- 'character'
- area_m
- 'numeric'
- raion_popul
- 'integer'
- green_zone_part
- 'numeric'
- indust_part
- 'numeric'
- children_preschool
- 'integer'
- preschool_quota
- 'integer'
- preschool_education_centers_raion
- 'integer'
- children_school
- 'integer'
- school_quota
- 'integer'
- school_education_centers_raion
- 'integer'
- school_education_centers_top_20_raion
- 'integer'
- hospital_beds_raion
- 'integer'
- healthcare_centers_raion
- 'integer'
- university_top_20_raion
- 'integer'
- sport_objects_raion
- 'integer'
- additional_education_raion
- 'integer'
- culture_objects_top_25
- 'character'
- culture_objects_top_25_raion
- 'integer'
- shopping_centers_raion
- 'integer'
- office_raion
- 'integer'
- thermal_power_plant_raion
- 'character'
- incineration_raion
- 'character'
- oil_chemistry_raion
- 'character'
- radiation_raion
- 'character'
- railroad_terminal_raion
- 'character'
- big_market_raion
- 'character'
- nuclear_reactor_raion
- 'character'
- detention_facility_raion
- 'character'
- full_all
- 'integer'
- male_f
- 'integer'
- female_f
- 'integer'
- young_all
- 'integer'
- young_male
- 'integer'
- young_female
- 'integer'
- work_all
- 'integer'
- work_male
- 'integer'
- work_female
- 'integer'
- ekder_all
- 'integer'
- ekder_male
- 'integer'
- ekder_female
- 'integer'
- 0_6_all
- 'integer'
- 0_6_male
- 'integer'
- 0_6_female
- 'integer'
- 7_14_all
- 'integer'
- 7_14_male
- 'integer'
- 7_14_female
- 'integer'
- 0_17_all
- 'integer'
- 0_17_male
- 'integer'
- 0_17_female
- 'integer'
- 16_29_all
- 'integer'
- 16_29_male
- 'integer'
- 16_29_female
- 'integer'
- 0_13_all
- 'integer'
- 0_13_male
- 'integer'
- 0_13_female
- 'integer'
- raion_build_count_with_material_info
- 'integer'
- build_count_block
- 'integer'
- build_count_wood
- 'integer'
- build_count_frame
- 'integer'
- build_count_brick
- 'integer'
- build_count_monolith
- 'integer'
- build_count_panel
- 'integer'
- build_count_foam
- 'integer'
- build_count_slag
- 'integer'
- build_count_mix
- 'integer'
- raion_build_count_with_builddate_info
- 'integer'
- build_count_before_1920
- 'integer'
- build_count_1921-1945
- 'integer'
- build_count_1946-1970
- 'integer'
- build_count_1971-1995
- 'integer'
- build_count_after_1995
- 'integer'
- ID_metro
- 'integer'
- metro_min_avto
- 'numeric'
- metro_km_avto
- 'numeric'
- metro_min_walk
- 'numeric'
- metro_km_walk
- 'numeric'
- kindergarten_km
- 'numeric'
- school_km
- 'numeric'
- park_km
- 'numeric'
- green_zone_km
- 'numeric'
- industrial_km
- 'numeric'
- water_treatment_km
- 'numeric'
- cemetery_km
- 'numeric'
- incineration_km
- 'numeric'
- railroad_station_walk_km
- 'numeric'
- railroad_station_walk_min
- 'numeric'
- ID_railroad_station_walk
- 'integer'
- railroad_station_avto_km
- 'numeric'
- railroad_station_avto_min
- 'numeric'
- ID_railroad_station_avto
- 'integer'
- public_transport_station_km
- 'numeric'
- public_transport_station_min_walk
- 'numeric'
- water_km
- 'numeric'
- water_1line
- 'character'
- mkad_km
- 'numeric'
- ttk_km
- 'numeric'
- sadovoe_km
- 'numeric'
- bulvar_ring_km
- 'numeric'
- kremlin_km
- 'numeric'
- big_road1_km
- 'numeric'
- ID_big_road1
- 'integer'
- big_road1_1line
- 'character'
- big_road2_km
- 'numeric'
- ID_big_road2
- 'integer'
- railroad_km
- 'numeric'
- railroad_1line
- 'character'
- zd_vokzaly_avto_km
- 'numeric'
- ID_railroad_terminal
- 'integer'
- bus_terminal_avto_km
- 'numeric'
- ID_bus_terminal
- 'integer'
- oil_chemistry_km
- 'numeric'
- nuclear_reactor_km
- 'numeric'
- radiation_km
- 'numeric'
- power_transmission_line_km
- 'numeric'
- thermal_power_plant_km
- 'numeric'
- ts_km
- 'numeric'
- big_market_km
- 'numeric'
- market_shop_km
- 'numeric'
- fitness_km
- 'numeric'
- swim_pool_km
- 'numeric'
- ice_rink_km
- 'numeric'
- stadium_km
- 'numeric'
- basketball_km
- 'numeric'
- hospice_morgue_km
- 'numeric'
- detention_facility_km
- 'numeric'
- public_healthcare_km
- 'numeric'
- university_km
- 'numeric'
- workplaces_km
- 'numeric'
- shopping_centers_km
- 'numeric'
- office_km
- 'numeric'
- additional_education_km
- 'numeric'
- preschool_km
- 'numeric'
- big_church_km
- 'numeric'
- church_synagogue_km
- 'numeric'
- mosque_km
- 'numeric'
- theater_km
- 'numeric'
- museum_km
- 'numeric'
- exhibition_km
- 'numeric'
- catering_km
- 'numeric'
- ecology
- 'character'
- green_part_500
- 'numeric'
- prom_part_500
- 'numeric'
- office_count_500
- 'integer'
- office_sqm_500
- 'integer'
- trc_count_500
- 'integer'
- trc_sqm_500
- 'integer'
- cafe_count_500
- 'integer'
- cafe_sum_500_min_price_avg
- 'numeric'
- cafe_sum_500_max_price_avg
- 'numeric'
- cafe_avg_price_500
- 'numeric'
- cafe_count_500_na_price
- 'integer'
- cafe_count_500_price_500
- 'integer'
- cafe_count_500_price_1000
- 'integer'
- cafe_count_500_price_1500
- 'integer'
- cafe_count_500_price_2500
- 'integer'
- cafe_count_500_price_4000
- 'integer'
- cafe_count_500_price_high
- 'integer'
- big_church_count_500
- 'integer'
- church_count_500
- 'integer'
- mosque_count_500
- 'integer'
- leisure_count_500
- 'integer'
- sport_count_500
- 'integer'
- market_count_500
- 'integer'
- green_part_1000
- 'numeric'
- prom_part_1000
- 'numeric'
- office_count_1000
- 'integer'
- office_sqm_1000
- 'integer'
- trc_count_1000
- 'integer'
- trc_sqm_1000
- 'integer'
- cafe_count_1000
- 'integer'
- cafe_sum_1000_min_price_avg
- 'numeric'
- cafe_sum_1000_max_price_avg
- 'numeric'
- cafe_avg_price_1000
- 'numeric'
- cafe_count_1000_na_price
- 'integer'
- cafe_count_1000_price_500
- 'integer'
- cafe_count_1000_price_1000
- 'integer'
- cafe_count_1000_price_1500
- 'integer'
- cafe_count_1000_price_2500
- 'integer'
- cafe_count_1000_price_4000
- 'integer'
- cafe_count_1000_price_high
- 'integer'
- big_church_count_1000
- 'integer'
- church_count_1000
- 'integer'
- mosque_count_1000
- 'integer'
- leisure_count_1000
- 'integer'
- sport_count_1000
- 'integer'
- market_count_1000
- 'integer'
- green_part_1500
- 'numeric'
- prom_part_1500
- 'numeric'
- office_count_1500
- 'integer'
- office_sqm_1500
- 'integer'
- trc_count_1500
- 'integer'
- trc_sqm_1500
- 'integer'
- cafe_count_1500
- 'integer'
- cafe_sum_1500_min_price_avg
- 'numeric'
- cafe_sum_1500_max_price_avg
- 'numeric'
- cafe_avg_price_1500
- 'numeric'
- cafe_count_1500_na_price
- 'integer'
- cafe_count_1500_price_500
- 'integer'
- cafe_count_1500_price_1000
- 'integer'
- cafe_count_1500_price_1500
- 'integer'
- cafe_count_1500_price_2500
- 'integer'
- cafe_count_1500_price_4000
- 'integer'
- cafe_count_1500_price_high
- 'integer'
- big_church_count_1500
- 'integer'
- church_count_1500
- 'integer'
- mosque_count_1500
- 'integer'
- leisure_count_1500
- 'integer'
- sport_count_1500
- 'integer'
- market_count_1500
- 'integer'
- green_part_2000
- 'numeric'
- prom_part_2000
- 'numeric'
- office_count_2000
- 'integer'
- office_sqm_2000
- 'integer'
- trc_count_2000
- 'integer'
- trc_sqm_2000
- 'integer'
- cafe_count_2000
- 'integer'
- cafe_sum_2000_min_price_avg
- 'numeric'
- cafe_sum_2000_max_price_avg
- 'numeric'
- cafe_avg_price_2000
- 'numeric'
- cafe_count_2000_na_price
- 'integer'
- cafe_count_2000_price_500
- 'integer'
- cafe_count_2000_price_1000
- 'integer'
- cafe_count_2000_price_1500
- 'integer'
- cafe_count_2000_price_2500
- 'integer'
- cafe_count_2000_price_4000
- 'integer'
- cafe_count_2000_price_high
- 'integer'
- big_church_count_2000
- 'integer'
- church_count_2000
- 'integer'
- mosque_count_2000
- 'integer'
- leisure_count_2000
- 'integer'
- sport_count_2000
- 'integer'
- market_count_2000
- 'integer'
- green_part_3000
- 'numeric'
- prom_part_3000
- 'numeric'
- office_count_3000
- 'integer'
- office_sqm_3000
- 'integer'
- trc_count_3000
- 'integer'
- trc_sqm_3000
- 'integer'
- cafe_count_3000
- 'integer'
- cafe_sum_3000_min_price_avg
- 'numeric'
- cafe_sum_3000_max_price_avg
- 'numeric'
- cafe_avg_price_3000
- 'numeric'
- cafe_count_3000_na_price
- 'integer'
- cafe_count_3000_price_500
- 'integer'
- cafe_count_3000_price_1000
- 'integer'
- cafe_count_3000_price_1500
- 'integer'
- cafe_count_3000_price_2500
- 'integer'
- cafe_count_3000_price_4000
- 'integer'
- cafe_count_3000_price_high
- 'integer'
- big_church_count_3000
- 'integer'
- church_count_3000
- 'integer'
- mosque_count_3000
- 'integer'
- leisure_count_3000
- 'integer'
- sport_count_3000
- 'integer'
- market_count_3000
- 'integer'
- green_part_5000
- 'numeric'
- prom_part_5000
- 'numeric'
- office_count_5000
- 'integer'
- office_sqm_5000
- 'integer'
- trc_count_5000
- 'integer'
- trc_sqm_5000
- 'integer'
- cafe_count_5000
- 'integer'
- cafe_sum_5000_min_price_avg
- 'numeric'
- cafe_sum_5000_max_price_avg
- 'numeric'
- cafe_avg_price_5000
- 'numeric'
- cafe_count_5000_na_price
- 'integer'
- cafe_count_5000_price_500
- 'integer'
- cafe_count_5000_price_1000
- 'integer'
- cafe_count_5000_price_1500
- 'integer'
- cafe_count_5000_price_2500
- 'integer'
- cafe_count_5000_price_4000
- 'integer'
- cafe_count_5000_price_high
- 'integer'
- big_church_count_5000
- 'integer'
- church_count_5000
- 'integer'
- mosque_count_5000
- 'integer'
- leisure_count_5000
- 'integer'
- sport_count_5000
- 'integer'
- market_count_5000
- 'integer'
- price_doc
- 'integer'
In [169]:
# summarize correlations between input variables
#cor(dataset[,3:11])
In [170]:
# split input and output
#name the features then split into x and y by feature name and label name -- or just by column nubers
x <- dataset[,2:4]
y <- dataset[,292]
#how to refer to specific columns by number
#df[,c(1:3,6,9:12)]
In [171]:
#dataset[3:9] <- lapply(dataset[3:9], as.numeric)
In [172]:
# scatterplot matrix
pairs(dataset[,2:4])
In [173]:
# correlation plot
#correlations <- cor(dataset[,3:7])
#corrplot(correlations, method="circle")
In [174]:
dataset <- dataset[,c(3,103:105,292)] # dataset <- dataset[,c(292,3,6,7:8)]
In [175]:
sapply(dataset, function(df) {
round(( sum(is.na(df) == TRUE) / length(df) * 100 ),2)
})
- full_sq
- 0
- ID_railroad_station_avto
- 0
- public_transport_station_km
- 0
- public_transport_station_min_walk
- 0
- price_doc
- 0
In [176]:
head(dataset)
full_sq ID_railroad_station_avto public_transport_station_km public_transport_station_min_walk price_doc
43 1 0.27498514 3.2998217 5850000
34 2 0.06526334 0.7831601 6000000
43 3 0.32875604 3.9450725 5700000
77 113 0.07148032 0.8577639 16331452
25 7 0.05021105 0.6025326 5500000
44 1 0.25481389 3.0577667 2000000
In [177]:
# Evaluate Algorithms: Baseline
# Run algorithms using 10-fold cross validation
control <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "RMSE"
In [178]:
# lm
set.seed(7)
fit.lm <- train(price_doc~., data=dataset, method="lm", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# GLM
set.seed(7)
fit.glm <- train(price_doc~., data=dataset, method="glm", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# GLMNET
set.seed(7)
fit.glmnet <- train(price_doc~., data=dataset, method="glmnet", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# SVM
set.seed(7)
#fit.svm <- train(price_doc~., data=dataset, method="svmRadial", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# CART
set.seed(7)
grid <- expand.grid(.cp=c(0, 0.05, 0.1))
fit.cart <- train(price_doc~., data=dataset, method="rpart", metric=metric, tuneGrid=grid, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
# kNN
set.seed(7)
fit.knn <- train(price_doc~., data=dataset, method="knn", metric=metric, preProc=c("center", "scale"), trControl=control, na.action=na.omit)
Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"Warning message in predict.lm(modelFit, newdata):
"prediction from a rank-deficient fit may be misleading"
In [179]:
# Compare algorithms
results <- resamples(list(LM=fit.lm
, GLM=fit.glm
, GLMNET=fit.glmnet
#, SVM=fit.svm
, CART=fit.cart
, KNN=fit.knn))
summary(results)
dotplot(results)
Call:
summary.resamples(object = results)
Models: LM, GLM, GLMNET, CART, KNN
Number of resamples: 30
RMSE
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
LM 3717000 4150000 4254000 5204000 4623000 13700000 0
GLM 3713000 4148000 4251000 5203000 4622000 13700000 0
GLMNET 3797000 4246000 4351000 5143000 4717000 12230000 0
CART 3031000 3201000 3279000 3309000 3385000 3692000 0
KNN 3016000 3113000 3217000 3254000 3363000 3579000 0
Rsquared
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
LM 0.01489 0.2616 0.2817 0.2621 0.3061 0.3586 0
GLM 0.01504 0.2590 0.2796 0.2618 0.3076 0.3540 0
GLMNET 0.01413 0.2938 0.3254 0.3024 0.3604 0.4209 0
CART 0.42380 0.4797 0.5328 0.5260 0.5710 0.6443 0
KNN 0.45800 0.5160 0.5371 0.5331 0.5572 0.5938 0
In [ ]:
In [ ]:
Content source: jsphyg/ml_practice_notebooks
Similar notebooks: