http://myweatherproject.s3-website-us-east-1.amazonaws.com/
* Self-healing mechanism recreates tables from raw data if issues encountered with parquet files. This was used during development but hasn't been encountered in production.
Problem - can't explode multiple columns
Solution - switch to RDD
DataFrame:
City, Date, Time, [forecast date/times], [forecast temperatures], [forecast humidity], [ ]...
RDD:
Zip:
City, Date, Time, zip(forecast date/times, forecast temps, hum etc.)
City, Date, Time, [(dt, temp, hum, ...), (dt, temp, hum, ...), (dt, temp, hum...), ...)
Reshape:
[(city, date, time, dt, temp, hum, ...), (city, date, time, dt, temp, hum, ...), ...]
FlatMap:
(city, date, time, dt, temp, hum, ...)
Switch Back to DF
select *
from
(select *
,row_number() over(partition by city, date order by source desc) as rk
from cityDay2V)
where rk=1').drop('rk')
How does my system have this property?
How does my system fall short and how could it be improved?
How does my system have this property?
How does my system fall short and how could it be improved?
How does my system have this property?
How does my system fall short and how could it be improved?
How does my system have this property?
How does my system fall short and how could it be improved?
How does my system have this property?
How does my system have this property?
How does my system fall short and how could it be improved?
How does my system have this property?
How does my system fall short and how could it be improved?
How does my system have this property?
How does my system fall short and how could it be improved?