Author: Kat Chuang @katychuang on Twitter
The goal of this exercise is to parse JSON from the Facebook Graph Api using the Aeson library.
I collected data from my most recent posts and previously saved json output from a python version of this code and saved it into a file.json
First import the modules we'll need for this exercise, and define the data record type to map the json fields to variables.
In [1]:
{-# LANGUAGE DeriveGeneric #-}
import Data.Aeson
import Data.Aeson.Types
import Data.Time.Clock (UTCTime)
import qualified Data.ByteString.Lazy as B
import Options.Generic
Define the record type for reading JSON. The fields correspond to JSON fields.
In [2]:
data Post = Post
{ created_time :: UTCTime
, id :: String
, message :: Maybe String
, story :: Maybe String
} deriving (Show, Generic)
instance FromJSON Post
instance ToJSON Post
input <- B.readFile "_+posts2017.json"
We now have the input with the data read in from the JSON file and can proceed with parsing the values.
1. eitherDecode
Here's one way to parse the json, using eitherDecode, which captures the error message. The left side provides the error message, whereas the right contains the value.
The types we're working with:
putStrLn :: String -> IO ()
print :: Show a => a -> IO ()
mapM_ :: (Monad m, Foldable t) => (a -> m b) -> t a -> m ()
Data.Text.take :: Int -> Text -> Text
The snippet below saves the first timestampe into the variable recent
In [3]:
let myJSON = eitherDecode input :: Either String [Post]
recent <- case myJSON of
Left err -> putStrLn err
Right value -> mapM_ print $ take 1 [ created_time x | x <- value ]
2. decode
We can be further concise with the code since we can assume that the JSON formatting is predictable. We can assume this because I saved the data in the python version of this project, in part 1 - instead use decode while assigning the data to a variable. It auto parses the list of JSON dictionaries into a list of Haskell data record types.
In [4]:
let (Just allData) = decode input :: Maybe [Post]
Now that we have theData with the type [Post], which is a list of Post elements, we can easily work with displaying the information in a number of ways.
To demonstrate how the data structure looks like, let's print out the created time field from the first item in the list.
In [5]:
-- show most recent timestamp
print $ created_time $ head allData
Now we are at a position to worry about how timestamps are interpreted by Haskell.
To get an idea of activity per day, we need the timestamps. We can filter the allData to get a list of timestamps. You can follow the structure below to grab all the timestamps in a list, and let's try both map and list comprehension approaches.
1. Using map
In [6]:
-- show all the timestamps
print $ show (length allData) ++ " items in the file. Showing them all with map "
print (map created_time allData) --same as: print $ map (\post -> created_time post) y
2. Using List Comprehension
We can import the utctDay function from Data.Time.Clock to format the timestamps.
In [7]:
import Data.Time.Clock (utctDay)
:t utctDay
In [8]:
print $ show (length allData) ++ " items in the file. Showing them with utctDay and list comprehension "
print [ utctDay (created_time x) | x <- allData ]
We can also look at the time diff from today as well if we were interested in relative time.
In [9]:
import Data.Time.Clock (diffUTCTime, UTCTime)
:t diffUTCTime
import Data.Time.Calendar (fromGregorian)
let marchEnd = fromGregorian 2017 03 31
:t marchEnd
diffDays
In [10]:
import Data.Time.Calendar (diffDays)
print [ diffDays marchEnd (utctDay $ created_time x) | x <- allData ]
In [11]:
daysAgo x = timestamp ++ show (delta x) ++ " days ago"
where
timestamp = show (created_time x) ++ " = "
postDate p = utctDay $ created_time p
delta x = diffDays marchEnd (postDate x)
mapM_ print $ take 5 [ daysAgo x | x <- allData ]
NominalDiffTime
In [12]:
let marchEndUTC = read "2017-03-31 23:59:59 UTC" :: UTCTime
print [ marchEndUTC `diffUTCTime` created_time x | x <- allData ]
In [13]:
-- Practice filtering
let m = [ created_time x | x <- allData, (>=) (utctDay $ created_time x) (fromGregorian 2017 3 1) ]
In [14]:
print $ show (length m) ++ " items posted in March"
print m
With m we're ready to replicate the process from the Python version of creating nested lists data structure. We're going to need the nested lists for the heatmap visualization later. This may not be the optimal approach to convert a list of timestamps to a nested list, but I'm new to Haskell so am just worrying about how to get this done for now.
Let's be fancy and see how to make timestamps more readable as days of the week.
We have a list, and can produce the week information as well as the day information based on each UTC string. The day of week information is the x of an (x,y) matrix; the week information is the y of an (x,y) matrix.
In [15]:
import Data.Time.Calendar.WeekDate
import Data.Time.Format
-- print week numbers of each
print [ formatTime defaultTimeLocale "%U" x | x <- m]
-- print the day of the week
print [ formatTime defaultTimeLocale "%a" x | x <- m]
Turning an element into a tuple/list
Next step is to try making this more seamless into creating a nested list by incorporating the index of lists
In [16]:
march = [ created_time x | x <- allData, (>=) (utctDay $ created_time x) (fromGregorian 2017 3 1) ]
getXY u = [(formatTime defaultTimeLocale "%U" u), formatTime defaultTimeLocale "%u" u]
print [ getXY x | x <- march]
getX r = i - 9 where i = read $ (formatTime defaultTimeLocale "%U" r) :: Int
getY r = read $ (formatTime defaultTimeLocale "%U" r) :: Int
getXY' u = [getX u, getY u]
coords = [ getXY x | x <- march]
print coords
In [17]:
-- Create an empty array of lists 'month'
week = take 7 (repeat 1)
month = take 5 (repeat week)
------------------------------------------------------------
month
What can we reason about this data structure so far? Here is a hint:
-- Accessing items by index
getWeek m j = m !! j -- returns a list
getDay m w d = m !! w !! d -- returns a number value
When a day is marked, we want to change the value in that day of the week. So we have the function mark, that marks a week.
In [18]:
------------------------------------------------------------
-- Takes a list and an index, edits element at index
mark w d = frt ++ (e: tail end)
where
s = d - 1 -- getting the index to work with. index starts at 0. days of week starts at 1.
splice = splitAt s w
frt = fst splice -- before the indexed element
end = snd splice -- after the indexed element
norm = 1 / 21 -- hardwire the max number for now
e = head end - norm -- new element to replace old one
-- show example of using mark
mark week 4
In [19]:
-- try a combination of using pair on mark
-- how can I return the month?
myfunction1 (w,d) = mark thisWeek dayIndex
where
thisWeek = month !! weekIndex
weekIndex = (read w :: Int ) - 9
dayIndex = read d :: Int
[ myfunction1 (w,d) | [w,d] <- coords]
That sort of works, but we don't want to have a resulting list of 21 lists, instead we want to group the output so that we have 5 weeks left.
Let's try another way that creates a key that enables an capability to group lists based on the week 1 thru 5, then we can use reduction principle.
2. Reduction
In [20]:
-- Working with lists
-- https://wiki.haskell.org/How_to_work_on_lists
mm i = xs ++ [norm] ++ ys
where
xs = replicate (i-1) 0 -- if i = 2; x = 1, y = 5
ys = replicate (6-i) 0
norm = 1 / 21 -- hardwire the max number for now
e = 0.1 -- new element to replace old one
-- get a list of key + week
getMM r = (i, mm y)
where
y = read (formatTime defaultTimeLocale "%u" r) :: Int
i = (read (formatTime defaultTimeLocale "%U" r) :: Int) - 9
mapM_ print [ getMM x | x <- march]
Now we have the first element of the tuple that acts as a key from which we can sort and group!
Below I found a function on stackoverflow: How to group similar items in a list using Haskell? to group.. and let's borrow it to see what happens.
In [21]:
import Data.Map
sortAndGroup assocs = toList $ fromListWith (++) [(k, [v]) | (k, v) <- assocs]
oh yeah, we have to apply sortAndGroup on the March posts.....
In [22]:
sample = sortAndGroup [ getMM x | x <- march]
mapM_ print sample
Getting closer, we have three weeks for the 3 active weeks in March. Let's combine the lists for each week to get the resulting list of active days.... to do this I found another example on stackoverflow: Sum a list of lists?
It takes a list of lists, transposes the columns to rows so that you can use the sum function.
In [23]:
--
import Data.List
week = replicate 7 1
reduction input = Prelude.map sum . transpose $ (input)
-- try the first one:
reduction $ snd (head sample)
In [24]:
-- http://stackoverflow.com/questions/23660295/how-to-map-a-function-on-the-elements-of-a-nested-list
tryAgain = [ [ 1- e | e <- reduction $ snd weekly] | weekly <- sample]
mapM_ print tryAgain
At this point, I don't know if the values are correct but at least the data type is correct for drawing the chart. We'll come back later to fix this.
In [25]:
import Plots.Axis (Axis, r2Axis)
import Plots.Axis.ColourBar (colourBar)
import Plots.Axis.Render (renderAxis)
import Plots.Axis.Scale (axisExtend, noExtend)
import Plots.Style (axisColourMap, magma, greys)
import Plots.Types (display)
import Plots.Types.HeatMap
import Control.Lens ((&~), (.=))
import Diagrams.Backend.Cairo (B, Cairo)
import Diagrams.TwoD.Types (V2)
import Diagrams.Core.Types (QDiagram)
import IHaskell.Display.Diagrams
import IHaskell.Display.Juicypixels hiding (display)
In [26]:
heatMapAxis' :: Axis Cairo V2 Double
heatMapAxis' = r2Axis &~ do
display colourBar
axisExtend Control.Lens..= noExtend
axisColourMap Control.Lens..= greys
--let result = [[1,1,3], [3,2,1],[1,2,2]]
heatMap' $ reverse tryAgain
heatMapExample' = renderAxis heatMapAxis'
diagram heatMapExample'