classifier for uncorrected xml

some common patterns:

CAPITAL LETTERS at the start of the sentence for each meeting.

often followed by .- "CAPITAL LETTERS.- the meeting..."

day of the week - usually in the paragraph

'in the evening' - often in the paragraph

start of column is always headed 'forthcoming Chartist meetings', but the xml is often mangled for this - e.g.

fortDoorainvi gtart(Ot -;Plcc.t(tTO