Pickle data


In [48]:
import gzip
import pickle
import json
from tqdm import tqdm_notebook as tqdm

In [49]:
gz_file = '/Users/hernan/Downloads/signalmedia-1m.jsonl.gz'
# gz_file = '/Users/hernan/Downloads/smedia-100.json.gz'

output_file = '/Users/hernan/Downloads/signalmedia/signalmedia.pkl.gz'

data = ([],[])
with gzip.open(gz_file, 'rb') as fi:
    for line in tqdm(fi, total=1000010):
        article = json.loads(line)
        data[0].append( article["title"] )
        data[1].append( article["content"] )




In [50]:
print("writing...")
with gzip.open(output_file, 'wb') as fo:
    pickle.dump(data, fo)

print('finished.')


writing...
finished.

In [54]:
%%time
# del data

print('reading again...')

with gzip.open(output_file, 'rb') as pi:
    heads, desc = pickle.load(pi)

print(len(heads))
print(len(desc))

print(heads[-1])
print(desc[-1])


reading again...
1000000
1000000
Green groups urge Malcolm Turnbull to drop Tony Abbott's tax 'vendetta'
Environment groups urge new Prime Minister Malcolm Turnbull to abandon any plans to change the tax status of green charities. 

Environment groups are urging Prime Minister Malcolm Turnbull to abandon any plans to change the tax status of green charities.

A demonstration is expected outside the Victorian Parliament on Monday to coincide with hearings in Melbourne of a federal inquiry into the administration and transparency of environment groups.

Green groups see the the inquiry, set up by the Abbott government in March, as a "vendetta" and fear changes that will remove the tax deductibility for donations to organisations pushing for environmental protection.

Tony Abbott was particularly scathing of legal wrangling by environment groups to delay a proposal for a massive expansion of coal exports through the Great Barrier Reef.

Mark Wakeham​ from Environment Victoria said about 1000 demonstrators were expected to protest over the inquiry.

"It does appear to be an attack on environment groups," Mr Wakeham said. He accused the Abbott government of attempting to silence critics.

Environmental groups had been singled out ahead of other charities, he said.

"We'll be highlighting we've got a legitimate role to play in a democracy. That might be inconvenient for governments at times, but only for governments that don't have credible environmental policies."

But the inquiry has also heard submissions from the Minerals Council of Australia, stating some environmental groups have exploited their tax deductible status to pursue "ideological campaigns" and encourage illegal behaviour, such as blockades.

The Queensland Resources Council said many environmental groups were not operating within the rules of a charity or pursuing "practical" environmental work.

The Victorian government urged the inquiry to "take into account the various ways in which environmental organisations fulfil their goal of improving the natural environment".

Mr Wakeham said the change of prime minister was a chance to press a "reset button"

Liberal senator Arthur Sinodinos​, a key driver in Malcolm Turnbull's toppling of Mr Abbott last week, appeared on Sunday to flag a more conciliatory approach in the politics of the environment.

"I think you'll see that there'll be a bit of an end to the idea that the environment and development have to be at loggerheads, that somehow it's a zero sum game. It's not," Senator Sinodinos told ABC TV.

"Good environmental policies can also be good economic policies and good economic policies give you a capacity to deal with environmental issues."

The inquiry into the Register of Environmental Organisations has received almost 700 submissions. The story first appeared on The Sydney Morning Herald.
CPU times: user 1min 14s, sys: 50.1 s, total: 2min 4s
Wall time: 2min 57s
Parser   : 6.86 s

In [ ]: