Ministry of Education News

Script to parse news on education.govt.nz



In [75]:

    
import bs4
import requests
import os
import mammoth



In [26]:



In [27]:

    
opnewsfo = ('/media/removable/lemonyellow/educ/www.education.govt.nz/news')



In [28]:

    
osliz = os.listdir(opnewsfo)



In [29]:

    
osliz









    Out[29]:





['index.html',
 'early-learning-funding-reminders-july-2015',
 'changes-to-the-ministry-of-education-websites-and-email-addresses',
 'managing-child-illness-in-ece-services-and-kohanga-reo',
 'early-learning-taskforce-news-july-2015',
 'over-1800-schools-on-the-managed-network',
 'safety-checking-childrens-workers-in-force',
 'record-highs-for-education-participation-and-achievement',
 'teachers-keen-on-innovation-funding',
 'making-teaching-and-learning-easier',
 'early-learning-regional-news-july-2015',
 'new-zealand-educators-among-worlds-best',
 'buildings-at-bay-of-islands-college',
 'prime-ministers-education-excellence-awards-finalists-announced']



In [30]:

    
osrem = osliz.remove('index.html')



In [31]:

    
osrem



In [73]:

    
firlpar = list()



In [95]:

    
for repoz in osliz:
    #print repoz
    indef =  os.listdir(opnewsfo +  '/' + repoz + '/')
    for ind in indef:
        opso = open(opnewsfo +  '/' + repoz + '/' + ind, 'r')

        souprep = bs4.BeautifulSoup(opso)
        #a = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>")
        #print souprep
        #Instead of just getting class I need it to return specific 
        #class - intro./
        
        for link in souprep.find_all('p', class_="intro"):
            print link
                #print lifi
            #[x.extract() for x in a.findAll('p')]

            #print link.text
            firlpar.append(link.text)
            #print link.attrs('href')
            #print link.next_element
            
        #for link in souprep.find_all('p'):
            #print link
            #print(link.get('class')):
        #    print (link.get('class'))
        #link = souprep.find_all('p')
        #print(link.('class="intro"'))
        
        #print souprep.prettify
        #print souprep.text









    



<p class="intro">
        1815 schools are now connected to the government-funded internet Managed Network.  
    </p>
<p class="intro">
        The Ministry of Education is working with our partners to implement new Vulnerable Children Act safety checking regulations, in force from 1 July 2015.
    </p>
<p class="intro">
        The latest Better Public Service (BPS) target results show that child participation in quality early childhood education (ECE) and NCEA level 2 achievement rates are at record highs.
    </p>
<p class="intro">
Teacher enthusiasm for the Teacher-led Innovation Fund is huge. 40 projects, worth $2.7 million, involving 78 schools, have been funded from the first application round.
</p>
<p class="intro">
        Modern, multi-purpose and transportable classrooms will soon be in use in many New Zealand schools.
    </p>
<p class="intro">
        The experience, skill and dedication of New Zealand teachers have been highlighted in new international data.  
    </p>
<p class="intro">
        There has been some reporting about the condition of some of the buildings at Bay of Islands College. The full facts haven’t been reported, so it’s important to put them on the table. 
    </p>
<p class="intro">
        Finalists in the 2015 Prime Minister’s Education Excellence Awards have been announced, and judges are currently visiting them, to determine the eventual winners.
    </p>



In [94]:

    
link.unwrap









    Out[94]:





<bound method Tag.unwrap of <p class="intro">
        Finalists in the 2015 Prime Minister’s Education Excellence Awards have been announced, and judges are currently visiting them, to determine the eventual winners.
    </p>>



In [ ]:

    
#a = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>")
#[x.extract() for x in a.findAll('"intro"')]



In [81]:

    
for firl in firlpar:
    print firl.replace('  ', '')









    



1815 schools are now connected to the government-funded internet Managed Network.


The Ministry of Education is working with our partners to implement new Vulnerable Children Act safety checking regulations, in force from 1 July 2015.


The latest Better Public Service (BPS) target results show that child participation in quality early childhood education (ECE) and NCEA level 2 achievement rates are at record highs.


Teacher enthusiasm for the Teacher-led Innovation Fund is huge. 40 projects, worth $2.7 million, involving 78 schools, have been funded from the first application round.


Modern, multi-purpose and transportable classrooms will soon be in use in many New Zealand schools.


The experience, skill and dedication of New Zealand teachers have been highlighted in new international data.


There has been some reporting about the condition of some of the buildings at Bay of Islands College. The full facts haven’t been reported, so it’s important to put them on the table. 


Finalists in the 2015 Prime Minister’s Education Excellence Awards have been announced, and judges are currently visiting them, to determine the eventual winners.



In [41]:

    
with open("/home/wcmckee/Downloads/test.docx", "r") as docx_file:
    result = mammoth.extract_raw_text(docx_file)
    text = result.value # The raw text
    messages = result.messages # Any messages



In [42]:

    
import bs4



In [43]:

    
soudocx = bs4.BeautifulSoup(html)



In [44]:

    
soupnop = soudocx.findAll('p')[1:]



In [20]:



In [21]:

    
for sonp in soupnop:
    print sonp.text









    



Senior Data Analyst – Data Quality (ELI)
Evidence Data and Knowledge
Develop, implement and run processes for checking and correcting early childhood service information in the Early Learning Information system (ELI). Improve data quality processes to ensure they reflect changes to Ministry policy and data needs.
Reports to Manager: Data Collection Unit
Staff: no staff
What  
Our Purpose
 
 
Lift aspiration, raise educational achievement for every New Zealander
Why 
 
Our Vision
 
 
Every New Zealander:
 
 
•Is strong in their national and cultural identity
 
 
•Aspires for themselves and their children to achieve more
 
 
•Has the choice and opportunity to be the best they can be
 
 
•Is an active participant and citizen in creating a strong civil society
 
 
•Is productive, valued and competitive in the world
 
 
New Zealand and New Zealanders lead globally
How 
 
Our Behaviours:
 
 
•We get the job done
 
 
•  We are respectful, we listen, we learn
 
 
•  We back ourselves and others to win
 
 
•  We work together for maximum impact
 
 
Great results are our bottom line
Senior Data Analyst – Data Quality ELIEvidence Data and Knowledge
The Senior Data Analyst needs to have strong working relationships with members of the Collection Team, ECE Analysis Team; Early Years, Parents and Whānau group; and ministry employees working with ELI.  As well as the following external relationships:
. 
Senior Data Analyst – Data Quality ELIEvidence Data and Knowledge
 Action oriented
Written communication
Interpersonal Savvy
Problem solving
Perspective
Managing and measuring work
Tātai Pou 
Demonstration of Tātai Pou competencies at least a ‘developing’ level: 
Customer focus



In [ ]: