Render site pages

dpp runs the knesset data pipelines periodically on our server.

This notebook shows how to run pipelines that render pages for the static website at https://oknesset.org

Load the source data

Download the source data, can take a few minutes.



In [ ]:

    
!{'cd /pipelines; KNESSET_LOAD_FROM_URL=1 dpp run --concurrency 4 '\
  './committees/kns_committee,'\
  './people/committee-meeting-attendees,'\
  './members/mk_individual'}

Run the build pipeline

This pipeline aggregates the relevant data and allows to filter for quicker development cycles.

You can uncomment and modify the filter step in committees/dist/knesset.source-spec.yaml under the build pipeline to change the filter.

The build pipeline can take a few minutes to process for the first time.



In [2]:

    
!{'cd /pipelines; dpp run --verbose ./committees/dist/build'}









    



[./committees/dist/build:T_0] >>> INFO    :168911d3 RUNNING ./committees/dist/build
[./committees/dist/build:T_0] >>> INFO    :168911d3 Collecting dependencies
[./committees/dist/build:T_0] >>> INFO    :168911d3 Running async task
[./committees/dist/build:T_0] >>> INFO    :168911d3 Waiting for completion
[./committees/dist/build:T_0] >>> INFO    :168911d3 Async task starting
[./committees/dist/build:T_0] >>> INFO    :168911d3 Searching for existing caches
[./committees/dist/build:T_0] >>> INFO    :168911d3 Building process chain:
[./committees/dist/build:T_0] >>> INFO    :- load_resource
[./committees/dist/build:T_0] >>> INFO    :- knesset.load_large_csv_resource
[./committees/dist/build:T_0] >>> INFO    :- knesset.rename_resource
[./committees/dist/build:T_0] >>> INFO    :- load_resource
[./committees/dist/build:T_0] >>> INFO    :- filter
[./committees/dist/build:T_0] >>> INFO    :- build_meetings
[./committees/dist/build:T_0] >>> INFO    :- dump.to_path
[./committees/dist/build:T_0] >>> INFO    :- (sink)
[./committees/dist/build:T_0] >>> INFO    :load_resource: INFO    :Processed 756 rows
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/load_resource.py
[./committees/dist/build:T_0] >>> INFO    :knesset.load_large_csv_resource: INFO    :Processed 1771 rows
[./committees/dist/build:T_0] >>> INFO    :knesset.rename_resource: INFO    :Processed 1771 rows
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /pipelines/datapackage_pipelines_knesset/processors/rename_resource.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /pipelines/datapackage_pipelines_knesset/processors/load_large_csv_resource.py
[./committees/dist/build:T_0] >>> INFO    :load_resource: INFO    :Processed 76185 rows
[./committees/dist/build:T_0] >>> INFO    :filter: INFO    :Processed 1865 rows
[./committees/dist/build:T_0] >>> INFO    :build_meetings: INFO    :Processed 1865 rows
[./committees/dist/build:T_0] >>> INFO    :dump.to_path: INFO    :Processed 1865 rows
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/filter.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/load_resource.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /pipelines/committees/dist/build_meetings.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/dump/to_path.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/manager/../lib/internal/sink.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE V ./committees/dist/build {'.dpp': {'out-datapackage-url': '../../data/committees/dist/build_meetings/datapackage.json'}, 'bytes': 7557637, 'committees': 756, 'count_of_rows': 1865, 'dataset_name': '_', 'hash': 'c68616bddaacb22cb62c85cb3b4015e8', 'meetings': 94, 'mks': 1015, 'skipped committees': 0, 'skipped meetings': 0, 'skipped mks': 0}
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/build {'bytes': 7557637, 'committees': 756, 'count_of_rows': 1865, 'dataset_name': '_', 'hash': 'c68616bddaacb22cb62c85cb3b4015e8', 'meetings': 94, 'mks': 1015, 'skipped committees': 0, 'skipped meetings': 0, 'skipped mks': 0}

Download some protocol files for rendering

upgrade to latest dataflows library



In [ ]:

    
!{'pip install --upgrade dataflows'}

Restart the kernel if an upgrade was done

Choose some session IDs to download protocol files for:



In [1]:

    
session_ids = [2063122, 2063126]



In [2]:

    
from dataflows import Flow, load, printer, filter_rows

sessions_data = Flow(
    load('/pipelines/data/committees/kns_committeesession/datapackage.json'),
    filter_rows(lambda row: row['CommitteeSessionID'] in session_ids),
    printer(tablefmt='html')
).results()









    




kns_committeesession






    






  #         CommitteeSessionID
(integer)    Number
(integer)    KnessetNum
(integer)     TypeID
(integer) TypeDesc
(string)           CommitteeID
(integer) Location
(string)                                                  SessionUrl
(string)                                                                                                   BroadcastUrl
(string)     StartDate
(datetime)                    FinishDate
(datetime)                    Note
(string)                                                                                                         LastUpdatedDate
(datetime)                    download_crc32c
(string)         download_filename
(string)                              download_filesize
(integer) parts_crc32c
(string)               parts_filesize
(integer) parts_parsed_filename
(string)                      text_crc32c
(string)               text_filesize
(integer) text_parsed_filename
(string)                      topics
(array)     committee_name
(string)                      


  1 2063122 29 15 161 פתוחה 2045 חדר הוועדה, באגף הוועדות (קדמה), קומה 3, חדר 3710 http://main.knesset.gov.il/Activity/committees/Pages/AllCommitteesAgenda.aspx?Tab=3&ItemID=2063122 None 2000-07-05 00:00:00 2000-07-05 00:00:00 פניות ציבור בנושא איכות והתאמה לתקנים של שירותי הסעדה בבתי-הספר, פעוטונים, קייטנות ומוסדות ציבור        2018-10-10 11:03:06 UCgupg== files/23/4/3/434231.DOC 47154 /4kpmQ== 85239 files/2/0/2063122.csv pybkkw== 85134 files/2/0/2063122.txt None המיוחדת לפניות הציבור
  2 2063126 33 15 161 פתוחה 2045 חדר הוועדה, באגף הוועדות (קדמה), קומה 3, חדר 3710 http://main.knesset.gov.il/Activity/committees/Pages/AllCommitteesAgenda.aspx?Tab=3&ItemID=2063126 None 2000-10-30 00:00:00 2000-10-30 00:00:00 פניות של דיירי רחוב מאור הגולה בשכונת שפירא בתל-אביב שביתם נהרס והם ממשיכים לשלם משכנתא ולא מקבלים כ ... 2018-10-10 11:03:06 ryN9+g== files/23/4/3/434233.DOC 36724 qiGAHw== 56525 files/2/0/2063126.csv +Gw5Mw== 56419 files/2/0/2063126.txt None המיוחדת לפניות הציבור



In [7]:

    
import os
import subprocess
import sys

for session in sessions_data[0][0]:
    for attr in ['text_parsed_filename', 'parts_parsed_filename']:
        pathpart = 'meeting_protocols_text' if attr == 'text_parsed_filename' else 'meeting_protocols_parts'
        url = 'https://production.oknesset.org/pipelines/data/committees/{}/{}'.format(pathpart, session[attr])
        filename = '/pipelines/data/committees/{}/{}'.format(pathpart, session[attr])
        os.makedirs(os.path.dirname(filename), exist_ok=True)
        cmd = 'curl -s -o {} {}'.format(filename, url)
        print(cmd, file=sys.stderr)
        subprocess.check_call(cmd, shell=True)









    



curl -s -o /pipelines/data/committees/meeting_protocols_text/files/2/0/2063122.txt https://production.oknesset.org/pipelines/data/committees/meeting_protocols_text/files/2/0/2063122.txt
curl -s -o /pipelines/data/committees/meeting_protocols_parts/files/2/0/2063122.csv https://production.oknesset.org/pipelines/data/committees/meeting_protocols_parts/files/2/0/2063122.csv
curl -s -o /pipelines/data/committees/meeting_protocols_text/files/2/0/2063126.txt https://production.oknesset.org/pipelines/data/committees/meeting_protocols_text/files/2/0/2063126.txt
curl -s -o /pipelines/data/committees/meeting_protocols_parts/files/2/0/2063126.csv https://production.oknesset.org/pipelines/data/committees/meeting_protocols_parts/files/2/0/2063126.csv

Delete dist hash files



In [8]:

    
%%bash
find /pipelines/data/committees/dist -type f -name '*.hash' -delete

Render pages

Should run the render pipelines in the following order:

Meetings:



In [9]:

    
!{'cd /pipelines; dpp run ./committees/dist/render_meetings'}









    



./committees/dist/render_meetings: WAITING FOR OUTPUT

./committees/dist/render_meetings: RUNNING, processed 94 rows

./committees/dist/render_meetings: SUCCESS, processed 94 rows
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/render_meetings {'bytes': 1742, 'count_of_rows': 94, 'dataset_name': '_', 'failed meetings': 0, 'hash': 'fb41c59fff6c4eced438aa6e29556b24', 'kns_committees': 756, 'meetings': 94, 'mk_individuals': 1015}

Rendered meetings stats



In [10]:

    
from dataflows import Flow, load, printer, filter_rows, add_field

def add_filenames():
    
    def _add_filenames(row):
        for ext in ['html', 'json']:
            row['rendered_'+ext] = '/pipelines/data/committees/dist/dist/meetings/{}/{}/{}.{}'.format(
                str(row['CommitteeSessionID'])[0], str(row['CommitteeSessionID'])[1], str(row['CommitteeSessionID']), ext)
    
    return Flow(
        add_field('rendered_html', 'string'),
        add_field('rendered_json', 'string'),
        _add_filenames
    )

rendered_meetings = Flow(
    load('/pipelines/data/committees/dist/rendered_meetings_stats/datapackage.json'), 
    add_filenames(),
    filter_rows(lambda row: row['CommitteeSessionID'] in session_ids),
    printer(tablefmt='html')
).results()[0][0]









    




meetings_stats






    






  #         CommitteeSessionID
(integer)     num_speech_parts
(integer) hash
(string)     rendered_html
(string)                                                               rendered_json
(string)                                                               


  1 2063122 186 None /pipelines/data/committees/dist/dist/meetings/2/0/2063122.html /pipelines/data/committees/dist/dist/meetings/2/0/2063122.json
  2 2063126 209 None /pipelines/data/committees/dist/dist/meetings/2/0/2063126.html /pipelines/data/committees/dist/dist/meetings/2/0/2063126.json

Committees and homepage



In [13]:

    
!{'cd /pipelines; dpp run ./committees/dist/render_committees'}









    



./committees/dist/render_committees: WAITING FOR OUTPUT

./committees/dist/render_committees: WAITING FOR OUTPUT

./committees/dist/render_committees: SUCCESS, processed 0 rows
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/render_committees {'all chairpersons': 756, 'all committees': 756, 'all meeting stats': 94, 'all meetings': 94, 'all members': 7446, 'all mks': 1015, 'all others': 2, 'all replacements': 244, 'all watchers': 2, 'built index': 1, 'built_committees': 756, 'built_knesset_nums': 21, 'failed_committees': 0, 'failed_knesset_nums': 0}

Members / Factions



In [12]:

    
!{'cd /pipelines; dpp run ./committees/dist/create_members,./committees/dist/build_positions,./committees/dist/create_factions'}









    



./committees/dist/build_positions: WAITING FOR OUTPUT

./committees/dist/build_positions: RUNNING, processed 100 rows

./committees/dist/build_positions: RUNNING, processed 200 rows

./committees/dist/build_positions: RUNNING, processed 300 rows

./committees/dist/build_positions: RUNNING, processed 400 rows

./committees/dist/build_positions: RUNNING, processed 500 rows

./committees/dist/build_positions: RUNNING, processed 600 rows

./committees/dist/build_positions: RUNNING, processed 700 rows

./committees/dist/build_positions: RUNNING, processed 800 rows

./committees/dist/build_positions: RUNNING, processed 900 rows

./committees/dist/build_positions: RUNNING, processed 1000 rows

./committees/dist/build_positions: RUNNING, processed 1100 rows

./committees/dist/build_positions: RUNNING, processed 1200 rows

./committees/dist/build_positions: RUNNING, processed 1300 rows

./committees/dist/build_positions: RUNNING, processed 1400 rows

./committees/dist/build_positions: RUNNING, processed 1500 rows

./committees/dist/build_positions: RUNNING, processed 1600 rows

./committees/dist/build_positions: RUNNING, processed 1700 rows

./committees/dist/build_positions: RUNNING, processed 1800 rows

./committees/dist/build_positions: RUNNING, processed 1900 rows

./committees/dist/build_positions: RUNNING, processed 2000 rows

./committees/dist/build_positions: RUNNING, processed 2100 rows

./committees/dist/build_positions: RUNNING, processed 2144 rows

./committees/dist/build_positions: SUCCESS, processed 2144 rows

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: WAITING FOR OUTPUT

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: WAITING FOR OUTPUT

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: SUCCESS, processed 0 rows

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: SUCCESS, processed 0 rows
./committees/dist/create_factions: WAITING FOR OUTPUT

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: SUCCESS, processed 0 rows
./committees/dist/create_factions: WAITING FOR OUTPUT

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: SUCCESS, processed 0 rows
./committees/dist/create_factions: SUCCESS, processed 0 rows
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/build_positions {'bytes': 282211, 'count_of_rows': 2144, 'dataset_name': 'positions_aggr', 'hash': '0c318cd33a56a9fbb49f96172a462df0'}
INFO    :SUCCESS: ./committees/dist/create_members {}
INFO    :SUCCESS: ./committees/dist/create_factions {}

Showing the rendered pages

To serve the site, locate the correspondoing local directory for /pipelines/data/committees/dist/dist and run:

python -m http.server 8000

Pages should be available at http://localhost:8000/

#	CommitteeSessionID (integer)	Number (integer)	KnessetNum (integer)	TypeID (integer)	TypeDesc (string)	CommitteeID (integer)	Location (string)	SessionUrl (string)	BroadcastUrl (string)	StartDate (datetime)	FinishDate (datetime)	Note (string)	LastUpdatedDate (datetime)	download_crc32c (string)	download_filename (string)	download_filesize (integer)	parts_crc32c (string)	parts_filesize (integer)	parts_parsed_filename (string)	text_crc32c (string)	text_filesize (integer)	text_parsed_filename (string)	topics (array)	committee_name (string)
1	2063122	29	15	161	פתוחה	2045	חדר הוועדה, באגף הוועדות (קדמה), קומה 3, חדר 3710	http://main.knesset.gov.il/Activity/committees/Pages/AllCommitteesAgenda.aspx?Tab=3&ItemID=2063122	None	2000-07-05 00:00:00	2000-07-05 00:00:00	פניות ציבור בנושא איכות והתאמה לתקנים של שירותי הסעדה בבתי-הספר, פעוטונים, קייטנות ומוסדות ציבור	2018-10-10 11:03:06	UCgupg==	files/23/4/3/434231.DOC	47154	/4kpmQ==	85239	files/2/0/2063122.csv	pybkkw==	85134	files/2/0/2063122.txt	None	המיוחדת לפניות הציבור
2	2063126	33	15	161	פתוחה	2045	חדר הוועדה, באגף הוועדות (קדמה), קומה 3, חדר 3710	http://main.knesset.gov.il/Activity/committees/Pages/AllCommitteesAgenda.aspx?Tab=3&ItemID=2063126	None	2000-10-30 00:00:00	2000-10-30 00:00:00	פניות של דיירי רחוב מאור הגולה בשכונת שפירא בתל-אביב שביתם נהרס והם ממשיכים לשלם משכנתא ולא מקבלים כ ...	2018-10-10 11:03:06	ryN9+g==	files/23/4/3/434233.DOC	36724	qiGAHw==	56525	files/2/0/2063126.csv	+Gw5Mw==	56419	files/2/0/2063126.txt	None	המיוחדת לפניות הציבור

#	CommitteeSessionID (integer)	num_speech_parts (integer)	hash (string)	rendered_html (string)	rendered_json (string)
1	2063122	186	None	/pipelines/data/committees/dist/dist/meetings/2/0/2063122.html	/pipelines/data/committees/dist/dist/meetings/2/0/2063122.json
2	2063126	209	None	/pipelines/data/committees/dist/dist/meetings/2/0/2063126.html	/pipelines/data/committees/dist/dist/meetings/2/0/2063126.json