Create Layer Config Backup

This notebook outlines how to run a process to create a remote backup of gfw layers.

Rough process:

  • Run this notebook from the gfw/data folder
  • Wait...
  • Check _metadata.json files in the production and staging folders for changes
  • If everything looks good, make a PR

First, install the latest version of LMIPy


In [1]:
!pip install LMIPy

from IPython.display import clear_output
clear_output()

print('LMI ready!')


LMI ready!

Next, import relevent modules


In [23]:
import LMIPy as lmi
import os
import json
from pprint import pprint
from datetime import datetime
import shutil

First, pull the gfw repo and check that the following path correctly finds the data/layers folder, inside which, you should find a production and staging folder.


In [3]:
envs = ['staging', 'production']

In [4]:
path = './backup/configs'

In [27]:
# Create directory and archive previous datasets
with open(path + '/metadata.json') as f:
    date = json.load(f)[0]['updatedAt']
    
shutil.make_archive(f'./backup/archived/archive_{date}', 'zip', path)


Out[27]:
'/Users/vizzuality/Workspace/gfw/data/layers/archived/archive_2019-06-21@09h-31m-18s.zip'

In [28]:
# Check correct folders are found

if not all([folder in os.listdir(path) for folder in envs]):
    print(f'Boo! Incorrect path: {path}')
else:
    print('Good to go!')


Good to go!

Run the following to save, build .json files and log changes.


In [29]:
%%time
for env in envs:
    
    # Get all old ids
    old_ids = [file.split('.json')[0] for file in os.listdir(path + f'/{env}') if '_metadata' not in file]
    
    old_datasets = []
    files = os.listdir(path + f'/{env}')
    
    # Extract all old datasets
    for file in files:
        if '_metadata' not in file:
            with open(path + f'/{env}/{file}') as f:
                old_datasets.append(json.load(f))
    
    # Now pull all current gfw datasets and save
    col = lmi.Collection(app=['gfw'], env=env)
    col.save(path + f'/{env}')
    
    # Get all new ids
    new_ids = [file.split('.json')[0] for file in os.listdir(path + f'/{env}') if '_metadata' not in file]
    
    # See which are new, and which have been removed
    added = list(set(new_ids) - set(old_ids))
    removed = list(set(old_ids) - set(new_ids))
    changed = []
    
    # COmpare old and new, logging those that have changed
    for old_dataset in old_datasets:
        ds_id = old_dataset['id']
        old_ids.append(ds_id)
        with open(path + f'/{env}/{ds_id}.json') as f:
                new_dataset = json.load(f)
        
        if old_dataset != new_dataset:
            changed.append(ds_id)
    
    # Create metadata json
    with open(path + f'/{env}/_metadata.json', 'w') as f:
        
        meta = {
            'updatedAt': datetime.today().strftime('%Y-%m-%d@%Hh-%Mm-%Ss'),
            'env': env,
            'differences': {
                'changed': changed,
                'added': added,
                'removed': removed
            }
        }
        
        # And save it too!
        json.dump(meta,f)
        
print('Done!')


  0%|          | 0/15 [00:00<?, ?it/s]
Saving to path: ./layers/staging
100%|██████████| 15/15 [00:04<00:00,  3.33it/s]
  0%|          | 0/298 [00:00<?, ?it/s]
Saving to path: ./layers/production
100%|██████████| 298/298 [01:22<00:00,  3.63it/s]
Done!
CPU times: user 6.31 s, sys: 564 ms, total: 6.87 s
Wall time: 1min 29s


In [30]:
# Generate rich metadata

metadata = []
for env in envs:
    with open(path + f'/{env}/_metadata.json') as f:
        metadata.append(json.load(f))
        
for env in metadata:
    for change_type, ds_list in env['differences'].items():
        tmp = []
        for dataset in ds_list:
            # generate Dataset entity to get name etc...
            tmp.append(str(lmi.Dataset(dataset)))
        env['differences'][change_type] = tmp
        
with open(path + f'/metadata.json', 'w') as f:
        
        # And save it too!
        json.dump(metadata,f)

In [31]:
pprint(metadata)


[{'differences': {'added': ['Dataset 49d200dd-1fac-4326-a520-eea87e7a7401 '
                            'Template Layer',
                            'Dataset c8f95aa8-62e1-4c6c-96ca-0f0e508d4160 '
                            'Protected areas - El Impenetrable',
                            'Dataset 3bad1bb2-570e-404d-8694-a4eb59ed5971 '
                            'Ituna / Itata',
                            'Dataset dc3e9507-d247-40f9-9d78-a16363d9e4d3 Luli '
                            '- Gran Chaco'],
                  'changed': ['Dataset 52f774e3-7a72-4092-a859-1a8c3e412c29 '
                              'Tree cover TEST',
                              'Dataset 4839cc85-a4fe-41e5-96fd-51796aa694a6 '
                              'Tree cover loss - 2018',
                              'Dataset 42c1bd92-8f37-4b5f-b746-9e8496cb370b '
                              'Soil Organic Carbon 2019',
                              'Dataset 9c4bc723-6c9b-4cea-92a7-9f0f0f9e7697 '
                              'Global Mangrove Forests (OLD)',
                              'Dataset 284b3e2d-72e6-4f97-bf20-83f57ecbedfc '
                              'Alliance for zero extinction sites - 2018'],
                  'removed': []},
  'env': 'staging',
  'updatedAt': '2019-06-21@10h-06m-39s'},
 {'differences': {'added': ['Dataset 5bc5cd49-706f-409c-b10d-77fdfecb010f Fire '
                            'alerts summary stats',
                            'Dataset 9c0dfd21-53dd-40a2-9239-6cf292bd80c0 ISO '
                            'test data 2 Thomas',
                            'Dataset e10f4382-c7c4-4205-a484-17b9d60a68f5 Test '
                            'Data Rows',
                            'Dataset 3dd68cff-a4b8-4d45-8799-a5d433e75e60 NDC '
                            'stats for countries',
                            'Dataset 64632828-d5fa-4b94-be53-92b9e7b069a5 Test '
                            'Data C Thomas',
                            'Dataset 97546f05-3dce-4dd0-9abf-80fd1bff9cee Tree '
                            'cover loss 2018 - GADMv3.6 ISO summary - '
                            'v20190423',
                            'Dataset bd42375f-0983-4e4f-9602-806eb2c26401 Tree '
                            'cover loss 2018 - GADMv3.6 ADM2 summary - '
                            'v20190423',
                            'Dataset 01e90557-91f1-4da2-a810-a1bdd38e7824 Test '
                            'Data A Thomas',
                            'Dataset 0f24299d-2aaa-4afc-945c-b614028c12d1 Fire '
                            'alerts summary stats',
                            'Dataset b3bfa285-ab43-4562-b2e0-0ab3e92c59e3 '
                            'Brazil Land Cover 1985-2017',
                            'Dataset 2b247346-2a1c-4dbf-a934-dd529deed869 CMR '
                            '9.1 test data NULL Thomas',
                            'Dataset cdc5217b-09b7-461d-961d-dc262ba2b4be Tree '
                            'cover loss 2018 - GADMv3.6 ADM1 summary - '
                            'v20190423',
                            'Dataset f56a1761-d6be-40ec-9cd3-df16d3588480 Tree '
                            'cover loss 2018 - GADMv3.6 ADM2 summary - '
                            'v20190429'],
                  'changed': ['Dataset b70f070b-c9ae-4452-aa8e-2280a2604666 '
                              'Major Dams',
                              'Dataset 89755b9f-df05-4e22-a9bc-05217c8eafc8 '
                              'Tree Cover Loss by Dominant Driver',
                              'Dataset c8c7e5ae-d7bd-4e00-98e7-48677791d8f6 '
                              'Palm oil mills',
                              'Dataset fdc8dc1b-2728-4a79-b23f-b09485052b8d '
                              'Dynamic Boundaries (GADM36)',
                              'Dataset 63e88e53-0a88-416e-9532-fa06f703d435 '
                              'Summarized GLAD alerts for admin stats',
                              'Dataset 714339c1-c775-4303-aad4-16d975b2f023 '
                              'Primary Forest',
                              'Dataset 098b33df-6871-4e53-a5ff-b56a7d989f9a '
                              'Subnational Political Boundaries',
                              'Dataset 3d170908-043f-49db-b26b-9e9bfaaa40ce '
                              'GFW - Climate: Insights - Glad Alerts Countries '
                              'Data',
                              'Dataset 461e6f3f-c03c-40b2-8a40-47d1354c93bf '
                              'Deforestation alerts (Terra-i)',
                              'Dataset 55eec37b-e491-447f-b0d2-b8d5b7acdaf7 '
                              'Soil Organic Carbon 2019',
                              'Dataset 8d59a30f-9537-44ff-a6ca-29cf5c62a607 '
                              'Mexico Forest Cover',
                              'Dataset a20e9c0e-8d7d-422f-90f5-3b9bca355aaf '
                              'country page data for admin level 2',
                              'Dataset c4d4e07c-c5b4-4e2c-9db1-5c3bec185f0e '
                              'Oil Palm',
                              'Dataset ba9a6f22-c89c-4d8a-b843-bca2067b09de '
                              'Peru Permanent Production Forests',
                              'Dataset da9b92dd-ccdc-44e1-9dfd-6e8268e36dd0 '
                              'Oil and gas concessions',
                              'Dataset a705fce9-601c-455c-b97b-6237da5cedba '
                              'AGB Gains',
                              'Dataset 86863b72-bf1e-47db-8e7e-007cf3d00291 '
                              'Terrestrial Ecoregions',
                              'Dataset e8b873a3-5665-4b46-ae7e-040c531a77d2 '
                              'USA Conservation Easements',
                              'Dataset d4550e06-9ae3-4c82-a104-459b58efbba0 '
                              'Cambodia Economic Land Concessions',
                              'Dataset 391ca96d-303f-4aef-be4b-9cdb4856832c '
                              'GLAD alerts summary stats grouped by year, week '
                              'and iso',
                              'Dataset fee5fc38-7a62-49b8-8874-dfa31cbb1ef6 '
                              'Global Biodiversity',
                              'Dataset 3668bb78-d77e-4215-bc2a-07433e204823 '
                              'Recent Satellite Imagery',
                              'Dataset f8c77a33-d6ea-478b-9acd-2047b75b0cb8 '
                              'RSPO oil palm concessions',
                              'Dataset 37198e19-651f-4f79-96fb-3beb2746acd2 '
                              'Land Rights',
                              'Dataset 044f4af8-be72-4999-b7dd-13434fc4a394 '
                              'Tree cover',
                              'Dataset d7b12b17-9ed4-43ab-b8e4-efa2668c47f8 '
                              'GFW Stories',
                              'Dataset 7411c30d-88e4-487a-b809-3028c60ee207 '
                              'RTRS Guides for Responsible Soy Expansion',
                              'Dataset 93e67a77-1a31-4d04-a75d-86a4d6e35d54 '
                              'Wood fiber concessions - depricated',
                              'Dataset 4251b827-c6dc-4b27-9850-c6c652e18de3 '
                              'Sabah Timber Plantations Licenses',
                              'Dataset c7c76cc1-5178-474a-8b6a-60b895e02260 '
                              'Tiger Conservation Landscapes',
                              'Dataset 9bd34150-71d2-4fe0-86ae-f8911378d7e3 '
                              'Population Density',
                              'Dataset 05a6d516-e045-498d-bc9f-04673990860f '
                              'Brazil Biomes',
                              'Dataset c7a1d922-e320-4e92-8e4c-11ea33dd6e35 '
                              'GLAD alerts summary stats grouped by year, '
                              'week, iso, adm1 v2',
                              'Dataset 936b191c-8119-4752-8472-c918b9638241 '
                              'Liberia Development Exploration License',
                              'Dataset 887a8991-b5d9-421f-9e84-e26d3ed95779 '
                              'Sabah Logging Concessions',
                              'Dataset e663eb09-04de-4f39-b871-35c6c2ed10b5 '
                              'Deforestation alerts (GLAD)',
                              'Dataset 8a0a08ec-1a92-453a-9caa-6927de719357 '
                              'Canada Petroleum and Natural Gas',
                              'Dataset 493ea3f3-90ea-4fc8-89d6-98f1f4ac341f '
                              'Resource Rights',
                              'Dataset 746089a3-0c24-402f-81b6-f8d91fab77fe '
                              'Guatemala Forest Cover',
                              'Dataset c876f097-ad66-4ebc-ac36-a069790ad9a7 '
                              'Liberia Mineral Development Agreements',
                              'Dataset bb1dced4-3ae8-4908-9f36-6514ae69713f '
                              'Tree plantations',
                              'Dataset ff289906-aa83-4a89-bba0-562edd8c16c6 '
                              'Fire alerts summary stats',
                              'Dataset 3103075e-64d4-4a52-83a3-1094cf9cf04a '
                              'Indonesia Peat Lands',
                              'Dataset 428db321-5ebb-4e86-a3df-32c63b6d3c83 '
                              'GLAD alerts summary stats grouped by year, '
                              'week, iso, adm1 and adm2',
                              'Dataset 5da0c609-c20c-4e99-9d2c-3b1120a2983b '
                              'PRODES Deforestation',
                              'Dataset 9b26177b-1a28-4078-a4b9-8267ac4df669 '
                              'Soil carbon density',
                              'Dataset b3b8ca9d-a071-4383-b898-d7f64573b51f '
                              'Mangrove biomass density',
                              'Dataset 4145f642-5455-4414-b214-58ad39b83e1e '
                              'Fire alerts (MODIS and VIIRS) summary stats '
                              'grouped by date, polyname, iso, adm1 and adm2',
                              'Dataset 3bc67d97-cd01-4242-b72b-315e7f320543 '
                              'BirdLife Endemic Bird Areas',
                              'Dataset c36c3108-2581-4b68-852a-c929fc758001 '
                              'dis.007 Landslide Susceptibility',
                              'Dataset 759de49c-a599-4369-821a-8d27350b0393 '
                              'Malaysia Peat Lands',
                              'Dataset 94e0494e-f652-4ff8-8e4f-8ec0586c4b62 '
                              'Honduras Forest Type',
                              'Dataset dcf70e60-ff2b-4bc9-a4cb-1f12c0e370c8 '
                              'Indonesia Leuser Ecosystem',
                              'Dataset 0f0ea013-20ac-4f4b-af56-c57e99f39e08 '
                              'Fire Alerts (VIIRS)',
                              'Dataset 795633ea-88ba-4019-b4ed-d9575886e8ee '
                              'Liberia Mineral Exploration Licenses',
                              'Dataset 9cd1da2d-ab39-4fd9-9487-beea1d56dbac '
                              'Forma Activity',
                              'Dataset 134caa0a-21f7-451d-a7fe-30db31a424aa '
                              'Political boundaries (GADM)',
                              'Dataset 63295b05-55a1-456c-a56c-c9ccb3a711ec '
                              'River Basin Boundaries',
                              'Dataset 7cc6ac21-c8ef-4dd8-a181-8967721a15a4 '
                              'Political boundaries Admin 2 level (GADM 3.6)',
                              'Dataset 85f82851-e16e-4126-a630-93bb63d4ef42 '
                              'Terra I alerts summarized by admin 1 boundary '
                              'from GADM2.8',
                              'Dataset 3a638102-ab50-4717-a0fe-b27bd79d18c2 '
                              'bio.001 Alliance for Zero Extinction Endangered '
                              'Species Sites',
                              'Dataset 916022a9-2802-4cc6-a0f2-a77f81dd0c09 '
                              'Global Forest Watch - Home page news',
                              'Dataset ae1e485a-5b39-43b3-9a4e-0edc38fd11a6 '
                              'Carbon dioxide emissions from tree cover loss '
                              'in drained peat',
                              'Dataset 81c802aa-5feb-4fbe-9986-8f30c0597c4d '
                              'Tree biomass density',
                              'Dataset b67fc529-af07-4443-85a9-24b5cf6f2eae '
                              'Mangrove biomass density',
                              'Dataset ab35761b-ac75-4a82-a6b9-8c949a5af4da '
                              'Canada Protected Areas',
                              'Dataset a684a9bb-63f2-4bea-bf62-fd5e80d23d75 '
                              '2016 Biodiversity hot spots',
                              'Dataset f5809771-24eb-4cca-89ab-ea1697272b51 '
                              'Sarawak Logging Concessions',
                              'Dataset 10964a62-eff1-469a-8513-770e71f29445 '
                              'USA Forest Ownership Type',
                              'Dataset 897ecc76-2308-4c51-aeb3-495de0bdca79 '
                              'Tree Cover Loss',
                              'Dataset 9333e015-6699-41e6-b0a6-d44222cadcaf '
                              'Cambodia Protected Areas',
                              'Dataset 7a4d9a64-ecb1-45ec-a01e-658f1364fb2e '
                              'Mining',
                              'Dataset fb8987b6-7ad8-4172-b6ef-9c8f917fdafb '
                              'Mexico Payments for Ecosystem Services',
                              'Dataset 13e28550-3fc9-45ec-bb00-5a48a82b77e1 '
                              'Intact Forest Landscapes',
                              'Dataset 2d6ed2f7-4dc1-42ad-94b9-a65a5594037a '
                              'Sarawak Licenses for Planted Forests (LPFs)',
                              'Dataset 6556cbd3-1470-453f-8e69-d8adf4467e31 '
                              'Logging-concessions',
                              'Dataset 6d663b23-5ed8-4d1a-85ff-6cb04d9812d6 '
                              'Indonesia Forest Area',
                              'Dataset 83f8365b-f40b-4b91-87d6-829425093da1 '
                              'Tree Plantations',
                              'Dataset 41a26503-d708-4b95-bbde-c613fba04f44 '
                              'Sarawak Protected Areas',
                              'Dataset 81469de5-176c-487f-9b1a-7217d61de080 '
                              'Mexico forest zoning by category',
                              'Dataset a4e9c32d-d037-4c50-a893-967cad193537 '
                              'Population Density',
                              'Dataset aaf2d74a-4a75-441e-9b3c-73bcb590611e '
                              'Congo Basin Logging Roads',
                              'Dataset 9b9e56fc-270e-486d-8db5-e0a839c9a1a9 '
                              'Fire alerts summary stats - adm1',
                              'Dataset 1bdceabb-fed6-4d4d-9b38-0f04ef538434 '
                              'Peru forest concessions',
                              'Dataset 8e76424f-18a8-415c-affd-45e1158e148f '
                              'Active clearing alerts (FORMA)',
                              'Dataset 3f633a05-a3c9-44a5-939c-aecae35fe63e '
                              'NDC stats for countries',
                              'Dataset 51267795-de96-462f-9dfb-dd1d07b44057 '
                              'Indonesia Primary Forest',
                              'Dataset 7ce357f0-ca71-45f6-88ab-a2f13568017e '
                              'WRI Oil Palm Suitability Standard',
                              'Dataset 60db4603-84fd-487b-b0b8-2db9e13df0f5 '
                              'Mongabay Stories',
                              'Dataset 091cab6a-3a78-4015-a7b4-7a5d46ccf50b '
                              'Tree plantations by type - 2013-2014 CLONE',
                              'Dataset 8f96c227-b45a-43a7-9235-d08d722867ba '
                              'Guatemala Forest Density',
                              'Dataset acee82c1-e621-4ba6-8e37-0e7075aa73ff '
                              'Global Forest Watch - Countries config',
                              'Dataset b3fa1221-db2a-4826-95a1-37ac0973cc4b '
                              'SAD Alerts',
                              'Dataset 853ba748-f980-40d7-b0d8-d9b0fb5d748c '
                              'Indonesia Forest Moratorium',
                              'Dataset 70e2549c-d722-44a6-a8d7-4a385d78565e '
                              'Tree Cover Gain',
                              'Dataset a9cc6ec0-5c1c-4e36-9b26-b4ee0b50587b '
                              'Carbon dioxide emissions from tree cover loss',
                              'Dataset 4fc24a03-cb3e-4df3-a2ee-e2a8dca342b3 '
                              'Logging concessions',
                              'Dataset 69199c5c-31a3-46e4-9ae2-068160b90d79 '
                              'Logging concessions',
                              'Dataset 8f22dec5-2aea-49d6-8a7b-c494dbb8095c '
                              'Political boundaries Admin 1 level (GADM 3.6)',
                              'Dataset f3fc0f1e-aa26-49b6-8741-45df2eea9ac2 '
                              'Brazil Land Cover',
                              'Dataset bd5d7924-611e-4302-9185-8054acb0b44b '
                              'Global Mangrove Watch',
                              'Dataset b3d076cc-b150-4ccb-a93e-eca05d9ac2bf '
                              'soc.064.02 Political Boundaries (Second '
                              'Subnational Level)',
                              'Dataset dddcba3c-f746-4787-9915-f24c141a94da '
                              'USA Land Cover',
                              'Dataset c2615e10-584e-4e7f-ba27-7c4f52594150 '
                              'Peru Protected Areas',
                              'Dataset 9c00b73f-9a6e-453c-b730-e240b56e5c88 '
                              'Sarawak Oil Palm Concessions',
                              'Dataset 4316b45c-e744-4f4c-9823-142eb7638c8d '
                              'Indonesia land cover',
                              'Dataset 3a8e0ae1-fcc5-4a50-abd1-37f158f173ec '
                              'Mexico protected areas',
                              'Dataset 3b12cc5f-4bf8-4857-909e-a8791125bbf1 '
                              'Protected Areas',
                              'Dataset b7a34457-1d8a-456e-af46-876e0b42fb96 '
                              'Projected carbon storage from forest regrowth',
                              'Dataset a8dc9474-ba42-4ae3-a7d3-d8df5f1e78df '
                              'Political boundaries (GADM 3.6)',
                              'Dataset e5aed7ff-b569-4918-887f-192d66fd95de '
                              'Guatemala Forest Change',
                              'Dataset fe80bbb1-90e5-4ab6-ae10-3bce6abcc0fb '
                              'Global Mangrove Forests',
                              'Dataset 588f2f1f-cc62-46aa-9859-befa031412ca '
                              'Land Cover'],
                  'removed': []},
  'env': 'production',
  'updatedAt': '2019-06-21@10h-08m-02s'}]

In [ ]: