In [1]:
import os
print(os.getcwd())
#os.chdir('../blocking/')
import pandas as pd
import py_entitymatching as em
import math
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import style
import re
from numpy import genfromtxt
/Users/andrew/workspace/endangeredanimals/analysis
/Users/andrew/anaconda/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
In [2]:
t = pd.read_csv("labeled.csv", encoding="ISO-8859-1", index_col=0)
t.set_index('_id')
t
Out[2]:
_id
ltable_name
ltable_family
ltable_ecology
ltable_countries
ltable_threat_paragraph
ltable_conservation_paragraph
ltable_pop_trend
ltable_status
ltable_country_count
...
rtable_size
rtable_threats
rtable_conservation
rtable_threat_keywords
rtable_conservation_keywords
rtable_status
rtable_countries
rtable_country_count
rtable_tCount
label
7
1522478
caterpillar slug
Veronicellidae
NaN
South Africa (KwaZulu-Natal);
NaN
NaN
Unknown
Endangered
1
...
Extended length: up to 90 mm (2)
The caterpillar slug is threatened by habitat loss and degradation as a result of ongoing urbani...
Although there are currently no conservation measures directly targeting the caterpillar slug in...
loss;environment;
Endangered
['India', 'Russia', 'Malaysia', 'China', 'Indonesia']
5
2
1.0
17
8689373
catalina mahogany
Rosaceae
NaN
United States (California);
NaN
NaN
NaN
Critically Endangered
1
...
Height: 3 - 7 m (2)Trunk diameter: c. 20 cm (2)
Historically a major threat to the Catalina mahogany was the introduction of <strong>herbivores<...
Conservation efforts began in the 1970s with a detailed inventory of the remaining Catalina maho...
loss;invasive;
Critically Endangered
['Ukraine', 'Morocco', 'Russia', 'Hungary']
4
2
1.0
21
7512846
lorenz von liburnaus woolly lemur, western avahi, western woolly lemur
Indriidae
Terrestrial
Madagascar;
\r\r\r\r\r\n The major threat is forest destruction due to annual burning that creates new ca...
['\n This species is listed on Appendix I of CITES. ', <span lang="EN-CA">This species is kno...
Decreasing
Endangered
1
...
700 â?? 900 g (2)
15 species of lemur have become extinct since sea-faring humans arrived on Madagascarâ??s shore...
The western woolly lemur is confirmed in only two protected areas Ankarafantsika Nature Reserve ...
hunting;
protected;
Critically Endangered
['Australia']
1
1
1.0
23
1191527
bluelegged mantella, tular golden frog, tular mantella, tulear golden frog
Mantellidae
Terrestrial; Freshwater
Madagascar;
\r\r\r\r\r\n The main threat to this species is habitat loss due to grazing and fire, and in ...
['\n It occurs in Parque Nacional de Isalo. Trade in this species needs to be very carefully ...
Decreasing
Endangered
1
...
1 â?? 3 g (3)
Several thousand blue-legged mantellas are thought to be collected every year from some regions ...
Listing on Appendix II of the Convention on International Trade in Endangered Species provides t...
loss;
Endangered
['Taiwan', 'China', 'Vietnam']
3
1
1.0
25
4646125
malagasy giant jumping rat, malagasy giant rat
Nesomyidae
Terrestrial
Madagascar;
\r\r\r\r\r\n The historical decline of this species has been partly through climatic change l...
['\n The new Menabe-Antimena protected area has temporary protection order and covers the ent...
Decreasing
Endangered
1
...
1 â?? 1.5 kg (2)
Like many of Madagascarâ??s unique species the Malagasy giant rat is thought to have become hig...
This large rodent is in urgent need of conservation and its future remains highly uncertain. The...
loss;pet;
captive breeding;protected;
Endangered
['Canada']
1
2
1.0
33
8966650
lydenburg cycad
Zamiaceae
Terrestrial
South Africa (Limpopo Province);
\r\r\r\r\r\n This species has suffered much from the activities of collectors and in addition...
['\n This species is listed on Appendix I of the CITES Appendices.\n\n \n ']
Decreasing
Critically Endangered
1
...
Height: up to 3 m (2)
Over the past few decades many South African cycads have become increasingly scarce in the wild ...
There are not known to be any specific conservation measures in place for this Critically Endang...
loss;
cites;protected;
Critically Endangered
['Australia', 'South Africa', 'Brazil']
3
1
1.0
48
1199442
bluelegged mantella, tular golden frog, tular mantella, tulear golden frog
Mantellidae
Terrestrial; Freshwater
Madagascar;
\r\r\r\r\r\n The main threat to this species is habitat loss due to grazing and fire, and in ...
['\n It occurs in Parque Nacional de Isalo. Trade in this species needs to be very carefully ...
Decreasing
Endangered
1
...
1 â?? 3 g (3)
Several thousand blue-legged mantellas are thought to be collected every year from some regions ...
Listing on Appendix II of the Convention on International Trade in Endangered Species provides t...
loss;
Endangered
['Turkey']
1
1
1.0
49
4143489
jeweled toad
Bufonidae
Terrestrial; Freshwater
Mexico;
\r\r\r\r\r\n The main threat is habitat loss due to agricultural expansion, wood extraction, ...
['\n Protection of the suburban and tropical dry areas around Acapulco represents the only ch...
Decreasing
Endangered
1
...
Snout-vent length: up to 8 cm (2)
The main threat to the jeweled toad is habitat loss due to the spread of agriculture the expansi...
The jeweled toad is legally protected in Mexico but its only real chance of survival is likely t...
loss;disease;pollution;
captive breeding;protected;
Endangered
['Cambodia', 'Vietnam']
2
3
1.0
53
4979395
mitred leaf monkey, sumatran surili
Cercopithecidae
Terrestrial
Indonesia (Sumatera);
\r\r\r\r\r\n There has been extensive loss of habitat, especially for oil palm plantations, a...
['\n This species is listed under CITES Appendix II, and is protected by national law. It is ...
Decreasing
Endangered
1
...
5.8 â?? 7.4 kg (2)
Indonesia's status as the world's number-one supplier of plywood and a major supplier of palm oi...
The mitred leaf monkey is protected by national law in Indonesia (1) and is listed under Appendi...
loss;poaching;pet;
cites;protected;
Endangered
['China']
1
3
1.0
55
7199307
twofingered skink
Scincidae
Terrestrial
Spain; Algeria; Morocco;
\r\r\r\r\r\n Development of coastal areas for tourism and military purposes are major threats...
['\n Further surveys are needed to better determine the range of this species. It is known to...
Decreasing
Endangered
3
...
Length: 6 - 8 cm (2)
The two-fingered skink occupies a restricted and fragmented range and is under threat from a dec...
The two-fingered skink occurs in a few protected areas including Embouchure de la Moulouya and S...
protected;
Critically Endangered
['Montenegro']
1
0
1.0
56
4464235
nilgiri longtailed tree mouse
Muridae
Terrestrial
India;
\r\r\r\r\r\n Human disturbance, use of pesticides and exotic trees are found to be major thre...
['\n It is listed as a vermin under Schedule V of the Indian Wildlife (Protection) Act. It ha...
Decreasing
Endangered
1
...
c. 10 g (2)
Occupying an area no more than 500 square kilometres the long-tailed climbing mouse is threatene...
There are currently no conservation measures in place for the long-tailed climbing mouse and it ...
loss;fragmentation;
protected;
Endangered
['Sri Lanka']
1
2
1.0
61
6186039
salvins mushroomtongue salamander
Plethodontidae
Terrestrial
El Salvador; Guatemala;
\r\r\r\r\r\n The major threat in the past has been habitat loss, due mainly to subsistence ag...
['\n It is not currently known from any protected areas in Guatemala, although protected area...
Decreasing
Endangered
2
...
Snout-vent length: 51 - 68 mm (2) (3)Tail length: c. 51 mm (2)
Salvinâ??s mushroomtongue salamander was once relatively common but has undergone a decline as ...
No specific conservation measures are currently known to be in place for Salvinâ??s mushroomton...
loss;fragmentation;environment;disease;
protected;
Critically Endangered
['Bermuda']
1
4
1.0
70
7610113
rednosed bearded saki, rednosed saki, whitenosed bearded saki, whitenosed saki
Pitheciidae
Terrestrial
Brazil;
\r\r\r\r\r\n The Trans-Amazon highway bisects the range of this species from east to west and...
['\n This species occurs in the Parque Nacional da Amaz�´nia [Tapaj�³s] (10,000 km�²), ...
Decreasing
Endangered
1
...
Male head-body length: c. 42.7 cm (2)Female head-body length: c. 41.8 cm (2)Tail length: 30 - 50...
The white-nosed saki is mainly threatened by habitat destruction as its range has been divided b...
As well as being listed on Appendix I of the Convention on International Trade in Endangered Spe...
cites;protected;endangered species act;
Critically Endangered
['Malta']
1
0
NaN
80
8862181
club naiad, clubshell, clubshell pearly mussel
Unionidae
NaN
United States;
NaN
NaN
NaN
Critically Endangered
1
...
Length: up to 7.6 cm (2)
It is estimated that the clubshell pearly mussel has been extirpated from more than 95 percent o...
The clubshell pearly mussel is federally listed as â??Endangeredâ?? in the United States (4) (...
invasive;pollution;
Critically Endangered
['South Africa']
1
2
NaN
89
9125918
delta green ground beetle
Carabidae
NaN
United States;
NaN
NaN
NaN
Critically Endangered
1
...
Length: 6 mm (2).
The historical distribution of the delta ground beetle is unknown but it is thought reasonable t...
The delta green ground beetle is protected by the Lacey Act which prohibits its import export tr...
protected;
Critically Endangered
['Angola', 'Equatorial Guinea', 'Gabon', 'Central African Republic', 'Cameroon']
5
0
NaN
111
2099479
dark red meranti
Dipterocarpaceae
Terrestrial
Indonesia (Sumatera);
NaN
['\n Some subpopulations are found in primary forest reserves.\n\n \n ']
NaN
Endangered
1
...
Trunk diameter: up to 172 cm (2)
Dipterocarp forests have become amongst the most endangered in the world widely logged for use i...
Some subpopulations of this species are found in primary forest reserves where they receive vary...
Endangered
['Seychelles']
1
0
NaN
121
1089171
blue mountain water skink
Scincidae
NaN
Australia;
NaN
NaN
NaN
Endangered
1
...
up to 10 g (2)
Restricted to a very specific habitat dependant on delicate associations between the fauna and f...
With high levels of <strong>endemism</strong> and large numbers of threatened species the Blue M...
pollution;
Endangered
['Russia', 'Georgia', 'Turkey']
3
1
NaN
123
5575006
blackfaced black spider monkey, chamek spider monkey, peruvian black spider monkey, peruvian spi...
Atelidae
Terrestrial
Bolivia, Plurinational States of; Brazil (Acre, Amazonas, Mato Grosso, Rond�´nia); Peru;
\r\r\r\r\r\n The major threat is subsistence and market hunting for food (with guns). An addi...
['\n This species is confirmed, or may occur, in numerous protected areas.', <br/>, <br/>, 'B...
Decreasing
Endangered
3
...
ca. 7 kg (2)
The Peruvian spider monkey is under serious threat from hunting for food as well as from habitat...
The Peruvian spider monkey occurs in many protected areas throughout its range (1) and internati...
loss;hunting;
cites;protected;
Critically Endangered
['Taiwan', 'Philippines', 'China']
3
2
NaN
124
4638210
malagasy giant jumping rat, malagasy giant rat
Nesomyidae
Terrestrial
Madagascar;
\r\r\r\r\r\n The historical decline of this species has been partly through climatic change l...
['\n The new Menabe-Antimena protected area has temporary protection order and covers the ent...
Decreasing
Endangered
1
...
1 â?? 1.5 kg (2)
Like many of Madagascarâ??s unique species the Malagasy giant rat is thought to have become hig...
This large rodent is in urgent need of conservation and its future remains highly uncertain. The...
loss;pet;
captive breeding;protected;
Endangered
['Tanzania', 'Kenya']
2
2
NaN
139
3047256
golden bamboo lemur, golden lemur
Lemuridae
Terrestrial
Madagascar;
\r\r\r\r\r\n The major threat is habitat loss due to slash-and-burn agriculture and harvestin...
['\n This species is listed on Appendix I of CITES. ', <span lang="EN-CA">This species has a ...
Decreasing
Critically Endangered
1
...
1 - 1.5 kg (2)
Mainly threatened by habitat loss through slash-and-burn agriculture (6) although the golden ba...
In 1991 three areas of land around the village of Ranomafana were designated as Ranomafana Natio...
loss;hunting;pet;
Endangered
['Australia', 'United States']
2
3
NaN
152
5817875
queen alexandras birdwing
Papilionidae
NaN
Papua New Guinea;
NaN
NaN
NaN
Endangered
1
...
Wingspan: 19 â?? 28 cm (2)
As one of the worldâ??s most beautiful butterflies Queen Alexandraâ??s birdwing is extremely a...
Threatened by illegal trade and habitat loss the survival of Queen Alexandraâ??s birdwing is de...
loss;
cites;
Critically Endangered
[]
0
1
NaN
158
2081657
danube salmon, huchen
Salmonidae
Freshwater
Austria; Bosnia and Herzegovina; Croatia; Czech Republic; Germany; Hungary; Montenegro; Poland; ...
\r\r\r\r\r\n Historically overfishing, pollution and dam construction caused the decline of t...
['\n Restocking, fishing regulations. A EU LIFE project in Austria to improve habitat conditi...
Unknown
Endangered
13
...
up to 53 kg (2)
Once widespread the Danube salmon is now amongst the most endangered fish species in Europe (3) ...
Conservation efforts to date have involved the establishment of reserves restocking of populatio...
pollution;
Endangered
['United States']
1
1
NaN
170
4472150
nilgiri longtailed tree mouse
Muridae
Terrestrial
India;
\r\r\r\r\r\n Human disturbance, use of pesticides and exotic trees are found to be major thre...
['\n It is listed as a vermin under Schedule V of the Indian Wildlife (Protection) Act. It ha...
Decreasing
Endangered
1
...
c. 10 g (2)
Occupying an area no more than 500 square kilometres the long-tailed climbing mouse is threatene...
There are currently no conservation measures in place for the long-tailed climbing mouse and it ...
loss;fragmentation;
protected;
Endangered
['Sri Lanka']
1
2
NaN
172
9108068
delacours langur
Cercopithecidae
Terrestrial
Viet Nam;
\r\r\r\r\r\n Hunting for the purposes of traditional "medicine" is the primary threat facing ...
["\n This species is currently listed only as CITES Appendix II. It is considered 'endangered...
Decreasing
Critically Endangered
1
...
Male head-to-body length: 57 â?? 62 cm (2)Female head-to-body length: 57 â?? 59 cm (2)Mail tai...
With as few as 270 to 300 estimated individuals remaining in 19 isolated populations and 14 of t...
Four areas where Delacourâ??s langurs are protected include: Cuc Phuong National Park Pu Luong ...
loss;hunting;environment;
protected;
Critically Endangered
['Ecuador']
1
3
NaN
193
5893603
orangebellied racer, redbellied racer, saba racer
Dipsadidae
NaN
Saint Kitts and Nevis; Bonaire, Sint Eustatius and Saba (Saba, Sint Eustatius);
\r\r\r\r\r\n Extirpated historically from Nevis and St Kitts due to the introduction of mongo...
NaN
NaN
Endangered
2
...
Maximum snout-vent length: 92 cm (2)
The red-bellied racer now occupies only 11 percent of its original range (5) largely due to the ...
Fortunately there are some systems in place to conserve the red-bellied racer. For example on St...
protected;
Critically Endangered
['Italy', 'Mexico']
2
0
NaN
195
83513
albanian water frog
Ranidae
Terrestrial; Freshwater
Albania; Montenegro;
\r\r\r\r\r\n The major threat is drainage of wetland habitats and aquatic pollution of many w...
["\n It is listed on Appendix III of the Bern Convention. 'Green frogs', including ", <em>R. ...
Decreasing
Endangered
2
...
Male length: c. 71 mm (2)Female length: c. 74 mm (2)
The main threats to the Albanian water frog are the drainage of wetlands and agrochemical and in...
The Albanian water frog is listed on Appendix III of the Bern Convention which means that it sho...
pet;pollution;
protected;
Endangered
['Montenegro', 'Albania']
2
2
NaN
209
1823172
cornish pathmoss
Ditrichaceae
Terrestrial
United Kingdom (Great Britain);
\r\r\r\r\r\n Habitat is threatened by encroachment of rank vegetation and excessive human dis...
NaN
Unknown
Endangered
1
...
Plant size: 1 - 5 mm
Although the reasons for the decline of this species are not fully understood it is believed tha...
This species' rarity and the precarious state of its habitat meant that it was included in the o...
loss;
Endangered
['Australia', 'United States']
2
1
NaN
210
2604058
celebes tortoise, forstens tortoise, travancore tortoise
Testudinidae
NaN
Indonesia;
NaN
['\n It is listed on CITES Appendix II.\n\n \n ']
NaN
Endangered
1
...
2.5 kg (2)
The destruction of habitat on the islands of Sulawesi and Halmahera is greatly threatening the s...
International trade in Forsten's tortoise is restricted by its listing on Appendix II of the Con...
pet;
cites;
Endangered
['Dominican Republic', 'Haiti']
2
1
NaN
211
8421174
blackeared golden mantella, blackeared mantella
Mantellidae
Terrestrial; Freshwater
Madagascar;
\r\r\r\r\r\n The area where this species occurs is severely threatened, with its forest habit...
['\n It is not known from any protected areas, making protection of remaining habitat a top p...
Decreasing
Critically Endangered
1
...
Length: 15 â?? 18 mm (2)
As for many Madagascan frogs the threats to this species are serious and numerous. Having suffer...
For this species to persist trade must be carefully regulated. The black-eared mantella is curre...
loss;pet;
protected;
Critically Endangered
['India', 'Bangladesh', 'China', 'Laos', 'Myanmar', 'Cambodia', 'Vietnam', 'Bhutan', 'Thailand',...
13
2
NaN
215
7004837
tehuantepec hare, tehuantepec jackrabbit, tehuantepec jack rabbit, tropical hare
Leporidae
Terrestrial
Mexico (Oaxaca);
\r\r\r\r\r\n The species' habitat is threatened by encroaching agriculture as the local human...
['\n The Tehuantepec jackrabbit is listed as critically endangered in the Mexican Official No...
Decreasing
Endangered
1
...
2.5 - 4 kg (4)
The Tehuantepec jackrabbit has a small and declining range and is now found in only four small a...
The Tehuantepec jackrabbit is listed as Critically Endangered by the Mexican government (1) (13...
loss;hunting;
captive breeding;protected;
Critically Endangered
[]
0
2
NaN
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
793
5701602
NaN
Nepenthaceae
Terrestrial
Malaysia (Sabah);
NaN
['\n On CITES Appendix II.\n\n \n ']
NaN
Endangered
1
...
Height: up to 15 m (2)
<i>Nepenthes</i> species are threatened by a combination of over-collection and habitat loss (2)...
<i>Nepenthes burbidgeae</i> is found only on Mount Kinabalu which is situated within Kinabalu Na...
loss;environmental;environment;
cites;protected;
Endangered
['Cuba']
1
3
1.0
794
5693687
NaN
Nepenthaceae
Terrestrial
Malaysia (Sabah);
NaN
['\n On CITES Appendix II.\n\n \n ']
NaN
Endangered
1
...
Height: up to 15 m (2)
<i>Nepenthes</i> species are threatened by a combination of over-collection and habitat loss (2)...
<i>Nepenthes burbidgeae</i> is found only on Mount Kinabalu which is situated within Kinabalu Na...
loss;environmental;environment;
cites;protected;
Endangered
['China']
1
3
1.0
795
7689716
melodius coqui, wightmans robber frog
Eleutherodactylidae
Terrestrial
Puerto Rico;
\r\r\r\r\r\n Although some habitat destruction is taking place (due to agriculture and infras...
["\n It occurs in several protected areas, most of which are well managed. Further research i...
Decreasing
Endangered
1
...
Female snout-vent length: 2 cm (2)
Already restricted in range Wightmanâ??s robber frog populations are in a continuing decline es...
Despite not being the target of any known conservation measures Wightmanâ??s robber frog is aff...
disease;pollution;
protected;
Critically Endangered
['Turkey', 'Greece', 'Mauritania']
3
2
1.0
801
1491981
NaN
Bufonidae
Terrestrial
Brazil;
\r\r\r\r\r\n The major threats are habitat loss due to agricultural expansion, livestock graz...
['\n It might occur in the Estacion Bi�³logica Santa Lucia, and Reserva Bi�³logica August...
Decreasing
Endangered
1
...
Length: 14 - 19 mm (2)
Carvalhoâ??s tree toad like many other amphibians in the <strong>Atlantic forest</strong> is un...
Carvalhoâ??s tree toad is found in protected areas in the Caparao National Park and the Santa T...
loss;
protected;
Endangered
['United States']
1
1
1.0
811
75598
albanian water frog
Ranidae
Terrestrial; Freshwater
Albania; Montenegro;
\r\r\r\r\r\n The major threat is drainage of wetland habitats and aquatic pollution of many w...
["\n It is listed on Appendix III of the Bern Convention. 'Green frogs', including ", <em>R. ...
Decreasing
Endangered
2
...
Male length: c. 71 mm (2)Female length: c. 74 mm (2)
The main threats to the Albanian water frog are the drainage of wetlands and agrochemical and in...
The Albanian water frog is listed on Appendix III of the Bern Convention which means that it sho...
pet;pollution;
protected;
Endangered
['Montenegro', 'Albania']
2
2
1.0
813
234110
aran rock lizard
Lacertidae
Terrestrial
France; Spain;
\r\r\r\r\r\n This species is possibly threatened by overgrazing of habitat by cattle, collect...
['\n This species is listed on Appendix III of the Bern Convention. It does not occur in any ...
Decreasing
Endangered
2
...
Snout-vent length: up to 6 cm (2)
The rocky alpine habitat of the Aran rock lizard is currently threatened by overgrazing by cattl...
The Aran rock lizard is listed on Appendix III of the Bern Convention a convention which aims to...
protected;
Endangered
['Thailand', 'Bangladesh', 'India', 'Malaysia']
4
0
1.0
817
3621991
black howling monkey, guatemalan black howler, guatemalan black howler monkey, guatemalan howler...
Atelidae
Terrestrial
Belize; Guatemala; Mexico (Campeche, Chiapas, Quintana Roo, Tabasco, Yucat�¡n);
\r\r\r\r\r\n The main threats to this species are deforestation, hunting (for food and for ca...
['\n This species occurs, or may occur, in several protected areas:', <br/>, <br/>, 'Belize',...
Decreasing
Endangered
3
...
Male head-and-body length: 67Â - 71 cm (2)Female head-and-body length: 52Â - 64 cm (2)Male tail ...
The Guatemalan black howler is threatened throughout most of its range from hunting and habitat ...
The Guatemalan black howler is known to occur in six protected areas: Cockscomb Basin Wildlife S...
hunting;
protected;
Endangered
['New Zealand']
1
1
1.0
831
8429089
blackeared golden mantella, blackeared mantella
Mantellidae
Terrestrial; Freshwater
Madagascar;
\r\r\r\r\r\n The area where this species occurs is severely threatened, with its forest habit...
['\n It is not known from any protected areas, making protection of remaining habitat a top p...
Decreasing
Critically Endangered
1
...
Length: 15 â?? 18 mm (2)
As for many Madagascan frogs the threats to this species are serious and numerous. Having suffer...
For this species to persist trade must be carefully regulated. The black-eared mantella is curre...
loss;pet;
protected;
Critically Endangered
[]
0
2
1.0
854
166554
blind swamp eel
Synbranchidae
NaN
Mexico;
NaN
NaN
NaN
Endangered
1
...
Length: up to at least 32.5 cm (2)
The main threats to this species are various forms of water pollution caused by humans. In rural...
There are currently no conservation measures targeting the Anguila ciega.
pollution;
Endangered
['Oman', 'United Arab Emirates']
2
1
1.0
857
4616327
ginkgo, maidenhair tree
Ginkgoaceae
Terrestrial
China (Zhejiang);
NaN
['\n The species has been widespread in cultivation for several centuries.\n\n \n ']
NaN
Endangered
1
...
Height: 30 - 40 m (2)Diameter: 3 - 4 m (2)
The maidenhair tree was thought to have become extinct similarly to the other members of its anc...
It is uncertain whether the maidenhair tree still persists in the wild and at present there are ...
loss;
Endangered
['Philippines', 'China']
2
1
1.0
869
2837560
giant bushytailed cloud rat, luzon bushytailed cloud rat, luzon crateromys
Muridae
Terrestrial
Philippines;
\r\r\r\r\r\n It is supposed that hunting is the greatest threat to this species. This species...
['\n Stricter enforcement of hunting restrictions in combination with awareness raising may b...
Decreasing
Endangered
1
...
1.4 â?? 1.5 kg (3)
The giant bushy-tailed cloud rat is actively hunted by the local people of central northern Luzo...
The giant bushy-tailed cloud rat occurs in several national parks in northern Luzon (8) includin...
loss;pet;
Endangered
['New Caledonia']
1
2
1.0
880
2013291
NaN
Zamiaceae
Terrestrial
Mexico (Oaxaca, Veracruz);
\r\r\r\r\r\n This species is affected by severe habitat destruction as a result of farming, r...
['\n This species is listed on Appendix II of the CITES Appendices. Plants are protected by l...
Decreasing
Endangered
1
...
Height: up to 16 m (2) (3)Trunk diameter: up to 40 cm (3)Leaf length: up to 2 m (2) (3)
Already made vulnerable by its rather restricted distribution <i>Dioon spinulosum</i> is under t...
Cycads are of great conservation interest as they are an ancient group with considerable economi...
loss;
cites;
Endangered
['Philippines']
1
1
1.0
882
2520162
fivekeeled spinytailed iguana, oaxacan spinytailed iguana, oaxacan spinytail iguana
Iguanidae
Terrestrial
Costa Rica; Nicaragua;
\r\r\r\r\r\n Habitat loss through deforestation and regular burning of habitat, and collectio...
['\n This species currently is not under any legal protection and is not known to occur withi...
Decreasing
Endangered
2
...
Total male length: c. 47.5 cm (2)Total female length: c. 32 cm (2)
The population of five-keeled spiny-tailed iguanas could decline by as much as 30 percent if cur...
Five-keeled spiny-tailed iguanas currently have no legal protection (1). The Lost Canyon Nature ...
loss;pet;
Endangered
['South Africa']
1
2
1.0
884
3379785
greater bigfooted mouse, longtailed bigfooted mouse
Nesomyidae
Terrestrial
Madagascar;
\r\r\r\r\r\n This species is threatened by predation by feral cats and dogs (L. Dollar pers. ...
['\n This species is present in the Ankarafantsika National Park. There is a need to manage t...
Decreasing
Endangered
1
...
50 â?? 60 g (2)
The greater big-footed mouse is very vulnerable to any threats due to its very restricted distri...
The greater big-footed mouse occurs within the Ankarafantsika National Park. Unfortunately this...
Endangered
['Mauritius', 'Reunion']
2
0
1.0
888
1722091
central anatolian spined loach
Cobitidae
Freshwater
Turkey (Turkey-in-Asia);
\r\r\r\r\r\n Droughts due to over exploitation of surface and groundwater is the main threat ...
['\n', <p><span class="st"><span lang="EN-GB">There is no conservation action in place for this ...
Decreasing
Endangered
1
...
Maximum length: 8.1 cm (2)
The distribution of <em>Cobitis turcica</em> can be split into five separate populations all of ...
<em>Cobitis turcica</em> has not been the target of any known conservation measures.
pet;pollution;
Endangered
['Vietnam', 'Malaysia', 'Indonesia', 'Thailand']
4
2
1.0
914
8777688
china alligator, chinese alligator
Alligatoridae
Terrestrial; Freshwater
China (Anhui, Jiangsu, Zhejiang);
NaN
['\n It is listed on CITES Appendix I.\n\n \n ']
NaN
Critically Endangered
1
...
up to 40 kg (2)
A survey by the Wildlife Conservation Society in 1999 found the wild population of Chinese allig...
In contrast to the decimated wild population the breeding of captive Chinese alligators has been...
protected;
Critically Endangered
[]
0
0
1.0
915
2426517
NaN
Cyprinidae
Freshwater
Greece;
\r\r\r\r\r\n Water extraction and pollution (agriculture), and drought.\r\r\r\r\r\n\r\r\r\r\r...
['\n None.\n\n \n ']
Decreasing
Endangered
1
...
Maximum length: 25 cm (2)
Restricted to just a single river the European dace is primarily threatened by the loss of its h...
The Evrotas River home to the European dace is a unique biodiversity hotspot within Greece. Its ...
loss;pollution;
protected;
Endangered
['China']
1
2
1.0
924
5054359
NaN
Amaryllidaceae
Terrestrial
Spain;
\r\r\r\r\r\n This species is vulnerable to modifications of the water regime induced by natur...
['\n This species is included in various National Parks. Seeds from some populations are stoc...
Decreasing
Endangered
1
...
NaN
Changes in the water regime are one of the biggest threats to <em>Narcissus longispathus</em> ei...
<em>Narcissus longispathus</em> is present in several National Parks including the Parque Natura...
Endangered
['Italy']
1
0
1.0
938
2820516
germains langur, germains silver langur, indochinese lutung, indochinese silvered langur
Cercopithecidae
Terrestrial
Cambodia; Lao People's Democratic Republic; Myanmar; Thailand; Viet Nam;
\r\r\r\r\r\n The major threats to this species are hunting, mainly for subsistence use and tr...
['\n This species is listed on CITES Appendix II. It has been recorded from Phu Quoc National...
Decreasing
Endangered
5
...
Head-body length: 49 - 59 cm (2) (3)Tail length: 72 - 84 cm (2) (3)
Although a relatively widespread species Germainâ??s langur is very rare throughout most of its...
Germainâ??s langur occurs in a number of protected areas including Phu Quoc and Cat Tien Nation...
loss;hunting;pet;
cites;protected;
Endangered
['Brazil']
1
3
1.0
950
6178124
salvins mushroomtongue salamander
Plethodontidae
Terrestrial
El Salvador; Guatemala;
\r\r\r\r\r\n The major threat in the past has been habitat loss, due mainly to subsistence ag...
['\n It is not currently known from any protected areas in Guatemala, although protected area...
Decreasing
Endangered
2
...
Snout-vent length: 51 - 68 mm (2) (3)Tail length: c. 51 mm (2)
Salvinâ??s mushroomtongue salamander was once relatively common but has undergone a decline as ...
No specific conservation measures are currently known to be in place for Salvinâ??s mushroomton...
loss;fragmentation;environment;disease;
protected;
Critically Endangered
['Bermuda', 'South Africa', 'United States']
3
4
1.0
954
4932151
mexican water mouse
Cricetidae
Terrestrial; Freshwater
Mexico;
\r\r\r\r\r\n This species is threatened by human activity within its range, specifically by c...
['\n More research is needed to determine the status of this species population and its speci...
Unknown
Endangered
1
...
c. 88 g (3)
In general water mice are considered very difficult to capture and so the exact status of many s...
The Mexican water mouse is classed as â??Rareâ?? by the Mexican government (8) but there are n...
pollution;
Endangered
['Sri Lanka']
1
1
1.0
959
8879437
cooks holly
Aquifoliaceae
Terrestrial
Puerto Rico;
\r\r\r\r\r\n The construction of communication towers is likely to have destroyed a large par...
['\n It is listed on the US Endangered Species Act.\n\n \n ']
NaN
Critically Endangered
1
...
Max height: 2-3 m (2)
Having last been reviewed on the IUCN Red List in 1998 the conservation status of Cookâ??s holl...
Cookâ??s holly was listed on the US Endangered Species Act in 1987 and a Recovery Plan was draw...
endangered species act;
Critically Endangered
[]
0
0
1.0
962
3012210
glittering demoiselle
Calopterygidae
Terrestrial; Freshwater
Algeria; Morocco; Tunisia;
\r\r\r\r\r\n Water pollution, drying up of streams due to water extraction for irrigation, ov...
['\n Control of water pollution and reserve establishment through policy-based actions, incre...
Decreasing
Endangered
3
...
Male length of abdomen: 39 - 42 mm (2)Female length of abdomen: 37 - 39 mm (2)Male hind wing: 30...
The glittering demoiselle is threatened by habitat loss and degradation as a result of water pol...
There are currently no known conservation initiatives targeting the glittering demoiselle but th...
loss;pollution;
Endangered
['India']
1
2
1.0
974
5062274
NaN
Amaryllidaceae
Terrestrial
Spain;
\r\r\r\r\r\n This species is vulnerable to modifications of the water regime induced by natur...
['\n This species is included in various National Parks. Seeds from some populations are stoc...
Decreasing
Endangered
1
...
NaN
Changes in the water regime are one of the biggest threats to <em>Narcissus longispathus</em> ei...
<em>Narcissus longispathus</em> is present in several National Parks including the Parque Natura...
Endangered
['South Africa']
1
0
1.0
976
1364211
NaN
Cactaceae
Terrestrial
Brazil (Bahia);
\r\r\r\r\r\n The species is threatened by small-holder and agro-industrial agriculture, cattl...
['\n The species occurs in the Parque Estadual Morro do Chap�©u. It\xa0is included in CITES...
Decreasing
Endangered
1
...
Stem height: 13 - 18 cm (2)Stem diameter: 12.5 - 24 cm (2)
This cactus is particularly attractive and has been highly prized by collectors over the years; ...
<i>Melocactus glaucescens</i> has been placed on Appendix I of the Convention on International T...
cites;protected;
Endangered
['Tunisia', 'Morocco', 'Algeria']
3
0
1.0
978
1348381
NaN
Cactaceae
Terrestrial
Brazil (Bahia);
\r\r\r\r\r\n The species is threatened by small-holder and agro-industrial agriculture, cattl...
['\n The species occurs in the Parque Estadual Morro do Chap�©u. It\xa0is included in CITES...
Decreasing
Endangered
1
...
Stem height: 13 - 18 cm (2)Stem diameter: 12.5 - 24 cm (2)
This cactus is particularly attractive and has been highly prized by collectors over the years; ...
<i>Melocactus glaucescens</i> has been placed on Appendix I of the Convention on International T...
cites;protected;
Endangered
['Cuba']
1
0
1.0
981
4753626
marleys golden mole
Chrysochloridae
Terrestrial
South Africa (KwaZulu-Natal);
\r\r\r\r\r\n The single major threat is likely to be habitat degradation, either through over...
['\n Known to occur in only the Pongola Wilderness Area. Research is needed to search for oth...
Unknown
Endangered
1
...
30 - 34 g (2)
Habitat degradation brought about chiefly by overgrazing firewood collection and urbanization is...
There are no specific conservation measures in place for Marleyâ??s golden mole but it is known...
pet;
protected;
Endangered
['Turkey', 'Sri Lanka']
2
1
1.0
983
1136401
blue shiner
Cyprinidae
Freshwater
United States;
\r\r\r\r\r\n Declines have been caused by water pollution, siltation, and construction of res...
['\n Hatchery spawning techniques need to be developed. If spawning in captivity can be achie...
Decreasing
Endangered
1
...
Maximum size: 10 cm (2)
Numbering no more than 2500 individuals the blue shiner population is suspected to have undergon...
The future of the blue shiner very much depends on the protection of its habitat particularly th...
loss;pollution;
Endangered
['United Kingdom', 'New Zealand']
2
2
1.0
992
3808519
haitian solenodon, hispaniolan solenodon
Solenodontidae
Terrestrial
Dominican Republic; Haiti;
\r\r\r\r\r\n The most significant threat to this species appears to be the continuing demise ...
['\n It is protected by law in the Dominican Republic (General Environmental Law 64 - 00). Th...
Decreasing
Endangered
2
...
700 â?? 1000 g (2)
Out of approximately 25 <strong>endemic</strong> land (non-flying) mammal species that once inha...
There is thought to be little hope for this species in Haiti (2) but in the Dominican Republic t...
captive breeding;protected;
Endangered
[]
0
0
1.0
994
3800604
haitian solenodon, hispaniolan solenodon
Solenodontidae
Terrestrial
Dominican Republic; Haiti;
\r\r\r\r\r\n The most significant threat to this species appears to be the continuing demise ...
['\n It is protected by law in the Dominican Republic (General Environmental Law 64 - 00). Th...
Decreasing
Endangered
2
...
700 â?? 1000 g (2)
Out of approximately 25 <strong>endemic</strong> land (non-flying) mammal species that once inha...
There is thought to be little hope for this species in Haiti (2) but in the Dominican Republic t...
captive breeding;protected;
Endangered
['Czech Republic', 'Spain', 'Ecuador']
3
0
1.0
148 rows × 28 columns
In [3]:
for c in t.columns:
print(c)
_id
ltable_name
ltable_family
ltable_ecology
ltable_countries
ltable_threat_paragraph
ltable_conservation_paragraph
ltable_pop_trend
ltable_status
ltable_country_count
scientific_name
rtable_name
rtable_kingdom
rtable_phylum
rtable_class
rtable_order
rtable_family
genus
rtable_size
rtable_threats
rtable_conservation
rtable_threat_keywords
rtable_conservation_keywords
rtable_status
rtable_countries
rtable_country_count
rtable_tCount
label
In [ ]:
t.head()
In [6]:
print('Number of tuples:', len(t))
# print('Number of unique rtable values', len(t.rtable_scientific_name.unique()))
print('Number of unique genuses:', len(t.genus.unique()))
print('Number of unique sizes:', len(t.rtable_size.unique()))
Number of tuples: 148
Number of unique genuses: 92
Number of unique sizes: 102
In [13]:
for u in t.rtable_size.unique():
print(u)
print()
Extended length: up to 90 mm (2)
Height: 3 - 7 m (2)Trunk diameter: c. 20 cm (2)
700 â?? 900 g (2)
1 â?? 3 g (3)
1 â?? 1.5 kg (2)
Height: up to 3 m (2)
Snout-vent length: up to 8 cm (2)
5.8 â?? 7.4 kg (2)
Length: 6 - 8 cm (2)
c. 10 g (2)
Snout-vent length: 51 - 68 mm (2) (3)Tail length: c. 51 mm (2)
Male head-body length: c. 42.7 cm (2)Female head-body length: c. 41.8 cm (2)Tail length: 30 - 50.7 cm (3)
Length: up to 7.6 cm (2)
Length: 6 mm (2).
Trunk diameter: up to 172 cm (2)
up to 10 g (2)
ca. 7 kg (2)
1 - 1.5 kg (2)
Wingspan: 19 â?? 28 cm (2)
up to 53 kg (2)
Male head-to-body length: 57 â?? 62 cm (2)Female head-to-body length: 57 â?? 59 cm (2)Mail tail length: 82 - 88 cm (2)Female tail length: 84 â?? 86 cm (2)Male weight: 7.5 â?? 10.5 kg (2)Female weight: 6.2 â?? 9.2 kg (2)
Maximum snout-vent length: 92 cm (2)
Male length: c. 71 mm (2)Female length: c. 74 mm (2)
Plant size: 1 - 5 mm
2.5 kg (2)
Length: 15 â?? 18 mm (2)
2.5 - 4 kg (4)
Megophrys (1)
Male length: 20 â?? 30 mm (2)Female length: 30 â?? 40 mm (2)
Height: 90 â?? 180 cm (2)Leaf length: 7.5 - 15 cm (2)Flower length: up to 7.5 cm (2)
Montivipera (1)
Height: up to 25 metres (3)
Length: c. 15 cm (2) (3)
up to 315 kg (2)
Wingspan: 13 â?? 19 cm (2)
Length: up to 42 cm (2)
Height: 30 - 40 m (2)Diameter: 3 - 4 m (2)
Male length: 68 - 78 mm (2)Female length: 78-83 mm (2)Male length of abdomen: 50 - 60 mm (2)Female length of abdomen: 59 - 63 mm (2)Male hindwing: 41 - 46 mm (2)Female hindwing: 46 - 49 mm (2)
Length: 1.8 m (2)
142 â?? 179 g (2)
Stem height: 1 - 2 m (2)
Length: 16.1 cm (2)
150 - 320 kg (3) (4)
Male length: 18.5 cm (2)Female length: 18.3 cm (2)
Length: 1 - 1.5m (2)
115 g (2)
Height: c. 60 metres (2)Trunk diameter: up to 3 m (2)
c. 3 - 4 g (2)
c. 88 g (3)
Female snout-vent length: up to 34 mm (2)
Male length: 3.1 - 3.7 cm (2)Female length: 3.6 - 4 cm (2)
250 - 600g (2)
Male head-and-body length: 67Â - 71 cm (2)Female head-and-body length: 52Â - 64 cm (2)Male tail length: 60Â - 67 cm (2)Female tail length: 50Â - 54 cm (2)Male weight: c. 11.4 kg (2)Female weight: c. 6.4 kg (2)
Height: up to 60 m (2)
Male length: 22 â?? 25 mm (2)Female length: 25 â?? 30 mm (2)
Male length (excluding tail): 89 mm (2)Female length (excluding tail): 83 - 90.5 mm (2)
Maximum length: 8.1 cm (2)
ca. 1.3 kg (2)
Height: 45 - 60 cm (2)
Length: up to 25 cm (2)
up to 255.6 kg (2)
Length: up to 30 cm (2)Weight of male: 125 â?? 130 g (3)Weight of female: 170 â?? 180 g (3)
1.4 â?? 1.5 kg (3)
Diameter: up to 9.7 mm (2)
Male snout-vent length: 133 mm (2)Female snout-vent length: 120 mm (2)Total length: up to 300 mm (2)
Height: 10 - 15 m (1)
130 kg (2)
Male total length: up to 35.7 cm (2)Female total length: up to 33.4 cm (2)Male weight: up to 60 g (2)Female weight: up to 47g (2)
3 â?? 4.5 kg (2)
Carapace length: 26 cm (2)
Length at birth: 9 - 19 mm (2)Adult length: up to 75 mm (2)
1 - 8 kg (2)
Snout-vent length: up to 6 cm (2)
nan
up to 40 kg (2)
Length: up to 8.2 cm (2)
up to 1.25 kg (2)
Hieght: up to 20 m (2)
Stem height: 13 - 18 cm (2)Stem diameter: 12.5 - 24 cm (2)
Height: up to 10 m (2)
Height: 2 - 4 m (2)
Diameter: 2 - 5 cm (2)
Height: up to 30 m (2) (3)Trunk diameter: up to 0.6 m (2)
50 â?? 60 g (2)
Trunk length: up to 4.2 m (2)Trunk diameter: 35 - 40 cm (2)
3.5 kg (2)
Max height: 2-3 m (2)
Height: up to 5m (2)Pitcher height: 20 cm (2)
Male snout-vent length: 42 mm (2)Female snout-vent length: 52 mm (2)
Length: 10 - 16.5 cm (2)
Height: up to 15 m (2)
Female snout-vent length: 2 cm (2)
Length: 14 - 19 mm (2)
Length: up to at least 32.5 cm (2)
Height: up to 16 m (2) (3)Trunk diameter: up to 40 cm (3)Leaf length: up to 2 m (2) (3)
Total male length: c. 47.5 cm (2)Total female length: c. 32 cm (2)
Maximum length: 25 cm (2)
Head-body length: 49 - 59 cm (2) (3)Tail length: 72 - 84 cm (2) (3)
Male length of abdomen: 39 - 42 mm (2)Female length of abdomen: 37 - 39 mm (2)Male hind wing: 30 - 32 mm (2)Female hind wing: 32 - 36 mm (2)
30 - 34 g (2)
Maximum size: 10 cm (2)
700 â?? 1000 g (2)
In [35]:
# Figure out how to parse out weight from size column
u = t.rtable_size.unique()
lengths = [l for l in u if isinstance(l, str) and 'length' in l]
print('number of lengths:', len(lengths))
notlengths = [l for l in u if isinstance(l, str) and 'length' not in l]
weights = [l for l in u if isinstance(l, str) and (' g ' in l or ' kg ' in l)]
print('Number of weights', len(weights))
print()
print()
print()
for w in weights:
match = None
# find last occurrance of "<number> g" or "<number> kg"
for match in re.finditer(r"\d+ (k)?g", w):
pass
#only keep the digit
m = float(re.sub(r'\D', "", match.group()))
if 'k' in match.group():
m *= 1000
print(match.group())
print(m)
print()
for size in t.rtable_size.unique():
print(size)
print()
number of lengths: 30
Number of weights 33
900 g
900.0
3 g
3.0
5 kg
5000.0
4 kg
4000.0
10 g
10.0
10 g
10.0
7 kg
7000.0
5 kg
5000.0
53 kg
53000.0
2 kg
2000.0
5 kg
5000.0
4 kg
4000.0
315 kg
315000.0
179 g
179.0
320 kg
320000.0
115 g
115.0
4 g
4.0
88 g
88.0
4 kg
4000.0
3 kg
3000.0
6 kg
6000.0
180 g
180.0
5 kg
5000.0
130 kg
130000.0
60 g
60.0
5 kg
5000.0
8 kg
8000.0
40 kg
40000.0
25 kg
25000.0
60 g
60.0
5 kg
5000.0
34 g
34.0
1000 g
1000.0
Extended length: up to 90 mm (2)
Height: 3 - 7 m (2)Trunk diameter: c. 20 cm (2)
700 â?? 900 g (2)
1 â?? 3 g (3)
1 â?? 1.5 kg (2)
Height: up to 3 m (2)
Snout-vent length: up to 8 cm (2)
5.8 â?? 7.4 kg (2)
Length: 6 - 8 cm (2)
c. 10 g (2)
Snout-vent length: 51 - 68 mm (2) (3)Tail length: c. 51 mm (2)
Male head-body length: c. 42.7 cm (2)Female head-body length: c. 41.8 cm (2)Tail length: 30 - 50.7 cm (3)
Length: up to 7.6 cm (2)
Length: 6 mm (2).
Trunk diameter: up to 172 cm (2)
up to 10 g (2)
ca. 7 kg (2)
1 - 1.5 kg (2)
Wingspan: 19 â?? 28 cm (2)
up to 53 kg (2)
Male head-to-body length: 57 â?? 62 cm (2)Female head-to-body length: 57 â?? 59 cm (2)Mail tail length: 82 - 88 cm (2)Female tail length: 84 â?? 86 cm (2)Male weight: 7.5 â?? 10.5 kg (2)Female weight: 6.2 â?? 9.2 kg (2)
Maximum snout-vent length: 92 cm (2)
Male length: c. 71 mm (2)Female length: c. 74 mm (2)
Plant size: 1 - 5 mm
2.5 kg (2)
Length: 15 â?? 18 mm (2)
2.5 - 4 kg (4)
Megophrys (1)
Male length: 20 â?? 30 mm (2)Female length: 30 â?? 40 mm (2)
Height: 90 â?? 180 cm (2)Leaf length: 7.5 - 15 cm (2)Flower length: up to 7.5 cm (2)
Montivipera (1)
Height: up to 25 metres (3)
Length: c. 15 cm (2) (3)
up to 315 kg (2)
Wingspan: 13 â?? 19 cm (2)
Length: up to 42 cm (2)
Height: 30 - 40 m (2)Diameter: 3 - 4 m (2)
Male length: 68 - 78 mm (2)Female length: 78-83 mm (2)Male length of abdomen: 50 - 60 mm (2)Female length of abdomen: 59 - 63 mm (2)Male hindwing: 41 - 46 mm (2)Female hindwing: 46 - 49 mm (2)
Length: 1.8 m (2)
142 â?? 179 g (2)
Stem height: 1 - 2 m (2)
Length: 16.1 cm (2)
150 - 320 kg (3) (4)
Male length: 18.5 cm (2)Female length: 18.3 cm (2)
Length: 1 - 1.5m (2)
115 g (2)
Height: c. 60 metres (2)Trunk diameter: up to 3 m (2)
c. 3 - 4 g (2)
c. 88 g (3)
Female snout-vent length: up to 34 mm (2)
Male length: 3.1 - 3.7 cm (2)Female length: 3.6 - 4 cm (2)
250 - 600g (2)
Male head-and-body length: 67Â - 71 cm (2)Female head-and-body length: 52Â - 64 cm (2)Male tail length: 60Â - 67 cm (2)Female tail length: 50Â - 54 cm (2)Male weight: c. 11.4 kg (2)Female weight: c. 6.4 kg (2)
Height: up to 60 m (2)
Male length: 22 â?? 25 mm (2)Female length: 25 â?? 30 mm (2)
Male length (excluding tail): 89 mm (2)Female length (excluding tail): 83 - 90.5 mm (2)
Maximum length: 8.1 cm (2)
ca. 1.3 kg (2)
Height: 45 - 60 cm (2)
Length: up to 25 cm (2)
up to 255.6 kg (2)
Length: up to 30 cm (2)Weight of male: 125 â?? 130 g (3)Weight of female: 170 â?? 180 g (3)
1.4 â?? 1.5 kg (3)
Diameter: up to 9.7 mm (2)
Male snout-vent length: 133 mm (2)Female snout-vent length: 120 mm (2)Total length: up to 300 mm (2)
Height: 10 - 15 m (1)
130 kg (2)
Male total length: up to 35.7 cm (2)Female total length: up to 33.4 cm (2)Male weight: up to 60 g (2)Female weight: up to 47g (2)
3 â?? 4.5 kg (2)
Carapace length: 26 cm (2)
Length at birth: 9 - 19 mm (2)Adult length: up to 75 mm (2)
1 - 8 kg (2)
Snout-vent length: up to 6 cm (2)
nan
up to 40 kg (2)
Length: up to 8.2 cm (2)
up to 1.25 kg (2)
Hieght: up to 20 m (2)
Stem height: 13 - 18 cm (2)Stem diameter: 12.5 - 24 cm (2)
Height: up to 10 m (2)
Height: 2 - 4 m (2)
Diameter: 2 - 5 cm (2)
Height: up to 30 m (2) (3)Trunk diameter: up to 0.6 m (2)
50 â?? 60 g (2)
Trunk length: up to 4.2 m (2)Trunk diameter: 35 - 40 cm (2)
3.5 kg (2)
Max height: 2-3 m (2)
Height: up to 5m (2)Pitcher height: 20 cm (2)
Male snout-vent length: 42 mm (2)Female snout-vent length: 52 mm (2)
Length: 10 - 16.5 cm (2)
Height: up to 15 m (2)
Female snout-vent length: 2 cm (2)
Length: 14 - 19 mm (2)
Length: up to at least 32.5 cm (2)
Height: up to 16 m (2) (3)Trunk diameter: up to 40 cm (3)Leaf length: up to 2 m (2) (3)
Total male length: c. 47.5 cm (2)Total female length: c. 32 cm (2)
Maximum length: 25 cm (2)
Head-body length: 49 - 59 cm (2) (3)Tail length: 72 - 84 cm (2) (3)
Male length of abdomen: 39 - 42 mm (2)Female length of abdomen: 37 - 39 mm (2)Male hind wing: 30 - 32 mm (2)Female hind wing: 32 - 36 mm (2)
30 - 34 g (2)
Maximum size: 10 cm (2)
700 â?? 1000 g (2)
In [11]:
#add weight column
weightColumn = []
# replace name fields with list of nick/common names
for index, row in t.iterrows():
size = row['rtable_size']
if not size:
continue
#if this is a weight
if isinstance(size, str) and (' g ' in size or ' kg ' in size):
match = None
# find last occurrance of "<number> g" or "<number> kg"
for match in re.finditer(r"\d (k)?g", size):
pass
#only keep the digit
m = float(re.sub(r'\D', "", match.group()))
if 'km' in match.group():
m *= 1000
weightColumn.append(m)
else:
weightColumn.append(None)
t['weight'] = weightColumn
t.head()
Out[11]:
_id
ltable_name
ltable_family
ltable_ecology
ltable_countries
ltable_threat_paragraph
ltable_conservation_paragraph
ltable_pop_trend
ltable_status
ltable_country_count
...
rtable_threats
rtable_conservation
rtable_threat_keywords
rtable_conservation_keywords
rtable_status
rtable_countries
rtable_country_count
rtable_tCount
label
weight
7
1522478
caterpillar slug
Veronicellidae
NaN
South Africa (KwaZulu-Natal);
NaN
NaN
Unknown
Endangered
1
...
The caterpillar slug is threatened by habitat loss and degradation as a result of ongoing urbani...
Although there are currently no conservation measures directly targeting the caterpillar slug in...
loss;environment;
Endangered
['India', 'Russia', 'Malaysia', 'China', 'Indonesia']
5
2
1.0
NaN
17
8689373
catalina mahogany
Rosaceae
NaN
United States (California);
NaN
NaN
NaN
Critically Endangered
1
...
Historically a major threat to the Catalina mahogany was the introduction of <strong>herbivores<...
Conservation efforts began in the 1970s with a detailed inventory of the remaining Catalina maho...
loss;invasive;
Critically Endangered
['Ukraine', 'Morocco', 'Russia', 'Hungary']
4
2
1.0
NaN
21
7512846
lorenz von liburnaus woolly lemur, western avahi, western woolly lemur
Indriidae
Terrestrial
Madagascar;
\r\r\r\r\r\n The major threat is forest destruction due to annual burning that creates new ca...
['\n This species is listed on Appendix I of CITES. ', <span lang="EN-CA">This species is kno...
Decreasing
Endangered
1
...
15 species of lemur have become extinct since sea-faring humans arrived on Madagascarâ??s shore...
The western woolly lemur is confirmed in only two protected areas Ankarafantsika Nature Reserve ...
hunting;
protected;
Critically Endangered
['Australia']
1
1
1.0
0.0
23
1191527
bluelegged mantella, tular golden frog, tular mantella, tulear golden frog
Mantellidae
Terrestrial; Freshwater
Madagascar;
\r\r\r\r\r\n The main threat to this species is habitat loss due to grazing and fire, and in ...
['\n It occurs in Parque Nacional de Isalo. Trade in this species needs to be very carefully ...
Decreasing
Endangered
1
...
Several thousand blue-legged mantellas are thought to be collected every year from some regions ...
Listing on Appendix II of the Convention on International Trade in Endangered Species provides t...
loss;
Endangered
['Taiwan', 'China', 'Vietnam']
3
1
1.0
3.0
25
4646125
malagasy giant jumping rat, malagasy giant rat
Nesomyidae
Terrestrial
Madagascar;
\r\r\r\r\r\n The historical decline of this species has been partly through climatic change l...
['\n The new Menabe-Antimena protected area has temporary protection order and covers the ent...
Decreasing
Endangered
1
...
Like many of Madagascarâ??s unique species the Malagasy giant rat is thought to have become hig...
This large rodent is in urgent need of conservation and its future remains highly uncertain. The...
loss;pet;
captive breeding;protected;
Endangered
['Canada']
1
2
1.0
5000.0
5 rows × 29 columns
In [50]:
#Figure out how to parse length
def strToMeters(srcString):
n = float(re.sub(r'[^0-9.]', "", srcString))
if 'cm' in srcString:
n /= 100
if 'mm' in srcString:
n /= 1000
return n
u = t.rtable_size.unique()
lengths = [l for l in u if isinstance(l, str) and 'length' in l]
print('number of lengths:', len(lengths))
out = ''
numMatches = 0;
for l in lengths:
out += '\n'
out += l + '\n'
match = None
# find last occurrance of "<number> cm"
# or "<number> mm" or "<number m"
maxLen = 0
maxStr = ""
for match in re.finditer(r"([0-9]+.)?[0-9]+ (c)?(m+)", l, re.I):
tmp = strToMeters(match.group())
if maxLen < tmp:
maxLen = tmp
maxStr = match.group()
#only keep the digit
if not match:
continue
numMatches += 1
out += maxStr + '\n' + str(maxLen) + '\n'
print('number of matches:', numMatches)
print()
print()
print()
print(out)
number of lengths: 30
number of matches: 30
Extended length: up to 90 mm (2)
90 mm
0.09
Snout-vent length: up to 8 cm (2)
8 cm
0.08
Snout-vent length: 51 - 68 mm (2) (3)Tail length: c. 51 mm (2)
68 mm
0.068
Male head-body length: c. 42.7 cm (2)Female head-body length: c. 41.8 cm (2)Tail length: 30 - 50.7 cm (3)
50.7 cm
0.507
Male head-to-body length: 57 â?? 62 cm (2)Female head-to-body length: 57 â?? 59 cm (2)Mail tail length: 82 - 88 cm (2)Female tail length: 84 â?? 86 cm (2)Male weight: 7.5 â?? 10.5 kg (2)Female weight: 6.2 â?? 9.2 kg (2)
88 cm
0.88
Maximum snout-vent length: 92 cm (2)
92 cm
0.92
Male length: c. 71 mm (2)Female length: c. 74 mm (2)
74 mm
0.074
Male length: 20 â?? 30 mm (2)Female length: 30 â?? 40 mm (2)
40 mm
0.04
Height: 90 â?? 180 cm (2)Leaf length: 7.5 - 15 cm (2)Flower length: up to 7.5 cm (2)
180 cm
1.8
Male length: 68 - 78 mm (2)Female length: 78-83 mm (2)Male length of abdomen: 50 - 60 mm (2)Female length of abdomen: 59 - 63 mm (2)Male hindwing: 41 - 46 mm (2)Female hindwing: 46 - 49 mm (2)
78-83 mm
7.883
Male length: 18.5 cm (2)Female length: 18.3 cm (2)
18.5 cm
0.185
Female snout-vent length: up to 34 mm (2)
34 mm
0.034
Male length: 3.1 - 3.7 cm (2)Female length: 3.6 - 4 cm (2)
4 cm
0.04
Male head-and-body length: 67Â - 71 cm (2)Female head-and-body length: 52Â - 64 cm (2)Male tail length: 60Â - 67 cm (2)Female tail length: 50Â - 54 cm (2)Male weight: c. 11.4 kg (2)Female weight: c. 6.4 kg (2)
71 cm
0.71
Male length: 22 â?? 25 mm (2)Female length: 25 â?? 30 mm (2)
30 mm
0.03
Male length (excluding tail): 89 mm (2)Female length (excluding tail): 83 - 90.5 mm (2)
90.5 mm
0.0905
Maximum length: 8.1 cm (2)
8.1 cm
0.081
Male snout-vent length: 133 mm (2)Female snout-vent length: 120 mm (2)Total length: up to 300 mm (2)
300 mm
0.3
Male total length: up to 35.7 cm (2)Female total length: up to 33.4 cm (2)Male weight: up to 60 g (2)Female weight: up to 47g (2)
35.7 cm
0.35700000000000004
Carapace length: 26 cm (2)
26 cm
0.26
Length at birth: 9 - 19 mm (2)Adult length: up to 75 mm (2)
75 mm
0.075
Snout-vent length: up to 6 cm (2)
6 cm
0.06
Trunk length: up to 4.2 m (2)Trunk diameter: 35 - 40 cm (2)
4.2 m
4.2
Male snout-vent length: 42 mm (2)Female snout-vent length: 52 mm (2)
52 mm
0.052
Female snout-vent length: 2 cm (2)
2 cm
0.02
Height: up to 16 m (2) (3)Trunk diameter: up to 40 cm (3)Leaf length: up to 2 m (2) (3)
16 m
16.0
Total male length: c. 47.5 cm (2)Total female length: c. 32 cm (2)
47.5 cm
0.475
Maximum length: 25 cm (2)
25 cm
0.25
Head-body length: 49 - 59 cm (2) (3)Tail length: 72 - 84 cm (2) (3)
84 cm
0.84
Male length of abdomen: 39 - 42 mm (2)Female length of abdomen: 37 - 39 mm (2)Male hind wing: 30 - 32 mm (2)Female hind wing: 32 - 36 mm (2)
42 mm
0.042
In [51]:
def strToMeters(srcString):
n = float(re.sub(r'[^0-9.]', "", srcString))
if 'cm' in srcString:
n /= 100
if 'mm' in srcString:
n /= 1000
return n
#add length column
lengthColumn = []
# replace name fields with list of nick/common names
for index, row in t.iterrows():
size = row['rtable_size']
if not size:
continue
#if this is a length
if isinstance(size, str) and 'length' in size:
match = None
maxLen = 0
maxStr = ""
# find last occurrance of "<number> g" or "<number> kg"
for match in re.finditer(r"([0-9]+.)?[0-9]+ (c)?(m+)",
size, re.I):
tmp = strToMeters(match.group())
if maxLen < tmp:
maxLen = tmp
maxStr = match.group()
lengthColumn.append(maxLen)
else:
lengthColumn.append(None)
t['length'] = lengthColumn
t.head()
Out[51]:
_id
ltable_name
ltable_family
ltable_ecology
ltable_countries
ltable_threat_paragraph
ltable_conservation_paragraph
ltable_pop_trend
ltable_status
ltable_country_count
...
rtable_conservation
rtable_threat_keywords
rtable_conservation_keywords
rtable_status
rtable_countries
rtable_country_count
rtable_tCount
label
weight
length
7
1522478
caterpillar slug
Veronicellidae
NaN
South Africa (KwaZulu-Natal);
NaN
NaN
Unknown
Endangered
1
...
Although there are currently no conservation measures directly targeting the caterpillar slug in...
loss;environment;
Endangered
['India', 'Russia', 'Malaysia', 'China', 'Indonesia']
5
2
1.0
NaN
0.09
17
8689373
catalina mahogany
Rosaceae
NaN
United States (California);
NaN
NaN
NaN
Critically Endangered
1
...
Conservation efforts began in the 1970s with a detailed inventory of the remaining Catalina maho...
loss;invasive;
Critically Endangered
['Ukraine', 'Morocco', 'Russia', 'Hungary']
4
2
1.0
NaN
NaN
21
7512846
lorenz von liburnaus woolly lemur, western avahi, western woolly lemur
Indriidae
Terrestrial
Madagascar;
\r\r\r\r\r\n The major threat is forest destruction due to annual burning that creates new ca...
['\n This species is listed on Appendix I of CITES. ', <span lang="EN-CA">This species is kno...
Decreasing
Endangered
1
...
The western woolly lemur is confirmed in only two protected areas Ankarafantsika Nature Reserve ...
hunting;
protected;
Critically Endangered
['Australia']
1
1
1.0
0.0
NaN
23
1191527
bluelegged mantella, tular golden frog, tular mantella, tulear golden frog
Mantellidae
Terrestrial; Freshwater
Madagascar;
\r\r\r\r\r\n The main threat to this species is habitat loss due to grazing and fire, and in ...
['\n It occurs in Parque Nacional de Isalo. Trade in this species needs to be very carefully ...
Decreasing
Endangered
1
...
Listing on Appendix II of the Convention on International Trade in Endangered Species provides t...
loss;
Endangered
['Taiwan', 'China', 'Vietnam']
3
1
1.0
3.0
NaN
25
4646125
malagasy giant jumping rat, malagasy giant rat
Nesomyidae
Terrestrial
Madagascar;
\r\r\r\r\r\n The historical decline of this species has been partly through climatic change l...
['\n The new Menabe-Antimena protected area has temporary protection order and covers the ent...
Decreasing
Endangered
1
...
This large rodent is in urgent need of conservation and its future remains highly uncertain. The...
loss;pet;
captive breeding;protected;
Endangered
['Canada']
1
2
1.0
5000.0
NaN
5 rows × 30 columns
In [52]:
t.to_csv('labeled.csv')
In [ ]:
Content source: andrewedstrom/cs638project
Similar notebooks: